Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Always use person_distinct_id_override #22347

Merged
merged 15 commits into from
Jun 5, 2024

Conversation

timgl
Copy link
Collaborator

@timgl timgl commented May 17, 2024

Problem

For one version of PoE we were still using person_override instead of the correct person_distinct_id_override

Two types of people that would run into this

  1. Anyone who opted in to poev2 in the old settings page
  2. Anyone opting in to this on the new settings page
image

This will somewhat change data for these groups. I think this is acceptable as they opted in to this, and it's better to be correct. It will also reduce confusion for anyone now opting in to this, as the new settings don't explain the trade off the old settings did.

Changes

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

@timgl timgl requested a review from fuziontech as a code owner May 21, 2024 11:26
@Twixes
Copy link
Collaborator

Twixes commented May 21, 2024

I think the base commit of this PR is wrong, so the diff is much larger than it should be @timgl

Copy link
Contributor

Size Change: +172 B (+0.02%)

Total Size: 1.05 MB

ℹ️ View Unchanged
Filename Size Change
frontend/dist/toolbar.js 1.05 MB +172 B (+0.02%)

compressed-size-action

@timgl timgl force-pushed the always-use-distinct-id-overrides branch from 300f387 to f89b102 Compare May 21, 2024 11:48
@timgl
Copy link
Collaborator Author

timgl commented May 21, 2024

@Twixes done!

Copy link
Member

@fuziontech fuziontech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Collaborator

@Twixes Twixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but results are different in a couple of tests, not sure why

@tkaemming
Copy link
Contributor

tkaemming commented May 21, 2024

Looks good to me, but results are different in a couple of tests, not sure why

This also looks good to me in general. I spent a bit of time trying to figure out what was going on with these tests, too.

The issue with posthog/hogql/test/test_modifiers.py::TestModifiers::test_modifiers_persons_on_events_mode_mapping has a pretty straightforward fix: 8977334

The other test failures are more confusing: I assumed they were because the overrides created in the tests were still being written to the old person_overrides table and fixed that with 55a3179, but that doesn't seem to fix anything and actually causes more test failures (7 failures with persons-on-events test flag off - up from 3 if you exclude the modifiers test, and tests then also start failing with the persons-on-events test flag on where they weren't before.)

It seems like these tests are still using the legacy event query. When patched with:

diff --git a/posthog/queries/event_query/event_query.py b/posthog/queries/event_query/event_query.py
index 49d4565a94..56ed872b81 100644
--- a/posthog/queries/event_query/event_query.py
+++ b/posthog/queries/event_query/event_query.py
@@ -134,6 +134,8 @@ class EventQuery(metaclass=ABCMeta):
         return f"{self.DISTINCT_ID_TABLE_ALIAS}.person_id"
 
     def _get_person_ids_query(self, *, relevant_events_conditions: str = "") -> str:
+        raise NotImplementedError("using legacy event query")
+
         if not self._should_join_distinct_ids:
             return ""
 

… running py.test -k person_on_events_v2 results in:

FAILED ee/clickhouse/queries/test/test_paths.py::TestClickhousePaths::test_person_on_events_v2 - NotImplementedError: using legacy event query
FAILED ee/clickhouse/queries/test/test_retention.py::TestClickhouseRetention::test_groups_filtering_person_on_events_v2 - NotImplementedError: using legacy event query
FAILED ee/clickhouse/views/test/test_clickhouse_stickiness.py::TestClickhouseStickiness::test_stickiness_with_person_on_events_v2 - AssertionError: b'{"type":"server_error","code":"error","detail":"A server error occurred.","attr":null}'
FAILED posthog/queries/funnels/test/test_funnel.py::TestFOSSFunnel::test_funnel_events_with_person_on_events_v2 - NotImplementedError: using legacy event query
FAILED posthog/queries/test/test_retention.py::TestFOSSRetention::test_month_interval_with_person_on_events_v2 - NotImplementedError: using legacy event query
FAILED posthog/queries/test/test_trends.py::TestTrends::test_same_day_with_person_on_events_v2 - NotImplementedError: using legacy event query
FAILED posthog/queries/test/test_trends.py::TestTrends::test_same_day_with_person_on_events_v2_latest_override - NotImplementedError: using legacy event query

Not sure what the plan is with updating or replacing these tests to use the new queries, but hopefully this saves some debugging time for you.

Also, once we're not using this table in production queries any longer, I can take care of removing the override worker deployment and other plugin-server bits.

@Twixes
Copy link
Collaborator

Twixes commented May 22, 2024

Ah, all tests in posthog/queries/ use the legacy EventQuery, but they aren't actually being used in production anymore. The new engine lives in posthog/hogql_queries/.
But I don't think any of your failures with the raise NotImplementedError("using legacy event query") patch are of tests that fail on this PR. The failing ones are mostly posthog/hogql_queries/, which for sure uses the new table. So I think these differences are real. I don't actually have context on what use_distinct_id_overrides means though, so I think @mariusandra or @timgl probably have a better idea on what's expected and what isn't

@tkaemming
Copy link
Contributor

tkaemming commented May 22, 2024

You're right, and I misspoke here:

I assumed they were because the overrides created in the tests were still being written to the old person_overrides table and fixed that with 55a3179, but that doesn't seem to fix anything

That change does fix something — the errors in posthog/hogql_queries/! But it also also breaks the old tests in posthog/queries/ because the change to use person_distinct_id_overrides instead of person_overrides was never backported to the legacy queries (intentionally.)

To me that seems to suggest a discrepancy in test coverage between the old queries and the new queries (otherwise many more would have been broken here in this PR.) Seems like the tests that break with 55a3179 (the ones listed above) should either be ported to the new system, or deleted to unblock this?

@timgl timgl force-pushed the always-use-distinct-id-overrides branch from 2a51e72 to 3946521 Compare May 24, 2024 13:13
@posthog-bot
Copy link
Contributor

It looks like the code of hogql-parser has changed since last push, but its version stayed the same at 1.0.8. 👀
Make sure to resolve this in hogql_parser/setup.py before merging!

@timgl
Copy link
Collaborator Author

timgl commented May 24, 2024

@tkaemming I fixed the old queries so they use the new overrides too.

Copy link
Collaborator

@Twixes Twixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tkaemming tkaemming merged commit 1ec3b5f into master Jun 5, 2024
76 checks passed
@tkaemming tkaemming deleted the always-use-distinct-id-overrides branch June 5, 2024 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants