test: add test for impure function correlation behavior #9014

NickCrews · 2024-04-18T19:47:55Z

Related to #8921,
trying to write down exactly what the expected behavior is.

I figure we can use this PR to hash out exaclty what we want the semantics to be, and then the other discussions might be easier because our goal is written down precisely somewhere. Please let me know if you agree or disagree with this behavior, or if there are other tests we should add.

Need to fix the UDF test case in a followup. Also wasn't sure where to put these tests, I put them in their own little file but if you point me elsewhere Iwill move them.

NickCrews · 2024-04-18T19:56:41Z

oooh, these CI failures are revealing differences in the backend behaviors... wheeee into the 🐰 🕳️ we go!

ibis/backends/tests/test_impure.py

NickCrews · 2024-04-19T00:41:58Z

I can think of two reasons a user might care about the semantics here, and I hope we can support both of their needs:

impureness/correlatedness. If a function is impure, they may want it to be executed once, or many times, depending on if they want them to be correlated. See this example
performance. If some computation is slow, they only want it to happen a single time.

So I think this means that we as ibis authors can't assume what the goals of the user is, and how many times they want an expression executed. Therefore, we shouldn't do any clever rewrites or mergings of selects. I think we need to keep a more 1:1 correspondence between what the user writes and the SQL we produce: Every time the user does a .select(), .mutate(), etc, (except for simple column renamings, and maybe a few other cases) that leads to exactly one more with select .... as ... in the generated SQL.

IDK, what do you think of this train of reasoning? I think I'm fairly convinced that those two use cases are the requirements for success, but perhaps there is a different/better way of accomplishing that goal

kszucs · 2024-04-22T15:58:41Z

@NickCrews please rebase to test with #9023 change included

cpcloud · 2024-04-22T16:32:50Z

@NickCrews My only objection would be that

performance. If some computation is slow, they only want it to happen a single time.

Is not something we can enforce even if we never merge any select statements. This kind of guarantee is at the level of the query engine.

NickCrews · 2024-04-23T07:11:47Z

@cpcloud yup you are right with guarantees, "suggesting" to the backend is the best we can do.

What do you think in general of my proposal of "one CTE per .select"? I'm not sure if you're skeptical of the whole thing or just the performance claims... Thanks!

NickCrews · 2024-04-23T07:17:35Z

I'm trying to decide what is higher priority:

An implementation that is faster 90% of the time, but does clever things and therefore isn't able to be tuned by the user that 10% of the time they need it

Vs

An implementation that is a bit slower in the majority of cases, but is always fine tunable to get the perf you need in the edge cases.

NickCrews · 2024-04-23T19:22:12Z

slowly going through the backends and adding the correct marks for each kind of failure...

Need to fix the a few broken cases. Related to ibis-project#8921, trying to write down exactly what the expected behavior is.

NickCrews · 2024-05-07T01:48:44Z

@cpcloud @kszucs I think this is ready for review whenever you get the chance! I think this is the groundwork for defining the current state, and after we get this in then we can start talking about what we think ideal behavior should be, and how to get there.

NickCrews · 2024-05-21T21:13:54Z

Anything I can do here to help move this forward?

NickCrews force-pushed the test-correlation branch from 0992a3f to 48d4a51 Compare April 18, 2024 19:51

cpcloud reviewed Apr 18, 2024

View reviewed changes

ibis/backends/tests/test_impure.py Outdated Show resolved Hide resolved

NickCrews force-pushed the test-correlation branch from 48d4a51 to 1fea07f Compare April 19, 2024 00:27

NickCrews mentioned this pull request Apr 19, 2024

bug: inlining expressions leads to wrong results for non-pure functions #8921

Open

1 task

NickCrews force-pushed the test-correlation branch 2 times, most recently from 5d9a1ec to 21b180c Compare April 19, 2024 20:22

NickCrews force-pushed the test-correlation branch 2 times, most recently from 7700941 to d14384c Compare April 23, 2024 19:21

NickCrews force-pushed the test-correlation branch 3 times, most recently from 53c6941 to ee0ae0c Compare April 23, 2024 21:33

test: add test for impure function correlation behavior

37ae5dd

Need to fix the a few broken cases. Related to ibis-project#8921, trying to write down exactly what the expected behavior is.

NickCrews force-pushed the test-correlation branch from ee0ae0c to 37ae5dd Compare April 23, 2024 22:47

NickCrews enabled auto-merge (rebase) April 23, 2024 23:13

NickCrews mentioned this pull request Apr 25, 2024

bug: merging selections combines filters in incorrect way #9058

Closed

1 task

NickCrews requested review from cpcloud and kszucs May 7, 2024 01:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add test for impure function correlation behavior #9014

test: add test for impure function correlation behavior #9014

NickCrews commented Apr 18, 2024 •

edited

NickCrews commented Apr 18, 2024

NickCrews commented Apr 19, 2024

kszucs commented Apr 22, 2024

cpcloud commented Apr 22, 2024

NickCrews commented Apr 23, 2024

NickCrews commented Apr 23, 2024

NickCrews commented Apr 23, 2024

NickCrews commented May 7, 2024

NickCrews commented May 21, 2024

test: add test for impure function correlation behavior #9014

Are you sure you want to change the base?

test: add test for impure function correlation behavior #9014

Conversation

NickCrews commented Apr 18, 2024 • edited

NickCrews commented Apr 18, 2024

NickCrews commented Apr 19, 2024

kszucs commented Apr 22, 2024

cpcloud commented Apr 22, 2024

NickCrews commented Apr 23, 2024

NickCrews commented Apr 23, 2024

NickCrews commented Apr 23, 2024

NickCrews commented May 7, 2024

NickCrews commented May 21, 2024

NickCrews commented Apr 18, 2024 •

edited