Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to Table.union #9968

Merged
merged 45 commits into from
May 22, 2024
Merged

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented May 16, 2024

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.

@radeusgd radeusgd force-pushed the wip/radeusgd/9952-update-table-union branch 4 times, most recently from 7409ed8 to cd6822e Compare May 20, 2024 16:02
@radeusgd radeusgd force-pushed the wip/radeusgd/9952-update-table-union branch from cd6822e to ea43cca Compare May 21, 2024 15:27
Comment on lines 349 to 362
date_time_pending = if setup.test_selection.date_time.not then "Date/Time operations are not supported."
group_builder.specify "should warn when converting a Date to Date_Time" pending=date_time_pending <|
t1 = table_builder [["D", [Date_Time.new 2024 5 16 16 48 23]]]
t2 = table_builder [["D", [Date.new 2019 10 23, Date.new 2020]]]

t6 = call_union [t1, t4]
expect_column_names ["A"] t6
t6.at "A" . value_type . should_be_a (Value_Type.Decimal ...)
t6.at "A" . to_vector . should_equal [0, 1, 2, 2^100, 2^10, 2]
action = call_union [t1, t2] on_problems=_
tester table =
IO.println ""
table.print
expect_column_names ["D"] table
table.at "D" . value_type . should_equal Value_Type.Date_Time
table.at "D" . to_vector . should_equal_tz_agnostic [Date_Time.new 2024 5 16 16 48 23, Date_Time.new 2019 10 23 0 0 0, Date_Time.new 2020 1 1 0 0 0]
problems = [Mixing_Date_Time_Types.Date_To_Date_Time "D"]
Problems.test_problem_handling action problems tester
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example shows why the warning is needed.

When running in-memory, the coercion happens as +- expected:

   | D
---+------------------------------------------
 0 | 2024-05-16 16:48:23+02:00[Europe/Warsaw]
 1 | 2019-10-23 00:00:00+02:00[Europe/Warsaw]
 2 | 2020-01-01 00:00:00+01:00[Europe/Warsaw]

But when running on Postgres, it's a bit different story:

 D
----------------------
 2024-05-16 14:48:23Z
 2019-10-22 22:00:00Z
 2019-12-31 23:00:00Z

This behaviour actually does make sense - the Date is converted to Date_Time by setting it to midnight at the timezone it was inputted in - which is Europe/Warsaw in my case. However then, when we round-trip it back from Postgres to JDBC, due to how Postgres stores the data, it is sent to us in UTC. Hence it is no longer 00:00. To make matters worse - it is sometimes 22:00 and sometimes 23:00 - because Europe/Warsaw has DST but UTC doesn't! This is why a warning is useful to let the user know that things may not necessarily work as they expect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a slightly safer way to do that could be to always interpret the Date as 00:00 in UTC instead of default zone. That would make Postgres and in-memory consistent. But it may still not be a catch-all solution once we add more DB backends. This kind of coercion is bound to have some implicit assumptions, so I think best course of action is relying on backend's default and warning (current behaviour).

@radeusgd radeusgd self-assigned this May 21, 2024
@radeusgd radeusgd marked this pull request as ready for review May 21, 2024 15:46
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to have this done so quickly.
A minor nit but nothing more.

Comment on lines +29 to +35
## PRIVATE
The default widget for `Columns_To_Keep`.
It does not display the internal `In_Any_Warn_On_Missing` variant, since
that variant is only meant to be used as the default value.
default_widget -> Widget =
make_single_choice <|
["In_Any", "In_All", "In_List"].map c-> [c, ".."+c]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for now but feels like we should do a construct where we pass the ctors in and use meta to do this.

@radeusgd
Copy link
Member Author

Showing our widgets in action:
https://github.com/enso-org/enso/assets/1436948/e3ab0f33-77b9-4e9c-923d-d12c0b946d6e

One thing I don't like is that when we click + and add a new _ entry, this results in an ugly error:
image

That's why we usually rely on some nicer default. But for Table what may that be? Any suggestions?

I guess we could have Table.new [["X", []]] but I'm not convinced that's the right thing to do. IMHO I'd leave this as is for now - even if not ideal.

@radeusgd
Copy link
Member Author

For now manually tested the scenario of mixing Date_Time with and without timezone. If needed, we can add a Postgres-specific test (no other backend currently supports both types).

image

## PRIVATE
A warning indicating that no common type could be found, so the operation
had to fall back to converting all values to text.
Warning (types : Vector Value_Type) (related_column_name:Text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename this to Warning_Convert_To_Text or Convert_To_Text, since we might have other approaches for handling No_Common_Type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do

t5.at "A" . to_vector . should_equal [1.0, 0.0, 1.0, 1.5, 0.0, 2.0]

## TODO once the https://github.com/enso-org/enso/issues/8355 is implemented, the behaviour of this test will probably change:
When merging Float and big Integers, maybe we shall convert to BigDecimal, not to Float?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss this. Conversion to Decimal would be the most general way to do it, but Decimals are a lot slower and we might want to prevent conversion to it in cases like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I guess we can decide to not do it in that case.

@radeusgd radeusgd force-pushed the wip/radeusgd/9952-update-table-union branch from 716af01 to 04527d5 Compare May 21, 2024 18:13
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label May 21, 2024
@mergify mergify bot merged commit 1e0649f into develop May 22, 2024
34 checks passed
@mergify mergify bot deleted the wip/radeusgd/9952-update-table-union branch May 22, 2024 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Table.union
3 participants