Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Table.union #9952

Closed
jdunkerley opened this issue May 14, 2024 · 5 comments · Fixed by #9968
Closed

Update Table.union #9952

jdunkerley opened this issue May 14, 2024 · 5 comments · Fixed by #9968
Assignees
Labels
-libs Libraries: New libraries to be implemented
Milestone

Comments

@jdunkerley
Copy link
Member

Allow more relaxed type coercion.

  • If there is a mixed column, then the result will be mixed. Only way to get a Mixed column.
  • Widen including text (so everything can be widened enough). If binary columns then should error.
  • By default we warn if we take a non-text type to text.
  • Warn if potential loss of accuracy (e.g. Int64 merging to a Float64).
    • Int32 and Float32 can merge to Float64 (no warn).
  • Warn if attaching a TZ to a non-TZ column.
    • Date ==> Date_Time with 00:00:00 and attach a TZ (hence warning).

Changes to the API:

  • allow_type_widening is removed.
  • keep_unmatched_columns is removed.
  • New keep_columns arguments with new Atom type:
    (feel free to name better!)
type Columns_To_Keep
    ## All columns are kept, 
    In_Any

    ## On columns present in all are kept.
    In_All

    ## Specific list of column names
    In_List (column_names:Vector Text)
  • If ..By_Position and ..List throw an error.
  • tables should have a vector editor as default widget.
@jdunkerley jdunkerley added the -libs Libraries: New libraries to be implemented label May 14, 2024
@jdunkerley jdunkerley added this to the Beta Release milestone May 14, 2024
@enso-bot
Copy link

enso-bot bot commented May 16, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-05-15):

Progress: Some additional fixes in the DataLink PR. Got merged a PR preparing for Cloud endpoint move. Prepared a PR reverting a temporary fallback. Started work on Table.union improvement - clarifications, documenting new behaviour to deeper understand what we need. It should be finished by 2024-05-21.

Next Day: Next day I will be working on the same task. Update tests. Update implementation.

@enso-bot
Copy link

enso-bot bot commented May 17, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-05-16):

Progress: Updated Union tests to reflect the new semantics. Some minor docs clarifications. Last minute improvements to Datalink PR, finally merged. It should be finished by 2024-05-21.

Next Day: Next day I will be working on the same task. Update implementation.

@enso-bot
Copy link

enso-bot bot commented May 20, 2024

Radosław Waśko reports a new STANDUP for the provided date (2024-05-17):

Progress: Implemented the new column matching logic, incl. In_List and error handling. Updating the column type unification logic - WIP. It should be finished by 2024-05-21.

Next Day: Next day I will be working on the same task. Continue implementation of new type merging logic for Union.

@enso-bot
Copy link

enso-bot bot commented May 20, 2024

Radosław Waśko reports a new STANDUP for today (2024-05-20):

Progress: Got the unification mostly working on in-memory. Also aligned the DB backend - SQLite tests are passing. Postgres crashing. It should be finished by 2024-05-21.

Next Day: Next day I will be working on the same task. Fix Postgres problem. Add missing warning on Date->Date_Time. Try cleanup as there's some repetitiveness in current implementation between DB and in-mem.

@enso-bot
Copy link

enso-bot bot commented May 22, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-05-21):

Progress: Finished DB implementation. Code cleanup, implemented warnings for Date->Date_Time, fixed failing tests and separated type logic changes to only affect Union. PR ready and reviewed. One seemingly unrelated test is failing so blocked from merge :( It should be finished by 2024-05-21.

Next Day: Next day I will be working on the #9849 task. Fix that one test and get the PR in. Work on next task (UTF BOM).

@mergify mergify bot closed this as completed in #9968 May 22, 2024
mergify bot pushed a commit that referenced this issue May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented
Projects
Status: 🟢 Accepted
Development

Successfully merging a pull request may close this issue.

2 participants