Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(api): refactor the implementation of windowing #9200

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

chloeh13q
Copy link
Contributor

@chloeh13q chloeh13q commented May 15, 2024

Description of changes

Some design considerations:

  • Removed the original implementation of windowing to avoid confusion.
  • I thought about creating a window class to hold all the relevant windowing info like window type, window size, window slide, etc. The API would be slightly cleaner that way but I worry that it would be confused with the existing ibis.window().
  • Tumble and hop are the most basic types of windows that we should support across the board. Session windows and cumulate windows are not supported in every backend and the API is slightly different.

The new API:

>>> import ibis
>>> from ibis import _
>>> t = ibis.table(
...     ibis.schema(
...         {
...             "createTime": "timestamp(3)",
...             "orderId": "int64",
...             "payAmount": "float64",
...             "payPlatform": "int32",
...             "provinceId": "int32",
...         }
...     ),
...     name="payment_msg",
... )
>>> expr = (
...         t.window_by(time_col="createTime")
...         .tumble(size=ibis.interval(seconds=30))
...         .agg(by=["provinceId"], avgPayAmount=_.payAmount.mean())
...     )
>>> expr
r0 := UnboundTable: payment_msg
  createTime  timestamp(3)
  orderId     int64
  payAmount   float64
  payPlatform int32
  provinceId  int32

WindowAggregate[r0]
  window_type:
    tumble
  time_col:
    r0.createTime
  groups:
    provinceId: r0.provinceId
  metrics:
    avgPayAmount: Mean(r0.payAmount)
  window_size:
    30 s
  schema:
    window_start timestamp
    window_end   timestamp
    provinceId   int32
    avgPayAmount float64

Issues closed

#8847

@chloeh13q chloeh13q force-pushed the refactor/windowing branch 2 times, most recently from 79de201 to ae7bd82 Compare May 30, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant