Add ApplyBucketsWithInterpolation TFTransform #31291

jrmccluskey · 2024-05-14T17:51:42Z

Implements the apply_buckets_with_interpolation() Tensorflow Transform into MLTransform.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

github-actions · 2024-05-14T19:06:32Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

jrmccluskey · 2024-05-14T19:44:04Z

assign set of reviewers

jrmccluskey · 2024-05-14T19:44:33Z

looks like the python 3.11 ML tests have some sort of setup issue (#31287)

github-actions · 2024-05-14T19:45:17Z

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @shunping for label python.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

tvalentyn · 2024-05-14T21:55:51Z

looks like the python 3.11 ML tests have some sort of setup issue (#31287)

for now you can consider this a new suite that never worked.

tvalentyn · 2024-05-14T21:56:29Z

our ml tests are in a bad shape, but they were not running extensively previously.

github-actions · 2024-05-22T12:13:52Z

Reminder, please take a look at this pr: @shunping

sdks/python/apache_beam/ml/transforms/tft.py

tvalentyn · 2024-05-22T15:08:28Z

sdks/python/apache_beam/ml/transforms/tft.py

+    [0, 1].
+
+    Input values are bucketized based on the provided boundaries such that the
+    input is mapped to a positive index i for which bucket_boundaries[i-1] <=


do we need some escaping for code-snippets with backticks or something similar to make it look nicer in pydoc?

We can try that out, added some backticks

tvalentyn · 2024-05-22T15:08:54Z

sdks/python/apache_beam/ml/transforms/tft.py

+    input is mapped to a positive index i for which bucket_boundaries[i-1] <=
+    element < bucket_boundaries[i], if it exists. The values are then
+    normalized to the range [0,1] within the bucket, with NaN values being
+    mapped to 0.5.


should we link to TFT docs for more info as in some other ML Ops?

tvalentyn · 2024-05-22T15:11:03Z

sdks/python/apache_beam/ml/transforms/tft.py

+
+    Args:
+      columns: A list of column names to apply the transformation on.
+      bucket_boundaries: A rank 2 Tensor or list representing the bucket


Is the typehint set correctly for bucket_boundaries? Is Rank 2 Tensor a 2d matrix?

it's consistent with what we're accepting as valid input in ApplyBuckets, both co-opt the language from TFT. Updating those to be more accurate to our function signature seems reasonable, changed for both functions

github-actions · 2024-05-27T12:13:46Z

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @tvalentyn for label python.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

tvalentyn · 2024-05-28T02:33:20Z

waiting on author

Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>

sdks/python/apache_beam/ml/transforms/tft.py

Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>

Add ApplyBucketsWithInterpolation TFTransform

1585576

github-actions bot added the python label May 14, 2024

github-actions bot added the Next Action: Reviewers label May 14, 2024

github-actions bot added the slow-review label May 22, 2024

tvalentyn reviewed May 22, 2024

View reviewed changes

github-actions bot removed the slow-review label May 27, 2024

github-actions bot added Next Action: Author and removed Next Action: Reviewers labels May 28, 2024

Update sdks/python/apache_beam/ml/transforms/tft.py

5deb65e

Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>

github-actions bot added Next Action: Reviewers and removed Next Action: Author labels May 29, 2024

jrmccluskey added 2 commits May 29, 2024 15:16

add tft documentation link

f4b4d18

change docstring wording around bucket_boundaries

ed034d4

tvalentyn approved these changes May 29, 2024

View reviewed changes

sdks/python/apache_beam/ml/transforms/tft.py Outdated Show resolved Hide resolved

Update sdks/python/apache_beam/ml/transforms/tft.py

b157d12

Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>

jrmccluskey merged commit 06e103d into apache:master May 29, 2024
89 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ApplyBucketsWithInterpolation TFTransform #31291

Add ApplyBucketsWithInterpolation TFTransform #31291

jrmccluskey commented May 14, 2024

github-actions bot commented May 14, 2024

jrmccluskey commented May 14, 2024

jrmccluskey commented May 14, 2024 •

edited

github-actions bot commented May 14, 2024

tvalentyn commented May 14, 2024

tvalentyn commented May 14, 2024

github-actions bot commented May 22, 2024

tvalentyn May 22, 2024

jrmccluskey May 29, 2024

tvalentyn May 22, 2024

jrmccluskey May 29, 2024

tvalentyn May 22, 2024

jrmccluskey May 29, 2024

github-actions bot commented May 27, 2024

tvalentyn commented May 28, 2024

Add ApplyBucketsWithInterpolation TFTransform #31291

Add ApplyBucketsWithInterpolation TFTransform #31291

Conversation

jrmccluskey commented May 14, 2024

GitHub Actions Tests Status (on master branch)

github-actions bot commented May 14, 2024

jrmccluskey commented May 14, 2024

jrmccluskey commented May 14, 2024 • edited

github-actions bot commented May 14, 2024

tvalentyn commented May 14, 2024

tvalentyn commented May 14, 2024

github-actions bot commented May 22, 2024

tvalentyn May 22, 2024

Choose a reason for hiding this comment

jrmccluskey May 29, 2024

Choose a reason for hiding this comment

tvalentyn May 22, 2024

Choose a reason for hiding this comment

jrmccluskey May 29, 2024

Choose a reason for hiding this comment

tvalentyn May 22, 2024

Choose a reason for hiding this comment

jrmccluskey May 29, 2024

Choose a reason for hiding this comment

github-actions bot commented May 27, 2024

tvalentyn commented May 28, 2024

jrmccluskey commented May 14, 2024 •

edited