Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-2736] Adds output batch compression for Data Transforms #18514

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented May 15, 2024

This PR adds the ability to configure output batch compression on WASM transforms, either at deploy time or post facto, via metadata patch request. Includes rpk experience for same

TODO:

  • rpk bits and tests for deploying w/ compression turned on

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Add output batch compression for Data Transforms (configurable per deployed transform)

@oleiman oleiman self-assigned this May 15, 2024
@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch 2 times, most recently from 4adf296 to 58df36c Compare May 16, 2024 05:45
@oleiman
Copy link
Member Author

oleiman commented May 16, 2024

/ci-repeat 1

@oleiman
Copy link
Member Author

oleiman commented May 16, 2024

/ci-repeat 1
release
skip-units
skip-redpanda-build

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 16, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/49248#018f82f5-d2fc-4bd5-b204-a1774afd79bf:

"rptest.tests.connection_virtualizing_test.TestVirtualConnections.test_handling_invalid_ids"

new failures in https://buildkite.com/redpanda/redpanda/builds/49248#018f82f5-d2ff-46d2-8840-d9e063a709e2:

"rptest.tests.connection_virtualizing_test.TestVirtualConnections.test_no_head_of_line_blocking.different_clusters=False.different_connections=False"

@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch 2 times, most recently from d9f6691 to f13207f Compare May 17, 2024 18:31
@oleiman oleiman marked this pull request as ready for review May 17, 2024 18:32
@oleiman oleiman requested review from a team and michael-redpanda and removed request for a team May 17, 2024 22:55
@@ -149,6 +153,7 @@ The --var flag can be repeated to specify multiple variables like so:
cmd.Flags().StringSliceVarP(&fc.outputTopics, "output-topic", "o", []string{}, "The output topic to write the transform results to (repeatable)")
cmd.Flags().StringVar(&fc.functionName, "name", "", "The name of the transform")
cmd.Flags().Var(&fc.env, "var", "Specify an environment variable in the form of KEY=VALUE")
cmd.Flags().StringVar(&fc.compression, "compression", "", "Output batch compression type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be an explicit "none"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah fair point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current flag-overrides-file model, wouldn't this preclude the ability to control the setting via YAML?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think so, if you ask me!

NAME INPUT TOPIC OUTPUT TOPIC RUNNING LAG
foo2bar foo bar 2 / 3 7
scrubber pii cleaned, munged 0 / 1 99
NAME INPUT TOPIC OUTPUT TOPIC RUNNING LAG COMPRESSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful? We don't print env vars and this feels like that configuration. Maybe in the detail view? Doesn't feel like there is a user story where someone wants this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, yeah I suppose not.

@@ -81,3 +81,40 @@ Subsequent 'rpk transform list' operations will show transform processors as
}
return cmd
}

func newSetCompressionCommand(fs afero.Fs, p *config.Params) *cobra.Command {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably confusing for users that this is not sticky across deploys

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like on the backend (i.e. sticks to (name, transform))? Or writing back to the config (i.e. sticks to the transform itself)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure! I just wanted to point out that the following sequence of commands would end up with a non expected outcome.

rpk transform deploy
rpk transform set compression=gzip
rpk transform deploy
# no more gzip!

I personally would just not have this command, but otherwise I think writing back to the file is a reasonable choice (or adding a warning if there is no config file)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a great point. The set command doesn't seem particularly useful. Thanks!

Comment on lines 102 to 103
at a cost. So, while it does occur asynchronously with respect to transform
execution, compression may introduce latency on the output topic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
at a cost. So, while it does occur asynchronously with respect to transform
execution, compression may introduce latency on the output topic.
at a cost. So, while it occurs asynchronously with respect to transform
execution, compression may introduce latency on the output topic.

@oleiman oleiman marked this pull request as draft May 31, 2024 19:56
boost::lexical_cast<compression>("bogus) should raise bad_lexical cast.

Previous behavior: runtime_error("Fell off the end of a string-switch")

Failure to match the source string to one of the compression lexical cases
should set the fail bit on the istream instead of allowing the runtime_error
to escape.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Also adds bool transform_metadata_patch::empty()

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the xform/core-2736/compressed-batches branch from f13207f to 1e48f71 Compare May 31, 2024 20:07
@oleiman
Copy link
Member Author

oleiman commented May 31, 2024

force push to sync w/ dev after a couple weeks of inactivity

"lz4",
"zstd"
],
"description": "Output batch compression mode (TODO(oren): might want to limit which?)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo removal reminder

"zstd"
],
"required": false,
"description": "Output batch compression mode (TODO(oren): might want to limit which?)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo removal reminder

"zstd"
],
"required": false,
"description": "Output batch compression mode (TODO(oren): might want to limit which?)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo removal reminder

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Respecting transform_metadata::compression_mode

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Feediver1
Feediver1 previously approved these changes Jun 4, 2024
- tranform_from_json
- deploy

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

dt/rpk
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman
Copy link
Member Author

oleiman commented Jun 7, 2024

force push contents:

  • remove set_compression command (compression set only at deploy time to avoid confusion)
  • fix up some tests to account for above
  • remove compression column from list transforms output table (still present in JSON representation)

@oleiman oleiman marked this pull request as ready for review June 7, 2024 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants