Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BitSail][Connector] Optimize split strategy in Kudu Reader #143

Open
BlockLiu opened this issue Nov 9, 2022 · 1 comment · May be fixed by #451
Open

[BitSail][Connector] Optimize split strategy in Kudu Reader #143

BlockLiu opened this issue Nov 9, 2022 · 1 comment · May be fixed by #451
Assignees
Labels
difficulty-easy Easy difficulty to fix this issue help wanted Extra attention is needed

Comments

@BlockLiu
Copy link
Collaborator

BlockLiu commented Nov 9, 2022

Is your feature request related to a problem? Please describe

Current Kudu reader only supports one split strategy, i.e. SIMPLE_DIVIDE which simply evenly divide a integer range into several sub-ranges as splits.
Document reference: connector-kudu

This split strategy has shortages:

  1. User has to determine a integer type column (int8, int16, int32, int64) as split dimension.
  2. If user does not know the lower and upper bound, it will scan the whole table to get the actual lower and upper bound.
  3. It does not support null value in the dimension.

So this issue wants someone(s) to optimize current split strategy or implement other split strategies.

Describe the solution you'd like

  1. Similar to KuduTableInputFormat in kudu-mapreduce, may be we can let user directly set serialized KuduPredicates in configuration files.
  2. KuduTable supports List<Partition> getRangePartitions(long timeout) method. This method can get all range partitions in the table. Maybe one can directly use these partitioned ranges as splits.

Describe alternatives you've considered

Additional context

@garyli1019 garyli1019 added help wanted Extra attention is needed difficulty-easy Easy difficulty to fix this issue labels Nov 9, 2022
@beyond-up
Copy link
Contributor

please assign to me, thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty-easy Easy difficulty to fix this issue help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants