Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binlog parallel parsing #270

Open
Ryan-Git opened this issue Mar 29, 2020 · 0 comments
Open

binlog parallel parsing #270

Ryan-Git opened this issue Mar 29, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@Ryan-Git
Copy link
Collaborator

Ryan-Git commented Mar 29, 2020

There are some use cases where parsing binlog is the bottleneck, such as syncing from severals hours/days ago after dumping big uninterested table. Throughput is around 40 to 50 thousands records per second in production for us and exhausted one cpu core. If we could parse binlog in parallel, much higher throughput in this scenario could be reached I think.

To make this possible, we have to break the sequential assumption(within one input stream) from input to sliding window. One possible solution is add a prepare step before submit in scheduler. Sequence sensitive logic such as id allocating should be done before prepare, then start parallel parsing, and finally submit it as before.

@Ryan-Git Ryan-Git added the enhancement New feature or request label Mar 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant