Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] After the flink job written to Paimon modifies sink.parallelism, the job will not be able to recover from the checkpoint #3232

Closed
1 of 2 tasks
huyuanfeng2018 opened this issue Apr 18, 2024 · 3 comments · Fixed by #3300
Labels
bug Something isn't working

Comments

@huyuanfeng2018
Copy link
Contributor

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

master

Compute Engine

flink

Minimal reproduce step

  1. First set the parallelism degree of 1 to write Paimon

  2. Stop passing the current task

  3. Modify the parallelism to be greater than 1

  4. Restore from the last checkpoint

What doesn't meet your expectations?

The job should be able to resume normally from the last checkpoint or savepoint, even if I change the parallelism of the sink

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@huyuanfeng2018 huyuanfeng2018 added the bug Something isn't working label Apr 18, 2024
@huyuanfeng2018
Copy link
Contributor Author

        SingleOutputStreamOperator<?> committed =
                written.rebalance().transform(
                                GLOBAL_COMMITTER_NAME,
                                new MultiTableCommittableTypeInfo(),
                                new CommitterOperator<>(
                                        streamingCheckpointEnabled,
                                        commitUser,
                                        createCommitterFactory(),
                                        createCommittableStateManager()))
                        .setParallelism(1)
                        .setMaxParallelism(1);

This can be avoided by adding rebalance before the commit operator, because the parallelism of commit is always 1. When the write parallelism is 1, flink will chain the two operators. When the parallelism of writing increases, will be split

@JingsongLi
Copy link
Contributor

@huyuanfeng2018 this is a good issue, but a chain committer can reduce resource cost.

maybe we can have an option to control this.

@huyuanfeng2018
Copy link
Contributor Author

@huyuanfeng2018 this is a good issue, but a chain committer can reduce resource cost.

maybe we can have an option to control this.

+1. Agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants