[Bug] After the flink job written to Paimon modifies `sink.parallelism`, the job will not be able to recover from the checkpoint #3232

huyuanfeng2018 · 2024-04-18T07:35:52Z

Search before asking

I searched in the issues and found nothing similar.

Paimon version

master

Compute Engine

flink

Minimal reproduce step

First set the parallelism degree of 1 to write Paimon
Stop passing the current task
Modify the parallelism to be greater than 1
Restore from the last checkpoint

What doesn't meet your expectations?

The job should be able to resume normally from the last checkpoint or savepoint, even if I change the parallelism of the sink

Anything else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

huyuanfeng2018 · 2024-04-18T07:39:17Z

        SingleOutputStreamOperator<?> committed =
                written.rebalance().transform(
                                GLOBAL_COMMITTER_NAME,
                                new MultiTableCommittableTypeInfo(),
                                new CommitterOperator<>(
                                        streamingCheckpointEnabled,
                                        commitUser,
                                        createCommitterFactory(),
                                        createCommittableStateManager()))
                        .setParallelism(1)
                        .setMaxParallelism(1);

This can be avoided by adding rebalance before the commit operator, because the parallelism of commit is always 1. When the write parallelism is 1, flink will chain the two operators. When the parallelism of writing increases, will be split

JingsongLi · 2024-04-30T11:04:45Z

@huyuanfeng2018 this is a good issue, but a chain committer can reduce resource cost.

maybe we can have an option to control this.

huyuanfeng2018 · 2024-05-06T02:25:37Z

@huyuanfeng2018 this is a good issue, but a chain committer can reduce resource cost.

maybe we can have an option to control this.

+1. Agree

huyuanfeng2018 added the bug Something isn't working label Apr 18, 2024

huyuanfeng2018 mentioned this issue May 6, 2024

[Flink] Introduce new parameters to control whether the committer operator and writer operator are chained together #3300

Merged

JingsongLi closed this as completed in #3300 May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] After the flink job written to Paimon modifies `sink.parallelism`, the job will not be able to recover from the checkpoint #3232

[Bug] After the flink job written to Paimon modifies `sink.parallelism`, the job will not be able to recover from the checkpoint #3232

huyuanfeng2018 commented Apr 18, 2024

huyuanfeng2018 commented Apr 18, 2024

JingsongLi commented Apr 30, 2024

huyuanfeng2018 commented May 6, 2024

[Bug] After the flink job written to Paimon modifies sink.parallelism, the job will not be able to recover from the checkpoint #3232

[Bug] After the flink job written to Paimon modifies sink.parallelism, the job will not be able to recover from the checkpoint #3232

Comments

huyuanfeng2018 commented Apr 18, 2024

Search before asking

Paimon version

Compute Engine

Minimal reproduce step

What doesn't meet your expectations?

Anything else?

Are you willing to submit a PR?

huyuanfeng2018 commented Apr 18, 2024

JingsongLi commented Apr 30, 2024

huyuanfeng2018 commented May 6, 2024

[Bug] After the flink job written to Paimon modifies `sink.parallelism`, the job will not be able to recover from the checkpoint #3232

[Bug] After the flink job written to Paimon modifies `sink.parallelism`, the job will not be able to recover from the checkpoint #3232