Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug](Fe) fix potential deadlock in show proc statement #34988

Merged
merged 1 commit into from
May 27, 2024

Conversation

xy720
Copy link
Member

@xy720 xy720 commented May 16, 2024

Proposed changes

Issue Number: close #xxx

version: 1.2|2.0|2.1

We have encountered Fe hang and the number connections reach limits.
Unfortunately, I restarted FE without Jstack the Fe process.

But I found a lot of logs like:

2024-05-11 14:09:32,533 WARN (thrift-server-pool-424042|4684871) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101
2024-05-11 14:09:32,545 WARN (thrift-server-pool-392529|4367180) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101
2024-05-11 14:09:32,663 WARN (thrift-server-pool-421895|4663462) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101
2024-05-11 14:09:32,816 WARN (thrift-server-pool-379243|4167917) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101
2024-05-11 14:09:33,531 WARN (thrift-server-pool-420797|4652941) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101

The tablet report task is hang:

2024-05-11 14:07:57,671 INFO (Thread-57|105) [ReportHandler.tabletReport():250] backend[10009] reports 28982 tablet(s). repor
t version: 17094493386232
2024-05-11 14:24:15,323 INFO (Thread-57|105) [TabletInvertedIndex.tabletReport():308] finished to do tablet diff with backend[10009].
sync: 0. metaDel: 7. foundInMeta: 28953. migration: 0. found invalid transactions 0. found republish transactions 0. tabletInMemorySync:
0. need recovery: 0. cost: 66 ms

And the dynamic partition scheduler seems also hang:

image

I found a similar issue, as issue #11319 saying, we should avoid using common global ForkJoinPool to execute parallelStream tasks and prevent Fe deadlocks from occurring.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@xy720
Copy link
Member Author

xy720 commented May 16, 2024

run buildall

@lide-reed
Copy link
Contributor

lide-reed commented May 17, 2024

It's said that java.util.concurrent.ForkJoinPool may degrade performance. Please confirm whether the changes here involve high-frequency operations.
https://stackoverflow.com/questions/20288379/analysis-performance-of-forkjoinpool

@xy720
Copy link
Member Author

xy720 commented May 17, 2024

It's said that java.util.concurrent.ForkJoinPool may degrade performance. Please confirm whether the changes here involve high-frequency operations.
https://stackoverflow.com/questions/20288379/analysis-performance-of-forkjoinpool

As we discussed, the pr will not affect performance. No longer use the global only thread pool here, but instead use its own thread pool separately.

Copy link
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 27, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@cambyzju cambyzju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xy720 xy720 merged commit 3664ca4 into apache:master May 27, 2024
32 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants