Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(linter/eslint): add no_unreachable rule. #3238

Draft
wants to merge 17 commits into
base: 06-06-feat_semantic_cfg_add_condition_instruction
Choose a base branch
from

Conversation

rzvxa
Copy link
Collaborator

@rzvxa rzvxa commented May 11, 2024

no-unreachable

oxlint-echosystem-ci result

This rule is done but since it is running for every possible statement and does quite a bit of work on them to determine whether it is 100% reachable or not; The performance in my opinion is kind of abysmal.

I'll try to work it out, I know Biome does 2 types of checks to simplify the rule for some nodes, However, they have a lot more false negatives than our implementation.

Here is one example of those false negatives

Update 1:

I've benchmarked this rule using only the simplified reachability checks and it was around 5% faster, To be honest, it isn't much improvement especially considering that we can only use this check for a small portion of nodes and even that is accompanied by newly introduced checks which would lessen the amount of performance gain further.

Most of the performance regression is because of allocations during our depth first search since we have to store both the visited and finished nodes which results in a bunch of rapid-fire allocations back to back. Currently, At the moment I don't have a great idea of how to improve it, We may have to implement our own graph to use arenas underneath.

Given that this rule is the most extensive use case of control flow (It doesn't come with a limited scope similar to property and constructor rules already implemented) this performance drop might be reasonable to some extent.

Copy link

graphite-app bot commented May 11, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

@github-actions github-actions bot added A-linter Area - Linter A-ast Area - AST labels May 11, 2024
@rzvxa rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch 2 times, most recently from 57f6ca1 to 94c3001 Compare May 12, 2024 19:41
Copy link

codspeed-hq bot commented May 12, 2024

CodSpeed Performance Report

Merging #3238 will degrade performances by 22.18%

Comparing 05-11-feat_linter_eslint_add_no_unreachable_rule (6aa7af5) with 06-06-feat_semantic_cfg_add_condition_instruction (1644851)

Summary

❌ 1 regressions
✅ 21 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark 06-06-feat_semantic_cfg_add_condition_instruction 05-11-feat_linter_eslint_add_no_unreachable_rule Change
linter[checker.ts] 1.3 s 1.6 s -22.18%

@rzvxa rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from 94c3001 to ece1ff3 Compare May 21, 2024 08:52
@rzvxa rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from 4b0951e to a4f9ee3 Compare May 28, 2024 13:30
@rzvxa rzvxa changed the base branch from main to 05-28-improvement_semantic_cfg_better_control_flow_for_forstatement_s May 28, 2024 13:30
@rzvxa rzvxa force-pushed the 05-28-improvement_semantic_cfg_better_control_flow_for_forstatement_s branch from 658e5d5 to ae657bc Compare May 28, 2024 14:48
@rzvxa rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from a4f9ee3 to ba99160 Compare May 28, 2024 14:50
@rzvxa rzvxa force-pushed the 05-28-improvement_semantic_cfg_better_control_flow_for_forstatement_s branch from ae657bc to 857840a Compare May 28, 2024 18:06
@rzvxa rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from ba99160 to 842eccc Compare May 30, 2024 17:32
@rzvxa rzvxa changed the base branch from 05-28-improvement_semantic_cfg_better_control_flow_for_forstatement_s to 05-30-improvement_semantic_cfg_cfg_api_for_detecting_conditional_paths May 30, 2024 17:32
@rzvxa rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch 3 times, most recently from 6bb0206 to 9bdf934 Compare May 30, 2024 20:16
@rzvxa rzvxa changed the base branch from 05-30-improvement_semantic_cfg_cfg_api_for_detecting_conditional_paths to 05-30-improvement_linter_react_use_new_cfg_for_detection_conditional_nodes May 30, 2024 20:17
@rzvxa rzvxa force-pushed the 06-06-feat_semantic_cfg_add_condition_instruction branch from 504c906 to 1644851 Compare June 6, 2024 18:40
@rzvxa rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from b54237e to 3b12589 Compare June 6, 2024 18:40
@rzvxa rzvxa requested a review from Boshen June 6, 2024 22:03
@rzvxa
Copy link
Collaborator Author

rzvxa commented Jun 6, 2024

@Boshen Can you read the first comment on this PR? I was wondering how much should I improve the performance to make it acceptable.

@Boshen
Copy link
Member

Boshen commented Jun 7, 2024

@rzvxa Instead of doing the check inside the rule, are you able to mark the statement as unreachable during the cfg building phase inside semantic builder? For example add any of the following:

pub struct Semantic<'a> {
    pub unreachable: Vec<Span>, // All the unreachable spans
}

In the rule you just print out these spans in one go - a rule with only one line of code lol.

Or:

pub struct SemanticNode<'a> {
  pub unreachable: bool
}

Think outside of the box, we have the whole compiler pipeline, we can precompute a lot of stuff!

@rzvxa
Copy link
Collaborator Author

rzvxa commented Jun 7, 2024

@rzvxa Instead of doing the check inside the rule, are you able to mark the statement as unreachable during the cfg building phase inside semantic builder? For example add any of the following:

pub struct Semantic<'a> {
    pub unreachable: Vec<Span>, // All the unreachable spans
}

In the rule you just print out these spans in one go - a rule with only one line of code lol.

Or:

pub struct SemanticNode<'a> {
  pub unreachable: bool
}

Think outside of the box, we have the whole compiler pipeline, we can precompute a lot of stuff!

Doesn't adding this to the semantic mean we have to pay for this cost in places where it isn't necessary? Not to mention the amount of code we have to add to our semantic builder to compute and determine the reachability.

I'm sure with this much performance regression even I wouldn't like my current results (and if it was my project I wouldn't allow it to merge) But having it in semantic doesn't feel right to me either. Maybe we can achieve a better performance by rewriting the rule to use run_once and compare reachablity in relation to the last rechable node instead of checking from function(or root) down.

For example if we visit an infinite loop we know for sure everything after this loop is unreachable, But in my current implementation we can't remember that information between nodes so we get to do it all over again.

@Boshen
Copy link
Member

Boshen commented Jun 7, 2024

Yes, a run_once through the entire CFG can do the trick! Assuming you can query out the unreachable spans.

@Boshen
Copy link
Member

Boshen commented Jun 7, 2024

In TypeScript, it marks nodes as unreachable via FlowFlags.Unreachable, and is queried in checker.ts by the function isReachableFlowNode, maybe we can draw some inspirations there 😅

@Boshen
Copy link
Member

Boshen commented Jun 7, 2024

Maybe there's a bug somewhere? Or maybe you need to add some cache to the visited nodes?

image

checker.ts has some really complicated TypeScript code so maybe we hit an edge case somewhere.

@rzvxa
Copy link
Collaborator Author

rzvxa commented Jun 7, 2024

In TypeScript, it marks nodes as unreachable via FlowFlags.Unreachable, and is queried in checker.ts by the function isReachableFlowNode, maybe we can draw some inspirations there 😅

We already have an unreachable instruction to mark the first unreachable block but it doesn't propagate and we use it to filter out the subgraph when walking through the flow, I assume they are doing all of our checks - if not more - to set these flags correctly. I'll give the typescript approach a read before making any major changes but I have a hunch that whatever approach we use to implement this rule it should get rid of our current redundant revisits either by running once or precomputing the information(which in of itself is another kind of run once happening in the core instead of linter crate).

It all brings me to the question I was going to ask before seeing this comment. Considering that this information can be reused but comes at the cost of more memory and overhead for everything using the semantics, Do you think having it there is going to be better overall?
Let's say we will have a 15% regression for the linter benchmark(which includes everything else so the number is higher than 15% if we pinpoint it to the linter itself) but if we do it in the semantic it would be faster let's say the time is cut in half and is 7% for the doing it in the semantic. It would make our linter benchmark to be 8% faster than the other approach but in return, it makes all of the other benchmarks to be about 7% slower; in reality, things like semantic benchmarks are going to get hit much greater than something like transformers - when they are all implemented - unless we heavily lean into feature-gating it.

@rzvxa
Copy link
Collaborator Author

rzvxa commented Jun 7, 2024

Maybe there's a bug somewhere? Or maybe you need to add some cache to the visited nodes?
image

checker.ts has some really complicated TypeScript code so maybe we hit an edge case somewhere.

I know we are revisiting nodes over and over since we can't memorize the information from our run on the previous nodes, That's why I hope we get a great boost after getting rid of them, if I'm correct it would be something like going from around O(n^3) or even O(n!) in the worst case scenarios to around O(n^2).

@rzvxa
Copy link
Collaborator Author

rzvxa commented Jun 7, 2024

The sad part is that all of our dfs paths are only 8% of the performance regression the other 12% is for the allocations these visits do.

@Boshen
Copy link
Member

Boshen commented Jun 7, 2024

Do you think having it there is going to be better overall

We'll eventually make cfg construction optional at some point.

Regarding performance regression, I think we can accept around 10 - 15% given that we are doing a lot of work for cfg.

@rzvxa
Copy link
Collaborator Author

rzvxa commented Jun 7, 2024

Regarding performance regression, I think we can accept around 10 - 15% given that we are doing a lot of work for cfg.

Just to be clear do you mean it is acceptable if we keep it in the semantics or are you talking about this rule alone?

@Boshen
Copy link
Member

Boshen commented Jun 7, 2024

Regarding performance regression, I think we can accept around 10 - 15% given that we are doing a lot of work for cfg.

Just to be clear do you mean it is acceptable if we keep it in the semantics or are you talking about this rule alone?

Overall performance regression of 10 to 15%.

We can keep the unreachable query in the linter rule if that's what you prefer, I was only suggesting that maybe it's better to precompute in the semantic builder. I'll let you make the decision 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST A-linter Area - Linter A-semantic Area - Semantic
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants