Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast path for parsing decimal literals #3288

Open
overlookmotel opened this issue May 15, 2024 · 3 comments
Open

Fast path for parsing decimal literals #3288

overlookmotel opened this issue May 15, 2024 · 3 comments
Assignees
Labels
A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance

Comments

@overlookmotel
Copy link
Collaborator

overlookmotel commented May 15, 2024

#3283 made an optimization to the parser to avoid checking for _s in numeric literals twice, and got a nice 1% speed-up on parser benchmarks.

It occurs to me that we could repeat the trick for the common case of simple decimal literals (e.g. 0, 1, 123). str::parse::<f64>() is quite a complicated function which checks for all kinds of things (exponent, decimal point, valid input etc). All of this is repeated work, since we've already determined in the lexer what the input is.

Perhaps we could have a flag which lexer sets if the input is a simple case, and then use a hand-rolled parse_simple_decimal function to convert to f64, like the ones we have for binary and octal?

fn parse_octal(s: &str) -> f64 {
debug_assert!(!s.is_empty());
let mut result = 0_f64;
for c in s.as_bytes().iter().filter(|s| s != &&b'_') {
#[allow(clippy::cast_lossless)]
let value = (c - b'0') as f64;
result = result.mul_add(8.0, value);
}
result
}

(we should base implementation on std's str::parse::<f64> as it's probably very optimized already, but just cut out all the extraneous checks)

Given that #3283 got a 1% speed-up, maybe this can get the same again, or maybe more?

@DonIsaac Would you be interested in investigating?

@overlookmotel overlookmotel added the C-performance Category - Solution not expected to change functional behavior, only performance label May 15, 2024
@Boshen Boshen removed their assignment May 15, 2024
@overlookmotel
Copy link
Collaborator Author

overlookmotel commented May 15, 2024

As I mentioned in #3289, it's mysterious why #3283 got a 5% speed bump on lexer benchmarks.

Before we test out what I'm suggesting in this issue, we should investigate that further. If it turns out that converting the padding bytes in Token from u8 to bool is a speed-gain in itself, we should do that first, so then we can measure the impact of the change described here in isolation (as it will also involve using one of those padding bytes as a bool).

@overlookmotel overlookmotel added the A-parser Area - Parser label May 15, 2024
@DonIsaac DonIsaac self-assigned this May 15, 2024
@DonIsaac
Copy link
Collaborator

Just did something similar for non-decimals in #3296. Decimals are up next

@DonIsaac
Copy link
Collaborator

@overlookmotel I discovered that Kind::Decimals are never floating point numbers. However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements. However, my local benchmarking is not always accurate, so I'll make a small PR so we can see bench results from CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance
Projects
None yet
Development

No branches or pull requests

3 participants