Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect parsing of large binary/octal/hex literals #3347

Open
overlookmotel opened this issue May 18, 2024 · 1 comment
Open

Incorrect parsing of large binary/octal/hex literals #3347

overlookmotel opened this issue May 18, 2024 · 1 comment
Assignees
Labels
A-parser Area - Parser C-bug Category - Bug

Comments

@overlookmotel
Copy link
Collaborator

@DonIsaac I'm afraid I found an edge case with #3296.

Because of the switch from using floating point to integers, we need to handle arithmetic overflow now.

For numbers which evaluate to more than u64::MAX, parsing is incorrect.

e.g.: 0x10000000000000000 should be evaluated as 2.0f64.powf(64.0), but instead produces 0.

Playground

@overlookmotel overlookmotel added the C-bug Category - Bug label May 18, 2024
@overlookmotel
Copy link
Collaborator Author

A possible fix would be:

  • Trim any 0s from start of string before looping over chars.
  • Stop the main loop after 16 characters (16 for hex, more for octal or binary).
  • If further characters remain, shift the number after conversion to f64 to account for the extra digits.

Such enormous numbers are an uncommon case, and we wouldn't want to slow down the fast path to accommodate it. So I'd suggest:

  • Implement the full algorithm which handles large numbers in a separate function parse_hex_slow.
  • At start of parse_hex, quickly check if length is larger than what can be handled by the fast algorithm.
  • Switch to parse_hex_slow if so.
  • Mark parse_hex_slow as #[cold] and #[inline(never)].

This should only cost the fast path 2 CPU ops (test length, conditional jump), but will preserve correct behavior in all cases.

That's just a suggestion, you may see a better way. And if you don't have time for this, just say, and I'll do it.

@DonIsaac DonIsaac self-assigned this May 20, 2024
@DonIsaac DonIsaac added the A-parser Area - Parser label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area - Parser C-bug Category - Bug
Projects
None yet
Development

No branches or pull requests

3 participants