Incorrect parsing of large binary/octal/hex literals #3347

overlookmotel · 2024-05-18T20:58:43Z

@DonIsaac I'm afraid I found an edge case with #3296.

Because of the switch from using floating point to integers, we need to handle arithmetic overflow now.

For numbers which evaluate to more than u64::MAX, parsing is incorrect.

e.g.: 0x10000000000000000 should be evaluated as 2.0f64.powf(64.0), but instead produces 0.

The text was updated successfully, but these errors were encountered:

overlookmotel · 2024-05-18T21:10:46Z

A possible fix would be:

Trim any 0s from start of string before looping over chars.
Stop the main loop after 16 characters (16 for hex, more for octal or binary).
If further characters remain, shift the number after conversion to f64 to account for the extra digits.

Such enormous numbers are an uncommon case, and we wouldn't want to slow down the fast path to accommodate it. So I'd suggest:

Implement the full algorithm which handles large numbers in a separate function parse_hex_slow.
At start of parse_hex, quickly check if length is larger than what can be handled by the fast algorithm.
Switch to parse_hex_slow if so.
Mark parse_hex_slow as #[cold] and #[inline(never)].

This should only cost the fast path 2 CPU ops (test length, conditional jump), but will preserve correct behavior in all cases.

That's just a suggestion, you may see a better way. And if you don't have time for this, just say, and I'll do it.

overlookmotel added the C-bug Category - Bug label May 18, 2024

overlookmotel assigned Boshen May 18, 2024

overlookmotel unassigned Boshen May 18, 2024

DonIsaac self-assigned this May 20, 2024

DonIsaac added the A-parser Area - Parser label May 20, 2024

Provide feedback