Convert simple byte handlers in lexer to branchless code #3292

overlookmotel · 2024-05-15T13:03:20Z

Why the lexer is slower than it needs to be

Many of the simpler byte handlers in the lexer perform multiple branches.

e.g. The handler for = which needs to figure out "is the whole token =, ==, ===, or =>?"

oxc/crates/oxc_parser/src/lexer/byte_handlers.rs

Lines 389 to 403 in b9d69ad

    
           // = 
        
           ascii_byte_handler!(EQL(lexer) { 
        
               lexer.consume_char(); 
        
               if lexer.next_eq('=') { 
        
                   if lexer.next_eq('=') { 
        
                       Kind::Eq3 
        
                   } else { 
        
                       Kind::Eq2 
        
                   } 
        
               } else if lexer.next_eq('>') { 
        
                   Kind::Arrow 
        
               } else { 
        
                   Kind::Eq 
        
               } 
        
           });

Typical JS code will contain a random mix of assignments x = 1, equality checks x === 1, and arrow functions =>.

Because of the random distribution, the branch predictor won't be able to predict what the token is going to be, and so which way these branches are going to go. It will likely get it wrong at least 50% of the time. Branch mis-predication is fairly expensive (I think I read about 20 CPU cycles each time), and these functions are on a hot path because =, ===, and => are all common in JS code.

Make it branchless!

We could convert these matchers to branchless code with something like this:

// Set the discriminants for `Kind` to always follow the pattern
// that `discriminant % 4` is number of bytes this token takes
const EQ_BASE: u8 = 128; // or whatever multiple of 4

#[derive(PartialEq, Clone, Copy)]
#[repr(u8)]
pub enum Kind {
    // `=`
    Eq = EQ_BASE + 1,
    // `==`
    Eq2 = EQ_BASE + 2,
    // `===`
    Eq3 = EQ_BASE + 3,
    // `=>`
    Arrow = EQ_BASE + 4 + 2,

    // ...all the other token kinds
}

ascii_byte_handler!(EQL(lexer) { 
    const fn num_bytes(kind: Kind) -> usize { kind as usize % 4 }

    // Check the invariant about discrimants mentioned above at compile time
    const_assert!(num_bytes(Kind::Eq) == 1);
    const_assert!(num_bytes(Kind::Eq2) == 2);
    const_assert!(num_bytes(Kind::Eq3) == 3);
    const_assert!(num_bytes(Kind::Arrow) == 2);

    // NB: For simplicity, not handling case where close to EOF
    // and there are not 2 more bytes remaining in source to read.
    // Would need to handle that case separately, in a `#[cold]` branch.

    let next2: [u8; 2] = unsafe { lexer.source.position().add(1).read2() };
    let kind = if next2 == [b'=', b'='] {
        Kind::Eq3
    } else {
        match next2[0] {
            b'=' => Kind::Eq2,
            b'>' => Kind::Arrow,
            _ => Kind::Eq,
        }
    };

    unsafe { lexer.source.advance(num_bytes(kind)); }

    kind
});

The trick is to make Kind play two roles. It's the token kind, but it also encodes how many bytes the token is. This is what allows to compiler to use cmovne instructions rather than branches.

Godbolt shows this compiles down to branchless code: https://godbolt.org/z/Me5Pvj5cP

Note on benchmarking

We need to alter the benchmarking setup before we do this. CodSpeed have advised that the way their benchmarking works, it doesn't take into account branch mis-prediction.

So, even if this change makes a solid difference in real world (which I think it could), it won't show up on our CodSpeed benchmarks.

We need some way to check that this is improving performance in the way that we think it is.

The text was updated successfully, but these errors were encountered:

overlookmotel added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels May 15, 2024

Boshen added the E-Help Wanted Experience level - For the experienced collaborators label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert simple byte handlers in lexer to branchless code #3292

Convert simple byte handlers in lexer to branchless code #3292

overlookmotel commented May 15, 2024 •

edited

Convert simple byte handlers in lexer to branchless code #3292

Convert simple byte handlers in lexer to branchless code #3292

Comments

overlookmotel commented May 15, 2024 • edited

Why the lexer is slower than it needs to be

Make it branchless!

Note on benchmarking

overlookmotel commented May 15, 2024 •

edited