Niche Optimization for `Option<ReferenceId>` and `IndexVec<ReferenceId, T>` #3318

Boshen · 2024-05-16T17:14:13Z

we have these index types

Lines 6 to 8 in 712ee0d

    
           define_index_type! { 
        
               pub struct ReferenceId = u32; 
        
           }

oxc/crates/oxc_syntax/src/symbol.rs

Lines 7 to 9 in 712ee0d

    
           define_index_type! { 
        
               pub struct SymbolId = u32; 
        
           }

where

oxc/crates/oxc_ast/src/ast/js.rs

Line 451 in 712ee0d

pub reference_id: Cell<Option<ReferenceId>>,

oxc/crates/oxc_semantic/src/reference.rs

Line 22 in 712ee0d

symbol_id: Option<SymbolId>,

can be trimmed by niche optimization.

But,

use std::num::NonZeroU32;
define_index_type! {
    pub struct ReferenceId = NonZeroU32;
}

is not supported by the macro define_index_type

error[E0599]: no function or associated item named `max_value` found for struct `NonZero` in the current scope
   --> crates/oxc_syntax/src/reference.rs:7:1
    |
7   | / define_index_type! {
8   | |     pub struct ReferenceId = NonZeroU32;
9   | | }
    | |_^ function or associated item not found in `NonZero<u32>`
    |
note: if you're trying to build a new `NonZero<u32>` consider using one of the following associated functions:
      NonZero::<T>::new
      NonZero::<T>::new_unchecked
   --> /Users/boshen/.rustup/toolchains/1.78.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/num/nonzero.rs:308:5

but the good news is that we forked it and lives in https://github.com/oxc-project/oxc/tree/main/crates/oxc_index

The task of this issue is to make it work, if possible.

The text was updated successfully, but these errors were encountered:

overlookmotel · 2024-05-17T01:11:57Z

There's also a much more complicated, but more performant solution.

As far as I'm aware, the Vecs which ReferenceId, SymbolId and ScopeId index into can only grow, never contract.

We could capitalize on this invariant to remove bounds checks. Something like:

pub struct ReferenceVec(Vec<Whatever>);
pub struct ReferenceId(u32);

impl ReferenceId {
  const EMPTY: Self = ReferenceId(0);
}

impl ReferenceVec {
  pub fn new() -> Self {
    // Initialize vec with a dummy at index 0
    let mut inner = Vec::new();
    inner.push(Whatever::default());
    Self(inner)
  }

  pub fn push(&mut self, value: Whatever) -> ReferenceId {
    let index = ReferenceId(self.0.len());
    self.0.push(value);
    index
  }

  // Returns `&Whatever` not `Option<&Whatever>`
  // because `ReferenceId` is always a valid index
  pub fn get(&self, index: ReferenceId) -> &Whatever {
    // `index` must be in bounds, so skip bounds check
    unsafe { self.0.get(index.0).unwrap_unchecked() }
  }

  // NB: No `pop` method
}

i.e. The only way to get a ReferenceId is via push(), and because ReferenceVec never shrinks, once you have a ReferenceId, it always remains valid. So no need for bounds check on get().

Actual implementation would need to be more complex than the above. Would need some mechanism to prevent you using a ReferenceId from one ReferenceVec on another ReferenceVec. But hopefully the above gives the general idea.

NB: Unclear how much of a gain this would be, and whether worth the complexity. In my opinion, we should do the main change (shrink Option<ReferenceId>) and then maybe look into this further optimization later on.

Boshen · 2024-05-17T01:43:26Z

It seems like you're proposing a new crate for this use case, instead of reusing the crates I found from the community.

Edit: the issue description is changed from NonMax to NonZero.

overlookmotel · 2024-05-17T02:23:15Z

It seems like you're proposing a new crate for this use case, instead of reusing the crates I found from the community.

I hope not! I thought the reason we swapped index_vec for our own oxc_index was so we could bend it to our use cases. I think @rzvxa had exactly the NonZeroU32 use case in mind when he proposed bringing the indexed vec type "in house".

The append-only-with-no-bounds-checks Vec would I guess have to be a different type. But I don't think we should do that right now - I just mention it as something to have in mind for later.

Conaclos · 2024-05-17T08:56:36Z

Initialize our indexed Vecs with a dummy entry at index 0, so usable indexes start at 1.

Naive question: could you not subtract the index by 1 each time you want to access the element in the array?

overlookmotel · 2024-05-17T10:28:47Z

Naive question: could you not subtract the index by 1 each time you want to access the element in the array?

Yes we could, and for some use cases that would be a good solution. But personally I don't think it's ideal for ours, because:

These vecs are typically very large, so 1 dummy entry is a drop in the ocean. Costs almost nothing as a % of the total.
In comparison, read and writes are very common and on hot paths. So while subtracting 1 is a small cost (only 1 x CPU op) for each individual operation, it will add up to quite a lot in total.

Therefore, on balance, I think a dummy entry will be more performant for our use case.

Does that make sense?

rzvxa · 2024-05-17T12:37:18Z

I had some limited success with non-zero index here #3104, The Issue is that the performance gain I saw wasn't high enough for me to consider taking the risk of destabilizing index_vec with this major change.

But as our Option<Index> fields grow in the project we would gain more and more performance from this change.

The hard part is having to support all vector operations and its iterators, There are a lot of safety concerns to think about before switching to non zero indecis.

And some operations are slow because of safety checks, We can switch to unsafe methods in our vector so we can gain more whenever it is safe to skip the checks.

Boshen added C-enhancement Category - New feature or request E-Help Wanted Experience level - For the experienced collaborators C-performance Category - Solution not expected to change functional behavior, only performance labels May 16, 2024

Boshen assigned Boshen and unassigned Boshen May 16, 2024

This comment was marked as outdated.

Sign in to view

Boshen changed the title ~~Niche Optimization for define_index_type! { pub struct ReferenceId = NonMaxU32; }~~ Niche Optimization for Option<ReferenceId> and IndexVec<ReferenceId, T> May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Niche Optimization for `Option<ReferenceId>` and `IndexVec<ReferenceId, T>` #3318

Niche Optimization for `Option<ReferenceId>` and `IndexVec<ReferenceId, T>` #3318

Boshen commented May 16, 2024 •

edited

This comment was marked as outdated.

overlookmotel commented May 17, 2024 •

edited

Boshen commented May 17, 2024 •

edited

overlookmotel commented May 17, 2024

Conaclos commented May 17, 2024

overlookmotel commented May 17, 2024 •

edited

rzvxa commented May 17, 2024

Niche Optimization for Option<ReferenceId> and IndexVec<ReferenceId, T> #3318

Niche Optimization for Option<ReferenceId> and IndexVec<ReferenceId, T> #3318

Comments

Boshen commented May 16, 2024 • edited

This comment was marked as outdated.

overlookmotel commented May 17, 2024 • edited

Boshen commented May 17, 2024 • edited

overlookmotel commented May 17, 2024

Conaclos commented May 17, 2024

overlookmotel commented May 17, 2024 • edited

rzvxa commented May 17, 2024

Niche Optimization for `Option<ReferenceId>` and `IndexVec<ReferenceId, T>` #3318

Niche Optimization for `Option<ReferenceId>` and `IndexVec<ReferenceId, T>` #3318

Boshen commented May 16, 2024 •

edited

overlookmotel commented May 17, 2024 •

edited

Boshen commented May 17, 2024 •

edited

overlookmotel commented May 17, 2024 •

edited