Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reader: Improve performance of read_all #441

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wcampbell0x2a
Copy link
Collaborator

@wcampbell0x2a wcampbell0x2a commented May 19, 2024

  • Add Leftover::{Byte, Bits}, so that instead of conversion straight away
    into a Bitvec, we keep it as a slice instead. In the case of the read_all
    attribute this improves the speed, as reading until EOF doesn't keep
    causing our reader to convert to bits and back again all the time.

  • This does result in some slowdown, but with some #[inline(never)]
    we can keep it to a minimum. The total gain to read_all speed is
    worth any .8 ms slowdown I saw in testing.

Closes #439

@wcampbell0x2a wcampbell0x2a force-pushed the 439-improve-read-all-perf branch 3 times, most recently from c9d336a to 43dbdfb Compare May 19, 2024 06:19
Copy link

Benchmark for b49a65c

Click to view benchmark
Test Base PR %
count 3.8±0.02µs N/A N/A
deku_read_bits 1230.7±22.08ns 1250.3±20.97ns +1.59%
deku_read_byte 3.5±0.05ns 3.4±0.13ns -2.86%
deku_read_enum 3.0±0.05ns 2.6±0.04ns -13.33%
deku_read_vec 34.1±0.44ns 37.6±0.89ns +10.26%
deku_write_bits 161.9±3.24ns 162.7±17.05ns +0.49%
deku_write_byte 21.8±0.20ns 22.7±0.34ns +4.13%
deku_write_enum 21.1±0.41ns 21.7±0.38ns +2.84%
deku_write_vec 422.7±5.49ns 362.6±7.05ns -14.22%
read_all 4.1±0.06µs N/A N/A
read_all_bytes 3.6±0.03µs N/A N/A

Copy link

Benchmark for 7f81f7b

Click to view benchmark
Test Base PR %
count 3.6±0.05µs N/A N/A
deku_read_bits 1187.3±15.54ns 1238.2±13.65ns +4.29%
deku_read_byte 3.3±0.03ns 3.4±0.03ns +3.03%
deku_read_enum 2.6±0.05ns 2.6±0.05ns 0.00%
deku_read_vec 35.8±0.55ns 37.5±0.68ns +4.75%
deku_write_bits 150.3±1.46ns 153.7±3.39ns +2.26%
deku_write_byte 22.7±0.20ns 22.7±0.69ns 0.00%
deku_write_enum 21.5±0.31ns 21.7±0.45ns +0.93%
deku_write_vec 351.2±5.78ns 406.7±4.96ns +15.80%
read_all 5.2±0.04µs N/A N/A
read_all_bytes 4.8±0.04µs N/A N/A

Copy link

Benchmark for 42291cc

Click to view benchmark
Test Base PR %
count 3.8±0.03µs N/A N/A
deku_read_bits 1232.7±21.71ns 1283.2±13.27ns +4.10%
deku_read_byte 3.3±0.02ns 3.4±0.25ns +3.03%
deku_read_enum 2.6±0.04ns 2.6±0.16ns 0.00%
deku_read_vec 34.0±0.50ns 38.0±0.73ns +11.76%
deku_write_bits 148.0±2.23ns 155.2±12.15ns +4.86%
deku_write_byte 21.8±0.35ns 22.7±0.33ns +4.13%
deku_write_enum 21.1±0.35ns 21.8±0.46ns +3.32%
deku_write_vec 353.8±28.27ns 370.1±11.25ns +4.61%
read_all 4.1±0.10µs N/A N/A
read_all_bytes 3.6±0.04µs N/A N/A

Copy link

Benchmark for a473082

Click to view benchmark
Test Base PR %
count 3.8±0.03µs N/A N/A
deku_read_bits 1253.0±20.87ns 1198.3±18.86ns -4.37%
deku_read_byte 3.3±0.07ns 3.6±0.07ns +9.09%
deku_read_enum 2.6±0.05ns 3.0±0.07ns +15.38%
deku_read_vec 33.9±0.53ns 37.7±0.69ns +11.21%
deku_write_bits 152.9±3.45ns 153.9±3.13ns +0.65%
deku_write_byte 21.8±0.51ns 22.4±0.40ns +2.75%
deku_write_enum 21.1±0.33ns 21.7±0.37ns +2.84%
deku_write_vec 357.3±38.06ns 421.8±4.87ns +18.05%
read_all 5.0±0.04µs N/A N/A
read_all_bytes 4.6±0.18µs N/A N/A

Copy link

Benchmark for e1f913a

Click to view benchmark
Test Base PR %
count 3.8±0.04µs N/A N/A
deku_read_bits 1413.7±27.29ns 1298.0±20.27ns -8.18%
deku_read_byte 3.3±0.08ns 3.4±0.08ns +3.03%
deku_read_enum 2.6±0.04ns 2.6±0.11ns 0.00%
deku_read_vec 34.0±0.56ns 43.0±1.08ns +26.47%
deku_write_bits 170.9±8.62ns 165.2±3.57ns -3.34%
deku_write_byte 21.9±0.39ns 23.6±0.62ns +7.76%
deku_write_enum 21.4±0.53ns 22.2±0.60ns +3.74%
deku_write_vec 351.9±9.65ns 355.1±4.87ns +0.91%
read_all 4.1±0.07µs N/A N/A
read_all_bytes 5.0±0.03µs N/A N/A

Copy link

Benchmark for 8c98ec9

Click to view benchmark
Test Base PR %
count 3.8±0.03µs N/A N/A
deku_read_bits 1183.6±16.25ns 1325.2±11.72ns +11.96%
deku_read_byte 3.3±0.05ns 3.4±0.05ns +3.03%
deku_read_enum 2.6±0.06ns 2.6±0.08ns 0.00%
deku_read_vec 33.9±0.53ns 39.9±0.74ns +17.70%
deku_write_bits 150.7±2.83ns 167.3±12.46ns +11.02%
deku_write_byte 21.7±0.36ns 22.4±0.39ns +3.23%
deku_write_enum 21.1±0.33ns 21.7±0.33ns +2.84%
deku_write_vec 350.3±4.98ns 350.9±3.68ns +0.17%
read_all 4.1±0.05µs N/A N/A
read_all_bytes 5.0±0.04µs N/A N/A

@wcampbell0x2a wcampbell0x2a force-pushed the 439-improve-read-all-perf branch 2 times, most recently from 833f696 to 558ed33 Compare May 22, 2024 03:27
Copy link

Benchmark for d2397b4

Click to view benchmark
Test Base PR %
count 4.6±0.04µs N/A N/A
deku_read_bits 1234.3±15.55ns 1211.2±12.58ns -1.87%
deku_read_byte 3.4±0.07ns 3.4±0.11ns 0.00%
deku_read_enum 2.6±0.41ns 2.6±0.06ns 0.00%
deku_read_vec 33.5±0.56ns 38.9±0.46ns +16.12%
deku_write_bits 153.5±21.23ns 153.5±8.61ns 0.00%
deku_write_byte 21.8±0.35ns 21.9±0.37ns +0.46%
deku_write_enum 21.1±0.27ns 21.7±0.22ns +2.84%
deku_write_vec 404.6±5.31ns 406.5±5.66ns +0.47%
read_all 17.6±0.01µs N/A N/A
read_all_bytes 17.6±0.03µs N/A N/A

Copy link

Benchmark for 9ff94ec

Click to view benchmark
Test Base PR %
count 3.3±0.05µs N/A N/A
deku_read_bits 1197.1±17.14ns 1299.6±18.08ns +8.56%
deku_read_byte 3.4±0.04ns 3.4±0.11ns 0.00%
deku_read_enum 2.5±0.04ns 2.6±0.04ns +4.00%
deku_read_vec 34.2±0.47ns 37.9±0.63ns +10.82%
deku_write_bits 148.0±3.18ns 166.4±5.02ns +12.43%
deku_write_byte 21.9±0.21ns 21.8±0.38ns -0.46%
deku_write_enum 21.1±0.44ns 21.8±0.54ns +3.32%
deku_write_vec 347.1±6.39ns 351.0±6.85ns +1.12%
read_all 17.5±0.09µs N/A N/A
read_all_bytes 16.5±0.10µs N/A N/A

@wcampbell0x2a wcampbell0x2a changed the title DRAFT: reader: Improve performance of read_all reader: Improve performance of read_all May 22, 2024
@wcampbell0x2a
Copy link
Collaborator Author

For reference, not as perf as it was ;)

+ critcmp stable nightly stable-perf nightly-perf
group                 nightly                                nightly-perf                            stable                                 stable-perf
-----                 -------                                ------------                            ------                                 -----------
Deserialize/binrw     1.38      2.8±0.06ms        ? ?/sec    1.00      2.0±0.04ms        ? ?/sec     1.37      2.8±0.10ms        ? ?/sec    1.01      2.1±0.07ms        ? ?/sec
Deserialize/custom    1.21      2.2±0.06ms        ? ?/sec    1.00  1826.9±30.32µs        ? ?/sec     1.21      2.2±0.07ms        ? ?/sec    1.03  1873.1±56.09µs        ? ?/sec
Deserialize/deku      1.11      3.1±0.08ms        ? ?/sec    1.00      2.8±0.05ms        ? ?/sec     1.05      3.0±0.06ms        ? ?/sec    1.01      2.8±0.13ms        ? ?/sec
Serialize/binrw       2.09      5.5±0.21ms        ? ?/sec    1.00      2.7±0.06ms        ? ?/sec     2.19      5.8±0.21ms        ? ?/sec    1.00      2.7±0.10ms        ? ?/sec
Serialize/custom      1.11  1943.4±49.78µs        ? ?/sec    1.02  1790.1±137.87µs        ? ?/sec    1.14      2.0±0.04ms        ? ?/sec    1.00  1752.8±44.16µs        ? ?/sec
Serialize/deku        1.12      3.1±0.17ms        ? ?/sec    1.00      2.8±0.04ms        ? ?/sec     1.10      3.1±0.08ms        ? ?/sec    1.01      2.8±0.14ms        ? ?/sec

before:
image

Copy link

Benchmark for bd2a83c

Click to view benchmark
Test Base PR %
count 3.4±0.05µs N/A N/A
deku_read_bits 1241.8±15.85ns 1185.6±38.67ns -4.53%
deku_read_byte 15.4±0.02ns 3.4±0.10ns -77.92%
deku_read_enum 15.5±0.06ns 2.6±0.05ns -83.23%
deku_read_vec 41.3±0.42ns 36.9±0.61ns -10.65%
deku_write_bits 161.8±2.92ns 169.1±3.97ns +4.51%
deku_write_byte 22.1±0.32ns 22.1±0.34ns 0.00%
deku_write_enum 21.4±0.30ns 22.0±0.30ns +2.80%
deku_write_vec 420.7±3.94ns 342.9±4.97ns -18.49%
read_all 16.6±0.11µs N/A N/A
read_all_bytes 17.1±0.24µs N/A N/A

@wcampbell0x2a
Copy link
Collaborator Author

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..

read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]

@wcampbell0x2a
Copy link
Collaborator Author

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..

read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]

Or! Just take the improvements and go to sleep ;)

* Add Leftover::{Byte, Bits}, so that instead of conversion straight away
  into a Bitvec, we keep it as a slice instead. In the case of the read_all
  attribute this improves the speed, as reading until EOF doesn't keep
  causing our reader to convert to bits and back again all the time.

* This _does_ result in some slowdown, but with some #[inline(never)]
  we can keep it to a minimum. The total gain to read_all speed is
  worth any .8 ms slowdown I saw in testing.
* This was causing issues with bit_size of buf, and BitSize already takes
  care of this
Copy link

Benchmark for 5f329ba

Click to view benchmark
Test Base PR %
count 3.7±0.05µs N/A N/A
deku_read_bits 1304.6±18.95ns 1274.5±11.97ns -2.31%
deku_read_byte 3.3±0.10ns 3.4±0.07ns +3.03%
deku_read_enum 2.6±0.05ns 3.0±0.05ns +15.38%
deku_read_vec 35.2±0.58ns 35.6±0.53ns +1.14%
deku_write_bits 149.8±2.65ns 170.6±3.72ns +13.89%
deku_write_byte 22.1±0.30ns 22.4±0.22ns +1.36%
deku_write_enum 21.4±0.43ns 21.7±0.20ns +1.40%
deku_write_vec 398.3±3.11ns 333.4±7.79ns -16.29%
read_all 4.4±0.07µs N/A N/A
read_all_bytes 3.9±0.03µs N/A N/A

@wcampbell0x2a wcampbell0x2a added this to the 0.18.0 milestone May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance of read_all and count
1 participant