Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/field: Think about using fiat-crypto #78

Open
4 of 5 tasks
Yawning opened this issue Jul 28, 2021 · 0 comments
Open
4 of 5 tasks

internal/field: Think about using fiat-crypto #78

Yawning opened this issue Jul 28, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@Yawning
Copy link
Contributor

Yawning commented Jul 28, 2021

fiat-crypto has neat formally verified/auto-generated field arithmetic that can be used to replace the fiddly bits of the internal/field package. It would be nice to be able to use it, since it would let me feel less scared of exposing the package, and it's a nice checkbox to have.

Actually doing the switch requires the fiat-crypto performance to not be horrifically bad, since the current code works (and has been reviewed externally for correctness). Benchmarks from Novi's curve25519-dalek fork (that has been upstreamed) documentation indicate that a 8-15% performance degradation is to be expected, but the actual observed results are significantly worse.

  • Write an initial naive experimental branch to get ballpark benchmarks.
  • Be really really really sad about performance.
  • Try to figure out why performance is, significantly worse, when the rust code is just "slightly worse".
    • CarryMul is ridiculously slow because of addcarryxU64
    • CarrySquare is ridiculously slow because of addcarryxU64
    • Go's shit inliner strikes again. The rust code #[inline]s fiat_25519_carry_mul among other things. Carry is over the inliner budget.
  • Try to reduce the performance penalty further
    • Switch CarrySquare over to CarryPow2k
    • Add Add/Sub/Opp + Carry
    • Add Add/Sub + CarryMul/CarrySquare
    • Fix X25519 performance
  • Benchmark on 32-bit ARM or something.
Current status (2021/08/09)

The upstream fiat-crypto developers were kind enough to fold in some performance related changes (CarryMul/CarrySquare performance fixed, Add/Sub/Opp + Carry added).

My branch still adds CarryPow2k and Add/Sub + CarryMul/CarrySquare, which I suppose I can ask about.

Since the branch is close enough to what I envision a switch would look like, these are the current rough benchmarks:

With AVX2 used where it exists:

name                                old time/op    new time/op    delta
VerifyBatchOnly/1-8                    103µs ± 0%     109µs ± 0%   +5.67%
VerifyBatchOnly/2-8                    146µs ± 0%     157µs ± 0%   +7.56%
VerifyBatchOnly/4-8                    232µs ± 0%     252µs ± 0%   +8.59%
VerifyBatchOnly/8-8                    402µs ± 0%     444µs ± 0%  +10.49%
VerifyBatchOnly/16-8                   742µs ± 0%     826µs ± 0%  +11.31%
VerifyBatchOnly/32-8                  1.43ms ± 0%    1.58ms ± 0%  +10.91%
VerifyBatchOnly/64-8                  2.76ms ± 0%    3.11ms ± 0%  +12.55%
VerifyBatchOnly/128-8                 5.24ms ± 0%    5.95ms ± 0%  +13.61%
VerifyBatchOnly/256-8                 9.52ms ± 0%   10.91ms ± 0%  +14.57%
VerifyBatchOnly/384-8                 13.5ms ± 0%    15.6ms ± 0%  +15.35%
VerifyBatchOnly/512-8                 17.7ms ± 0%    20.4ms ± 0%  +15.28%
VerifyBatchOnly/768-8                 25.4ms ± 0%    29.6ms ± 0%  +16.32%
VerifyBatchOnly/1024-8                32.8ms ± 0%    38.4ms ± 0%  +17.11%
GenerateKey/voi-8                     25.5µs ± 0%    27.9µs ± 0%   +9.33%
GenerateKey/stdlib-8                  43.3µs ± 0%    43.1µs ± 0%   -0.41%
NewKeyFromSeed/voi-8                  25.3µs ± 0%    27.7µs ± 0%   +9.43%
NewKeyFromSeed/stdlib-8               42.7µs ± 0%    42.8µs ± 0%   +0.07%
Signing/voi-8                         28.0µs ± 0%    30.3µs ± 0%   +8.21%
Signing/stdlib-8                      52.2µs ± 0%    52.2µs ± 0%   -0.04%
Verification/voi-8                    72.9µs ± 0%    79.4µs ± 0%   +8.94%
Verification/voi_stdlib-8             83.3µs ± 0%    90.9µs ± 0%   +9.13%
Verification/stdlib-8                  124µs ± 0%     124µs ± 0%   +0.15%
Expanded/NewExpandedPublicKey-8       11.4µs ± 0%    14.2µs ± 0%  +24.98%
Expanded/Verification/voi-8           62.6µs ± 0%    65.5µs ± 0%   +4.65%
Expanded/Verification/voi_stdlib-8    74.9µs ± 0%    78.9µs ± 0%   +5.39%
Expanded/VerifyBatchOnly/1-8          91.9µs ± 0%    96.7µs ± 0%   +5.18%
Expanded/VerifyBatchOnly/2-8           123µs ± 0%     128µs ± 0%   +4.02%
Expanded/VerifyBatchOnly/4-8           184µs ± 0%     197µs ± 0%   +6.76%
Expanded/VerifyBatchOnly/8-8           308µs ± 0%     331µs ± 0%   +7.54%
Expanded/VerifyBatchOnly/16-8          556µs ± 0%     597µs ± 0%   +7.37%
Expanded/VerifyBatchOnly/32-8         1.05ms ± 0%    1.13ms ± 0%   +7.70%
Expanded/VerifyBatchOnly/64-8         2.02ms ± 0%    2.20ms ± 0%   +8.56%
Expanded/VerifyBatchOnly/128-8        3.87ms ± 0%    4.24ms ± 0%   +9.47%
Expanded/VerifyBatchOnly/256-8        7.03ms ± 0%    7.77ms ± 0%  +10.38%
Expanded/VerifyBatchOnly/384-8        10.0ms ± 0%    11.1ms ± 0%  +10.93%
Expanded/VerifyBatchOnly/512-8        13.1ms ± 0%    14.5ms ± 0%  +11.00%
Expanded/VerifyBatchOnly/768-8        18.6ms ± 0%    20.9ms ± 0%  +12.10%
Expanded/VerifyBatchOnly/1024-8       24.0ms ± 0%    26.9ms ± 0%  +11.68%

name                      old time/op    new time/op    delta
ScalarBaseMult/voi-8        24.5µs ± 0%    26.9µs ± 0%   +9.54%
ScalarMult/voi-8            80.2µs ± 0%   113.0µs ± 0%  +40.98%

Note: The massive regression for X25519 ScalarMult is due to the removal of assembly. Sufficiently recent x/crypto/curve25519 does away with it as well, and clocks in at ~107us/op (~161us/op purego).

purego:

name                                old time/op    new time/op    delta
VerifyBatchOnly/1-8                    164µs ± 0%     179µs ± 0%   +9.15%
VerifyBatchOnly/2-8                    238µs ± 0%     247µs ± 0%   +3.79%
VerifyBatchOnly/4-8                    351µs ± 0%     383µs ± 0%   +9.18%
VerifyBatchOnly/8-8                    603µs ± 0%     657µs ± 0%   +9.03%
VerifyBatchOnly/16-8                  1.10ms ± 0%    1.21ms ± 0%   +9.29%
VerifyBatchOnly/32-8                  2.09ms ± 0%    2.29ms ± 0%   +9.60%
VerifyBatchOnly/64-8                  4.09ms ± 0%    4.48ms ± 0%   +9.36%
VerifyBatchOnly/128-8                 8.04ms ± 0%    8.93ms ± 0%  +11.05%
VerifyBatchOnly/256-8                 14.3ms ± 0%    15.9ms ± 0%  +11.09%
VerifyBatchOnly/384-8                 20.2ms ± 0%    22.5ms ± 0%  +11.43%
VerifyBatchOnly/512-8                 26.3ms ± 0%    29.3ms ± 0%  +11.46%
VerifyBatchOnly/768-8                 37.3ms ± 0%    41.4ms ± 0%  +11.10%
VerifyBatchOnly/1024-8                48.1ms ± 0%    53.4ms ± 0%  +11.20%
GenerateKey/voi-8                     48.8µs ± 0%    52.6µs ± 0%   +7.80%
GenerateKey/stdlib-8                  58.3µs ± 0%    58.2µs ± 0%   -0.02%
NewKeyFromSeed/voi-8                  48.5µs ± 0%    52.8µs ± 0%   +8.84%
NewKeyFromSeed/stdlib-8               58.2µs ± 0%    58.1µs ± 0%   -0.12%
Signing/voi-8                         51.3µs ± 0%    55.0µs ± 0%   +7.20%
Signing/stdlib-8                      72.4µs ± 0%    72.2µs ± 0%   -0.27%
Verification/voi-8                     108µs ± 0%     117µs ± 0%   +8.55%
Verification/voi_stdlib-8              131µs ± 0%     145µs ± 0%  +10.17%
Verification/stdlib-8                  181µs ± 0%     181µs ± 0%   -0.22%
Expanded/NewExpandedPublicKey-8       14.5µs ± 0%    16.0µs ± 0%  +10.85%
Expanded/Verification/voi-8           93.4µs ± 0%   101.9µs ± 0%   +9.16%
Expanded/Verification/voi_stdlib-8     119µs ± 0%     132µs ± 0%  +11.36%
Expanded/VerifyBatchOnly/1-8           149µs ± 0%     163µs ± 0%   +9.48%
Expanded/VerifyBatchOnly/2-8           196µs ± 0%     215µs ± 0%   +9.80%
Expanded/VerifyBatchOnly/4-8           293µs ± 0%     319µs ± 0%   +9.03%
Expanded/VerifyBatchOnly/8-8           484µs ± 0%     530µs ± 0%   +9.44%
Expanded/VerifyBatchOnly/16-8          868µs ± 0%     943µs ± 0%   +8.67%
Expanded/VerifyBatchOnly/32-8         1.62ms ± 0%    1.77ms ± 0%   +9.44%
Expanded/VerifyBatchOnly/64-8         3.13ms ± 0%    3.43ms ± 0%   +9.75%
Expanded/VerifyBatchOnly/128-8        6.33ms ± 0%    7.00ms ± 0%  +10.72%
Expanded/VerifyBatchOnly/256-8        11.3ms ± 0%    12.6ms ± 0%  +11.41%
Expanded/VerifyBatchOnly/384-8        15.9ms ± 0%    17.7ms ± 0%  +11.16%
Expanded/VerifyBatchOnly/512-8        20.7ms ± 0%    23.0ms ± 0%  +11.17%
Expanded/VerifyBatchOnly/768-8        29.1ms ± 0%    32.4ms ± 0%  +11.34%
Expanded/VerifyBatchOnly/1024-8       37.4ms ± 0%    41.5ms ± 0%  +10.98%

name                      old time/op    new time/op    delta
ScalarBaseMult/voi-8        47.8µs ± 0%    51.7µs ± 0%  +8.18%
ScalarMult/voi-8             118µs ± 0%     113µs ± 0%  -4.05%

This is getting to "an acceptable slowdown" if people think that using the fiat code is better over the code that came from dalek, but the issue of having to ship modified routines from the 64/curve25519 implementation remains.

@Yawning Yawning added the enhancement New feature or request label Jul 29, 2021
@Yawning Yawning changed the title internal/field: Keep an eye on fiat-crypto internal/field: Think about using fiat-crypto Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant