Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base.FastMath.pow_fast(a::Float64, i::Int) fails for huge i #53857

Closed
KlausC opened this issue Mar 26, 2024 · 6 comments · Fixed by #54512
Closed

Base.FastMath.pow_fast(a::Float64, i::Int) fails for huge i #53857

KlausC opened this issue Mar 26, 2024 · 6 comments · Fixed by #54512
Labels
domain:maths Mathematical functions kind:bug Indicates an unexpected problem or unintended behavior

Comments

@KlausC
Copy link
Contributor

KlausC commented Mar 26, 2024

From calculus we know (1 + 1/n) ^ n converges to Euler's e.

julia> n = 2^52
4503599627370496

julia> (1 + 1 / n) ^ n
2.7182818284590446

that's quite good! But

julia> @fastmath (1 + 1 / n) ^ n
ERROR: InexactError: trunc(Int32, 4503599627370496)
Stacktrace:
 [1] throw_inexacterror(::Symbol, ::Vararg{Any})
   @ Core ./boot.jl:772
 [2] checked_trunc_sint
   @ ./boot.jl:786 [inlined]
 [3] toInt32
   @ ./boot.jl:823 [inlined]
 [4] Int32
   @ ./boot.jl:913 [inlined]
 [5] convert
   @ ./number.jl:7 [inlined]
 [6] cconvert
   @ ./essentials.jl:662 [inlined]
 [7] pow_fast(x::Float64, y::Int64)
   @ Base.FastMath ./fastmath.jl:286
 [8] top-level scope
   @ REPL[31]:1

Reason is that Base.FastMath.pow_fast calls the llvm intrinsic llvm.powi.f64.i32, for integer second argument and tries to convert the integer to an Int32, which fails.

@KlausC KlausC changed the title Base.FastMath.pow_fasta, i) fails for huge i Base.FastMath.pow_fast(a::Float64, i::Int) fails for huge i Mar 26, 2024
@oscardssmith oscardssmith added kind:bug Indicates an unexpected problem or unintended behavior domain:maths Mathematical functions labels Mar 26, 2024
@oscardssmith
Copy link
Member

is powi actually faster than our Float64^Int path? My guess is that this code path was from pre julia 1.9 and that our current algorithm is probably as fast.

@KlausC
Copy link
Contributor Author

KlausC commented Mar 26, 2024

Actually powi it is faster for 2^32 > i > 3 on my hardware, and slower for small integers.
Replacing @fastmath a ^ i with @fastmath a ^ float(i)) is also an option, maybe.

@oscardssmith
Copy link
Member

Interesting, I can also reproduce this.

fastpow(x, n) = @fastmath x^n
julia> @benchmark Base.power_by_squaring(x,5) setup = x=rand()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  3.638 ns … 73.319 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.657 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.902 ns ±  1.213 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █   ▂ ▆▁▂                                                  ▁
  █▇▁▇█████▇█▇▇▃▇▆▆▄▆▅▄▅▄▄▅▅▆▆▅▅▄▅▅▄▃▅▁▄▄▆▅▄▆▅▅▅▆▄▆▇▇▇█▇▇█▇▆ █
  3.64 ns      Histogram: log(frequency) by time     6.86 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark fastpow(x,5) setup = x=rand()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.507 ns … 17.839 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.520 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.600 ns ±  0.423 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄  ▁   ▄▆    ▂ ▂                                          ▁
  ██▃▁██▁▁██▃▄▃██▁██▄▃▃▃▁▃▁▁▁▃▃▁▁▁▁▃▁▁▁▃▁▅▅▄▄▃▄▄▁▃▁▃▃▇▇█▇▆▇▇ █
  1.51 ns      Histogram: log(frequency) by time     2.47 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

It looks like power_by_squaring is doing something dumb. Should be relatively easy to fix since https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/powisf2.c is pretty darn simple.

@eschnett
Copy link
Contributor

eschnett commented Apr 4, 2024

I looked a bit into this. I think the main difference is that Base.power_by_squaring is so large that it doesn't get inlined, and the benchmark code then involves an additional function call. When I add @inline to Base.power_by_squaring then the performance difference becomes smaller.

With @inline, the generated assembler code is functionally identical, but a performance difference remains. I do not understand this.

@eschnett
Copy link
Contributor

eschnett commented Apr 4, 2024

Regarding inlining: When the second argument (the power) is known at compile time then inlining simplifies the function significantly. If it is not known then inlining would not help and would probably only make the code larger.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Apr 4, 2024

Replacing @fastmath a ^ i with @fastmath a ^ float(i)) is also an option, maybe.

I'm not sure users would want a different type, though here in this context a is a float, so would not apply. I'm just thinking, when both are integers should it be allowed if known faster, since @fastmath is an opt-in? And even would people be ok with float(i) for just certain is? I think it would be possible with a macro for a literal i at least. Does the macro know if it's a literal i or a variable and take it into account? Also could it be faster to cast, and then for type-stability cast back to integers, and could that be an option?

With @inline, the generated assembler code is functionally identical, but a performance difference remains. I do not understand this.

For integers it's a routine, but for float in hardware, isn't that why and the main reason? The hardware needs to do something similar, unclear how many functional units it has to do it, so even if it's faster, it might allow less throughput, even if latency is less? [I used to think floats always inherently slower, why in hardware, to make as fast for the user, but at least some simple float operations get smaller in hardware cost/area when you get down to FP4 (vs Int4); as in latest Nvidia hardware. From a paper I read, so assume still calculated, since could be done with a lookup table when that small, then should be as small.]

oscardssmith pushed a commit that referenced this issue May 26, 2024
Fixes #53857

For `x::Union{Float32,Float64}` and `y::Integer` with `y % Int32 != y` a
stack trace appears when calling `@fastmath x^y`.

This fix calls `x^y` in those cases, while leaving the other cases as
is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:maths Mathematical functions kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants