`Base.FastMath.pow_fast(a::Float64, i::Int)` fails for huge `i` #53857

KlausC · 2024-03-26T15:15:38Z

From calculus we know (1 + 1/n) ^ n converges to Euler's e.

julia> n = 2^52
4503599627370496

julia> (1 + 1 / n) ^ n
2.7182818284590446

that's quite good! But

julia> @fastmath (1 + 1 / n) ^ n
ERROR: InexactError: trunc(Int32, 4503599627370496)
Stacktrace:
 [1] throw_inexacterror(::Symbol, ::Vararg{Any})
   @ Core ./boot.jl:772
 [2] checked_trunc_sint
   @ ./boot.jl:786 [inlined]
 [3] toInt32
   @ ./boot.jl:823 [inlined]
 [4] Int32
   @ ./boot.jl:913 [inlined]
 [5] convert
   @ ./number.jl:7 [inlined]
 [6] cconvert
   @ ./essentials.jl:662 [inlined]
 [7] pow_fast(x::Float64, y::Int64)
   @ Base.FastMath ./fastmath.jl:286
 [8] top-level scope
   @ REPL[31]:1

Reason is that Base.FastMath.pow_fast calls the llvm intrinsic llvm.powi.f64.i32, for integer second argument and tries to convert the integer to an Int32, which fails.

The text was updated successfully, but these errors were encountered:

oscardssmith · 2024-03-26T15:18:36Z

is powi actually faster than our Float64^Int path? My guess is that this code path was from pre julia 1.9 and that our current algorithm is probably as fast.

KlausC · 2024-03-26T15:25:50Z

Actually powi it is faster for 2^32 > i > 3 on my hardware, and slower for small integers.
Replacing @fastmath a ^ i with @fastmath a ^ float(i)) is also an option, maybe.

oscardssmith · 2024-03-26T15:36:06Z

Interesting, I can also reproduce this.

fastpow(x, n) = @fastmath x^n
julia> @benchmark Base.power_by_squaring(x,5) setup = x=rand()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  3.638 ns … 73.319 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.657 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.902 ns ±  1.213 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █   ▂ ▆▁▂                                                  ▁
  █▇▁▇█████▇█▇▇▃▇▆▆▄▆▅▄▅▄▄▅▅▆▆▅▅▄▅▅▄▃▅▁▄▄▆▅▄▆▅▅▅▆▄▆▇▇▇█▇▇█▇▆ █
  3.64 ns      Histogram: log(frequency) by time     6.86 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark fastpow(x,5) setup = x=rand()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.507 ns … 17.839 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.520 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.600 ns ±  0.423 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄  ▁   ▄▆    ▂ ▂                                          ▁
  ██▃▁██▁▁██▃▄▃██▁██▄▃▃▃▁▃▁▁▁▃▃▁▁▁▁▃▁▁▁▃▁▅▅▄▄▃▄▄▁▃▁▃▃▇▇█▇▆▇▇ █
  1.51 ns      Histogram: log(frequency) by time     2.47 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

It looks like power_by_squaring is doing something dumb. Should be relatively easy to fix since https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/powisf2.c is pretty darn simple.

eschnett · 2024-04-04T00:14:53Z

I looked a bit into this. I think the main difference is that Base.power_by_squaring is so large that it doesn't get inlined, and the benchmark code then involves an additional function call. When I add @inline to Base.power_by_squaring then the performance difference becomes smaller.

With @inline, the generated assembler code is functionally identical, but a performance difference remains. I do not understand this.

eschnett · 2024-04-04T00:16:32Z

Regarding inlining: When the second argument (the power) is known at compile time then inlining simplifies the function significantly. If it is not known then inlining would not help and would probably only make the code larger.

PallHaraldsson · 2024-04-04T10:31:44Z

Replacing @fastmath a ^ i with @fastmath a ^ float(i)) is also an option, maybe.

I'm not sure users would want a different type, though here in this context a is a float, so would not apply. I'm just thinking, when both are integers should it be allowed if known faster, since @fastmath is an opt-in? And even would people be ok with float(i) for just certain is? I think it would be possible with a macro for a literal i at least. Does the macro know if it's a literal i or a variable and take it into account? Also could it be faster to cast, and then for type-stability cast back to integers, and could that be an option?

With @inline, the generated assembler code is functionally identical, but a performance difference remains. I do not understand this.

For integers it's a routine, but for float in hardware, isn't that why and the main reason? The hardware needs to do something similar, unclear how many functional units it has to do it, so even if it's faster, it might allow less throughput, even if latency is less? [I used to think floats always inherently slower, why in hardware, to make as fast for the user, but at least some simple float operations get smaller in hardware cost/area when you get down to FP4 (vs Int4); as in latest Nvidia hardware. From a paper I read, so assume still calculated, since could be done with a lookup table when that small, then should be as small.]

Fixes #53857 For `x::Union{Float32,Float64}` and `y::Integer` with `y % Int32 != y` a stack trace appears when calling `@fastmath x^y`. This fix calls `x^y` in those cases, while leaving the other cases as is.

KlausC changed the title ~~Base.FastMath.pow_fasta, i) fails for huge i~~ Base.FastMath.pow_fast(a::Float64, i::Int) fails for huge i Mar 26, 2024

oscardssmith added kind:bug Indicates an unexpected problem or unintended behavior domain:maths Mathematical functions labels Mar 26, 2024

KlausC mentioned this issue May 17, 2024

fix pow_fast(x, y) for large integer y #54512

Merged

oscardssmith closed this as completed in #54512 May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Base.FastMath.pow_fast(a::Float64, i::Int)` fails for huge `i` #53857

`Base.FastMath.pow_fast(a::Float64, i::Int)` fails for huge `i` #53857

KlausC commented Mar 26, 2024

oscardssmith commented Mar 26, 2024

KlausC commented Mar 26, 2024

oscardssmith commented Mar 26, 2024

eschnett commented Apr 4, 2024

eschnett commented Apr 4, 2024

PallHaraldsson commented Apr 4, 2024 •

edited

Base.FastMath.pow_fast(a::Float64, i::Int) fails for huge i #53857

Base.FastMath.pow_fast(a::Float64, i::Int) fails for huge i #53857

Comments

KlausC commented Mar 26, 2024

oscardssmith commented Mar 26, 2024

KlausC commented Mar 26, 2024

oscardssmith commented Mar 26, 2024

eschnett commented Apr 4, 2024

eschnett commented Apr 4, 2024

PallHaraldsson commented Apr 4, 2024 • edited

`Base.FastMath.pow_fast(a::Float64, i::Int)` fails for huge `i` #53857

`Base.FastMath.pow_fast(a::Float64, i::Int)` fails for huge `i` #53857

PallHaraldsson commented Apr 4, 2024 •

edited