Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indirect (and direct) tail calls trigger invalid LLVM error #11776

Closed
Techcable opened this issue Jun 2, 2022 · 6 comments
Closed

Indirect (and direct) tail calls trigger invalid LLVM error #11776

Techcable opened this issue Jun 2, 2022 · 6 comments
Labels
backend-llvm The LLVM backend outputs an LLVM IR Module. bug Observed behavior contradicts documented or intended behavior stage1 The process of building from source via WebAssembly and the C backend.
Milestone

Comments

@Techcable
Copy link
Contributor

Techcable commented Jun 2, 2022

Zig Version

0.10.0-dev.2461+69323fc14

Steps to Reproduce

To trigger this bug, invoke an (indirect) function pointer with @call(.{ .modifier = .always_tail }).

Although the frontend runs successfully, the generated LLVM IR is considered invalid.

Here is a minimal test case: https://gist.github.com/9258584d33227d4c85eb81efcfd048a1

Switching to a direct tail call seems to invoke the same bug with invalid LLVM output.

Expected Behavior

Mandatory tail calls should work correctly, both with direct calls and indirect function pointers.

Actual Behavior

Error message (same for direct calls):

  %9 = musttail call fastcc double %7(i64 %8), !dbg !2631

This is a bug in the Zig compiler.thread 720277 panic: 
/Users/nicholas/git/zig/src/stage1.zig:178:5: 0x104e3274b in stage2_panic (zig1)
    @panic(ptr[0..len]);
    ^
???:?:?: 0x10a17c7f7 in __Z9zig_panicPKcz (???)
Relevant LLVM IR Here is the llvm IR for the direct call ``` %7 = load i64, i64* %val, align 8, !dbg !2629 %8 = musttail call fastcc double @handler_sin(i64 %7), !dbg !2630 store double %8, double* %result, align 8, !dbg !2630 %9 = load double, double* %result, align 8, !dbg !2631 ret double %9, !dbg !2631 ```

Here is the relevant llvm IR for the indirect call:

BoundsCheckOk:                                    ; preds = %RemZeroOk
  %5 = getelementptr inbounds [3 x double (i64)*], [3 x double (i64)*]* @TABLE, i64 0, i64 %3, !dbg !2627
  %6 = load double (i64)*, double (i64)** %5, align 8, !dbg !2627
  store double (i64)* %6, double (i64)** %tgt, align 8, !dbg !2627
  call void @llvm.dbg.declare(metadata double (i64)** %tgt, metadata !2621, metadata !DIExpression()), !dbg !2628
  %7 = load double (i64)*, double (i64)** %tgt, align 8, !dbg !2629
  %8 = load i64, i64* %val, align 8, !dbg !2630
  %9 = musttail call fastcc double %7(i64 %8), !dbg !2631
  store double %9, double* %result, align 8, !dbg !2631
  %10 = load double, double* %result, align 8, !dbg !2632
  ret double %10, !dbg !2632

Motivation & Related issues

The motivating use-case here is a "tail call interpreter" suggested as a workaround to #8220.

A simple example is here.

LLVM gives a different error message for compiling my "simple" interpreter, but I would expect the underlying issues are related.

@Techcable Techcable added the bug Observed behavior contradicts documented or intended behavior label Jun 2, 2022
@andrewrk andrewrk added stage1 The process of building from source via WebAssembly and the C backend. backend-llvm The LLVM backend outputs an LLVM IR Module. labels Jun 2, 2022
@andrewrk andrewrk added this to the 0.11.0 milestone Jun 2, 2022
@Techcable
Copy link
Contributor Author

I am beginning to believe this is a combination of several bugs, some of which are in LLVM itself.

Our first issue (easier) is that spilling/saving results after the tail call is causing issues. When generating tail-calls, clang doesn't emit this IR even at -O0. We should probably special-case it (or wait for stage2).

This is why direct calls right now are broken. Once we fix that, we can move onto indirect calls (which is also broken in clang....).

Zig support for @musttail() is very similar to clang [[musttail]]

I was able to produce some similar C code that crashed with a similar error:

  1. c version
  2. zig version

The underlying error from LLVM is the same on both ends. Clang catches it and puts a fancy message on it, Zig just crashes

cannot perform a tail call to function 'tgt' because its signature is incompatible with the calling function
target function has different number of parameters (expected 2 but has 1) calling

In both cases, I'm fairly certian the message is from the LLVM verifier.

I'm not sure superficially changing the way the verifier checks arguments is safe. The extra "argument" the verifier seems to detect for indirect functions could be there for a good reason (web-assembly something?).
There are also two additional problems I'm worried about:

  1. The calling conventions of the function pointers use the default calling convention, but fastcc or tailcc is required for musttail.
    • Neither of these calling conventions are exposed to zig code (see below).
    • Do we need to emit an error on mismatched calling conventions? Does clang need to do this too?
  2. I found webassembly test seems to imply webassembly function pointers are more than just pointers
    • Don't know enough about webassembly to do anything more than say "that's interesting"

Overall LLVM rules seem unclear here. Looking through the github I see a lot oft compiler nerds debating the exact semantics (surprise)

I seem to be running into a lot of bugs here not just in Zig, but also in clang (line between clang/LLVM is blurry).
Looking through the test cases, it seems that LLVM upstream has no test cases for "tall calls for indirect branch".

I think we should wait to fix this until clang fixes generating the same IR.

NOTE: What I said above about save/restore is really a separate issue from this.

@dvmason
Copy link

dvmason commented Aug 30, 2022

I have a similar problem to Techcable, but my interpreter may be clearer, and it works (until it doesn't) with -fstage1 (i.e. it uses *const fn(...
The source is https://hastebin.com/uhutafopap.zig and it produces an LLVM error for every tailcall (direct or indirect), which look like:

: Aug29 ; zig test -fstage1 tails.zig
broken LLVM module found: musttail call must precede a ret with an optional bitcast
  %23 = musttail call fastcc i64 @Thread.checkP(%P.Code* %18, i64* %19, i64 %20, %Thread* %21, i64 %22), !dbg !2018
musttail call must precede a ret with an optional bitcast
  %36 = musttail call fastcc i64 %29(%P.Code* %31, i64* %32, i64 %33, %Thread* %34, i64 %35), !dbg !2029
musttail call must precede a ret with an optional bitcast
  %22 = musttail call fastcc i64 @Thread.checkP(%P.Code* %17, i64* %18, i64 %19, %Thread* %20, i64 %21), !dbg !2058
musttail call must precede a ret with an optional bitcast
     .... many more
This is a bug in the Zig compiler.thread 20158410 panic: 
Unable to dump stack trace: debug info stripped
Abort trap: 6

If I compile it without the -fstage1, it wants functions without the * const or the p(& but says,

tails.zig:67:36: error: modifier 'always_tail' requires a comptime-known function

which is no more useful. I don't really care which syntax (although the earlier one (without the &) looks nicer)

I have built a whole infrastructure with this, because my initial tests (the top part of the file) worked. So I REALLY hope it can be fixed (and ideally, soon)! My project is kind of dead in the water until I can verify it.

I saw this problem before with functions that returned an error union - which should work, but doen't - so I stopped doing that. It looks like there is something that it thinks is being done between the tailcall and the return. Unfortunately -femit-llvm-ir doesn't seem to do anything, so I don't know how else to check.

@malcolmstill
Copy link
Sponsor Contributor

@dvmason as you say, there's the same issue with returning error unions.

To work around that in the case of returning an error I include an optional pointer to an error and effectively return an error by setting that value. See https://github.com/malcolmstill/foxwren/blob/master/src/interpreter.zig

In your case I wonder if it is because the recursive functions are returning i64. If you tried a similar trick of returning the i64 via a pointer arg, does that get passed the error to give you are working binary?

The stage2 issue I think is a separate issue, which I am also facing.

@dvmason
Copy link

dvmason commented Aug 30, 2022

I dumped the IR with --verbose-llvm-ir and discovered that because my tail-called functions were returning i64, the IR that was generated was taking the returned value, and then moving it to %result and then returning that... so LLVM was correct that there was code between the musttail and the return.

  %23 = musttail call fastcc i64 @Thread.checkP(%P.Code* %18, i64* %19, i64 %20, %Thread* %21, i64 %22), !dbg !1700
  store i64 %23, i64* %result, align 8, !dbg !1700
  %24 = load i64, i64* %result, align 8, !dbg !1701
  ret i64 %24, !dbg !1701

Changing my functions to return void, solved the problem. @malcolmstill that sounds like the solution you discovered, too. I am greatly relieved that I found a work-around!!!

Now, that is not to say that the problem is fixed... just that I've found a work-around for my case.

Vexu added a commit to Vexu/zig that referenced this issue Aug 30, 2022
Vexu added a commit to Vexu/zig that referenced this issue Aug 30, 2022
@andrewrk andrewrk modified the milestones: 0.11.0, 0.10.0 Aug 30, 2022
@Techcable
Copy link
Contributor Author

Dear lord, if this works I will be so glad.

I was planning to resort to writing my interpreter in C......

@dvmason
Copy link

dvmason commented Aug 31, 2022

I was getting desperate too.... I'd been telling everyone how happy I was with Zig.... but was getting worried.... and now I am happy again!

TUSF pushed a commit to TUSF/zig that referenced this issue May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend-llvm The LLVM backend outputs an LLVM IR Module. bug Observed behavior contradicts documented or intended behavior stage1 The process of building from source via WebAssembly and the C backend.
Projects
None yet
Development

No branches or pull requests

4 participants