Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build.zig: issue with caching with chained build.step.Run steps #19817

Open
bfredl opened this issue Apr 30, 2024 · 2 comments · May be fixed by #19974
Open

build.zig: issue with caching with chained build.step.Run steps #19817

bfredl opened this issue Apr 30, 2024 · 2 comments · May be fixed by #19974
Labels
bug Observed behavior contradicts documented or intended behavior

Comments

@bfredl
Copy link
Sponsor Contributor

bfredl commented Apr 30, 2024

Zig Version

0.13.0-dev.56+956f53beb

Steps to Reproduce and Observed Behavior

Consider build.zig logic where one build.step.Run step takes the output of another such step as an input:

    const step1 = b.addRunArtifact(extracter);
    step1.addFileArg(b.path("src/foo.c"));
    const foo_header = step1.addOutputFileArg("foo.generated.h");

    const step2 = b.addRunArtifact(generator);
    step2.addFileArg(foo_header);
    const rpc_bindings_header = step2.addOutputFileArg("foo.rpc_bindings.h");

    // dummy installs so that output state is visible..
    b.getInstallStep().dependOn(&b.addInstallFileWithDir(foo_header, .header, "foo.generated.h").step);
    b.getInstallStep().dependOn(&b.addInstallFileWithDir(rpc_bindings_header, .header, "foo.rpc_bindings.h").step);

Assume that step2 is much more expensive than step1. If src/foo.c changed we obviously need to re-run step1, but step2 should only need to re-run if the foo_header actually changed as a result.

But any new state of src/foo.c causes both step1 and step2 to run, even when foo.generated.h output is identical. This is because the complete path of the built foo_header depend on the input state of step1. and in turn, the hash calculated by build.step.Run.make in step2 depends not only of the actual contents of addFileArg() file, but also its name which leaks the hash of step1 inputs.

Complete executable test case: https://github.com/bfredl/zig-run4run . The first step extracts only the prototypes of the c functions, so changing implementation_of_foo() to implementation_of_foo2() would illustrate the behavior.

Expected Behavior

When doing an incremental rebuild, if a change to src/foo.c causes step1 to run but still producing the same "foo.generated.h" contents as an earlier rebuild, do not run step2 but use the exiting cached foo.rpc_bindings.h from that earlier build.

I think this could be done in two different ways:

  • change the full path of foo_header.getPath() to only depend on that file's contents, not all inputs to step1
  • somehow make the input hash of step2 only be calculated by the files contents and intended file name, not the full prefixed path ( like zig-cache/o/{HASH}/foo.generated.h )

I experimented a bit with the second option but I couldn't make it work myself.

@bfredl bfredl added the bug Observed behavior contradicts documented or intended behavior label Apr 30, 2024
@rohlem
Copy link
Contributor

rohlem commented Apr 30, 2024

Option (2) (not including the full input path in its hash) seems like it would be backwards incompatible to me:
Currently the file path could very well influence the behavior of the run step (f.e. using it as argument to other programs, writing it to a file, etc.).
I assume we would want the build.zig file to specify when to trigger this behavior via the API somehow.

Option (1) (moving generated files to a path based on their contents) seems like a more general solution -
we would still need a way to find a file based on its inputs(' hash), f.e. via (a file providing the content's hash, which can act like) a symlink.

Just FYI, I think providing a file's content to a run step via stdin already uses the file content's hash,
which might be an appropriate workaround for you in the meantime.

@bfredl
Copy link
Sponsor Contributor Author

bfredl commented May 4, 2024

Unfortunately step2.setStdIn(.{ .lazy_path = foo_header }) does not work as a workaround. it still adds the full generated path of foo_header to the hash, same as for addFileArg(foo_header)

bfredl added a commit to bfredl/zig that referenced this issue May 15, 2024
fixes ziglang#19817

This improves the efficiency of the cache when chaining muliple commands
like

    const step1 = b.addRunArtifact(tool_fast);
    step1.addFileArg(b.path("src/input.c"));
    const output1 = step1.addOutputFileArg("output1.h");

    const step2 = b.addRunArtifact(tool_slow);
    step2.addFileArg(output2);
    const chained_output = step2.addOutputFileArg("output2.h");

assume that step2 takes much long time than step1
if we make a change to "src/input.c" which produces an identical
"output1.h" as a previous input, one would expect step2 not to
rerun as the cached output2.h only depends on the content of output1.h

However, this does not work yet as the hash of src/input.c leaks into
the file name of the cached output1.h, which the second run step
interprets as a different cache key. Erasing the "zig-build/o/{HASH}"
part of the file name in the hash key fixes this.
bfredl added a commit to bfredl/zig that referenced this issue May 15, 2024
fixes ziglang#19817

This improves the efficiency of the cache when chaining muliple commands
like

    const step1 = b.addRunArtifact(tool_fast);
    step1.addFileArg(b.path("src/input.c"));
    const output1 = step1.addOutputFileArg("output1.h");

    const step2 = b.addRunArtifact(tool_slow);
    step2.addFileArg(output1);
    const chained_output = step2.addOutputFileArg("output2.h");

assume that step2 takes much long time than step1
if we make a change to "src/input.c" which produces an identical
"output1.h" as a previous input, one would expect step2 not to
rerun as the cached output2.h only depends on the content of output1.h

However, this does not work yet as the hash of src/input.c leaks into
the file name of the cached output1.h, which the second run step
interprets as a different cache key. Erasing the "zig-build/o/{HASH}"
part of the file name in the hash key fixes this.
bfredl added a commit to bfredl/zig that referenced this issue May 15, 2024
fixes ziglang#19817

This improves the efficiency of the cache when chaining muliple commands
like

    const step1 = b.addRunArtifact(tool_fast);
    step1.addFileArg(b.path("src/input.c"));
    const output1 = step1.addOutputFileArg("output1.h");

    const step2 = b.addRunArtifact(tool_slow);
    step2.addFileArg(output1);
    const chained_output = step2.addOutputFileArg("output2.h");

assume that step2 takes much long time than step1
if we make a change to "src/input.c" which produces an identical
"output1.h" as a previous input, one would expect step2 not to
rerun as the cached output2.h only depends on the content of output1.h

However, this does not work yet as the hash of src/input.c leaks into
the file name of the cached output1.h, which the second run step
interprets as a different cache key. Erasing the "zig-build/o/{HASH}"
part of the file name in the hash key fixes this.
bfredl added a commit to bfredl/zig that referenced this issue May 15, 2024
fixes ziglang#19817

This improves the efficiency of the cache when chaining muliple commands
like

    const step1 = b.addRunArtifact(tool_fast);
    step1.addFileArg(b.path("src/input.c"));
    const output1 = step1.addOutputFileArg("output1.h");

    const step2 = b.addRunArtifact(tool_slow);
    step2.addFileArg(output1);
    const chained_output = step2.addOutputFileArg("output2.h");

assume that step2 takes much long time than step1
if we make a change to "src/input.c" which produces an identical
"output1.h" as a previous input, one would expect step2 not to
rerun as the cached output2.h only depends on the content of output1.h

However, this does not work yet as the hash of src/input.c leaks into
the file name of the cached output1.h, which the second run step
interprets as a different cache key. Erasing the "zig-build/o/{HASH}"
part of the file name in the hash key fixes this.
bfredl added a commit to bfredl/zig that referenced this issue May 15, 2024
fixes ziglang#19817

This improves the efficiency of the cache when chaining muliple commands
like

    const step1 = b.addRunArtifact(tool_fast);
    step1.addFileArg(b.path("src/input.c"));
    const output1 = step1.addOutputFileArg("output1.h");

    const step2 = b.addRunArtifact(tool_slow);
    step2.addFileArg(output1);
    const chained_output = step2.addOutputFileArg("output2.h");

assume that step2 takes much long time than step1
if we make a change to "src/input.c" which produces an identical
"output1.h" as a previous input, one would expect step2 not to
rerun as the cached output2.h only depends on the content of output1.h

However, this does not work yet as the hash of src/input.c leaks into
the file name of the cached output1.h, which the second run step
interprets as a different cache key. Erasing the "zig-build/o/{HASH}"
part of the file name in the hash key fixes this.
bfredl added a commit to bfredl/zig that referenced this issue May 15, 2024
fixes ziglang#19817

This improves the efficiency of the cache when chaining muliple commands
like

    const step1 = b.addRunArtifact(tool_fast);
    step1.addFileArg(b.path("src/input.c"));
    const output1 = step1.addOutputFileArg("output1.h");

    const step2 = b.addRunArtifact(tool_slow);
    step2.addFileArg(output1);
    const chained_output = step2.addOutputFileArg("output2.h");

assume that step2 takes much long time than step1
if we make a change to "src/input.c" which produces an identical
"output1.h" as a previous input, one would expect step2 not to
rerun as the cached output2.h only depends on the content of output1.h

However, this does not work yet as the hash of src/input.c leaks into
the file name of the cached output1.h, which the second run step
interprets as a different cache key. Erasing the "zig-build/o/{HASH}"
part of the file name in the hash key fixes this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants