Add `podman system check` #22733

nalind · 2024-05-16T20:34:28Z

Does this PR introduce a user-facing change?

Added `podman system check`.

openshift-ci · 2024-05-16T20:34:36Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nalind
Once this PR has been reviewed and has the lgtm label, please assign ashley-cui for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

packit-as-a-service · 2024-05-16T20:38:32Z

Ephemeral COPR build failed. @containers/packit-build please check.

packit-as-a-service · 2024-05-16T21:29:33Z

Cockpit tests failed for commit 701dff4. @martinpitt, @jelly, @mvollmer please check.

Luap99

I am not a fan at all of actually having commands in podman and the API to corrupt the local storage, it bloats podman, makes it a bit slower due extra argument setup and most importantly should never ever be used by any user outside of testing.

If we need these extra commands for testing then I would move them into their own command, i.e. cmd/podman-testing or something like this.

Luap99 · 2024-05-17T09:17:08Z

pkg/bindings/testing/testing.go

+	"github.com/containers/podman/v5/pkg/domain/entities/types"
+)
+
+func CreateStorageLayer(ctx context.Context, options *CreateStorageLayerOptions) (*types.CreateStorageLayerReport, error) {


this is technically part of a public API as we support pkg/bindings for consumers which is very bad

Moved it to an internal package.

edsantiago

I'll stay out of the broader conversation for now. Some specific comments inline about system tests.

edsantiago · 2024-05-20T12:52:13Z

test/system/helpers.bash

+}
+
+function make_layer_blob() {
+    local tmpdir=$(mktemp -d --tmpdir=${BATS_TMPDIR:-/tmp} podman_bats.XXXXXX)


You already have $PODMAN_TMPDIR for exactly this purpose

Switched to using that as a parent directory.

edsantiago · 2024-05-20T12:54:15Z

test/system/helpers.bash

@@ -1178,5 +1178,57 @@ function wait_for_command_output() {
    die "Timed out waiting for '$cmd' to return '$want'"
 }

+function make_random_file() {


These new functions are unlikely to be used anywhere outside the new test file; they probably belong there instead of the global helpers file.

Moved two of the three, left make_random_file() where it was.

edsantiago · 2024-05-20T12:55:28Z

test/system/helpers.bash

+
+function testing_make_image_metadata_for_layer_blobs() {
+    local tmpdir=$(mktemp -d --tmpdir=${BATS_TMPDIR:-/tmp} podman_bats.XXXXXX)
+    local imageID=$1


Suggestion: imageID=${1?Missing IMAGEID argument} to catch caller errors early

edsantiago · 2024-05-20T13:01:55Z

test/system/331-system-check.bats

+#
+
+load helpers
+


This module urgently needs a teardown(): otherwise, any individual test failure will leave the system in an undefined and probably unrecoverable state. I don't know what said teardown() will look like; it's possible that it will involve system reset -f.

Or, perhaps better, run all these tests with temporary storage; see https://github.com/containers/podman/blob/main/test/system/330-corrupt-images.bats

I'm not sure that the current testing framework makes it possible to do any of that for a subset of the tests, when they are being run as part of a remote testing suite.

True: there is no way to use --root et al with podman-remote. Is there any actual reason to run these tests under podman-remote?

I expected that we would be keenly interested in having this function available for Podman Desktop and podman-machine cases.

I know you have written a lot of badh test code already but maybe doing this in e2e would be better?
All tests have a custom --roor/--runroot by default including remote. Also I assume these checks are expensive and slow? In this case it would be another reason for e2e as they run parallel so slow tests are not that bad there.

Poring over actual data, they can be time-intensive. The ones we set up for testing are small enough that the new tests finish in about 15 seconds on my dev machine.

To be clear I don't want you to force you to write it all to e2e if you don't want this.
15s is fine I guess, I am working on strategies to make the system tests faster so I will keep this in mind but I know there are worse offenders so not a reason to block this one

Luap99 · 2024-05-23T09:32:48Z

Makefile

@@ -678,7 +688,7 @@ remotesystem:
 	if timeout -v 1 true; then \
 		SOCK_FILE=$(shell mktemp --dry-run --tmpdir podman_tmp_XXXX);\
 		export PODMAN_SOCKET=unix://$$SOCK_FILE; \
-		./bin/podman system service --timeout=0 $$PODMAN_SOCKET > $(if $(PODMAN_SERVER_LOG),$(PODMAN_SERVER_LOG),/dev/null) 2>&1 & \
+		./bin/podman system service --timeout=0 --features=testing $$PODMAN_SOCKET > $(if $(PODMAN_SERVER_LOG),$(PODMAN_SERVER_LOG),/dev/null) 2>&1 & \


I Still see no point in running the system service for this? You could just execute the podman-testing binary now and there really should not need to exists a remote API to corrupt the storage

When we're doing remote testing, the client has access to the storage that the server is using? I thought that we'd taken steps to make that harder in order to avoid cases where we unintentionally used the non-remote code paths.

Yes in both e2e and system tests, it is not as straight forward but in both tests suite there are some tests that manually overwrite the default command

podman/test/system/001-basic.bats

Line 114 in e53b96c

# $PODMAN may be a space-separated string, e.g. if we include a --url.

So I don't think there is any problem in just executing the local test binary to corrupt the storage of the service.

Okay, then, giving that a try.

nalind · 2024-05-28T21:42:27Z

/retitle Add podman system check

packit-as-a-service · 2024-05-29T13:50:34Z

Ephemeral COPR build failed. @containers/packit-build please check.

lsm5 · 2024-05-29T14:04:01Z

ignore rawhide

Add a `podman system check` that performs consistency checks on local storage, optionally removing damaged items so that they can be recreated. Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>

Testing `podman system check` requires that we have a way to intentionally introduce storage corruptions. Add a hidden `podman testing` command that provides the necessary internal logic in subcommands. Stub out the tunnel implementation for now. Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>

baude · 2024-05-31T13:20:28Z

cmd/podman/utils/utils.go

@@ -104,6 +106,62 @@ func PrintNetworkPruneResults(networkPruneReport []*entities.NetworkPruneReport,
 	return errs.PrintErrors()
 }

+func PrintSystemCheckResults(report *entities.SystemCheckReport) error {


i think? there is precedent for output related functions to live in cmd/podman/foo alongside the command itself? I think we have a bunch of PrintJSON and so forths.

openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 16, 2024

github-actions bot added the kind/api-change Change to remote API; merits scrutiny label May 16, 2024

nalind force-pushed the system-check branch from 4b89b72 to 701dff4 Compare May 16, 2024 20:54

Luap99 requested changes May 17, 2024

View reviewed changes

edsantiago reviewed May 20, 2024

View reviewed changes

nalind force-pushed the system-check branch 5 times, most recently from 407aba9 to 7edf6b6 Compare May 22, 2024 21:54

Luap99 reviewed May 23, 2024

View reviewed changes

nalind force-pushed the system-check branch 4 times, most recently from 4d9a87e to a837188 Compare May 28, 2024 21:42

openshift-ci bot changed the title ~~WIP: add podman system check~~ Add podman system check May 28, 2024

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2024

nalind force-pushed the system-check branch from a837188 to 700b696 Compare May 29, 2024 13:46

Add podman system check for checking storage consistency

cd93056

Add a `podman system check` that performs consistency checks on local storage, optionally removing damaged items so that they can be recreated. Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>

nalind force-pushed the system-check branch from 700b696 to 95b0563 Compare May 29, 2024 14:49

nalind force-pushed the system-check branch from 95b0563 to c172e1e Compare May 29, 2024 15:10

baude reviewed May 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `podman system check` #22733

Add `podman system check` #22733

nalind commented May 16, 2024

openshift-ci bot commented May 16, 2024

packit-as-a-service bot commented May 16, 2024

packit-as-a-service bot commented May 16, 2024

Luap99 left a comment

Luap99 May 17, 2024

nalind May 22, 2024

edsantiago left a comment

edsantiago May 20, 2024

nalind May 22, 2024

edsantiago May 20, 2024

nalind May 22, 2024

edsantiago May 20, 2024

edsantiago May 20, 2024

nalind May 20, 2024

edsantiago May 20, 2024

nalind May 20, 2024

Luap99 May 23, 2024

nalind May 23, 2024

Luap99 May 23, 2024

Luap99 May 23, 2024

nalind May 23, 2024

Luap99 May 23, 2024

nalind May 28, 2024

nalind commented May 28, 2024

packit-as-a-service bot commented May 29, 2024

lsm5 commented May 29, 2024

baude May 31, 2024

		#

		load helpers

Add podman system check #22733

Are you sure you want to change the base?

Add podman system check #22733

Conversation

nalind commented May 16, 2024

Does this PR introduce a user-facing change?

openshift-ci bot commented May 16, 2024

packit-as-a-service bot commented May 16, 2024

packit-as-a-service bot commented May 16, 2024

Luap99 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edsantiago left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalind commented May 28, 2024

packit-as-a-service bot commented May 29, 2024

lsm5 commented May 29, 2024

Choose a reason for hiding this comment

Add `podman system check` #22733

Add `podman system check` #22733