Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add podman system check #22733

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add podman system check #22733

wants to merge 2 commits into from

Conversation

nalind
Copy link
Member

@nalind nalind commented May 16, 2024

Does this PR introduce a user-facing change?

Added `podman system check`.

@openshift-ci openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 16, 2024
Copy link
Contributor

openshift-ci bot commented May 16, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nalind
Once this PR has been reviewed and has the lgtm label, please assign ashley-cui for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions github-actions bot added the kind/api-change Change to remote API; merits scrutiny label May 16, 2024
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

Copy link

Cockpit tests failed for commit 701dff4. @martinpitt, @jelly, @mvollmer please check.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan at all of actually having commands in podman and the API to corrupt the local storage, it bloats podman, makes it a bit slower due extra argument setup and most importantly should never ever be used by any user outside of testing.

If we need these extra commands for testing then I would move them into their own command, i.e. cmd/podman-testing or something like this.

"github.com/containers/podman/v5/pkg/domain/entities/types"
)

func CreateStorageLayer(ctx context.Context, options *CreateStorageLayerOptions) (*types.CreateStorageLayerReport, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is technically part of a public API as we support pkg/bindings for consumers which is very bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to an internal package.

Copy link
Collaborator

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll stay out of the broader conversation for now. Some specific comments inline about system tests.

}

function make_layer_blob() {
local tmpdir=$(mktemp -d --tmpdir=${BATS_TMPDIR:-/tmp} podman_bats.XXXXXX)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have $PODMAN_TMPDIR for exactly this purpose

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using that as a parent directory.

@@ -1178,5 +1178,57 @@ function wait_for_command_output() {
die "Timed out waiting for '$cmd' to return '$want'"
}

function make_random_file() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new functions are unlikely to be used anywhere outside the new test file; they probably belong there instead of the global helpers file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved two of the three, left make_random_file() where it was.


function testing_make_image_metadata_for_layer_blobs() {
local tmpdir=$(mktemp -d --tmpdir=${BATS_TMPDIR:-/tmp} podman_bats.XXXXXX)
local imageID=$1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: imageID=${1?Missing IMAGEID argument} to catch caller errors early

#

load helpers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module urgently needs a teardown(): otherwise, any individual test failure will leave the system in an undefined and probably unrecoverable state. I don't know what said teardown() will look like; it's possible that it will involve system reset -f.

Or, perhaps better, run all these tests with temporary storage; see https://github.com/containers/podman/blob/main/test/system/330-corrupt-images.bats

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the current testing framework makes it possible to do any of that for a subset of the tests, when they are being run as part of a remote testing suite.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True: there is no way to use --root et al with podman-remote. Is there any actual reason to run these tests under podman-remote?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected that we would be keenly interested in having this function available for Podman Desktop and podman-machine cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you have written a lot of badh test code already but maybe doing this in e2e would be better?
All tests have a custom --roor/--runroot by default including remote. Also I assume these checks are expensive and slow? In this case it would be another reason for e2e as they run parallel so slow tests are not that bad there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poring over actual data, they can be time-intensive. The ones we set up for testing are small enough that the new tests finish in about 15 seconds on my dev machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear I don't want you to force you to write it all to e2e if you don't want this.
15s is fine I guess, I am working on strategies to make the system tests faster so I will keep this in mind but I know there are worse offenders so not a reason to block this one

@nalind nalind force-pushed the system-check branch 5 times, most recently from 407aba9 to 7edf6b6 Compare May 22, 2024 21:54
Makefile Outdated
@@ -678,7 +688,7 @@ remotesystem:
if timeout -v 1 true; then \
SOCK_FILE=$(shell mktemp --dry-run --tmpdir podman_tmp_XXXX);\
export PODMAN_SOCKET=unix://$$SOCK_FILE; \
./bin/podman system service --timeout=0 $$PODMAN_SOCKET > $(if $(PODMAN_SERVER_LOG),$(PODMAN_SERVER_LOG),/dev/null) 2>&1 & \
./bin/podman system service --timeout=0 --features=testing $$PODMAN_SOCKET > $(if $(PODMAN_SERVER_LOG),$(PODMAN_SERVER_LOG),/dev/null) 2>&1 & \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Still see no point in running the system service for this? You could just execute the podman-testing binary now and there really should not need to exists a remote API to corrupt the storage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we're doing remote testing, the client has access to the storage that the server is using? I thought that we'd taken steps to make that harder in order to avoid cases where we unintentionally used the non-remote code paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in both e2e and system tests, it is not as straight forward but in both tests suite there are some tests that manually overwrite the default command

# $PODMAN may be a space-separated string, e.g. if we include a --url.

So I don't think there is any problem in just executing the local test binary to corrupt the storage of the service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then, giving that a try.

@nalind nalind force-pushed the system-check branch 4 times, most recently from 4d9a87e to a837188 Compare May 28, 2024 21:42
@nalind
Copy link
Member Author

nalind commented May 28, 2024

/retitle Add podman system check

@openshift-ci openshift-ci bot changed the title WIP: add podman system check Add podman system check May 28, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2024
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@lsm5
Copy link
Member

lsm5 commented May 29, 2024

ignore rawhide

Add a `podman system check` that performs consistency checks on local
storage, optionally removing damaged items so that they can be
recreated.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Testing `podman system check` requires that we have a way to
intentionally introduce storage corruptions.  Add a hidden `podman
testing` command that provides the necessary internal logic in
subcommands.  Stub out the tunnel implementation for now.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
@@ -104,6 +106,62 @@ func PrintNetworkPruneResults(networkPruneReport []*entities.NetworkPruneReport,
return errs.PrintErrors()
}

func PrintSystemCheckResults(report *entities.SystemCheckReport) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think? there is precedent for output related functions to live in cmd/podman/foo alongside the command itself? I think we have a bunch of PrintJSON and so forths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Change to remote API; merits scrutiny release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants