Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidecar: Add /api/v1/flush endpoint #7359

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Nashluffy
Copy link

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Adds a sidecar API with one endpoint: /api/v1/flush which calls the TSDB snapshot endpoint on the prometheus instance, then uploads all not-already-present blocks in the snapshot to object store.

There are a few issues that explain the motivation:

Essentially if this is the last time sidecar will be running (ie. cluster is being deleted, shard being removed, etc...) then without some flushing mechanism you will permanently lose up to 2 hours of data.

Verification

Beside the unit tests, running prometheus locally and calling the endpoint works as expected.

image

Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
@Nashluffy Nashluffy marked this pull request as ready for review May 15, 2024 08:44
BlocksUploaded int `json:"blocksUploaded"`
}

func (s *SidecarAPI) flush(r *http.Request) (interface{}, []error, *api.ApiError, func()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't /api/v1/snapshot just create hardlinks to existing blocks? https://github.com/prometheus/prometheus/blob/main/tsdb/block.go#L687-L696. You will have to somehow filter out the non-head block here. I'm not sure how to do that.

Alternatively, Prometheus could have a new parameter to take a snapshot of the head only.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shipper.Sync only uploads blocks that don't already exist in the bucket
https://github.com/thanos-io/thanos/blob/main/pkg/shipper/shipper.go#L284-L289

snapshotDir := s.dataDir + "/" + dir

s.shipper.SetDirectoryToSync(snapshotDir)
uploaded, err := s.shipper.Sync(r.Context())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus might produce a block in the middle of the snapshot call. I think you will have to disable the other syncer before doing anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the concern that it'd produce an overlapping block? Or that there's a potential race between two shippers Syncing at the same time?

if err != nil {
return nil, nil, &api.ApiError{Typ: api.ErrorInternal, Err: fmt.Errorf("failed to upload head block: %w", err)}, func() {}
}
return &flushResponse{BlocksUploaded: uploaded}, nil, nil, func() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we immediately shutdown Prometheus after this is done? Otherwise, the same risk of overlapping blocks appears.

Copy link
Author

@Nashluffy Nashluffy May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - if the endpoint is only meant to be called once before shutting down, does it even need to be an endpoint? Should it just be the default behavior of the sidecar shutting down? That (I think) should eliminate the concern of both overlapping blocks and two shippers racing against one another.

Although the endpoint alleviates concerns around ordering shutdown, prometheus-operator could hit the endpoint and then just delete the statefulset.

Copy link
Member

@GiedriusS GiedriusS May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love it if all of this would be handled automatically by Sidecar when it is shutting down, but that would entail deeper integration between Sidecar & Prometheus. In Sidecar, we should only care about the head block and avoid overlaps. If only there were a way to tell Prometheus to trim data from the head block that had been uploaded by Sidecar already. :(

Maybe an alternative would be to shut down Prometheus and then read the HEAD block ourselves from Sidecar? We could produce a block from it, upload it, and then trim the head. What do you think about such idea?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like that idea - I played around with it locally and it does work!
https://gist.github.com/Nashluffy/097e7df7d0a90b0cdefd2b87fb3129c8

But there's a few issues with it, mostly that opening a ReadOnlyDB reads in the WAL before it can be persisted to disk, which depending on the size can be very time consuming & memory intense. If we did want to take this approach, I think there'd have to be a finalizer on the prometheus-owned statefulsets, and a controller that would remove the finalizer. The controller would create a pod that mounts the volume, opens a ReadOnlyDB, and then persists to disk & uploads. But this is quite a lot of orchestration.

The more I think about this, the more I think persist-head-to-block needs to be an endpoint on prometheus/tsdb that we consume. There's quite an extensive thread here about it prometheus-junkyard/tsdb#346

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If overlapping blocks is not a problem due to vertical compaction being on, it should be clearly noted in the docs + logs

@MichaHoffmann
Copy link
Contributor

If we want a flush api we probably should add it to shipper so that ruler and receiver also get it, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants