Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for Velero Backup performance Improvements and VolumeGroupSnapshot enablement #7474

Open
sseago opened this issue Feb 27, 2024 · 2 comments · May be fixed by #7628
Open

Design for Velero Backup performance Improvements and VolumeGroupSnapshot enablement #7474

sseago opened this issue Feb 27, 2024 · 2 comments · May be fixed by #7628
Assignees
Labels
Area/Design Design Documents
Milestone

Comments

@sseago
Copy link
Collaborator

sseago commented Feb 27, 2024

Describe the problem/challenge you have

Note that this issue is to track the design only. There will be two separate implementation phases to implement this design, which will be tracked by separate issues as implementation will end up in a future Velero release.

There are two different goals here, linked by a single primary missing feature in the Velero backup workflow.
The first goal is to enhance backup performance by allowing the primary backup controller to run in multiple threads, enabling Velero to back up multiple items at the same time for a given backup.
The second goal is to enable Velero to eventually support VolumeGroupSnapshots.
For both of these goals, Velero needs a way to determine which items should be backed up together.
This design proposal will include two development phases:

  • Phase 1 will refactor the backup workflow to identify blocks of items that should be backed up together, and then coordinate backup hooks among items in the block.
  • Phase 2 will add multiple multiple worker threads for backing up item blocks, so instead of backing up each block as it identified, the velero backup workflow will instead add the block to a channel and one of the workers will pick it up.
  • Actual support for VolumeGroupSnapshots is out-of-scope here and will be handled in a future design proposal, but the item block refactor introduced in Phase 1 is a primary building block for this future proposal.

Background

Currently, during backup processing, the main Velero backup controller runs in a single thread, completely finishing the primary backup processing for one resource before moving on to the next one.
We can improve the overall backup performance by backing up multiple items for a backup at the same time, but before we can do this we must first identify resources that need to be backed up together.
As part of this initial refactoring, once these "Item Blocks" are identified, an additional change will be to move pod hook processing up to the ItemBlock level.
If there are multiple pods in the ItemBlock, pre-hooks for all pods will be run before backing up the items, followed by post-hooks for all pods.
This change to hook processing is another prerequisite for future VolumeGroupSnapshot support, since supporting this will require backing up the pods and volumes together for any volumes which belong to the same group.
Once we are backing up items by block, the next step will be to create multiple worker threads to process and back up ItemBlocks, so that we can back up multiple ItemBlocks at the same time.

Goals

  • Identify groups of items to back up together (ItemBlocks)
  • Manage backup hooks at the ItemBlock level rather than per-item
  • Using worker threads, back up ItemBlocks at the same time.

Non Goals

  • Support VolumeGroupSnapshots: this is a future feature, although certain prerequisites for this enhancement are included in this proposal.
  • Process multiple backups in parallel: this is a future feature, although certain prerequisites for this enhancement are included in this proposal.

Environment:

  • Velero version (use velero version):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@sseago sseago changed the title Velero Backup performance Improvements and VolumeGroupSnapshot enablement Design for Velero Backup performance Improvements and VolumeGroupSnapshot enablement Feb 27, 2024
@sseago sseago self-assigned this Feb 27, 2024
@reasonerjt reasonerjt added Area/Design Design Documents and removed 1.14-candidate labels Mar 13, 2024
@reasonerjt reasonerjt added this to the v1.14 milestone Mar 13, 2024
@sseago sseago linked a pull request Apr 5, 2024 that will close this issue
1 task
@reasonerjt
Copy link
Contributor

I understand that the mechanism for hook execution needs to be changed to support VolumeSnapshotGroup, but it's not quite clear to me why the multi-thread backup is a pre-requisite for VolumeSnapshotGroup. In other words, if all the pods don't have hooks defined, do we need multi-thread in velero backup to support VolumeSnapshotGroup?

If there are multiple pods in the ItemBlock, pre-hooks for all pods will be run before backing up the items, followed by post-hooks for all pods.

Isn't this only necessary when the bound PVCs are in the same VolumeSnapshotGroup? Otherwise how the hooks are executed in one item block does not really matter.

@sseago
Copy link
Collaborator Author

sseago commented Apr 17, 2024

@reasonerjt nulti-thread backup is not a prerequisite for VolumeGroupSnapshot. Rather, the ItemBlock concept is a prerequisite for both VolumeSnapshotGroup and multithreaded backups. In other words, if there is a need for both features, then things are far simpler to use the same building block for VolumeGroupSnapshot and multithreaded backup -- otherwise we risk doing the same work in two different ways resulting in ar more complexity.

"Isn't this only necessary when the bound PVCs are in the same VolumeSnapshotGroup" -- yes, but ordinarily having a VolumeGroup in common is exactly how two pods will end up in the same ItemBlock. The pod BIA will need to look at bound PVCs, and if any of those are in VolumeGroups, then find any other pods bound to other PVCs in the same VolumeGroup -- therefore the defined multi-pod ItemBlock will include all of the pods with volumes in the same VolumeGroup. There may be some outlying situations where a third-party plugin will create a VolumeGroup with two pods in it, and in those cases hooks may end up running together, but 1) it's an edge case and 2) it won't result in incorrect results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Design Design Documents
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants