Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data mover micro service design #7576

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Lyndon-Li
Copy link
Contributor

@Lyndon-Li Lyndon-Li commented Mar 28, 2024

Fix issue #7198. Add the design for data mover micro service

Copy link

codecov bot commented Mar 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.80%. Comparing base (24941b4) to head (470484d).
Report is 245 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7576      +/-   ##
==========================================
- Coverage   61.70%   58.80%   -2.91%     
==========================================
  Files         263      345      +82     
  Lines       28861    28759     -102     
==========================================
- Hits        17808    16911     -897     
- Misses       9793    10420     +627     
- Partials     1260     1428     +168     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Lyndon-Li Lyndon-Li force-pushed the data-mover-micro-service-design branch from adc1979 to 2264d56 Compare March 28, 2024 08:37
- This enable users to to control resource (i.e., cpu, memory) request/limit in a granular manner, e.g., control them per backup/restore of a volume
- This increases the resilience, crash of one VGDP activity won't affect others
- In the cases that the backup storage must be represented by a Kubernetes persistent volumes (i.e., nfs storage, [COSI][3]), this avoids to dynamically mount the persistent volumes to node-agent pods and cause node-agent pods to restart (this is not accepted since node-agent lose it current state after its pods restart)
- This prevents unnecessary full backup. Velero's fs uploaders support file level incremental backup by comparing the file name and metadata. However, at present the files are visited by host path, while pod and PVC's ID are part of the host path, so once the pod is recreated, the same file is regarded as a different file since the pod's ID has been changed. If the fs uploader is in a dedicate pod and files are visited by pod's volume path, files' full path are not changed after pod restarts, so incremental backups could continue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... once the pod is recreated, the same file is regarded as a different file....

Here is the pod the backuppod created by VBDM? Have you verified the incremental backup scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod refers to the source pod to be backed up.
The incremental backup compares the full path for each file, while the source pod's ID is part of the full path. Once the pod is recreated, the full path will change, so the same file will be treated as a new file.


## Goals
- Create a solution to make VGDP instances as micro services
- Modify the VBDM to launch VGDP micro services instead of running the VGDP instances in place
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest modifying this statement or adding one more statement to specify what we wanna implement.

Like offload the work of data movement from node agent to backup pod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

## Detailed Design
### Exposer
At present, the exposer sets ```velero-helper pause``` as the command run by backupPod/restorePod. Now, VGDP-MS command will be used, the command is like below:
```velero vgdp-ms backup --this-pod xxx --this-container xxx --this-volume xxx --resource-timeout xxx --log-format xxx --log-level xxx```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is subjective but isn't it easier to understand by calling it velero data-move? Or compile into another binary like velero-data-mover

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To lower the overhead to build the extra binary, we should keep it in velero, but rename the sub-command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed the command name to velero data-mover

Below are the parameters of the commands:
**this-pod**: Deliver the backupPod/restorePod's name to VGDP-MS. Even though VGDP-MS is running inside the backupPod/restorePod, it is not easy to get the name of pod it is running in, so it is simply delivered through the parameter.
**this-container**: When backupPod/restorePod is started, the cluster may insert other containers to the pod, so we cannot assume that the container running VGDP-MS is always the first container in the pod. So the container name is simply delivered through the parameter.
**this-volume**: The name of the snapshot/target volume. VGDP-MS accesses the snapshot/target volume by matching this-volume to the volumeMount/volumeDevice in backupPod/restorePod's spec, so that it could access the snapshot/target volume locally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass the path directly to the command?

I have the same concern about passing this-pod and this-container to the command.
The workload should not need to know the pod or container name of where it runs.

Copy link
Contributor Author

@Lyndon-Li Lyndon-Li May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the full path, we also need the volume mode(fs or block) to initialize the backup. So instead of passing more detailed parameters (path, mode, etc.), it is better to simply pass these 3 parameters and let the VGDP to get the detailed info from them.

Moreover, the fileSystemBR should work for both host-path mode(for PVB/PVR) and pod-path mode(data mover), passing 3 simple parameters and deducing the info varying from different modes is helpful to reduce the code redundancy.

Conclusively, we need these 3 parameters for below purposes:

  1. Find the path of the volume to be backed up
  2. Find the DUCR/DDCR this pod is for
  3. Create events to the backupPod/restorePod
  4. Retrieve the volume mode(fs or block)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this-pod is a workload created by velero right? or a user workload mounting the volume prior to backup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this-pod runs a subcommand of velero

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the further discussion, we will not use this-pod, this-container and this-volume, instead, we will pass the specific info required by the vgdp-ms --- volume path, volume mode, DUCR/DDCR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done on the changes


In order to have the same capability and permission with node-agent, below pod configurations are inherited from node-agent and set to backupPod/restorePod's spec:
- Volumes: Some configMaps will be mapped as volumes to node-agent, so we add the same volumes of node-agent to the backupPod/restorePod
- Environment Variables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If time permits, we should follow the best practice to limit/minimize the scope and capability of the backupPod/restorePod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, theoretically, the backupPod/restorePod only needs to host the VGDP instance. And besides that, we also need a kube client so as to fulfill below communicating and controlling purposes:

  • Watch some status of DUCR/DDCR for synchronization with the controller, i.e., ready for start VGDP; VGDP cancellation
  • Deliver information to the controller, i.e., VGDP progress

There is no other functionalities in backupPod/restorePod

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may open an issue to limit the scope of the roles of the backupPod/restorePod but it can be optional for phase 1 implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a clarification for the roles/privileges/capabilities of the backupPod/restorePod.
We can open an issue for further changes after this micro service feature is merged.


We use two mechanism to make the watch:
**Pod Phases**: VGDP Watcher watches the backupPod/restorePod's phases updated by Kubernetes. That is, VGDP Watcher creates an informer to watch the pod resource for the backupPod/restorePod and detect that the pod reaches to one of the terminated phases (i.e., PodSucceeded, PodFailed). We also check the availability & status of the backupPod/restorePod at the beginning of the watch so as to detect the starting of the backupPod/restorePod.
**Custom Kubernetes Events**: VGDP-MS generates Kubernetes events and associates them to the backupPod/restorePod at the time of VGDP instance starting/stopping and progress update, then VGDP Watcher creates another informer to watch the Event resource associated to the backupPod/restorePod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the specific use case for this? reporting the transfer status?
IMHO, we wanna try to make the code that runs in backupPod less k8s-aware or k8s-dependent, so we can test it or verify certain cases more easily. But this is an immature thought and we can have further discussions on this topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom Kubernetes Events doesn't mean a mechanism out of k8s scope, so it is still k8s aware/dependent.
Here the Events refer to the Kubernetes Event resource, that is, we generate events to the backupPod/restorePod during the data movement.
We use the Events to transfer message to the controller during the data movement, since the Pod Phase and Pod Terminate Message mechanism can only deliver info at the time of pod terminates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After offline discussion, we agreed to choose k8s events as the approach for backup/restore pod to communicate to the node-agent, but maybe point the involvedObject to the DUCR is a better choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some further checks, I can see below negative points if we want to make involvedObject as DUCR/DDCR:

  1. Kubernetes' etcd is not good at to store huge number of objects, this is why the event objects are automatically removed after 1 hour(by default). Since DUCR/DDCR are to be permanently kept, it will increase the burden to etcd when there are lots of DUCR/DDCR. On the other hand, since the events are deleted after 1 hour, we have no way to keep the events for further use, so there is no gain to keep the events with DUCR/DDCR.
  2. The ms-service solution introduced in this design is capable to be reused by other requirements in future and Event is a core part of this solution. Adding DUCR/DDCR as involvedObject requires adding one more input parameter for the ms-service. As a comparison, originally, we only require this-pod, this-container and this-volume as the input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a further discussion, we decided to associate the events to DUCR/DDCR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done on the changes

Copy link
Contributor

@shawn-hurley shawn-hurley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this, but it feels like a perfect thing to at least think through how a third-party datamover could now plugin into this system.

@Lyndon-Li
Copy link
Contributor Author

I don't see this, but it feels like a perfect thing to at least think through how a third-party datamover could now plugin into this system.

The scope of this design is for VGDP, VGDP is a part of VBDM (Velero Build-in Data Mover), so this design doesn't cover 3rd data movers.
3rd data movers still follow this section in the Volume Snapshot Data Movement design. Specifically, 3rd data movers handles the DUCR/DDCR and they are free to make their data movers micro service style or monolith service style.

I can add a clarification for this in the design.

@Lyndon-Li Lyndon-Li force-pushed the data-mover-micro-service-design branch from 2264d56 to 3527da1 Compare June 4, 2024 10:22
@shawn-hurley
Copy link
Contributor

So my thought process was:

We could use a generic controller to handle the DUCR/DDCR and then move all the data movers to micro services and have the generic controller be able to control what data mover is called when.

The biggest problem with the design at the moment is that I can only have a single Data Mover. I have to copy the full VBDM code and extend it for my use case (a particular volume for a particular driver).

Maybe this isn't how you want to solve that problem, and that is ok, but it is an option.

I would like to see us think through a solution to this problem though.

@Lyndon-Li
Copy link
Contributor Author

So my thought process was:

We could use a generic controller to handle the DUCR/DDCR and then move all the data movers to micro services and have the generic controller be able to control what data mover is called when.

The biggest problem with the design at the moment is that I can only have a single Data Mover. I have to copy the full VBDM code and extend it for my use case (a particular volume for a particular driver).

Maybe this isn't how you want to solve that problem, and that is ok, but it is an option.

I would like to see us think through a solution to this problem though.

I think this is possible theoretically, though we still need to cover lots of details, e.g., how do we publish the communication mechanism so that 3rd data movers could implement.
However, the biggest problem here is this conflicts with the existing 3rd data mover integration mechanism that has already been defined in the Volume Snapshot Data Movement design --- the 3rd data mover is integrated through DUCR/DDCR, which is simple and clear.
Therefore, without enough motivation, we don't want to modify the original design. And the benefit I can see for your proposal is that 3rd data mover authors don't need to implement the DUCR/DDCR controllers, but I don't think that is motivative enough. Moreover, I am not sure if all the 3rd data mover authors want this.

Anyway, this is out of the scope of the current design. I suggest to open another issue after this design is merged and we can continue this discussion in the issue.

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
@Lyndon-Li Lyndon-Li force-pushed the data-mover-micro-service-design branch from 3527da1 to 470484d Compare June 5, 2024 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants