Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: docker-lock migrate #104

Open
jonjohnsonjr opened this issue Jun 8, 2021 · 10 comments
Open

Idea: docker-lock migrate #104

jonjohnsonjr opened this issue Jun 8, 2021 · 10 comments

Comments

@jonjohnsonjr
Copy link

I've been thinking about building something similar to this that includes the ability to copy images to an alternative registry, rather than just resolving tags to digests.

Abstractly, I'd love to have some way to map various functions over collections that contain image references.

You've already implemented support for lots of collections (Dockerfiles, docker-compose files, and kubernetes manifests) and two functions (rewite and verify). We could add a migrate function (we can bikeshed the name) that calls crane.Copy, too.

Some applications:

  1. Air-gapped/locked-down clusters that can only pull from a specific registry.
  2. Copying all your dependencies to a closer registry (for availability, latency, and rate limiting reasons).

I put together a proof of concept a while back that only worked for kubernetes manifests: ko-build/ko#11

What do you think?

cc @imjasonh

@michaelperel
Copy link
Collaborator

I think this is an excellent idea - it would make migrating to a private registry much easier.

Let me know if this somewhat matches what you had in mind in terms of workflow:

  • The user has a project that uses base images from public registry(s) like Dockerhub.
  • The user runs docker lock generate to create a Lockfile. The Lockfile records all the relevant information about the public base images.
  • The user runs docker lock migrate <flags about private registry, such as auth info>. It reads the Lockfile and uses crane.Copy to copy all the base images into the private registry.
  • The Lockfile is updated to have both information about the original public registry and the new private registry
  • The user runs docker lock rewrite and it rewrites all the base images to reference those in the private registry.

Now, let’s assume that an update has been pushed to the public registry(s):

  • The user runs docker lock generate, which updates info about the images from the public registry, but leaves the private registry untouched (technically it generates a new Lockfile, but the effect is the same).
  • The user runs docker lock migrate, which reads the Lockfile and copies the updated images to the private registry
  • … The same workflow from the step above …

What do you think?

@jonjohnsonjr
Copy link
Author

flags about private registry, such as auth info

I would expect this not to require flags and just use the docker auth config file, in the same way that crane.Digest works.

The Lockfile is updated to have both information about the original public registry and the new private registry

Seems reasonable. I haven't looked at the lockfile format, so I'm not sure how this would affect existing datastructures.

How would you handle migrating to multiple private registries? Say I want to have geo-redundant k8s clusters that each pull from their nearest registry. For GCR, you could rewrite manifests three times to point to eu.gcr.io, asia.gcr.io, or us.gcr.io, deploying each to their respective continent. Would you want to track that as three separate lockfiles? Or a single lockfile with multiple downstreams, somehow?

The user runs docker lock rewrite and it rewrites all the base images to reference those in the private registry.

With some flag or something to select which registry should be used in the rewrite? Or would that decision happen during docker lock migrate?

Now, let’s assume that an update...

This seems like a reasonable workflow -- one nice thing about tracking both upstream and downstream in lockfiles is that you could do some diffing client-side to skip migrating anything that's already downstream. That might complicate things, though.

@michaelperel
Copy link
Collaborator

michaelperel commented Jun 9, 2021

I would expect this not to require flags and just use the docker auth config file, in the same way that crane.Digest works.

My apologies, that was a lapse on my part - yes, currently docker-lock just uses the auth info from the docker config file. No flags would be needed (in older versions, before I migrated to crane, this had to be configured manually).

How would you handle migrating to multiple private registries?

I am not super familiar with the geo-redundant case, but shouldn't this be handled by the registry itself? Quickly reading up on Azure (I am more familiar with it), for Azure Container Registry it appears as though you can just use one URL and it will pull from the ideal replica for you.

That said, I understand the case for having multiple replicas with multiple URLs. An alternative to having multiple Lockfiles as you suggested would be to have docker lock migrate accept flags, such as docker lock migrate <-downstream-registries=us.gcr.io,asia.gcr.io>. This could produce a Lockfile that has a key-value pair that looks like

downstream: [us.gcr.io, asia.gcr.io]

Then, when running docker lock rewrite, it could produce multiple Dockerfiles such as Dockerfile.us.gcr.io or Dockerfile.asia.gcr.io.

Just a thought, not wed to any solution, but I think it might be more ergonomic to always just have one Lockfile.

you could do some diffing client-side to skip migrating anything that's already downstream

As for client side diffing, I assume the goal is to have every replica registry contain the same images, so if an image already exists, docker-lock could just skip the crane.Copy step. (Trying to reduce the amount of work someone needs to do to use docker-lock - which is why all the current flags are implemented, instead of having people pair commands such as find with docker-lock).

I haven't looked at the lockfile format

Here is the Lockfile used in this project. It only uses Dockerhub, but following the README, you can generate them for sample projects using your own private registry.

@jonjohnsonjr
Copy link
Author

I am not super familiar with the geo-redundant case, but shouldn't this be handled by the registry itself? Quickly reading up on Azure (I am more familiar with it), for Azure Container Registry it appears as though you can just use one URL and it will pull from the ideal replica for you.

Indeed, this is how a lot of registries work, but it introduces a single point of failure at the DNS or load balancer level. Sometimes it's nice to have complete isolation between two environments, which generally means you'll need multiple image references for the "same" workload. (There are other ways to accomplish similar things, I'm just brainstorming here.)

Just a thought, not wed to any solution, but I think it might be more ergonomic to always just have one Lockfile.

That seems reasonable to me. For the use case I really have in mind, this is sufficient. What I want is this:

  1. I have a bunch of kubernetes yaml.
  2. I want to run it on my cluster.
  3. My cluster can only pull from private.example.com.
  4. docker-lock helps me copy everything I need from that yaml into private.example.com and rewrites the kubernetes manifest images to point to private.example.com instead of wherever they came from originally.

I don't know that I really need the lockfile, but it seems integral to how docker-lock functions currently, and I don't think it really hurts anything to have it. I would defer to you for the best UX here.

One thing I haven't solved is how to rename images across registries. Ideally, you could mirror the structure of the source:

docker.io/library/foo -> private.example.com/library/foo

But, what if we also have gcr.io/library/foo in an image? There would be a collision.

One nice thing is that the collisions don't really matter if you're pulling by digest, but it's something to consider (especially if we're copying tags over).

ko has worked around this in a bunch of terrible ways with different naming strategy flags -- maybe this could be solved using go templates or something to let users specify how things should be renamed.

I assume the goal is to have every replica registry contain the same images, so if an image already exists, docker-lock could just skip the crane.Copy step.

Yep, exactly.

@michaelperel
Copy link
Collaborator

I don't know that I really need the lockfile, but it seems integral to how docker-lock functions currently, and I don't think it really hurts anything to have it. I would defer to you for the best UX here.

This was raised in the other open issue, and I tend to agree that in many cases you don't need the Lockfile. When I developed this (for my own usecase) I thought it would be nice to keep the hash information out of the Dockerfiles so that they would remain as readable as possible. As the project evolved, I am still 50/50 on whether this is necessary, but currently it is how it works.

One thing I haven't solved is how to rename images across registries.

This seems pretty hairy and makes me wonder if it might just be worth supporting a smaller subset of usecases.

In terms of time for this feature, I am not sure the next time I will have to add features, but am willing to review any PRs.

In terms of UX, I would generally see:
(1) generate a Lockfile (docker lock generate)
(2) read lockfile, push to new registries via crane.Copy (docker lock migrate)

@michaelperel
Copy link
Collaborator

Upon second thought, I will play around with it in the next week and ping you with an update / code, but feel free to try some ideas out as well if you have time.

@jonjohnsonjr
Copy link
Author

In terms of time for this feature, I am not sure the next time I will have to add features, but am willing to review any PRs.

Upon second thought, I will play around with it in the next week and ping you with an update / code, but feel free to try some ideas out as well if you have time.

Sounds good -- no rush on my side as I am also a bit busy, but I might point some people towards this as a potential solution if they have time to implement it.

@michaelperel
Copy link
Collaborator

@jonjohnsonjr
I have a mvp working for the copy behavior in the branch miperel/migrate. It works with Dockerhub, but when testing it with Azure Container Registry, I ran into issues (opened an issue in crane) that also occur with the crane cli tool.

@michaelperel
Copy link
Collaborator

One thing I haven't solved is how to rename images across registries. Ideally, you could mirror the structure of the source:
docker.io/library/foo -> private.example.com/library/foo
But, what if we also have gcr.io/library/foo in an image? There would be a collision.

One other problem with this is that the same structure may not even work. For instance:

docker tag docker.io/library/redis myaccount/library/redis
docker push myaccount/library/redis

fails

but

docker tag docker.io/library/redis myaccount/redis
docker push myaccount/redis

succeeds.

In light of that, I was thinking that the simplest solution would be to just use the last part of the path, as in the example above.

However, this would be annoying for the case of a project that uses 2 images with the same last path:
bitnami/redis -> myaccount/redis
docker.io/library/redis -> myaccount/redis

I think that this case is somewhat rare though, and the command could throw a warning/error/rename a path in this case, so it could be an acceptable solution. Thoughts?

@jonjohnsonjr
Copy link
Author

I think that this case is somewhat rare though, and the command could throw a warning/error/rename a path in this case, so it could be an acceptable solution. Thoughts?

Many! So this has bit me in ~four different contexts now, and it feels like I should write something up, but I honestly haven't found a great solution to it. Let me try to enumerate some constraints to explain why I think this is difficult, and one potential path forward:

  1. Some registries allow 1 or more path components.
  2. Some registries allow 2 or more path components.
  3. Some registries allow 3 or more path components.
  4. Some registries allow exactly two path components.

If your destination is one of the first three cases, this isn't actually too bad. You can just take a configured "root" in the destination registry and append all the path components from the source registry, e.g.:

DESTINATION=dst.example.com/lock

in out
src.example.com/foo:bar dst.example.com/lock/foo:bar
src.example.com/foo/bar:baz dst.example.com/lock/foo/bar:baz
src.example.com/foo/bar/baz:quux dst.example.com/lock/foo/bar/baz:quux

As long as we're unrestricted in the maximum number of paths, you can always choose a DESTINATION that works for your target registry.

The problem is with e.g. the fourth case. If we have an upper bound on path components in a source registry, how do we flatten them into a finite number of paths for the destination registry?

With ko, we ended up adding a bunch of flags to address this (I think flags were a mistake, but whatever):

  • --preserve-import-path just keeps the entire structure, similarly to the table above.
  • --base-import-paths does what you suggest, just taking the last path component.
  • --bare just lets you hardcode the whole path as DESTINATION -- because we stick the digests in there, it doesn't really matter, but this would cause tag collisions normally.
  • and the default, which is similar to --bare, but appends an md5 of the whole path to avoid collisions

The default is ugly, but it works :/

the command could throw a warning/error/rename a path in this case

This is fine, but what is a user to do if there's an error? We would need some kind of knob to turn, I think. They are not likely to be able to change the names of all their source repositories, as those are often outside of their control.

So I've got a handful of bad ideas to deal with this, but I'm not sure if any are palatable:

  1. Stick a map[string]string in a config file that statically maps src -> dst
  2. Have a hacky config language that is a little less verbose than a map e.g.:
source: "src.example.com/foo/bar/*"
destination: "dst.example.com/foo-bar/*"
  1. Have a way to provide a go template for renaming, possibly with some custom functions for common scenarios.
  2. Have a way to configure a binary we can shell out to for renaming, which would let you do stuff in bash or whatever language you want e.g.:
$ echo 'src.example.com/foo/bar:baz' | configured-rename-binary
'dst.example.com/foo-bar:baz

I feel like solution 3 and 4 are maybe overkill, but it's hard to find a method that works for everything.

Not sure if this is helpful or not :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants