Image Security Scanning RFC #136

silenteh · 2020-08-12T07:25:18Z

This PR contains an RFC to discuss the implementation of Trow image (and OCI artifacts in general) security scanning.
Please use this pull request for requesting changes or in general give feedback regarding this specifc RFC

amouat

I like this a lot. Thanks - you've put a lot of thought into this.

I wonder if we can simplify it a bit or at least break it into stages, so that it doesn't all have to happen at once.

For example, the first stage could have no storage or updating and just call out to the scanner each time a report is requested.

amouat · 2020-08-12T09:28:56Z

rfcs/image-scanning.md

+## Abstract
+This document describes the introduction of container images scanning to Trow thus providing additional security 
+to a container based infrastructures.    
+Container image scanning is currently supported by all the major container registries.  


I guess you mean the cloud provider registries like ECR etc? Wasn't it removed from Docker Hub? Certainly a common feature though.

Yep I meant cloud registries. Will adjust the wording

amouat · 2020-08-12T09:30:07Z

rfcs/image-scanning.md

+
+
+## 1. Objectives
+* [ ] Security scanners integrate via the [Pluggable Image Vulnerability Scanning](https://github.com/goharbor/community/blob/master/proposals/pluggable-image-vulnerability-scanning_proposal.md) 


Should this change to be a clear goal i.e. Implement the bla proposal?

Yes good point

amouat · 2020-08-12T09:31:41Z

rfcs/image-scanning.md

+Trow web UI to consume the vulnerabilities data
+* [ ] If the security scanning fails or is not configured properly, Trow **runtime** functionalities are not impacted
+* [ ] **Deployment** of Trow should not be impacted by the availability of image scanners like Clair or Trivy
+* [ ] Secure by default. Whenever a security scanner is available all images are automatically scanned for vulnerabilities. 


I'm not sure what is intended here. If we add a security scanning service to an existing registry, it should immediately begin scanning all images?

The idea is to have Trow to be secure by default. In other words:

when a security scanner is available all images will be scanned (we don't add the possibility to skip the security scanning for specific images)

when a security scanner is available existing images should be immediately scanned and then the next run should happen based on the scheduling interval. This is how Harbor behaved 3 years ago :)
As as an Admin/Ops I would expect that once the security scanner is added I don't have to do anything else to have my images scanned.
This is of course my personal opinion. I think we need to check how heavy the security scanning is in order to have a better understanding of how Trow should behave once a new security scanner is added. It probably also depends on the amount of stored artifacts in Trow.
We should also probably fix the tokio runtime creation for each route before allowing this.

Ok, makes sense to me as long as it's just a default.

amouat · 2020-08-12T14:32:04Z

rfcs/image-scanning.md

+* [ ] If the security scanning fails or is not configured properly, Trow **runtime** functionalities are not impacted
+* [ ] **Deployment** of Trow should not be impacted by the availability of image scanners like Clair or Trivy
+* [ ] Secure by default. Whenever a security scanner is available all images are automatically scanned for vulnerabilities. 
+* [ ] New vulnerabilities are released every day. Trow should schedule image security scanning on a daily basis. 


Is daily a common default?

The way security scanning normally works is with a first step that builds a list of packages, libs etc installed in an image, before figuring out which of those packages have known vulnerabilities. If you store the list of installed software and the images it relates to, you don't need to rescan - if a new vulnerability is found that affects one of the packages in the list you can automatically update the report. But that implies the scanner needs to keep track of package lists and is somehow able to update the reports...

My idea was to allow images to be scanned both regularly and also when they are pushed to the registry.
I definitely worded it badly. I wanted to say that Trow should support scanning of images based on a intervals and at push. Daily should be mentioned as an example only. We could use the cron format to specify the interval.

Regarding the security scanning process you mentioned, I personally think it is the responsibility of the security scanners (Trivy, Clair) to behave like that. I don't think this is something Trow should take care of. Instead Trow gets the result of the scan as a JSON file and just stores it as an artifact.
The Harbor Security Spec is nothing more than an HTTP API communication channel between a registry and a security scanner. The API calls available only return the full result of the scan, not that incremental ones, as far as I understand it.

Ok, that's fine by me. Keep it simple if we can :)

amouat · 2020-08-12T14:51:10Z

rfcs/image-scanning.md

+* [ ] Allow to run a scan on demand by a user triggered action (api call and/or web ui) 
+* [ ] Expose a webhook to subscribe to notifications regarding vulnerable images
+* [ ] API endpoint exposing information about vulnerable images MUST be protected by default by authentication and authorisation methods
+* [ ] Trow admission controller MUST support blocking the pulling of images in case they have SEVERE vulnerabilities. 


Can we reword this one a bit? Perhaps something like:

"The Trow Admission Controller SHOULD add controls for blocking pulling of images with vulnerabilities. There MUST be support for configuring what level or number of vulnerabilities result in blocking and it MUST be configurable on an image or namespace level. "?

Yep good point!

amouat · 2020-08-12T14:52:53Z

rfcs/image-scanning.md

+### a. Components
+This is a breakdown of all components and their descriptions.
+
+#### Trow Security Scanner registry


Interesting - is this what Harbor does?

I'm wondering if we can simplify this at all.

I am not sure I understand what you are referring to. Github shows me that your comments are related to the #### Trow Security Scanner registry title. Were the comments meant for a different line ? If not then could you please clarify: is this what Harbor does ?

I was referring to the whole section :)

Could we just support a single scanner to begin with?

Ah apologies, I did not get the fact you meant the whole section.
Harbor does have the notion of a security scanner registry and a controller so yes the architecture is similar.
I personally liked their architecture and I believe Harbor's devs must have put quite a lot of thoughts on it already.

The list of HTTP api entry points is a copy and paste of the current spec Harbor has defined.
I am not sure whether and how the Harbor spec will be updated.
So I wanted to make sure we have a stable reference to work against and as a documentation of the current implementation.
Inevitably at some point the two implementations will diverge. For instance when Harbor implements a new spec Trow will inevitably be behind with its own implementation. Or might even diverge...

I have added the ordered list of steps to implement this RFC as a comment to the whole PR.

amouat · 2020-08-12T15:35:10Z

rfcs/image-scanning.md

+We do not need a config manifest for the vulnerability scanning report, so the mediaType is set to: `vnd.unknown.config.v1+json`  
+See Azure container [documentation](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-oci-artifacts)  
+
+* Step 3: Add an entry to the `index` file of the image which was scanned.  


I'm not sure this will work. I don't think all images will have indexes (we should check this). And what about images with multiple different tag names?

I think we have to be able to look up a report based on the image digest somehow.

I did not understand either the role of index, but after re-reading it this morning, now I think I do.
The OCI Index spec is useful at the moment only for multi platform artifacts.
They explain it in the documentation like this:
The goal is to allow multi-architecture images, through a "fat manifest" which references image manifests for platform-specific versions of an image

So basically there is one index per image:tag. So for example let's assume we have one image tag nginx:1.19 which has images for both amd64 and arm.
The client pulls the index first and then pulls the specific OCI image via its digest, based on the platform the client runs on. (For arm it will pull the arm image, for amd64 the respective amd64 image)

{ "schemaVersion": 2, "manifests": [ { "mediaType": "application/vnd.oci.image.manifest.v1+json", "size": 7143, "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f", "platform": { "architecture": "arm", "os": "linux" } }, { "mediaType": "application/vnd.oci.image.manifest.v1+json", "size": 7682, "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270", "platform": { "architecture": "amd64", "os": "linux" } } ] }

So the index in our case could be useful if the client wants to pull the security scanner reports for specific platforms. For example: pull the security scanner report for nginx:1.19 for arm.
I agree this is something we should not be focusing on at the moment.
This means also that I will need to adjust a bit how we store the image security report and instead of using the index, just use a manifest and the store the json as a file and place the manifest and the file inside the digest folder of the image/artifact.
So we can use the image digest to lookup the security report.

I've used the OCI Index (previously called manifest list) a bit, as I played with multi-arch stuff (and we have a GitHub action to build a multi-arch image). I think it only exists for images that are multi-arch - most images won't have one.

I know it's an extra step, but I think I'd rather keep the reports in a separate directory structure - so just have a scans directory with multiple directories inside (each one named after a digest) which store the reports.

Good point about keeping it in a different root folder.
One thing however which might complicate things further down the path is when (and if) we add artifacts replication to Trow.
At that point we need to make sure we replicate both the artifacts (root folder) and the scan reports (root folder).

True, but I think that's ok. I can see use cases where people want the images but not the reports, or even vice versa.

amouat · 2020-08-12T15:39:12Z

rfcs/image-scanning.md

+``` 
+
+#### Trow WebHook
+Once an image is found to have vulnerabilities we can allow pushing this information out to third party HTTP/s end points.  


I like this. Does Harbor do the same thing? What are the use cases?

I wonder if rather than send out the report it would be better just to say "this image has an updated vulnerability report" and leave it to them to go and pull the report if interested.

I am not aware of Harbor doing this, but I find it useful for simplifying third party integrations and in general to build SIEM systems.
We can either send the full report (which as long as the client implements the signature verification it's secure) or as you proposed just send a simpler JSON (this image has an updated vulnerability report).
The reason why I was suggesting to send the full report is that, as a developer, I prefer to keep the integration steps to a minimum. So basically if we send only the information regarding this image has an updated vulnerability report then as a developer I will need to:

verify the signature of the webhook (which requires a shared secret key between both the client and the server)

add API key to my http client to authenticate the next request I need to make to Trow API to pull the report
So I end up with having to maintain 2 secrets.

Yeah, it might be easiest to send the whole report. I'm just thinking they might be quite large in some cases, but it's text that compresses well.

Regarding security, is it enough to say that if you can pull the image you can get the security report? After all they could run the security scan themselves.

amouat · 2020-08-12T15:40:08Z

rfcs/image-scanning.md

+
+#### Future potential improvements
+
+The usage of HMAC for signing files could be an interesting security feature of Trow.  


By files, you mean the vulnerability reports?

I actually think that we can sign both security reports and even artifacts.
HMAC has the property of making sure nobody has tempered with local files.
Here an example:

Currently we use sha256 to make sure that an image has not been tempered with. However if the client pulls images via TAG rather than image digests, as an attacker I could modify the image at the registry level and the recalculate the sha256 and the client would not be aware of the changes.
By signing locally every file with HMAC the attacker would need to gain access not only to the registry, but also to the secret key used to signed the files via HMAC.
Ideally the secret key lives either in Vault or an HSM or cliud HSM.
This is something we can explore later because it complicates things quite a bit. The good part is that we would guarantee stronger security for Trow without any OCI changes needed. It would just be an additional security feature of Trow.

I also had some thoughts about a merkle-tree style approach. We could create a hash of all the digests in the registry, which gives us a master hash for the current state of the registry. I'm not really sure that helps much, but auditing and security is definitely one of the use-cases I want to explore with Trow.

amouat · 2020-08-12T15:40:47Z

rfcs/image-scanning.md

+#### Future potential improvements
+
+The usage of HMAC for signing files could be an interesting security feature of Trow.  
+HMAC in fact guarantees that nobody has tempered with the local files, as long as the attacker gains access to the secret key.  


Typo: doesn't gain access

Thanks for catching it. Will fix it

amouat · 2020-08-12T15:44:12Z

Also, is it common to have more than one scanning service? There's a fair bit of complexity associated with supporting mulitple services. Could we just skip and say you can register one for the time being?

silenteh · 2020-08-13T08:43:59Z

We should definitely allow only 1 scanning service to begin with. The difference in terms of complexity to support more however is not that big.
I agree we should do it in stages.
However while writing down the steps I realized that if we allow only scanning of images with push, without storing the result then how can a Trow user check the scan report ? We should then implement the Webhook I suppose or an api.
Again the API needs to be queried, so we would need to store the reports somewhere...
So here my suggestion:

Add the possibility to configure a scanner
Scan images only when images are pushed without storage
Add webhook
Add storage
Add Trow API to retrieve scan reports
Allow scheduling of security scans

What do you think ?

amouat · 2020-08-13T13:17:32Z

We should definitely allow only 1 scanning service to begin with. The difference in terms of complexity to support more however is not that big.
I agree we should do it in stages.
However while writing down the steps I realized that if we allow only scanning of images with push, without storing the result then how can a Trow user check the scan report ? We should then implement the Webhook I suppose or an api.
Again the API needs to be queried, so we would need to store the reports somewhere...
So here my suggestion:
1. Add the possibility to configure a scanner

2. Scan images only when images are pushed without storage

3. Add webhook

4. Add storage

5. Add Trow API to retrieve scan reports

6. Allow scheduling of security scans
What do you think ?

What about only scanning when the report is requested? So a user would send something like:

GET /vuln-scans/1234def

Where 1234def is the digest of the image they are interested in. When Trow gets the request, it will kick off scanning the image and return a not ready yet response to the client. The client can then poll at a regular interval. When the scan completes, Trow can store it in temporary storage and return to the client on it's next request.

It might be a bit simpler for the first pass. If the scans are deleted, it doesn't matter as they will get re-created on request.

silenteh and others added 6 commits August 5, 2020 17:29

RFC draft: image security scanning

f15a519

Adjusted headings

d29b98b

Formatting and Rust struct samples

a13cfc3

Trow security scanner controller

08337ce

Added simple architecture diagram for the security scanning

a43d5f1

Added link to architecture diagram

2f24719

amouat reviewed Aug 12, 2020

View reviewed changes

amouat added this to In progress in Trow Aug 20, 2020

amouat mentioned this pull request Aug 21, 2020

Integrate Security Scanning #144

Open

amouat changed the base branch from master to main April 21, 2021 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Security Scanning RFC #136

Image Security Scanning RFC #136

silenteh commented Aug 12, 2020

amouat left a comment

amouat Aug 12, 2020

silenteh Aug 13, 2020 •

edited

amouat Aug 12, 2020

silenteh Aug 13, 2020

amouat Aug 12, 2020

silenteh Aug 13, 2020

amouat Aug 13, 2020

amouat Aug 12, 2020

silenteh Aug 13, 2020

amouat Aug 13, 2020

amouat Aug 12, 2020

silenteh Aug 13, 2020

amouat Aug 12, 2020

amouat Aug 12, 2020

silenteh Aug 13, 2020

amouat Aug 13, 2020

silenteh Aug 13, 2020

amouat Aug 12, 2020

silenteh Aug 13, 2020

amouat Aug 13, 2020

silenteh Aug 13, 2020

amouat Aug 13, 2020

amouat Aug 12, 2020

silenteh Aug 13, 2020

amouat Aug 13, 2020

amouat Aug 12, 2020

silenteh Aug 13, 2020

amouat Aug 13, 2020

amouat Aug 12, 2020

silenteh Aug 13, 2020

amouat commented Aug 12, 2020

silenteh commented Aug 13, 2020 •

edited

amouat commented Aug 13, 2020



		## 1. Objectives
		* [ ] Security scanners integrate via the [Pluggable Image Vulnerability Scanning](https://github.com/goharbor/community/blob/master/proposals/pluggable-image-vulnerability-scanning_proposal.md)


		#### Future potential improvements

		The usage of HMAC for signing files could be an interesting security feature of Trow.

Image Security Scanning RFC #136

Are you sure you want to change the base?

Image Security Scanning RFC #136

Conversation

silenteh commented Aug 12, 2020

amouat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

silenteh Aug 13, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amouat commented Aug 12, 2020

silenteh commented Aug 13, 2020 • edited

amouat commented Aug 13, 2020

silenteh Aug 13, 2020 •

edited

silenteh commented Aug 13, 2020 •

edited