Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add binary version parsing for nodejs, php and python #6524

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

kovacs-levent
Copy link

@kovacs-levent kovacs-levent commented Apr 19, 2024

Description

Implement binary version parsing for at least key binaries. The feature should be extended upon, but first round, I'm going to be focusing stuff which matters for my use-case (PHP, NodeJS, Python)... I'm keeping Java binary parsing since it's already been implemented by laurentdelosieresmano in the related PR👍 Since go-dep-parser was moved to this Repo, I'm reopening the PR and adding Python dep parsing.

Related issues

Related PRs

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

@CLAassistant
Copy link

CLAassistant commented Apr 19, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Levente Kovacs seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@kovacs-levent
Copy link
Author

I signed the cla... Not sure why it's not triggering as signed. On my forked branch, every test passed, so if you trigger the tests it should be good.

@kovacs-levent
Copy link
Author

kovacs-levent commented Apr 19, 2024

Hmm, while all tests are passing, I do not see the new binary results when I build the trivy from source and run the same experiment as in #6457, I'm not sure why that is the case.

EDIT: Guess I realized that I not only need to implement the parsing, but the analyzer logic as part of fanal. I'll try that.

@kovacs-levent
Copy link
Author

kovacs-levent commented Apr 20, 2024

I have spent some time with this. Now python version is detected correctly by trivy and added into the sbom in a way that xeol detects it properly according to my use case.

I looked into implementing vulnerability scanning as well for standalone binaries, but I think at first only having it part of sboms is a step forward.

There are a few problems regarding implementing this with standalone binaries:

  1. It's not an OS package, hence the whole PR. But that also means I can't implement the vuln scanning as part of the ospkg detector library. When I tried that, trivy started to correctly detect the vulns in the binaries, but it also discarded everything else due to conflicting OS versions (if I implement it as ospkg detector driver, then I just lost all dpkg vulns for example).
  2. It's also not really a langpkg. but this could be solved with some extra code as part of the driver logic and without messing more with the core. However, none of the ecosystems in trivy-db are really fit for querying vulnerabilities in these standalone binaries. The best bet would be NVD I thought.
  3. BUT NVD does not have an interface to query it in trivy-db. This means that I can't query NVD's information about the standalone binary, neither OS package sources.
  4. A Get function could be implemented for NVD, but I suspect that the best way to query it is through CPE2.3 vectors. For standalone-binaries, this is easy to generate correctly I think, but afaik, trivy does not support CPE for complexity reasons (it's also not in the SBOMs).

The only way I see to implement this nicely is by adding CPE support and also putting together a Get function for the NVD vulnsrc linked in 3rd point. Other way: just hack together an NVD get interface since it'd be used only for the scanner as a fallback in case of generic binary is detected, not for regular vulnerability scanning.

Now that I know how to implement these things, I think I can put together the remaining parts (Java, PHP, NodeJS binary as part of package detection and SBOMs) in the upcoming days and the PR would be ready.

@kovacs-levent kovacs-levent marked this pull request as draft April 20, 2024 08:37
@kovacs-levent kovacs-levent changed the title feat: Add binary version parsing for java, nodejs, php and python feat: Add binary version parsing for nodejs, php and python Apr 20, 2024
@kovacs-levent
Copy link
Author

I have added support for PHP and Nodejs standalone binaries and it works like a charm with the sboms. I'm dropping support for Java because I found that it's harder to correctly determine the package type (JRE vs. JDK), and I won't go down the rabbithole since it's not part of my usecase.

Only issue I discovered yesterday is that this will break lots of integration tests which I'll have to adjust those accordingly. After I'll adjust integration tests, I think this PR can be merged.

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @kovacs-levent
Thanks for this PR.

I left some comments. Some these comments same for nodejs, php and python.

Also parsers are very similar.
I think we can merge them to 1 package. Something like this:
parser:

├── executable
│   ├── exe.go
│   ├── nodejs
│       ├── parse.go
│       ├── parse_test.go
│       └── testdata
│   ├── python
│       ├── parse.go
│       ├── parse_test.go
│       └── testdata
│   └── php
│       ├── parse.go
│       ├── parse_test.go
│       └── testdata

about analyzers:
looks like we can add new logic here - https://github.com/aquasecurity/trivy/blob/main/pkg/fanal/analyzer/executable/executable.go

If a php/python/npm binary is found, use its parser and add the library.

wdyt?

go.mod Outdated
@@ -124,6 +124,7 @@ require (
github.com/alecthomas/chroma v0.10.0
github.com/antchfx/htmlquery v1.3.0
github.com/apparentlymart/go-cidr v1.1.0
github.com/aquasecurity/go-dep-parser v0.0.0-20240213093706-423cd04548a5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the go-dep-parser as a dependency, it only uses the internal package now

return nil, err
}

if bytes.HasPrefix(data, []byte("\x7FELF")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about windows and macos formats?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't thought about that yet. Will have to check on those.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried testing with Windows PEs, the interface can be implemented for exe and OpenExe, but the regex based detection methods don't work on the PE files I tried (I did manual testing to confirm). I honestly don't have any experience with macos binaries 🫤 So not sure if I can implement the interface for those. I also tried to obtain files to test with, but failed, since now I don't have access to a macos system. Any idea if regex based detection would work on those?

If not, I'd rather not bother with implementing this for those binaries, even though you could implement the interface and executableAnalyzer would still run on those, the binary version detection won't give package in the AnalysisResult.

}

// Python's version pattern is [NUL]3.11.2[NUL]
re := regexp.MustCompile(`^\d{1,4}\.\d{1,4}\.\d{1,4}[-._a-zA-Z0-9]*$`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment with a link where you get this regex?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from syft... Should I link that? All the regexes I used, I mostly copied them from syft. For example python regex:
https://github.com/anchore/syft/blob/b04bc0fbfe5742a970aa3f06594b1e82c87eb4a5/syft/pkg/cataloger/binary/classifiers.go#L495


var libs []types.Library
libs = append(libs, types.Library{
ID: packageID(name, vers),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use dependent.ID once. So you can just use it here.

Suggested change
ID: packageID(name, vers),
ID: dependency.ID(ftypes.PythonGeneric, name, version),

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the packageID function from all parsers and replaced with dependency.ID call only

}
}

return "python", vers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to determine package name?
Looks like it might be confusing if all detected binaries are named "python"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so unfortunately, if there was additional package information, the standalone binary detection would not be needed. But I will look into it if it's obtainable from some metadata.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a way to try detect it with OS package managers, but they don't necessarily yield results unfortunately. I think if these were in package manager, then OS pkg analyzer would be scanning these, so probably also not the right way to go about it.

I wasn't able to find any way to figure out more concrete package name from the standalone binaries.

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
a := pythonBinaryAnalyzer{}
fileInfo, err := os.Lstat(tt.filePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can use 1 file and change names, permissions, etc. in test.
Then we can reduce the number of copies of test files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll rework the tests for fanal in an upcoming commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fanal test cases were reworked. It uses the common logic of executable tests. Added test cases for Python/Node/PHP binaries.

@@ -65,7 +65,6 @@ func (s *scanner) Scan(target types.ScanTarget, _ types.ScanOptions) (types.Resu
}

logger := log.WithPrefix(string(app.Type))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure why I did it, but I'll remove it in next commit, it's just some "automatic" line deletion I did I guess :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -0,0 +1,82 @@
// Ported from https://github.com/golang/go/blob/e9c96835971044aa4ace37c7787de231bbde05d9/src/cmd/go/internal/version/exe.go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Go 1.22 doesn't have this file. Can you please update the link and double check - we may need to add some changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unfortunately didn't find this in Go 1.22 either. I'll take a look into this in next commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the link and also the implementation was changed.

It's not a 1on1 port, since they repurposed it for capturing Go main module version, but we need to read whole executable file, but I used the same ideas. It also contains interfaces for other binaries (didn't put those in, only elf).

Comment on lines 41 to 42
pythonLibNameRegex := regexp.MustCompile("^libpython[0-9]+(?:[.0-9]+)+[a-z]?[.]so.*$")
pythonBinaryNameRegex := regexp.MustCompile("(?:.*/|^)python(?P<version>[0-9]+(?:[.0-9]+)+)?$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be great if you add some comment about these regex (e.g. link to docs)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypePip Type = "pip"
TypePipenv Type = "pipenv"
TypePoetry Type = "poetry"
TypePythonGeneric Type = "python"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we can use executable for new binaries

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change all the types to become "TypeXExecutable".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you meant here now, that all the new binaries could be TypeExecutable.

I use the language specific types in fanal executable.go, to figure out which binary type was detected in case of "detectable" binary. It can be localized to executable.go instead and use the common type for the "outside" return types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them as a separate, new group of variables instead in this same file.

@kovacs-levent
Copy link
Author

Thanks for the feedback, I will make the changes when I have the time, this week’s been busy, but I’ll try to make progress when I have the time.

@kovacs-levent
Copy link
Author

kovacs-levent commented Jun 3, 2024

As you can see, I finally got to finish up this PR. I'm going to work some more on this in the upcoming days, so that all the comments are reflected in the changes.

I did refactor the code to have the proposed structure.

@kovacs-levent
Copy link
Author

I had some troubles with rebasing, probably shouldn't have waited so long with that 😓 I had to force push, because other people's commits started appearing here, so I did a hard reset. Sorry for the ugly commit history 🤷

@kovacs-levent
Copy link
Author

kovacs-levent commented Jun 4, 2024

I have encountered a weird issue now... I found that if Rekor sbom source is not specificed, then ExecutableAnalyzer is getting disabled. https://github.com/aquasecurity/trivy/blob/main/pkg/commands/artifact/run.go#L514 Can someone shed some light on why is that? The commit message contains a PR... Seems like completely unrelated issue: #6163

For some reason, the whole executable analyzer was disabled, instead of only disabling Rekor queries based on the executables. Can you please provide some insight on whether it's fine to remove this, and how could I go about it to disable Rekor queries instead? @DmitriyLewen

@kovacs-levent
Copy link
Author

Problem is that, with Rekor enabled, it works fine for me, but it takes significantly longer to scan with Rekor, even though I'm not interested in that information.

@kovacs-levent
Copy link
Author

I have encountered a weird issue now... I found that if Rekor sbom source is not specificed, then ExecutableAnalyzer is getting disabled. https://github.com/aquasecurity/trivy/blob/main/pkg/commands/artifact/run.go#L514 Can someone shed some light on why is that? The commit message contains a PR... Seems like completely unrelated issue: #6163

For some reason, the whole executable analyzer was disabled, instead of only disabling Rekor queries based on the executables. Can you please provide some insight on whether it's fine to remove this, and how could I go about it to disable Rekor queries instead? @DmitriyLewen

I fixed this issue in the following commit. Now if the Rekor flag isn't enabled, it only skips the "unpackaged" hook, so Rekor queries are only made if the Rekor is actually specified, but executable analysis still happens regardless.

@DmitriyLewen
Copy link
Contributor

Hello @kovacs-levent
Thanks for your work!

I'm working on other priority tasks.
I will consider this PR when I have time for it

Regards, Dmitriy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants