Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Alpine parsing to actually extract commit instead of the checksum #241

Merged
merged 6 commits into from
Oct 31, 2024

Conversation

another-rex
Copy link
Collaborator

Migrate parts of osv-scanner's alpine extractor logic here to replace textproto, as textproto MIMEHeaders assumes the key is case insensitive which is not the case.

(There doesn't seem like there's a way to preserve the case with textproto from what I can find)

wantResultMetric: stats.FileExtractedResultSuccess,
wantInventory: nil,
wantErr: cmpopts.AnyError,
wantResultMetric: stats.FileExtractedResultErrorUnknown,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason invalid apk installed files still gets successfully extracted?

This does change the behavior to make this extractor fail on invalid files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases like NPM extraction we opted to skip invalid formats and not return an error, e.g. https://github.com/google/osv-scalibr/blob/main/extractor/filesystem/language/javascript/packagejson/extractor.go#L177

The reasoning was that it's not the extractor that's the problem but the file, so the scan shouldn't be marked "failed" just because there was an invalid file somewhere.

Copy link
Collaborator Author

@another-rex another-rex Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be handled at the scanner level? The extraction still failed, it should be up to the caller to determine what to do in those cases. We can add a "ErrInvalidFile" error to be returned that can be handled by the caller?

This is something we would want to configure in osv-scanner, where if there is an invalid file we generally want to notify the user that we didn't fully complete the scan and let them correct the file. (Though I also do see how having a log entry also kind of accomplishes this, it would be nice to have more control on the caller side)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically the idea was that the plugin is the best place to determine if this is maybe not the intended file or the file itself is invalid.

Example for a file which FileRequired says yes, but Extract should ignore and not through an error:

  • For go binaries FileRequired returns true based on some heuristics, ignoring the file content to be efficient. Extract then parses the file and might figure it's not a go binary. Thus this plugin should just ignore it.
  • package.json is used by multiple applications (e.g. vscode extensions). FileRequired just checks the filename. Extract can then determine if this is actually a javascript package or not.

Example for invalid file, which should error:

  • DPKG location is /var/lib/dpkg/status. If the parser finds an error, it's very likely because the file is corrupted. It's unlikely that there is any other app using that path for something else.

So in case of APK, which also has a quite unique location (lib/apk/db/installed), I'd say an invalid file should fail, yes.

@another-rex another-rex requested a review from vpasdf October 22, 2024 23:01
// check both that line is empty and we have filled out data in group
// this avoids double empty lines returning early
if line == "" && len(group) > 0 {
// scanner.Err() could only be non nil when Scan() returns false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this comment. I don't see why it's here. It's the expected behavior from my point of view. The Scan=false and Err=nil on EOF is more unexpected. But the code handles this nicely

@copybara-service copybara-service bot merged commit 761ca67 into google:main Oct 31, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants