-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: migrate semantic
from osv-scanner
#316
base: main
Are you sure you want to change the base?
Conversation
"math/big" | ||
"strings" | ||
|
||
"github.com/google/osv-scalibr/internal/cachedregexp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just define all regexes at the top-level and refer to those? Then we don't need a cachedregexp implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but then everyone that imported the library would pay the cost of compiling every regexp regardless of if the function using that regexp actually gets called, which is especially interesting in a CLI context where it's expected that only a handful of "branches" will be executed in a single run (i.e. on an average project we'd expect to scan one or two lockfiles, rather than running every single extractor and semantic comparator).
Plus, I like not having to think about it: with this I can just use cachedregexp
inline every single time I need a regexp rather than having to think about the cost - semantic in particular has the largest regexp I've ever had the horror of using for parsing Python versions which is sourced from an official PEP so we don't really want to touch it.
(by extension I guess I'm implying I have a preference for collocating regexps to their usage too, as typically they're only used in one function)
Granted the key thing is that the regexps are not inlined - beyond that we're talking about saving a couple of milliseconds more, but in contrast as far as I know the only cost to using cachedregexp
is an extra ~8 bytes in the binary size, which already varies more than that across different OSs 🤷
(fun side fact: moving that Python regexp inline will take go test ./semantic/... -count=10
from 3 seconds to 20 seconds, and it just keeps growing from there 😅)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option if we really don't want the global map is to lazily initialise the few big regexp's, but as Gareth said then you have to start thinking about the cost of each regexp and whether it is worth it to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to challenge this slowly compiling regex. Can we do something about that? Like change the regex to be more efficient or use code for the heavy part and use multiple regexes for sub problems? I haven't looked into the regex yet, just wanting to make sure we don't ignore an easy solution.
If there is really no chance on that: I think cachedregexp is a good alternative for those cases which are not on the hot path. We need to avoid using cachedregexp everywhere tho, as it has an overhead.
on an average project we'd expect to scan one or two lockfiles, rather than running every single extractor and semantic comparator
This is just one use case. When scanning entire systems we usually run most plugins eventually. It can happen that a single plugin runs more than 100k times. In those use cases it's very important to keep hot paths fast. Thus we can not use cachedregexp on the hot path of FileRequired. Usage inside Extract is likely fine. In an ideal world, we would have benchmarks for this so we can make quantitative decisions.
We plan to add better benchmarks to Scalibr, so it is easier to understand performance impact of various changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to challenge this slowly compiling regex. Can we do something about that?
Technically yes, but as I said this particular regexp comes from a third-party - specifically, it's the official regexp provided by Python in their PEP on how versions should be parsed, meaning by using it we're officially compliant with the whole ecosystem and know for sure that any version which is not matched by that regexp is not a valid version.
Attempting to optimize it or break it up would risk introducing a different behaviour, and so far it's not been an issue so long as it's only compiled the once - but yes technically we could break it up (though I'd not want to do that as part of this migration, as I think there'd be too much risk)
We need to avoid using cachedregexp everywhere tho, as it has an overhead.
Can you explain more about this overhead? My understanding is that there is already an overhead by having the regexps as globals, which is that they all get compiled when the package is imported/used - are you saying that the cost of calling the function or doing the lookup in the map every time has a more impactful overhead than just always compiling every regexp, or is there another source of overhead I'm missing?
This is just one use case. When scanning entire systems we usually run most plugins eventually.
"scanning entire systems" is also just one use case 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the ability to trigger all regexes and populate the cache on startup instead of invocation.
I'm fine leaving this as is, but would like a way to force compile all regexes at startup. Specifically because we'll be running this in a server and:
- I don't want to risk a crash due to specific requests. I'd rather the crash happen at server startup.
- All functions will eventually be hit. I'd rather have the upfront startup costs than variable request latencies.
I'm not sure how to do this in the current architecture. I like that the regexes live inside the specific ecosystem files, but I'm not opposed to @erikvarga 's suggestion to move them into the cache package so this is easier. As a stop-gap, if submitted as is, we'll just call all functions that use cachedregexp
at server startup to force the compilation. But I don't like that long term.
"cachedregexp everywhere tho, as it has an overhead"
@vpasdf what overhead are you concerned about? Additional binary size, slowdown of calling the functions, something else? Would pre-calling all functions at startup, mimicking inline, fix this for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alowayed that's a very good, and has me leaning to removing cachedregexp
since the performance difference between having it vs using globals is a lot smaller than compiling on the hot path, and in a server context those risks are tons worse than then warm fuzzy "we're being efficient devs" feeling cachedregexp
can give us.
Ideally long-term it would be awesome if we could have something like this and some kind of "hey if you're on a server, call precompileEverything
" kind of pattern, though sadly I can't think of any best pattern for enabling that which doesn't have tradeoffs - ultimately most of the ideas I have come back to having globals anyway...
Since this should be straightforward to remove and I've still not decided if I want to do it here or in a dedicated PR, I'll leave this to action later as I'd still like to hear more about the overhead @vpasdf has hinted at (for my own learning, if nothing else), but confirming I'm now leaning towards not using cachedregexp
for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about the overhead of the hashmap lookup compared to compiling it before running and having it in a global variable. So I'm concerned about cpu time. This assumes of course that the hashmap lookup might be slower than the regex matching. This is for the python case likely not the case, but for trivial regexes I'm not sure.
There is also some memory overhead through the hashmap, as hashmaps store some metadata around the actual data. But I think this is neglectible here.
I'm fine with having cachedregexp. I think @alowayed made a good point about the server use case. If we use it in FileRequired I want to see benchmarks first, but I think this is not the plan rn anyways.
semantic/utilities.go
Outdated
return num | ||
} | ||
|
||
panic(fmt.Sprintf("failed to convert %s to a number", str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd like to avoid panics in SCALIBR - can we propagate an error instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though it feels like using an error would require a bunch of churn, it would probably still be straightforward to do (though involve a lot of tedious error up-passing), but I really dislike the idea because this panic should never ever happen - it's intended to be called by parsers after they've determined the input can be cast i.e. as they've done a regexp match on \d+
(otherwise, they should be using convertToBigInt
which returns a bool indicating if the input was successfully cast).
Less important but still part of my feelings is that right now we behave like Compare
and friends, which I think returning an error would make no longer the case (i.e. right now we're doing x.Compare(y): int
).
Having said that, we've never really discussed how we should handle if semantic
is passed a version that is invalid and maybe this is a good time for that? Generally from memory just about all native version comparers handle "invalid" versions in a non-erroring way, which is why we've not needed to return an error most of the time (which in turn has made things a lot simpler) though I'd have to check my notes to confirm if that's the case for all native comparers, and with this becoming public it might just be safer to change the functions to return an error (worst case we could introduce a MustCompare
🤷)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 please no panics. In some use cases panics are very hard to handle.
An alternative could be to introduce a wrapping struct, so you can check if it is a number close to the comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative could be to introduce a wrapping struct, so you can check if it is a number close to the comparison
could you expand or give an example of what you mean by that, just to make sure I understand the pattern you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now we behave like
Compare
This panic can only be hit while parsing the version. So (version) Compare(version) int
can still be error free because the parsing and error handling happened beforehand. This would only impact (version) CompareStr(str) (int, error)
which seems reasonable to me because the str
may not be a valid version.
This also seems more cohesive with the top level Parse
which would look like:
func Parse(str string, ecosystem string) (Version, error) {
//nolint:exhaustive // Using strings to specify ecosystem instead of lockfile types
switch ecosystem {
case "Alpine":
return parseAlpineVersion(str)
...
all native version comparers handle "invalid" versions in a non-erroring way
What "native version comparers" are you referring to? In our use case specifically, we have no control over where the version strings comes from. And we'll be using this server side, so this will be a query of death for our service.
this panic should never ever happen
Optimistic words :)
This assumes the regexes are always correct. If there's an update or a new ecosystem where a non-number can sneak through the regex would panic. A panic is dangerous because it is possible for deferred functions to deadlock or further corrupt internal or external state. If submitted as is, we'd wrap semantic.Parse(...)
in a recovery block to avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This panic can only be hit while parsing the version
nope, it's called on the compare path for Debian (splitDebianDigitPrefix
-> compareDebianVersions
-> DebianVersion#Compare
)
What "native version comparers" are you referring to?
I'm referring to the libraries and packages and code and everything that handles doing version comparisons in the native ecosystem for the particular version, which is what semantic
is mirroring i.e. org.apache.maven.artifact.versioning.ComparableVersion
in Maven, Gem::Version
for Bundler, packaging.version
in Python, version_compare
in Composer, etc
The goal of semantic
is to mirror both the spec and the implementation as closely as possible (in that order), so if a particular native version comparer rejected an input, I would have pursued semantic
doing the same - at this point it has been so long since I implemented most of these that I don't want to swear by that and so yes its worth checking cause better safe than sorry, but I have high confidence and wanted to share some of the history behind the current structure.
Optimistic words :)
I don't think they're overly optimistic given the extensiveness of my test suite ;) (in saying that, I'd have a better argument if the test suite included a fuzzer 😅)
I don't disagree though that it's actually impossible, and yes I get your position more now knowing its in a server side context.
As some more background (and maybe as a small point of pride), I have done a lot of testing as part of building all the comparators, involving a lot of manual and automatic crafting of version strings for these ecosystems, and tossing a lot of crap at both the native and semantic
comparators to check their behaviour - despite that, yes a panic
is death for a server, which trumps all the optimism in the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative could be to introduce a wrapping struct, so you can check if it is a number close to the comparison
could you expand or give an example of what you mean by that, just to make sure I understand the pattern you have in mind?
Instead of Parse(foo) (version, error)
you could do Parse(foo) version
, where version has a function Valid() bool
or something like that. Then you could do the error handling in the place you actually use the version. Just a random thought, idk if this is the best idea here.
@another-rex @erikvarga thinking about this in the public context and especially in regards to error handling, I'm wondering if we want to start by making just about everything except for I don't know much about the desired uses for i.e. this usage of |
func Parse(str string, ecosystem string) (Version, error) { | ||
//nolint:exhaustive // Using strings to specify ecosystem instead of lockfile types | ||
switch ecosystem { | ||
case "Alpine": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would be nice to have these as package level consts. Can be punted to a future PR as I don't have a good place in mind to put these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is a point to be discussed w/ @another-rex and co, since these are currently OSV ecosystems - string
was used over the Ecosystem
to avoid a cycle import as that particular type has had a mess (and duplicate) history which we've been working to improve
i.e. we probably want to change these to be the new Ecosystem
type and then use the constants defined in the osv-schema
package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Leaving to your and @another-rex to hash out. No AIs on this comment thread from my end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@another-rex and I caught up on this and confirmed we're wanting to keep using the string
type for now since that's what we're using elsewhere in osv-scalibr
.
Since the goal is to eventually switch over to using the type / constants from https://github.com/ossf/osv-schema I think we should leave them as-is rather than make them package constants
semantic/utilities.go
Outdated
return num | ||
} | ||
|
||
panic(fmt.Sprintf("failed to convert %s to a number", str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now we behave like
Compare
This panic can only be hit while parsing the version. So (version) Compare(version) int
can still be error free because the parsing and error handling happened beforehand. This would only impact (version) CompareStr(str) (int, error)
which seems reasonable to me because the str
may not be a valid version.
This also seems more cohesive with the top level Parse
which would look like:
func Parse(str string, ecosystem string) (Version, error) {
//nolint:exhaustive // Using strings to specify ecosystem instead of lockfile types
switch ecosystem {
case "Alpine":
return parseAlpineVersion(str)
...
all native version comparers handle "invalid" versions in a non-erroring way
What "native version comparers" are you referring to? In our use case specifically, we have no control over where the version strings comes from. And we'll be using this server side, so this will be a query of death for our service.
this panic should never ever happen
Optimistic words :)
This assumes the regexes are always correct. If there's an update or a new ecosystem where a non-number can sneak through the regex would panic. A panic is dangerous because it is possible for deferred functions to deadlock or further corrupt internal or external state. If submitted as is, we'd wrap semantic.Parse(...)
in a recovery block to avoid this.
semantic/version.go
Outdated
// when parsed as the concrete Version relative to the subject Version. | ||
// | ||
// The result will be 0 if v == w, -1 if v < w, or +1 if v > w. | ||
CompareStr(str string) int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this should be:
Comare(Version v) int
Instead of CompareStr
. This would solve some of the error / panic discussions above. And is also more inline with how most custom comparison within Go works (comparing objects of the same type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because then it'd mean we have to handle being given a version from a different ecosystem, which itself would require returning an error
- that was the reason why I did CompareStr
, though yes if we're talking about returning an error at all here then we could explore changing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still possible for someone to provide a version from a different ecosystem as a string no?
Here's the user journey that would results in this:
semantic
is run in a server that exposes a osv.dev like API that accepts package names, version, ecosystem, etc and returns vulnerabilities.- A user provides some incorrect package like
foo
, claims the ecosystem isDebian
, and provides a non-debian version, let's say1.2.3-not-a-debian-vesion!@#$
. - The server retrieves a vulnerability with version range [
1.0.0
,2.0.0
]. - The server runs
lowerBound := semantic.Parse(1.0.0, "Debian")
. - The server runs
lowerBound.CompareStr("1.2.3-not-a-debian-vesion!@#$")
, leading to a panic.
I'm surprised osv-scanner doesn't crash when there's a non-debian version in say a status
file that's scanned.
For our use case, we'd likely do:
func CompareVuln(vulnVersion, packageVersion, ecosystem str) (error) {
defer func() {
if r := recover(); r != nil {
fmt.Println("Recovered:", r)
}
}()
vulnV, err := semantic.Parse(lowerV, ecosystem)
if err != nil { // handle }
_ := vulnV.CompareStr(packaveVersion)
// Rest omitted for brevity.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure it's possible, but it just doesn't break anything because either implicitly or not, generally the native comparators resolve every input down to a result rather than erroring and so we do too i.e. in your example semantic
handles that just fine in the same way that dpkg
does:
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ go test ./pkg/semantic/...
--- FAIL: TestVersion_Compare_Ecosystems (0.00s)
--- FAIL: TestVersion_Compare_Ecosystems/Debian (0.00s)
compare_test.go:235: Expected 1.2.3-not-a-debian-vesion!@#$ to be less than 1.2.3, but it was greater than
compare_test.go:235: 1 of 31 failed
FAIL
FAIL github.com/g-rath/osv-detector/pkg/semantic 0.230s
FAIL
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ dpkg --compare-versions '1.2.3-not-a-debian-vesion!@#$' 'lt' '1.2.3'
dpkg: warning: version '1.2.3-not-a-debian-vesion!@#$' has bad syntax: invalid character in revision number
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ echo $?
1
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ dpkg --compare-versions '1.2.3-not-a-debian-vesion!@#$' 'gt' '1.2.3'
dpkg: warning: version '1.2.3-not-a-debian-vesion!@#$' has bad syntax: invalid character in revision number
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ echo $?
0
(again, I'd really like to say this is really the case for "all comparators" but it's been so long since I implemented some of these that I want to confirm that before throwing such a big claim around)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm looking at the code, wouldn't this panic with 1.2.3-not-a-debian:version!@#$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup that one does it, which also matches dpkg --compare-version
and is why I didn't want to double down on "this never panics" 😅
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0 took 4s
❯ go test ./pkg/semantic/...
--- FAIL: TestVersion_Compare_Ecosystems (0.00s)
--- FAIL: TestVersion_Compare_Ecosystems/Debian (0.00s)
panic: failed to convert 1.2.3-not-a-debian to a number [recovered]
panic: failed to convert 1.2.3-not-a-debian to a number
goroutine 35 [running]:
testing.tRunner.func1.2({0x52fc20, 0xc00038c040})
/home/jones/.goenv/versions/1.21.11/src/testing/testing.go:1545 +0x238
testing.tRunner.func1()
/home/jones/.goenv/versions/1.21.11/src/testing/testing.go:1548 +0x397
panic({0x52fc20?, 0xc00038c040?})
/home/jones/.goenv/versions/1.21.11/src/runtime/panic.go:914 +0x21f
github.com/g-rath/osv-detector/pkg/semantic.convertToBigIntOrPanic({0xc00039c000, 0x12})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/utilities.go:13 +0xa5
github.com/g-rath/osv-detector/pkg/semantic.parseDebianVersion({0xc00039c000?, 0x0?})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/version-debian.go:153 +0xc7
github.com/g-rath/osv-detector/pkg/semantic.Parse({0xc00039c000?, 0x2?}, {0x55c248?, 0x4c0505?})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/parse.go:29 +0x777
github.com/g-rath/osv-detector/pkg/semantic_test.parseAsVersion(0xc0001d0680, {0xc00039c000, 0x1e}, {0x55c248, 0x6})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:97 +0x6b
github.com/g-rath/osv-detector/pkg/semantic_test.expectCompareResult(0xc0001d0680, {0x55c248, 0x6}, {0xc00039c000, 0x1e}, {0xc00039c021, 0x5}, 0xffffffffffffffff)
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:115 +0x92
github.com/g-rath/osv-detector/pkg/semantic_test.expectEcosystemCompareResult(0xc0001d0680, {0x55c248, 0x6}, {0xc00039c000, 0x1e}, {0xc00039c01f, 0x1}, {0xc00039c021, 0x5})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:141 +0x97
github.com/g-rath/osv-detector/pkg/semantic_test.runAgainstEcosystemFixture(0xc0001d0680, {0x55c248, 0x6}, {0x55f5da, 0x13})
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:78 +0x328
github.com/g-rath/osv-detector/pkg/semantic_test.TestVersion_Compare_Ecosystems.func1(0x0?)
/home/jones/workspace/projects-personal/osv-detector/pkg/semantic/compare_test.go:235 +0x5a
testing.tRunner(0xc0001d0680, 0xc000121560)
/home/jones/.goenv/versions/1.21.11/src/testing/testing.go:1595 +0xff
created by testing.(*T).Run in goroutine 18
/home/jones/.goenv/versions/1.21.11/src/testing/testing.go:1648 +0x3ad
FAIL github.com/g-rath/osv-detector/pkg/semantic 0.006s
FAIL
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ dpkg --compare-versions '1.2.3-not-a-debian:version!@#$' 'gt' '1.2.3'
dpkg: error: version '1.2.3-not-a-debian:version!@#$' has bad syntax: epoch in version is not number
osv-detector on main [$!?] via 🐹 v1.21.11 via v20.11.0
❯ echo $?
2
@alowayed @zpavlinovic @erikvarga thanks for the reviews and discussions so far! I think there's been enough for me to do another pass before we continue further - notably, I think we should look at making I expect too I'll probably go ahead for now with making most things private to make it easier to iterate on the internals - I think that probably comes down to I'll also try to do a general refresher on the package and related native version comparers as a whole since while I do have confidence in what I've said so far, I wrote a lot of this 1-2 years ago and its required very little change besides adding new comparators, which while a great sign of stability (or lack of use... 😅) does mean I'm not feeling as confident as I'd like when trying to think through tradeoffs and make architecture decisions Feel free to continue discussions and review, but also feel free to wait until I've actioned the above 🙂 |
12bee2c
to
cc237d6
Compare
"math/big" | ||
) | ||
|
||
func convertToBigInt(str string) (*big.Int, error, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now I went with having this return a bool
along with an error
because a lot of the existing code was v, vIsNumber
and it felt weird to either rename it to vErr
or to keep it the name but compare it against nil
.
I don't think this is super bad, but now that I've got everything transitioned if others felt it would be more Go-y to change all the checks to be checking the err
, I can do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a concrete example of what I mean:
vv, vErr := convertToBigInt(vt.value)
wv, wErr := convertToBigInt(wt.value)
// numeric tokens have the same natural order
if vErr != nil && wErr != nil {
return vv.Cmp(wv) == -1, nil
}
vs
vv, vIsNumber := convertToBigInt(vt.value)
wv, wIsNumber := convertToBigInt(wt.value)
// numeric tokens have the same natural order
if vIsNumber != nil && wIsNumber != nil {
return vv.Cmp(wv) == -1, nil
}
though, I've just realised I could probably actually switch to checking if vv
is nil
...? but I think that has a similar trade off in readability 🤔
@alowayed @zpavlinovic @erikvarga @another-rex I've updated the implementation so that instead of panicking I think long-term there should be some exploring into refactoring the code to better leverage what's already been asserted to eliminate the code paths that cannot actually even happen e.g. there are functions where we convert to a There's a few places already where we could just ignore the error as we know for sure it can only ever be a number, but I'm guessing people's preference is to just always check-and-return the error (let me know if I'm wrong, as that would let me revert some logic changes for the Alpine implementation in particular) I'm probably still going to remove |
// returning an ErrUnsupportedEcosystem error if the ecosystem is not supported. | ||
func Parse(str string, ecosystem string) (Version, error) { | ||
//nolint:exhaustive // Using strings to specify ecosystem instead of lockfile types | ||
switch ecosystem { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the ecosystems we support in osv.dev (https://github.com/google/osv.dev/blob/master/osv/ecosystems/_ecosystems.py) here? The missing ones are all pretty simple to add I believe, either using an existing sorting method, or literally just an int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exploiting my IDE, it looks like these are the ones not in our switch
statement:
case "Bitnami":
SemverEcosystem()
case "SwiftURL":
SemverEcosystem()
// # Non SemVer-based ecosystems
case "Bioconductor":
Bioconductor()
case "Chainguard":
Chainguard()
case "GHC":
GHC()
case "Hackage":
Hackage()
case "Wolfi":
Wolfi()
// # Ecosystems which require a release version for enumeration, which is
// # handled separately in get().
// # Ecosystems missing implementations:
case "Android":
OrderingUnsupportedEcosystem()
case "ConanCenter":
OrderingUnsupportedEcosystem()
case "GitHub Actions":
OrderingUnsupportedEcosystem()
case "Linux":
OrderingUnsupportedEcosystem()
case "OSS-Fuzz":
OrderingUnsupportedEcosystem()
case "Photon OS":
OrderingUnsupportedEcosystem()
of those, the semver ones can just be dropped in, and we can't support the last 6, so that leaves 5 that would need custom implementations (though I wouldn't be surprised if Bioconductor is just CRAN).
I'm happy to look into those, but think they probably shouldn't be a blocker to landing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not blockers and can be done in a followup, the hardest part will just be the tests I think.
Bioconductor -> Semver
Chainguard -> Alpine
GHC -> Semver
Hackage -> SemverLike (just numbers and . separators)
Wolfi -> Alpine
This brings across the
semantic
package fromosv-scanner
, which provides support for parsing and comparing versions across different ecosystems per their unique specifications (or lack thereof).Main actions / discussion points:
cachedregexp
(though worth noting thatsemantic
includes some chonky regexps which do take a second or two to compile)semantic
weekly workflowResolves #257