-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Findings on Discrepancy Assessments within the SBOM Ecosystem. #905
Comments
We make some changes on this issue to specify the details and code. |
Thanks for the analysis @dw763j - very interesting. This would be great analysis to have at our next SPDX DocFest since some of the analysis and suggestions seem more relevant to the specific tooling used to generate the SBOMs and not the SPDX spec itself. I have also tried to respond to some of your suggestions below:
This seems to me like a general ecosystem naming problem. Having hard naming requirements in SPDX would make SPDX too rigid and hard to adapt. Encouraging tools to try to use the same naming conventions would definitely be helpful, though. If you are ever able to attend the SPDX implementers call every other Wednesday morning, this would be a good topic to discuss there.
The package version field can be omitted if the tool finds nothing. It is not required to use NONE or NOASSERTION. As you mentioned, there are a variety of reasons the version might be empty (i.e. how the package was built, package manager used, tool used, etc). If you known and want to indicate the specific reason, I suggest using the package comment field to explain the empty/omitted field. If the field is empty or omitted, no inference can be made to the reason, which is likely unknown by the tool generating the SBOM. Regardless of the reason the field is empty/omitted, the effect is the same for anybody consuming an SBOM in that it is an empty/unknown value.
In SPDX 3.0, for any Element you may have, there's an optional verifiedUsing property which you can use to provide an IntegrityMethod with which the integrity of an Element can be asserted. In this field you can provide a Hash and specify the HashAlgorithm used. SPDX can't force tools to provide this information, but we encourage tools to do so by having fields in the spec for them to communicate this information. Thanks again for your detailed analysis. I'd love to continue discussion in the Implementers call or on this issue. ccing @goneall to see if he has other comments on the matter. |
Thanks for your reply @rnjudge, we are interested in improving the applications of SBOMs. Let's discuss your reply in detail.
Agreed. While more mandatory requirements could lead to acceptance issues, especially for tools, we can instead provide "suggested" use cases for tools and users to follow. Our analysis indicates that SBOMs can fully support the mandatory requirements from the standard, but only partially support certain use cases. Tools often have discrepancies in these partially supported use cases due to the lack of official, detailed suggestions for these scenarios. Providing "suggested" use cases can guide tool implementation and potentially enhance the performance of SBOMs in real-world applications.
I'm interested in participating in the next call 😊.
That's correct. In terms of
As you mentioned, this is an excellent method to avoid discrepancies in fields like hash. However, developers may not be aware of this feature. Directly suggesting improvements based on use cases at the hash field could be beneficial. SPDX 3.0 has provided profiles such as Software, Security, Licensing, and more. Making specific use case suggestions could help tools focus their development efforts.
Thank you for your reply and participation in the discussion. We hope our findings can be beneficial to the community. We are considering making a pull request to clarify our suggestions. |
Having run across SBOMs with a wide range of quality in my day job, I'm quite interested in any efforts to reduce the inconsistencies. I really appreciate the analysis done - this will really help put data behind some of the solution discussions. I have several thoughts (too many to list here), but one thing I'd like to offer for consideration is the creation of specific profiles that change the mandatory field requirements if the produce claims to support profile. For example, requiring checksums on artifacts. This would make it easy for producers to set expectations on SBOM quality for the consumers and transform utilities. I look forward to our next discussion / docfest where we can discuss real-time. Just the package version alone is worth some time discussing. I've found wild inconsistencies in SBOMs - some are due to tooling omissions and some are due to the information just not being available. I'm thinking that taking into account the SBOM type and package primary could be used to identify if the version information "should" be available. Anyway - perhaps a better real-time discussion. |
I see the arrangement from tech-team-meetings, yet how about the exact date🧐? @goneall @rnjudge
|
@dw763j - This week - May 1. |
BTW - I'll be 30 minutes late to this weeks call |
Assessments results on discrepancy of SBOM ecosystem and some suggestions
Background
As SBOM can be widely used in software software chain management, the capability and issues within SBOM ecosystem can influence the employment of users, thus accurately assessments of the current SBOM state is important. To this end, we have conducted a series of assessments on key characteristics in SBOM applications to reveal the potential discrepancies hindering usage.
Questions
We asked 3 questions:
1. Compliance: Do SBOM tools generate outputs that adhere to user requirements and standards?
2. Consistency: Do SBOM tools maintain consistency in transforming the produced SBOM?
3. Accuracy: How accurate are the SBOM produced by tools in reflecting the objective software?
Upon 9970 SBOM documents generated from 6 SBOM tools (sbom-tool, ort, syft, gh-sbom, cdxgen and scancode) in both SPDX and CycloneDX on 1162 GitHub repositories, we assess these questions. To evaluate accuracy, 100 repositories are annotated for benchmark, comprising 660 components and 4,000 data fields.
Results
This table shows average results across all the 6 tools, results are all in package level. Note that in the results for information of software itself is quite poor, for instance, we have 89.59% repositories contain licenses while only a minority are identified.
The findings indicate that while SBOM tools 100% support mandatory standards requirements (including Doc.: specVersion, License, Namespace, Creator; Comp.: Name, Identifier, downloadLocation, verificationCode), their performance in user case support is at 49.37% and the consistency within these supported use cases is on average of 17.63% (as the table shows). Accuracy assessments reveal significant discrepancies, with accuracy rates of 8.62%, 25.81%, and 12.3% as in software metadata, identified denpendent components, and detailed component information, underscoring substantial areas for improvement within the SBOM ecosystem.
Suggestions
name
with their information sources like pip, maven, npm, etc., while others do not. Inversion
tools varing in recording like whether add a 'V' before theversion
string this will lead to problems in utilizing SBOM from different SBOM tools. We suggest to require tools to specify their pattern in recoring information without the standard's explicit specification.NOASSERTION
,NONE
andNone
could be confusing in specific data fields. For instance,version
can naturally be empty in packages as the developers didn't record them in the software, tools deal empty ones into empty string or the three forms, which could lead to inconsistency for further exchange. We suggest to provide specific marks for these natually empty data fields.hashes
even does not specify the object the hash is performed on. We suggest to demand tools in creating checksums explicitly illustrate their process for creating the checksums, e.g. salt value or other preprocessing.We hope our findings can help promote the SBOM ecosystem, any questions or discussions are welcomed.
Fast check on code
We provide a fast check code at here based on part of our dataset.
Examples:
For checksum, here are examples that the file, hash algorithm are both matched, yet they still didn's get the same checksums:
The text was updated successfully, but these errors were encountered: