-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reference and retrieve another RO-Crate #296
Conversation
Could @dgarijo or @ptsefton have a look at this? I've used it here: Perhaps we should add that |
Will do when I finish the ISWC reviews that are due tomorrow :(((
El lun., 20 may. 2024 9:14 p. m., Stian Soiland-Reyes <
***@***.***> escribió:
… Could @dgarijo <https://github.com/dgarijo> or @ptsefton
<https://github.com/ptsefton> have a look at this? I've used it here:
https://stain.github.io/workflow-run-crate/profiles/0.5-DRAFT/process_run_crate/ro-crate-preview.html#https%3A//www.researchobject.org/workflow-run-crate-paper/mapping/
Perhaps we should add that isPartOf pattern as well on how to mention a
file within another crate? (Could get tricky to make absolute URIs..)
—
Reply to this email directly, view it on GitHub
<#296 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALTIGT5ZN6HYRP3WSEQEI3ZDJDQLAVCNFSM6AAAAABGZZMP2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGA2DMNZXGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks @stain , I have had a look. The only thing that it is not fully clear to me is where the distribution information is supposed to be added: is it on the crate referencing the other crate, or in the referenced crate metadata? For example, let's say crate A references crate B. Usually I would add a link in A to B. But here you recommend adding also where B is stored, correct? As opposed to adding a link to B, and hoping that when I resolve the id I get a JSON-LD with the distribution information. The only potential issue I see is that distributions may not have persistent ids. If the link from A to B persists, but the distribution is hosted elsewhere in the meantime, B has no means to tell A about this. But I am ok with this limitation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small clarification
Think this is good to go in now, waiting final approval from @dgarijo or @jmfernandez |
I can't seem to resolve the requested changes from @dgarijo although I have addressed them in earlier commits. |
Sorry @stain I am not available right now for review. Will do as soon as I can. |
After meeting 2024-08-22, revised to not require I now feel this should be split out into a page separate from data-entities.md |
Once again, apologies. Will review by the end of the week. |
|
||
##### Referencing RO-Crates that have a persistent identifier | ||
|
||
If the referenced RO-Crate B has a `identifier` declared as B's [Root Data Entity identifier](root-data-entity#root-data-entity-identifier), then this is a _persistent identifier_ which SHOULD be used as the URI in the `@id` of the corresponding entity in RO-Crate A. For instance, if crate B had declared the identifier `https://pid.example.com/another-crate/` then crate A can reference B as an entity: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a identifier
-> an identifier
If an `identifier` is not declared in a referenced RO-Crate B, but the determined absolute URI has [Signposting] declared for a `Link:` with `rel=cite-as`, then that link MAY be considered as an equivalent permalink for B. | ||
|
||
|
||
##### Determening entity identifier for a referenced RO-Crate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Determening -> Determining
|
||
##### Referencing another metadata document | ||
|
||
If a referenced RO-Crate Metadata Document is known at a given URI or path, but its corresponding RO-Crate identifier can't be determined as above (e.g. [Retrieving an RO-Crate](#retrieving-an-ro-crate) fails or requires heuristics), then an referenced metadata descriptor entity SHOULD be added. For instance, if `http://example.com/another-crate/ro-crate-metadata.json` resolves to an RO-Crate Metadata Document describing root `./`, but `http://example.com/another-crate/` always return a HTML page without [Signposting] to the metadata document, then `subjectOf` SHOULD be added to an explicit metadata descriptor entity, which has `encodingFormat` declared for JSON-LD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then an referenced metadata descriptor -> then a referenced metadata descriptor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor editorial changes. It looks very nice. The bit on content negotiation helps the issue I opened about retrieving contents, since you may add a distribution where to retrieve the zipped contents now (issue #228)
"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1", | ||
"@type": "DataDownload", | ||
"encodingFormat": ["application/zip", {"@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"}], | ||
"conformsTo": { "@id": "https://w3id.org/ro/crate" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is needed to be correct describing downloadable RO-Crate. But, in this scenario, if the graph does not contain an ro-crate-metadata.json
(or similar) node, or contains it without a proper conformsTo
, shall we assume that metadata should appear at the RO-Crate fetchable from the downloadable content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cheeky triple-use of our conformsTo
as we define what the profile mean on different files.. so here the implication is that if there is https://w3id.org/ro/crate
as profile of a DataDownload
then yes, there should be a ro-crate-metadata.json
inside it's archive, which presumably would have its own (versioned) conformsTo
declared. Or if it's a Dataset
with the same conformsTo
then the metadata file is found differently.
We also use the same profile as part of the JSON-LD content type https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/appendix/jsonld.html#ro-crate-json-ld-media-type
You could argue this should be several different profile URIs.. I think it's a bit overloading, but if you squeeze your eyes a bit it's all saying "an RO-Crate" just in different ways in.
But we won't try to list the content of the zip-file in the referring crate, unless we need to (which would bring in arcp
etc), so that's a nice simplification that we moved to having the references on the Dataset
rather than requiring mention of the second RO-Crate Metadata Document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://signposting.org/adopters/#workflowhub Signposting also uses the same profile for the ZIP file. But that should also be valid to mean "There will be an ro-crate-metadata.json
inside, as stated in top of section "#### Retrieving an RO-Crate"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although heuristics are properly described, I'm realizing we are going to need in some moment code snippet examples in a couple of programming languages to demonstrate how to properly fetch an RO-Crate with negotiation.
Thanks everyone, will merge now after fixing latest round of typos! Then we see if we need to move it out. |
This PR fixes #228 #160
Generalizes the Content-negotiate-or-signposting section from not just Profile Crates.
For ZIP files this is still vague in that it says If the retrieved resource is a ZIP file (
Content-Type: application/zip
), then extractro-crate-metadata.json
, or, if the archive root only contains a single folder (e.g.folder1/
), extractfolder1/ro-crate-metadata.json
I've also added BagIt reference as this would be a second folder, e.g.
folder1/data/ro-crate-metadata.json
and then the checksums should be verified first as we do in https://trefx.uk/5s-crate/0.4/#check-phaseAs for referencing another RO-rate from another, either the referenced RO-Crate can have its own
distribution
with aconformsTo
:or it can have a
subjectOf
to aro-crate-metadata.json
:As used by the 5s-crate profile: https://trefx.uk/5s-crate/0.4/#referencing-a-workflow-crate