Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reference and retrieve another RO-Crate #296

Merged
merged 19 commits into from
Aug 29, 2024
Merged

Conversation

stain
Copy link
Contributor

@stain stain commented Apr 25, 2024

This PR fixes #228 #160

Generalizes the Content-negotiate-or-signposting section from not just Profile Crates.

For ZIP files this is still vague in that it says If the retrieved resource is a ZIP file (Content-Type: application/zip), then extract ro-crate-metadata.json, or, if the archive root only contains a single folder (e.g. folder1/), extract folder1/ro-crate-metadata.json
I've also added BagIt reference as this would be a second folder, e.g. folder1/data/ro-crate-metadata.json and then the checksums should be verified first as we do in https://trefx.uk/5s-crate/0.4/#check-phase

As for referencing another RO-rate from another, either the referenced RO-Crate can have its own distribution with a conformsTo:

  {
    "@id": "./",
    "@type": "Dataset",
    "identifier": "https://doi.org/10.48546/workflowhub.workflow.775.1",
    "url": "https://workflowhub.eu/workflows/775/ro_crate?version=1",
    "name": "Research Object Crate for Jupyter Notebook Molecular Structure Checking",
    "distribution": {"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1"},
    "…": ""
  },
  {
    "@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1",
    "@type": "DataDownload",
    "encodingFormat": ["application/zip", {"@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"}],
    "conformsTo": { "@id": "https://w3id.org/ro/crate" }
  }

or it can have a subjectOf to a ro-crate-metadata.json:

{
  "@id": "http://example.com/another-crate/",
  "@type": "Dataset",
  "conformsTo": { "@id": "https://w3id.org/ro/crate" },
  "subjectOf": { "@id": "http://example.com/another-crate/ro-crate-metadata.json" }
},
{
  "@id": "http://example.com/another-crate/ro-crate-metadata.json",
  "@type": "CreativeWork",
  "encodingFormat": "application/ld+json"
}

As used by the 5s-crate profile: https://trefx.uk/5s-crate/0.4/#referencing-a-workflow-crate

@stain
Copy link
Contributor Author

stain commented May 20, 2024

Could @dgarijo or @ptsefton have a look at this? I've used it here:
https://stain.github.io/workflow-run-crate/profiles/0.5-DRAFT/process_run_crate/ro-crate-preview.html#https%3A//www.researchobject.org/workflow-run-crate-paper/mapping/

Perhaps we should add that isPartOf pattern as well on how to mention a file within another crate? (Could get tricky to make absolute URIs..)

@dgarijo
Copy link
Contributor

dgarijo commented May 20, 2024 via email

@dgarijo
Copy link
Contributor

dgarijo commented May 27, 2024

Thanks @stain , I have had a look. The only thing that it is not fully clear to me is where the distribution information is supposed to be added: is it on the crate referencing the other crate, or in the referenced crate metadata?

For example, let's say crate A references crate B. Usually I would add a link in A to B. But here you recommend adding also where B is stored, correct? As opposed to adding a link to B, and hoping that when I resolve the id I get a JSON-LD with the distribution information.

The only potential issue I see is that distributions may not have persistent ids. If the link from A to B persists, but the distribution is hosted elsewhere in the meantime, B has no means to tell A about this. But I am ok with this limitation

Copy link
Contributor

@dgarijo dgarijo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small clarification

@stain stain requested a review from dgarijo July 11, 2024 09:17
@stain
Copy link
Contributor Author

stain commented Jul 11, 2024

Think this is good to go in now, waiting final approval from @dgarijo or @jmfernandez

@stain stain requested a review from jmfernandez July 18, 2024 08:00
@stain
Copy link
Contributor Author

stain commented Jul 18, 2024

I can't seem to resolve the requested changes from @dgarijo although I have addressed them in earlier commits.

@dgarijo
Copy link
Contributor

dgarijo commented Jul 18, 2024

Sorry @stain I am not available right now for review. Will do as soon as I can.

@stain
Copy link
Contributor Author

stain commented Aug 22, 2024

After meeting 2024-08-22, revised to not require subjectOf declaration if the RO-Crate metadata file can be resolved. This meant adding a section on how the absolute URI of the RO-Crate can be determined if there is no identifier

I now feel this should be split out into a page separate from data-entities.md

@dgarijo
Copy link
Contributor

dgarijo commented Aug 23, 2024

Once again, apologies. Will review by the end of the week.


##### Referencing RO-Crates that have a persistent identifier

If the referenced RO-Crate B has a `identifier` declared as B's [Root Data Entity identifier](root-data-entity#root-data-entity-identifier), then this is a _persistent identifier_ which SHOULD be used as the URI in the `@id` of the corresponding entity in RO-Crate A. For instance, if crate B had declared the identifier `https://pid.example.com/another-crate/` then crate A can reference B as an entity:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a identifier -> an identifier

If an `identifier` is not declared in a referenced RO-Crate B, but the determined absolute URI has [Signposting] declared for a `Link:` with `rel=cite-as`, then that link MAY be considered as an equivalent permalink for B.


##### Determening entity identifier for a referenced RO-Crate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Determening -> Determining


##### Referencing another metadata document

If a referenced RO-Crate Metadata Document is known at a given URI or path, but its corresponding RO-Crate identifier can't be determined as above (e.g. [Retrieving an RO-Crate](#retrieving-an-ro-crate) fails or requires heuristics), then an referenced metadata descriptor entity SHOULD be added. For instance, if `http://example.com/another-crate/ro-crate-metadata.json` resolves to an RO-Crate Metadata Document describing root `./`, but `http://example.com/another-crate/` always return a HTML page without [Signposting] to the metadata document, then `subjectOf` SHOULD be added to an explicit metadata descriptor entity, which has `encodingFormat` declared for JSON-LD:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then an referenced metadata descriptor -> then a referenced metadata descriptor

Copy link
Contributor

@dgarijo dgarijo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor editorial changes. It looks very nice. The bit on content negotiation helps the issue I opened about retrieving contents, since you may add a distribution where to retrieve the zipped contents now (issue #228)

"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1",
"@type": "DataDownload",
"encodingFormat": ["application/zip", {"@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"}],
"conformsTo": { "@id": "https://w3id.org/ro/crate" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is needed to be correct describing downloadable RO-Crate. But, in this scenario, if the graph does not contain an ro-crate-metadata.json (or similar) node, or contains it without a proper conformsTo, shall we assume that metadata should appear at the RO-Crate fetchable from the downloadable content?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cheeky triple-use of our conformsTo as we define what the profile mean on different files.. so here the implication is that if there is https://w3id.org/ro/crate as profile of a DataDownload then yes, there should be a ro-crate-metadata.json inside it's archive, which presumably would have its own (versioned) conformsTo declared. Or if it's a Dataset with the same conformsTo then the metadata file is found differently.

We also use the same profile as part of the JSON-LD content type https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/appendix/jsonld.html#ro-crate-json-ld-media-type

You could argue this should be several different profile URIs.. I think it's a bit overloading, but if you squeeze your eyes a bit it's all saying "an RO-Crate" just in different ways in.

But we won't try to list the content of the zip-file in the referring crate, unless we need to (which would bring in arcp etc), so that's a nice simplification that we moved to having the references on the Dataset rather than requiring mention of the second RO-Crate Metadata Document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://signposting.org/adopters/#workflowhub Signposting also uses the same profile for the ZIP file. But that should also be valid to mean "There will be an ro-crate-metadata.json inside, as stated in top of section "#### Retrieving an RO-Crate"

Copy link
Contributor

@jmfernandez jmfernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although heuristics are properly described, I'm realizing we are going to need in some moment code snippet examples in a couple of programming languages to demonstrate how to properly fetch an RO-Crate with negotiation.

@stain
Copy link
Contributor Author

stain commented Aug 29, 2024

Thanks everyone, will merge now after fixing latest round of typos! Then we see if we need to move it out.

@stain stain merged commit d92d9f1 into main Aug 29, 2024
1 check passed
@stain stain deleted the referencing-crates branch August 29, 2024 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Use Case: How to get the contents of a RO as a zip file?
3 participants