Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions/issues regarding CSH scores #15

Open
8 tasks
vera opened this issue Feb 7, 2024 · 5 comments
Open
8 tasks

Questions/issues regarding CSH scores #15

vera opened this issue Feb 7, 2024 · 5 comments

Comments

@vera
Copy link

vera commented Feb 7, 2024

F

  • Why is "CSH-RDA-F3-01M: Metadata includes the identifier for the data" always failed? Shouldn't it be successful if the metadata contains a resource_identifier?

A

  • "CSH-RDA-A1-01M: Metadata contains information to enable the user to get access to the data" checks the wrong path ["resource","study_design","study_data_sharing_plan","study_data_sharing_plan_description"] instead of ["resource","study_design","study_data_sharing_plan","study_data_sharing_plan_generally"]
  • "CSH-RDA-A2-01M: Metadata is guaranteed to remain available after data is no longer available": as I mentioned during our call, the software used in our backend (Dataverse) provides this one. You may be interested in a reference document created by a Dataverse community member: "Dataverse and the FAIR principles" https://docs.google.com/document/d/176B36Ja947_JTquWY9gW9wPjCF3Mvb5okuS88bd3TeY/edit?usp=sharing
    For A2 it says: "Deaccession of a dataset published in Dataverse leaves a tombstone landing page with basic citation metadata that remains publically accessible."
    See also: https://guides.dataverse.org/en/latest/user/dataset-management.html#dataset-deaccession

I

  • "CSH-RDA-I3-01M: Metadata includes references to other metadata" checks the wrong path ["resource", "ids", "typeGeneral"] instead of ["resource", "ids", "type_general"] (I assume the code base is meant to be based on MDS 3.0? for MDS 3.3 this would be correct of course)
  • same for "CSH-RDA-I3-02M"

Some thoughts regarding "CSH-RDA-I3-01M", "CSH-RDA-I3-02M", "CSH-RDA-I3-03M" and "CSH-RDA-I3-04M" ((qualified) references to (meta)data):

I think the current check is too strict. You are checking whether "ids" contains entries that are "Datasets" (or not).

The MDS can contain references in the following fields:

  1. contributors (with mandatory type, e.g. "Contact", "Creator/Author")
    -> = "other referenced metadata"

  2. ids
    -> always "qualified" because "relationType" (e.g. "A continues B") is mandatory
    -> unsure how to differentiate "data" and "metadata", maybe using "typeGeneral" as you are already doing, but I don't think everything but "Dataset" is metadata. E.g. is a "Jounal article" metadata?

  3. idsNfdi4health
    -> "qualified" if "relationType" is given
    -> also unsure how to differentiate "data" and "metadata" here. in general, NFDI4Health resources are metadata, but they may have data attached. If you use the API to request the NFDI4Health resource, the "link" field will tell you whether data is attached. Could you use this?

R

  • "CSH-RDA-R1.1-01M: Metadata includes information about the licence under which the data can be reused" checks the wrong path ["resource", "nonStudyDetails", "useRights"] instead of ["resource", "non_study_details", "resource_use_rights"] (again assuming we are based on MDS 3.0, this check is using a weird mixture of MDS 3.0 and 3.3 paths)
  • "CSH-RDA-R1.1-02M: Metadata refers to a standard reuse licence" checks the wrong path ["resource", "nonStudyDetails", "useRights"] instead of ["resource", "non_study_details", "resource_use_rights", "resource_use_rights_label"] (again assuming we are based on MDS 3.0)
  • "RDA-R1.2-01M: Metadata includes provenance information according to community-specific standards" and "RDA-R1.2-02M: Metadata includes provenance information according to a cross-community language": currently this is always failed, could this be marked as success if the "provenance" block in the MDS is filled?
@vera
Copy link
Author

vera commented Feb 8, 2024

Btw, since the Study Hub is already using MDS 3.3, you could also update to MDS 3.3.

@AtinkutZeleke
Copy link

F

* [ ]  Why is "CSH-RDA-F3-01M: Metadata includes the identifier for the data" always failed? Shouldn't it be successful if the metadata contains a `resource_identifier`?

That is an important : This indicator deals with the inclusion of the reference (i.e. the identifier) of the the resource in the metadata so that the the resource can be accessed. Can we use `resource_identifier' for both metadata and resource identifier? Which one is for the resource and which one is for the metadata of that specific resource?

A

* [ ]  "CSH-RDA-A1-01M: Metadata contains information to enable the user to get access to the data" checks the wrong path `["resource","study_design","study_data_sharing_plan","study_data_sharing_plan_description"]` instead of `["resource","study_design","study_data_sharing_plan","study_data_sharing_plan_generally"]`

What about the following additional lists? Can you also confirm that?

  • Resource.resourceDesign.dataSharingPlan.accessCriteria
  • Resource.resourceDesign.dataSharingPlan.description
  • Resource.resourceDesign.dataSharingPlan.datashield
  • Resource.resourceDesign.dataSharingPlan.url
* [ ]  "CSH-RDA-A2-01M: Metadata is guaranteed to remain available after data is no longer available": as I mentioned during our call, the software used in our backend (Dataverse) provides this one. You may be interested in a reference document created by a Dataverse community member: "Dataverse and the FAIR principles" https://docs.google.com/document/d/176B36Ja947_JTquWY9gW9wPjCF3Mvb5okuS88bd3TeY/edit?usp=sharing
  For A2 it says: "Deaccession of a dataset published in Dataverse leaves a tombstone landing page with basic citation metadata that remains publically accessible."
  See also: https://guides.dataverse.org/en/latest/user/dataset-management.html#dataset-deaccession

I

* [ ]  "CSH-RDA-I3-01M: Metadata includes references to other metadata" checks the wrong path `["resource", "ids", "typeGeneral"]` instead of `["resource", "ids", "type_general"]` (I assume the code base is meant to be based on MDS 3.0? for MDS 3.3 this would be correct of course)

Can we use the following as well?

  • Resource.idsNfdi4health.relationType
  • Resource.ids.relationType
* [ ]  same for "CSH-RDA-I3-02M"

Some thoughts regarding "CSH-RDA-I3-01M", "CSH-RDA-I3-02M", "CSH-RDA-I3-03M" and "CSH-RDA-I3-04M" ((qualified) references to (meta)data):

I think the current check is too strict. You are checking whether "ids" contains entries that are "Datasets" (or not).

The MDS can contain references in the following fields:

1. `contributors` (with mandatory type, e.g. "Contact", "Creator/Author")
   -> = "other referenced metadata"

2. `ids`
   -> always "qualified" because "relationType" (e.g. "A continues B") is mandatory
   -> unsure how to differentiate "data" and "metadata", maybe using "typeGeneral" as you are already doing, but I don't think everything but "Dataset" is metadata. E.g. is a "Jounal article" metadata?

3. `idsNfdi4health`
   -> "qualified" if "relationType" is given
   -> also unsure how to differentiate "data" and "metadata" here. in general, NFDI4Health resources are metadata, but they may have data attached. If you use the API to request the NFDI4Health resource, the "link" field will tell you whether data is attached. Could you use this?

You are right! That is the most We really need an agreement or a contextual understanding

R

* [ ]  "CSH-RDA-R1.1-01M: Metadata includes information about the licence under which the data can be reused" checks the wrong path `["resource", "nonStudyDetails", "useRights"]` instead of `["resource", "non_study_details", "resource_use_rights"]` (again assuming we are based on MDS 3.0, this check is using a weird mixture of MDS 3.0 and 3.3 paths)
* [ ]  "CSH-RDA-R1.1-02M: Metadata refers to a standard reuse licence" checks the wrong path `["resource", "nonStudyDetails", "useRights"]` instead of `["resource", "non_study_details", "resource_use_rights", "resource_use_rights_label"]` (again assuming we are based on MDS 3.0)

We have a block of items that can be used for licence related indicators (CSH-RDA-R1.1-01M, CSH-RDA-R1.1-02M, and CSH-RDA-R1.1-03M, some of them are not mandatory, we can catagorize them where they can apply. what do you think?

  • Resource.nonStudyDetails.useRights.label
  • Resource.nonStudyDetails.useRights.description
  • Resource.nonStudyDetails.useRights.confirmations.terms
  • Resource.nonStudyDetails.useRights.confirmations.supportByLicensing
  • Resource.nonStudyDetails.useRights.confirmations.irrevocability
  • Resource.nonStudyDetails.useRights.confirmations.authority
  • Resource.nonStudyDetails.useRights.confirmations
* [ ]  "RDA-R1.2-01M: Metadata includes provenance information according to community-specific standards" and "RDA-R1.2-02M: Metadata includes provenance information according to a cross-community language": currently this is always failed, could this be marked as success if the "provenance" block in the MDS is filled?

If we assume that the one or more of the followings provenance related items are according to the NFDI4Health community-specific standards and a cross-community language then we can. what do you think the practice so far?

  • Resource.provenance.dataSource
  • Resource.provenance.verificationDate
  • Resource.provenance.verificationUser
  • Resource.provenance.firstSubmittedDate
  • Resource.provenance.firstSubmittedUser
  • Resource.provenance.firstPostedDate
  • Resource.provenance.firstPostedUser
  • Resource.provenance.lastUpdateSubmittedDate
  • Resource.provenance.lastUpdateSubmittedUser
  • Resource.provenance.lastUpdatePostedDate
  • Resource.provenance.lastUpdatePostedUser
  • Resource.provenance.resourceVersion

@AtinkutZeleke
Copy link

I will come with more specific questions

@vera
Copy link
Author

vera commented Feb 13, 2024

I am skipping questions we already discussed in the call today.

That is an important : This indicator deals with the inclusion of the reference (i.e. the identifier) of the the resource in the metadata so that the the resource can be accessed. Can we use `resource_identifier' for both metadata and resource identifier? Which one is for the resource and which one is for the metadata of that specific resource?

Not sure. Do all metadata entries in the Study Hub have data attached? E.g., does a "Study" metadata entry have associated data?

As I mentioned during the call, the Study Hub allows attaching data to metadata entries (if type != "Study", "Substudy", "Registry", "Secondary data source"):

image

This is outside of the MDS. The MDS describes the metadata only.

If data is attached, it will be returned by the API like this

{
  "link": {
    "external": false,
    "url": "/api/resource/622/data"
  },
  "resource": {...},
  "versions": [...]
}

(https://csh.nfdi4health.de/api/resource/113)

Can we use the following as well?
Resource.idsNfdi4health.relationType
Resource.ids.relationType

What do you want to use it for?

@vera
Copy link
Author

vera commented Feb 22, 2024

@AtinkutZeleke Regarding the provenance metric: The current provenance block contains mostly timestamps and usernames. In the deliverable, you listed four fields required to fulfill the provenance metric:

  • provenance.verificationDate
  • provenance.dataSource
  • provenance.firstSubmittedDate
  • provenance.lastUpdateSubmittedDate

I am unsure if this mathces the description below. What do you think?

For others to reuse your data, they should know where the data came from (i.e., clear story of origin/history, see R1), who to cite and/or how you wish to be acknowledged. Include a description of the workflow that led to your data: Who generated or collected it? How has it been processed? Has it been published before? Does it contain data from someone else that you may have transformed or completed? Ideally, this workflow is described in a machine-readable format.

(https://www.go-fair.org/fair-principles/r1-2-metadata-associated-detailed-provenance/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants