Skip to content

Metadata and DOI registration at dara

Andy Daniel edited this page Jun 28, 2024 · 30 revisions

Introduction

This document describes how the MDM uses the da|ra1 API to publish its metadata and to register a DOI (Digital Object Identifier) for each of its data package or analysis package versions.

The VerbundFDB2 harvests the published metadata of data packages from da|ra in order to make it available in its data search. It has therefore introduced some additional restrictions to the metadata schema of da|ra. These restrictions do not apply to analysis packages.

This is an example of a registered data package at da|ra.

The following questions are answered in this document:

When does the MDM send metadata to da|ra? When does it register a DOI?

The MDM sends metadata of a data package or an analysis package to da|ra in case of the following events:

  1. Non-beta release of a new version of project (version >= 1.0.0):
    In this case the metadata of a data package or an anlysis package version is sent to da|ra for the first time (e. g. version 1.0.0 or 2.0.0). Therefore a new DOI (10.21249/DZHW:${projectId}:${version}) is registered at DataCite by da|ra. The release is triggered by clicking the release button in the project cockpit of the MDM.
  2. Re-release of an existing version of a project (version >= 1.0.0):
    In this case the metadata at da|ra is updated (overwritten). The release is triggered by clicking the release button in the project cockpit of the MDM using the same version number as before.
  3. Related publications of a data package or an analysis package have changed (via import of the Citavi database):
    In this case the metadata at da|ra is updated (overwritten) automatically.
  4. Project version (aka shadow copy) has been hidden in the MDM:
    In this case, the data package or analysis package is updated (overwritten) at da|ra meaning it is marked as "not available" at da|ra. The delivered metadata is still available at da|ra, but the version is no longer available for public users of the MDM at all.

Which metadata does the MDM send to da|ra?

In each of the above cases a complete XML document is sent to da|ra. Depending on the project configuration either a data package is sent to da|ra or an analysis package.

Data Package

The current template for the XML document for data packages can be found here.

The following table describes the mapping from the MDM domain model to da|ra's XML schema. It also shows which XML elements have further restrictions introduced by the VerbundFDB.3

da|ra sequence da|ra property MDM property MDM computed from ... da|ra restrictions (controlled vocabulary) VerbundFDB restrictions (controlled vocabulary)
1 resouceType --- constant: Dataset mandatory (Appendix 4.1.1) mandatory
2.1 resouceTypeFree.­typeName --- dataPackage.­surveys[].­dataType in en and de --- mandatory (cv)
3.1 resourceIdentifier --- dzhw:${dataAcquisitionProject.­masterId}:1.0.0 mandatory (since we might want to update metadata) ---
3.2 currentVersion dataAcquisitionProject.­release.­version --- --- ---
4.1.2 title.­titleName dataPackage.­title in en and de --- mandatory mandatory
7.1.1.1 creator.­person.­firstName dataPackage.­projectContributors[].­firstName --- mandatory mandatory
7.1.1.2 creator.­person.­middleName dataPackage.­projectContributors[].­middleName --- --- ---
7.1.1.3 creator.­person.­lastName dataPackage.­projectContributors[].­lastName --- mandatory mandatory
7.1.1.4.1.1 creator.­person.­personIDs.­personID.­identifierURI dataPackage.­projectContributors[].­orcid --- --- ---
7.1.1.4.1.2 creator.­person.­personIDs.­personID.­identifierSchema --- constant: ORCID --- ---
7.1.2.1 creator.­institution.­institutionName dataPackage.­institutions[] in de if available else en --- --- ---
8.1 dataUrl --- Prod URL of data package version mandatory mandatory
9 doiProposal --- DZHW:${dataAcquisitionProject.­masterId}:${dataAcquisitionProject.­release.­version} mandatory (since we might want to register a new DOI) mandatory
10.1 publicationDate.­date dataAcquisitionProject.­release.­firstDate --- mandatory mandatory
13.1 availability.­availabilityType --- constant: Delivery
if dataPackage is hidden: NotAvailable
mandatory (Appendix 4.1.3) mandatory
13.2 availability.­availabilityFree --- constant: "Beantragung notwendig unter..."/"Application necessary under ..." in de and en --- mandatory
16.1.1 alternativeID.­identifier --- constant: 1 --- mandatory
16.1.2 alternativeID.­type --- constant: VerbundFDB --- mandatory
16.1.1 alternativeID.­identifier --- constant: 2 --- mandatory
16.1.2 alternativeID.­type --- constant: QDN --- mandatory
19 freeKeywords --- dataPackage.­tags in de and en --- mandatory
20.1.2 description.­freetext dataPackage.­description in de and en --- --- mandatory
20.1.3 description.­type --- constant: Abstract Appendix 4.1.4 mandatory (constant: Abstract)
21.1.1 geographicCoverage.­geographicCoverageControlled --- deduplicated(dataPackage.­surveys[].­population.­geographicCoverages[]).­country --- mandatory if 21.1.2.1 empty (country code)
21.1.2.1 geographicCoverage.­geographicCoveragesFree.­geographicCoverageFree.­freeText --- deduplicated(dataPackage.­surveys[].­population.­geographicCoverages[]).­description in de and en --- mandatory if 21.1.1 empty
22.1.2 universe.­sampled --- html list of dataPackage.­surveys[].­population.­description in de and en, survey.­title is used as title of each list item --- mandatory
23.1.2 sampling.­method --- semicolon seperated list of dataPackage.­surveys[].­sample in de and en --- mandatory (cv)
24.1.1.1 temporalCoverage.­temporalCoverageFormal.­startDate dataPackage.­surveys[].­fieldPeriod.­start --- --- mandatory
24.1.1.2 temporalCoverage.­temporalCoverageFormal.­enDate dataPackage.­surveys[].­fieldPeriod.­end --- --- mandatory
24.1.2.1.2 temporalCoverage.­temporalCoverageFree.­freetext dataPackage.­surveys[].­title in de and en --- --- ---
25.1.1 timeDimension.­timeDimensionType --- dataPackage.­surveyDesign (mapped to "CrossSection" or "Longitudinal.Panel") Appendix 4.1.5 mandatory (cv)
26.1.1.1 contributor.­person.­firstName dataPackage.­dataCurators[].­firstName --- --- ---
26.1.1.2 contributor.­person.­firstName dataPackage.­dataCurators[].­middleName --- --- ---
26.1.1.3 contributor.­person.­firstName dataPackage.­dataCurators[].­lastName --- --- ---
26.1.1.4 contributor.­person.­contributorType --- constant: DataCurator Appendix 4.1.6 ---
26.1.1.5.1.1 contributor.­person.­personIDs.­personID.­identifierURI dataPackage.­dataCurators[].­orcid --- --- ---
26.1.1.5.1.2 contributor.­person.­personIDs.­personID.­identifierSchema --- constant: ORCID --- ---
26.1.2.1 contributor.­institution.­institutionName --- constant: FDZ-DZHW --- mandatory
26.1.2.2 contributor.­institution.­contributorType --- constant: Distributor Appendix 4.1.6 mandatory (Distributor)
27.1.2.1 fundingReference.­institution.­institutionName dataPackage.­sponsors[] in de if available else en --- --- ---
28.1.1 collectionMode.­collectionModeType --- one entry per dataPackage.­surveys[].­instruments[].­type mapped to cv Appendix 4.1.7 mandatory (cv)
28.1.2.1.2 collectionMode.­collectionModeFree.­freetext --- dataPackage.­surveys[].­title: dataPackage.­surveys[].­surveyMethod in de and en --- ---
29.1.2 dataSet.­unitType --- constant: Individual Appendix 4.1.8 ---
29.1.3 dataSet.­numberUnits dataPackage.­dataSets[].­subDataSets[].­numberOfObservations --- --- ---
29.1.4 dataSet.­numberVariables --- dataPackage.­dataSets[].­variables[].­length --- ---
29.1.6.1.1 dataSet.­files.­file.­name dataPackage.­dataSets[].­subDataSets[].­name --- --- ---
30.1.2 note.­text dataPackage.­annotations in de and en additionally in de: Erhebungseinheit dataPackage.­surveys[].­population.­unit (semicolon seperated) --- mandatory (cv)
31.1.1 relation.­identifier --- if available, DOI of previous version --- ---
31.1.2 relation.­identifierType --- constant: DOI --- ---
31.1.3 relation.­relationType --- constant: IsNewVersionOf Appendix 4.1.9 ---
32.1.2.1 publication.­unstructuredPublication.­freetext dataPackage.­publications[].­sourceReference --- --- ---

Analysis Package

The current template for the XML document for analysis packages can be found here.

The following table describes the mapping from the MDM domain model to da|ra's XML schema.

da|ra sequence da|ra property MDM property MDM computed from ... da|ra restrictions (controlled vocabulary)
1 resouceType --- constant: Collection mandatory (Appendix 4.1.1)
3.1 resourceIdentifier --- dzhw:${dataAcquisitionProject.­masterId}:1.0.0 mandatory (since we might want to update metadata)
3.2 currentVersion dataAcquisitionProject.­release.­version --- ---
4.1.2 title.­titleName analysisPackage.­title in en and de --- mandatory
7.1.1.1 creator.­person.­firstName analysisPackage.­authors[].­firstName --- mandatory
7.1.1.2 creator.­person.­middleName analysisPackage.­authors[].­middleName --- ---
7.1.1.3 creator.­person.­lastName analysisPackage.­authors[].­lastName --- mandatory
7.1.1.4.1.1 creator.­person.­personIDs.­personID.­identifierURI analysisPackage.­authors[].­orcid --- ---
7.1.1.4.1.2 creator.­person.­personIDs.­personID.­identifierSchema --- constant: ORCID ---
7.1.2.1 creator.­institution.­institutionName analysisPackage.­institutions[] in de if available else en --- ---
8.1 dataUrl --- Prod URL of analysis package version mandatory
9 doiProposal --- DZHW:${dataAcquisitionProject.­masterId}:${dataAcquisitionProject.­release.­version} mandatory (since we might want to register a new DOI)
10.1 publicationDate.­date dataAcquisitionProject.­release.­firstDate --- mandatory
13.1 availability.­availabilityType --- constant: Delivery
if analysisPackage is hidden: NotAvailable
mandatory (Appendix 4.1.3)
13.2 availability.­availabilityFree --- constant: "Download oder Beantragung notwendig unter..."/"Download or application necessary under ..." in de and en ---
19 freeKeywords --- analysisPackage.­tags in de and en ---
20.1.2 description.­freetext analysisPackage.­description in de and en --- ---
20.1.3 description.­type --- constant: Abstract Appendix 4.1.4
26.1.1.1 contributor.­person.­firstName analysisPackage.­dataCurators[].­firstName --- ---
26.1.1.2 contributor.­person.­firstName analysisPackage.­dataCurators[].­middleName --- ---
26.1.1.3 contributor.­person.­firstName analysisPackage.­dataCurators[].­lastName --- ---
26.1.1.4 contributor.­person.­contributorType --- constant: DataCurator Appendix 4.1.6
26.1.1.5.1.1 contributor.­person.­personIDs.­personID.­identifierURI analysisPackage.­dataCurators[].­orcid --- ---
26.1.1.5.1.2 contributor.­person.­personIDs.­personID.­identifierSchema --- constant: ORCID ---
26.1.2.1 contributor.­institution.­institutionName --- constant: FDZ-DZHW ---
26.1.2.2 contributor.­institution.­contributorType --- constant: Distributor Appendix 4.1.6
27.1.2.1 fundingReference.­institution.­institutionName analysisPackage.­sponsors[] in de if available else en --- ---
30.1.2 note.­text analysisPackage.­annotations in de and en --- ---
31.1.1 relation.­identifier --- if available, DOI of previous version ---
31.1.2 relation.­identifierType --- constant: DOI ---
31.1.3 relation.­relationType --- constant: IsNewVersionOf Appendix 4.1.9
32.1.2.1 publication.­unstructuredPublication.­freetext analysisPackage.­publications[].­sourceReference --- ---

Where can I find the current metadata schema of da|ra?

The metadata XML schema is best described here. A short description of the da|ra web service API can be found here.

Further documentation like examples and controlled vocabulary is available at da|ra.

Where can I find the specification of the additional restrictions introduced by the VerbundFDB?

The additional restrictions introduced by the VerbundFDB (aka Kernset) are described here.

What happens to the registered data package information at da|ra?

see https://github.com/dzhw/FDZ_Allgemein/wiki/Strukturierte-Metadaten#schnittstellen-und-standards-zu-unseren-metadaten


1 da|ra is a DOI registration agency for social and economic data in Germany. It is connected to DataCite which organizes the administration of prefixes and the connection to the International DOI Foundation (IDF).
2 The VerbundFDB is a research data infrastructure for empirical educational research collecting and sharing research data and information. The FDZ-DZHW is a network partner within the VerbundFDB.
3 If both da|ra restrictions and VerbundFDB restrictions have "---", this means we send these attributes voluntarily and/or the attributes are marked "optional" by the VerbundFDB Kernset

Clone this wiki locally