Simplistic parser to convert Markdown metadata file for a given M_README.md with identifiers similar to DublinCore terms to a simplistic JSON file for further processing. The JSON file is inspired by the ZENODO.json schema, see also ZENODO developers guide. Metadata information (data on data) are crucial to find and understand your data in your project tree and these JSON files can be used for further data processing, e.g. to create a database catalog for your files or to provide additional metadata in public repository. Feel free to adapt it to your needs.
Follow these instructions to run the application ParsingMetadataMD2JSON
.
Requirements for the software:
- clone the repository
git clone https://github.com/Bondoki/ParsingMetadataMD2JSON
- run the application with sample file
M_Dataset_README_Example.md
python3 ParsingMetadataMD2JSON.py M_PhD_README_Example.md
- this should generate a new file
M_Dataset_README_Example.json
and promted with success:SUCCESS: M_Dataset_README_Example.md parsed to M_Dataset_README_Example.json
- alternatively, run and use the Jupyter notebook
ParsingMetadataMD2JSON.jpynb
withjupyter-lab ParsingMetadataMD2JSON.ipynb
The following keywords will be parsed and converted:
Keyword | Description |
---|---|
Title | Descriptive name the Paper/Project/Thesis/Dataset |
Creator | A consecutive list of names, who created the resource and is primarily responsible. |
Creator.ORCID | Additional information: The ORCID identifier of the Creator. |
Creator.Email | Additional information: The email identifier of the Creator. |
Publisher | The department/institute responsible for making the resource available. |
Contributor | A consecutive list of names, contributed to the resource and is secondary to Creators. |
Contributor.ORCID | Additional information: The ORCID identifier of the Contributor. |
Contributor.Email | Additional information: The email identifier of the Contributor. |
Description | A textual description of the content of the resource. |
Subject | Phrase\Keywords describing the content of the resource. |
Date | A date associated with the creation or availability of the resource. Recommended format: YYYY-MM-DD. |
Language | The language of the resource recommended as BCP 47 language tag. |
Format | The data format to identify the software and possibly hardware that might be needed to display or operate the resource. For a list of MIME types see here. |
Type | The category of the resource e.g. Collection, Dataset, Event, Image, Experiment, Simulation, Report, Text, Draft, Image. See also DCMI Type Vocabulary. |
Coverage | Temporal coverage is typically a period for acquiring the data. |
Source | Information about a second resource from which the present resource is derived - if applicable. |
Relation | Provide a relationship from source to the present resource, e.g. IsVersionOf, IsReplacedBy, IsPartOf, IsReferencedBy, see Qualified Dublin Core Terms. |
Identifier | An unique identifier of the resource, e.g. DOI, ISBN, Number |
Method | Refer to your (post-)processing tools/methods, e.g. URL or git hash, as relation. |
Rights | A rights management statement of the resource, e.g. license for publishing and sharing. |
This project is licensed under the Unlicense.