Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cores to match latest changes to Darwin Core #21

Closed
tucotuco opened this issue Nov 17, 2018 · 14 comments · Fixed by #85
Closed

Update cores to match latest changes to Darwin Core #21

tucotuco opened this issue Nov 17, 2018 · 14 comments · Fixed by #85

Comments

@tucotuco
Copy link
Collaborator

With the new release of Darwin Core, comments and examples have been separated from definitions. The definitions are stable and require community process to be changed. The examples and comments do not require community process. The extensions schema has no attribute for the comments, but it does have attributes for the definition and examples. Is it wise to have to maintain the examples in two places? Not so much, but if all this information is not also in an extension, it will be missed in practice, because many tools use the extensions and not (necessarily directly) the current standard. Given the disparities, there are at least two options: 1) leave examples and comments out of extensions altogether, forcing users to go to the sources using the links captured in the dc:relation attributes in the extension XML files (and rendered as "See also" in the GBIF extensions pages (such as https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Occurrence); 2) execute a process that updates all affected extensions each time there are changes in Darwin Core definitions, comments, and/or examples. The second option would require that a property attribute for comments be added to the extensions schema in order to capture all the relevant documentation that is managed with the standard.

@tucotuco
Copy link
Collaborator Author

If I can get a decision on this, I would be happy to make and submit the changes with a pull request.

@timrobertson100
Copy link
Member

timrobertson100 commented Nov 24, 2018 via email

@tucotuco
Copy link
Collaborator Author

I am not sure I understand what you mean in 1. The core extensions consist entirely of Darwin Core terms, no? Thus there is always a primary source for these.

An alternative for 2 would be a periodic update. Versioning as separate files is somewhat unfortunate, as it makes it somewhat harder to see the diffs than if only one file was maintained. I understand the reasoning for the distinctly named version files though. Perhaps comments in the commits could alert people if there was no functional change (that is, only commentaries and examples changed as opposed to terms or their definitions).

As for 3., wouldn't these restrictions necessarily be information on top of the standard rather than changes to the standard? Some restrictions are already captured by the schema, but perhaps the schema needs an element for a text-based further restrictions description.

I think 4. is a very good idea indeed. But what would be in the text? With our without the comments and examples?

@tucotuco tucotuco changed the title Update core extensions to match latest changes to Darwin Core Update cores to match latest changes to Darwin Core Dec 18, 2018
@timrobertson100
Copy link
Member

Thanks John. From our private chat, 1. is no longer an issue now we're clear we're talking about only the cores in http://rs.gbif.org/core/

If we go for 4 I think I'd favour simply copy the text verbatim in so any tool relying on those elements show it inline and don't end up with empty recommendations. I understand the fragility of this but it is slow changing and likely a quick task.

@tucotuco
Copy link
Collaborator Author

Just to be clear, are you recommending (using dc:type as an example)

a) definition only
"The nature or genre of the resource."

b) definition plus commentary
"The nature or genre of the resource. Must be populated with a value from the DCMI type vocabulary (http://dublincore.org/documents/2010/10/11/dcmi-type-vocabulary/)."

c) definition plus commentary plus examples
"The nature or genre of the resource. Must be populated with a value from the DCMI type vocabulary (http://dublincore.org/documents/2010/10/11/dcmi-type-vocabulary/). Examples: 'StillImage', 'MovingImage', 'Sound', 'PhysicalObject', 'Event', 'Text' "

or some variant of the above?

@tucotuco
Copy link
Collaborator Author

tucotuco commented Aug 17, 2021

I have created a script to generate GBIF Darwin Core extensions as part of preparing new XML extension definitions for testing.

First, the extension schema needs to be updated to allow the term comments to be separate from the term definitions. In making this change, it became clear that the dc.xsd referenced by extension.xsd was lacking a creator attribute. All of this has been committed in #71 and is referred to in issue #45.

Second, There are three terms that now have formal vocabularies (see also gbif/portal-feedback#3257). It looks like only one of these (establishmentMeans) is in production in the GBIF registry. The IPT can take advantage of a vocabulary that pointed to for a given term via the thesaurus attribute. As far as I understand it, these need to be XML files that are valid according to the schema at http://rs.gbif.org/schema/thesaurus.xsd. Right now there are no such XML vocabulary files in http://rs.gbif.org/vocabulary/ for the vocabularies for the terms degreeOfEstablishment and pathway. There is one for establishmentMeans (https://github.com/gbif/rs.gbif.org/blob/master/vocabulary/gbif/establishment_means.xml), but it is out of date with respect to the recommendations of the standard. I can build or update and commit XML vocabulary files for these three terms, but wonder whether there is a plan to use the vocabularies in the registry as thesauri directly or to generate XML files for rs.tdwg.org and access them from there. Without this resolved, we can't make a completely up-to-date set of core and extension XML files. We can only make a version without these vocabularies linked or with placeholder thesaurus URLs.

Assuming the issues above are resolved, I have also committed updated cores and extensions generated by the scripts and config files here for review and testing.

@tucotuco
Copy link
Collaborator Author

Updated vocabulary XML files for basisOfRecord, degreeOfEstablishment, establishmentMeans, and pathway in c82d4bc.

@MattBlissett
Copy link
Member

Thanks @tucotuco,

I've deployed everything to the sandbox, so it should be visible in test IPTs after synchronizing vocabularies and extensions.

@tucotuco
Copy link
Collaborator Author

Ehhksellent!

@tucotuco
Copy link
Collaborator Author

Is there a formal testing process? What can/should I tell people interested in knowing when the updates will be in production?

@MattBlissett
Copy link
Member

There is not a formal process.

IPT users can update test-mode IPTs and start to use the new vocabularies and updated terms. Published datasets can be reviewed on UAT.

We need to update our interpretation, indexing, APIs and the website (first on UAT) to recognize and (if applicable) interpret the new terms, and update our vocabularies where these have been introduced. This will require some coordination between our systems, we'll aim to link related issues to this one.

CC @fmendezh. The first step of updating the DWC-API is done: gbif/dwc-api@c79f124 , and shows the new terms. degreeOfEstablishment and pathway come with vocabularies, and establishmentMeans has a new vocabulary.

@MattBlissett
Copy link
Member

@EstebanMH-SiB
Copy link

We should like to know if there is an advance in the addition of new terms in the IPT or a tentative date to it.

We are attentive to the release to actualize a couple of resources. Thank you very much.

@timrobertson100
Copy link
Member

Thanks @EstebanMH-SiB

It is our aim to have all of GBIF.org updated to the latest in the next 4 weeks. We can't update the IPT schemas alone as it will break all data indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants