-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connect harvester to NDE Dataset Register #97
Comments
During today’s tech day, we discussed the idea of having a preparatory SPARQL query that returns a list of provider URIs to use in the regular query. On the side of the NDE Dataset Register we can add some predicate to datasets that should be included in the CLARIAH Registry. To keep things standardised, NDE then provides a SPARQL query that selects on that predicate to the CLARIAH Harvester. This is similar to the |
@menzowindhouwer As discussed, I’ve now changed the example query to a CONSTRUCT, allowing you to get its results as a single RDF graph per dataset rather than (duplicated) SELECT result bindings. |
@menzowindhouwer @vicding-mi Can you elaborate on how you select datasets from the NDE Dataset Register for inclusion in the CLARIAH one? If I remember correctly, you do so on the level of the dataset’s publisher. If so, we want to add more publishers to that list, including https://uba.uva.nl, as requested by @LvanWissen. |
On the other hand, I also see that not all datasets published by https://uba.uva.nl/ are relevant for CLARIAH. Some of the datasets in there are created in research projects, such as ECARTICO, OnStage and Cinema Context, and are relevant. The main collection can, for instance, stay only in the NDE register. A more advanced filter would not only look at publisher, possibly also at creator/contributor (and their ORCiD or ROR identifiers). |
@LvanWissen In that case, please see if netwerk-digitaal-erfgoed/dataset-register#483 would solve your use case. |
Yes, but that's an 'opt-in' on my side, as it requires an extra attribute to the dataset description. I'd rather see an 'opt-out'. In my opinion, filtering should be done on the harvesting party's side. |
the harvest is based on the following sparql query, correct me if I am wrong please @menzo ;)
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT * WHERE
{{
BIND ***@***.***}> as ?publisher)
?dataset a dcat:Dataset ;
dct:title ?title ;
dct:license ?license ;
dct:publisher ?publisher .
OPTIONAL {{ ?dataset dct:description ?description }}
OPTIONAL {{ ?dataset dcat:keyword ?keyword }}
OPTIONAL {{ ?dataset dcat:landingPage ?landingPage }}
OPTIONAL {{ ?dataset dct:source ?source }}
OPTIONAL {{ ?dataset dct:created ?created }}
OPTIONAL {{ ?dataset dct:modified ?modified }}
OPTIONAL {{ ?dataset dct:issued ?published }}
OPTIONAL {{ ?dataset owl:versionInfo ?version }}
OPTIONAL {{ ?dataset dcat:distribution ?distribution .
?distribution dcat:accessURL ?distribution_url .
}}
OPTIONAL {{ ?distribution dcat:mediaType ?distribution_mediaType }}
OPTIONAL {{ ?distribution dct:format ?distribution_format }}
OPTIONAL {{ ?distribution dct:issued ?distribution_published }}
OPTIONAL {{ ?distribution dct:modified ?distribution_modified }}
OPTIONAL {{ ?distribution dct:description ?distribution_description }}
OPTIONAL {{ ?distribution dct:license ?distribution_license }}
OPTIONAL {{ ?distribution dct:title ?distribution_title }}
OPTIONAL {{ ?distribution dcat:byteSize ?distribution_size }}
}}
On 30 Nov 2023, at 11:20, David de Boer ***@***.***> wrote:
@menzowindhouwer<https://github.com/menzowindhouwer> @vicding-mi<https://github.com/vicding-mi> Can you elaborate on how you select datasets from the NDE Dataset Register for inclusion in the CLARIAH one? If I remember correctly, you do so on the level of the dataset’s publisher. If so, we want to add more publishers to that list, including https://uba.uva.nl<https://uba.uva.nl/>, as requested by @LvanWissen<https://github.com/LvanWissen>.
—
Reply to this email directly, view it on GitHub<#97 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHRYZCY2MKMYLYJMDIX7FELYHBMYLAVCNFSM5QVTF6HKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTGM2DONJVGI3A>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
The NDE Register will be used for (at the very least) B&G (#96) and KB.
Please find an example query here. Replace the
BIND (<http://data.bibliotheken.nl/id/thes/p075301482> as ?publisher)
with the publisher you want to retrieve datasets for. For a list of publishers, see this query.
Semantics of the query arguments are described at https://github.com/netwerk-digitaal-erfgoed/dataset-register#dcatdataset and based on the Requirements for Datasets.
You can also have a look at the NDE Dataset Register website for examples.
The text was updated successfully, but these errors were encountered: