Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify dc:creator oio:created_by and dc:contributor, IAO:term editor #60

Open
matentzn opened this issue Mar 8, 2021 · 40 comments
Open
Labels

Comments

@matentzn
Copy link
Contributor

matentzn commented Mar 8, 2021

Is there any way we can, OBO wide, agree to

  • move to dc:creator with orcids as values OR
  • move to oio:created_by with orcids as values

and

  • agree that dc:contributor should always refer to a valid orcid?

@cthoyt

@cthoyt
Copy link
Contributor

cthoyt commented Mar 8, 2021

I'm not familiar with the difference in semantics of dc:creator and dc:created_by. Does one refer to a resource and the other a literal? Because it would be great to refer to ORCID identifiers as resources.

Either way, 100% support using structured information as attribution. It's very disconcerting reading through such high quality resources and finding somebody's initials that take 2 hours to look up by reading old papers. This has happened to me in GO, MONDO, and others

@matentzn
Copy link
Contributor Author

matentzn commented Mar 8, 2021

Thats because it was a typo.. Sorry about that. Fixed now. oio:created_by!

@cthoyt
Copy link
Contributor

cthoyt commented Mar 8, 2021

Okay, then rephrased: I'm not familiar with oio - but since DC is so ubiquitous, I'd vote for using that (unless the semantics of oio:created_by are more suggestive for relations between resources instead of just text)

@matentzn
Copy link
Contributor Author

matentzn commented Mar 8, 2021

oio stands for oboInOwl and is basically the OBO format internal vocabulary namespace. You have been using oio:hasDbXref a lot!

@cmungall
Copy link
Contributor

cmungall commented Mar 9, 2021

agree that dc:contributor should always refer to a valid orcid?

I think SHOULD not MUST is OK here but be prepared that there will be many violations. We have many ontologies that are decades old with contributions that predates ORCID. In some cases we have retrospectively tracked down historic contributors and rewired their contributor dbxref to an orcid, but this is not always possible. Many historic contributors still lack ORCIDs. I worry by saying SHOULD we generate a lot of busy work on resource poor ontologies that would be better spent elsewhere, or we just weaken the meaning of SHOULD to where it's meaningless.

@matentzn
Copy link
Contributor Author

matentzn commented Mar 9, 2021

I would say SHOULD is good and we just agree on using orcids moving forward.. I don't think its busy work. If we could use this consolidated way of attributing to generate a dashboard that makes individuals contributions to ontologies other than their own more visible, this will be a very great incentive!

@sbello
Copy link

sbello commented Mar 9, 2021

When adding terms in Protege if you use the new entities metadata settings to automatically add creator and date information to new terms, the default setting is for creator (see image). If we are not going to settle on 'creator' it would be good to ask protege to change the default setting to whatever we settle on.
creator_metadata

@matentzn
Copy link
Contributor Author

matentzn commented Mar 9, 2021

@sbello thanks! Yes! And what would be even better if the protege config was a separate config file that could be reused across obo.. We are contemplating something like that at the moment!

@matentzn matentzn changed the title Unify dc:creator oio:created_by and dc:contributor Unify dc:creator oio:created_by and dc:contributor, IAO:term editor Nov 3, 2021
@matentzn
Copy link
Contributor Author

matentzn commented Nov 3, 2021

In reference to #76 maybe we should first gather the use cases for attributing terms.

I want to emphasise one more time how strongly I feel about OBO being a driving force in world-wide ontology standardisation efforts beyond the biomedical domain, and to do that, we need to cut back on some of our silo annotation properties in the OIO and IAO vocabularies in favour of more widely used ones, like dublin core, skos, void, and friends. Please open a new issue: "We should not re-use external vocabularies if it means even the slightest compromise" and provide your arguments to convince me otherwise. So yes, standardisation means that we may lose some subtle distinctions.

Here is how I would suggest we use the creation vocabulary. Please tell me what you think.

  • dc:creator: the person or group that is responsible for the ID (IRI) of the term coming into being. This is synonymous with oio:created_by. The primary use case of this annotation is attribution (not provenance).
  • dc:contributor: any person that contributed anything to a term (adding a synonym, label, etc). The primary use case of this annotation is attribution (not provenance).
  • dc:source: if a person (or group) invented something, i.e. a definition or something along these lines, they can be referenced as a source. The primary use case of this annotation is provenance, not attribution. We can consider using this for robot templates or dosdp template-based generations as well.
  • IAO:0000117 (term editor): @zhengj2007 points out that "the person who add the term in the OWL file may not be the creator of the term" - while true, I would argue this is a distinction that is so subtle that it would not help with neither provenance, nor attribution. I would suggest to use dc:source or dc:creator, whichever is more appropriate from the definitions above.
  • oio:created_by means the same as dc:creator above, and should be retired.
  • The range of any of the above should, be one of the following, sorted from most to least desireable:
    1. ORCiD
    2. ROR
    3. Wikidata Identifier
    4. ..... huge threshold of desirability....
    5. http://purl.obolibrary.org/obo/mondo#CJM (this is just a hack to contextualise the current "CJM used by ontologies like GO).
    6. "Chris Mungall"
    7. "CJM"

I am not saying to change all legacy annotations now to this: I am saying, let's find a standard we can use moving forward, or agree that standardising this is not worth the cost.

@StroemPhi
Copy link

Wouldn't the semantics behind IAO:0000117 be sufficiently provided, if each term has it's own issue (using IAO_0000233 - term tracker item) that is properly assigned to be handled by the "term editor(s)"?

@matentzn
Copy link
Contributor Author

matentzn commented Nov 3, 2021

I totally agree. I would love making this standard habit, tagging all new terms with their respective github issues.. It would create a layer of indirection, for obtaining the "responsible editor", but I think this much better than using non standard properties for something like that..

@sbello
Copy link

sbello commented Nov 3, 2021

@matentzn can these annotations be added when using ROBOT templates?
I like the creator/contributor/source trio ideally in combination with an ORCID but it would be helpful if I could include this information in ROBOT templates for bulk addition. Would it be as simple as adding columns for this attributes?

@matentzn
Copy link
Contributor Author

matentzn commented Nov 3, 2021

Absolutely no problem! :)

@zhengj2007
Copy link
Contributor

@matentzn I'd like to correct my comment. I never used 'dc:creator' when I added a new term. So, what I mean is "the person who add the term in the OWL file may not be the IAO: 'term editor' of the term".

@matentzn
Copy link
Contributor Author

matentzn commented Nov 3, 2021

I get it now @zhengj2007 thanks! But perhaps that is secondary. In this case of ambiguity, you could simply use dc:contributor which is certainly true, right?

@bpeters42
Copy link

bpeters42 commented Nov 3, 2021 via email

@bpeters42
Copy link

bpeters42 commented Nov 3, 2021 via email

@matentzn
Copy link
Contributor Author

matentzn commented Nov 3, 2021

Yes, I agree with that as well.. dc:contributor should be the default, and, realistically given that ontologies are always a massively collaborative effort, I would even agree to a motion that gets rid of dc:creator altogether. Thank you @bpeters42 for your input :)

@sbello
Copy link

sbello commented Nov 3, 2021

It looks like I can change the user metadata in protege to use whatever relation we decide in the creator property field. So, if the group wants to go with contributor instead of creator I'm fine with that.

@wdduncan
Copy link

wdduncan commented Nov 3, 2021

FWIW, I've been setting the "New entities metadata" to "Use user name.

image

But, in my "User details" setting, I include my name and ORCID.

image

I like having both a name and ORCID, since I don't have people's ORCIDs memorized.

@cmungall
Copy link
Contributor

cmungall commented Nov 3, 2021

I agree with Nico's recommendations.

What this doesn't address is how this interacts with definition level axiom annotations (done using owl reification). It's very common on many ontologies to provide as provenance for a definition some mix of primary, secondary, tertiary sources, individuals, and groups of people.

How should this interact with term-level source and contributor annotations?

  1. Favor term-level annotations over axiom-level
  2. Favor axiom-level and only include term-level if non-redundant
  3. Have redundancy in the release version, non-redundancy in edit version, and a standard sparql update to propagate selectively from axiom-level to term level as part of the release process
  4. No recommendation. Every ontology does this as it pleases

I favor 3, and disfavor 1, it is important for many ontologies to have the provenance at the axiom level.

@graybeal
Copy link

graybeal commented Nov 4, 2021

I think I agree, it isn't clear what IAO:0000117 (term editor) adds to the others, nor which of the others it truly represents (but I infer 'creator' from the description), and therefore it is less helpful to the average non-OBO user. (if that's a user you're trying to reach, that's a good thing I think.)

Some nuances in case they are useful.

Is making at least one dc:contributor required, but making dc:creator optional consistent with both your idea of compromise and the previous comments?

Note there is no reason people and institutions can't both be contributors/creators/etc on one term. Right?

presumably dc:source can also be a place (location on the web), not just a person or group.

I think you've dropped a few person identification systems that have some scientific following and are LOD-friendly (FOAF, VIVO). Whereas I'm not sure why you'd include 4 through 7, given this is a future-looking recommendation.

@matentzn
Copy link
Contributor Author

matentzn commented Nov 5, 2021

I agree full-heartedly with your assessment @graybeal , the reason why I added these three purely because I want to void pushback from GO which has used 4-7 for 30 years and will now be resistant to retro-curate all the various cjms and others to orcids.. Maybe I will volunteer doing it for them one weekend - if we can agree that orcid is the preferable identification. If someone has no orcid, I would follow the radical @cthoyt method of simply creating an entity on wikidata and use that, and I prefer that then to use FOAF or VIVO, because we know easily how to edit it. But, yes, FOAF and VIVO would still be better than 4-7.

@matentzn
Copy link
Contributor Author

Related to #2

@wdduncan
Copy link

@cthoyt Does it help your script if I reverse how I do my dc:creator annotations so that the orcid comes first? E.g.:

https://orcid.org/0000-0001-9625-1899 (Bill Duncan)

@matentzn
Copy link
Contributor Author

It is not @cthoyt script that is the only problem: we want to simply aggregate contributions across all ontologies using sparql. The labelling approach you chose is not well defined, everyone will do it differently. If we want human readable editor names as well, we should provide a map in the ontology header.

@wdduncan
Copy link

What is not well specified about the example? Are you wanting something like a regex? How about this:

{orcidid} *({first-name last-name},+)

I.e.: an orcid followed by an optional set of one or more comma delimited names contained within parenthesis.

I don't like the idea of putting a map in the header. It makes people go looking for the name associated with the orcid.

@cthoyt
Copy link
Contributor

cthoyt commented Dec 20, 2021

I agree with Nico; there's no useful, machine-readable attribution via dc:creator that isn't structured by directly and only using the IRI for the ORCID record.

I can't see how adding a mini-language within the OWL spec would be helpful, I'd strong disagree with anything that isn't just using the ORCID IRI for attribution purposes.

With regard to ease of access to human-readable names for contributors, I think that's a different conversation that has to happen somewhere else at a later time, after first getting a consensus that people would generally actually use this human-readable metadata

@matentzn
Copy link
Contributor Author

First-name and last name is super error-prone no matter what (what about middle initials, special chars for Spanish etc). I am thinking more of how this would feed into knowledge graphs like wikidata, where your name would be a label on your ID, and how all the information OBO generates get connected to the wider world - the orcid here really is not a literal, but a node in a graph. We can always generate Human readable labels from the ORCIDs automatically if people would ask for it, and connect that using a different property.

@wdduncan
Copy link

What is the benefit vs cost of using only orcids? In what I proposed you (or a script) can ignore ignore whatever comes after the orcid. People (or at least me) read names, not orcids.

I'm trying to find a compromise. But, you don't seem open to such a compromise.

@matentzn
Copy link
Contributor Author

Just think of it that way: an ORCID is the ID of a person. Would you recommend the ID of Limb in Uberon to be "http://purl.obolibrary.org/obo/UBERON_123 (limb)"? Again, the compromise is to have a well defined second property that generates human-readable contributor statements..

@wdduncan
Copy link

Where does it say that the dc:creator can only have orcids as values?

There is also the term editor annotation. Do you also want to restrict it in the same way? Again

@matentzn
Copy link
Contributor Author

You are right, there is no rule restricting the range of dc:creator. I just have a use case I want to implement, which is to accurately aggregate contributions across OBO ontologies. So I want to be able to, write a sparql queries that counts all the terms you have contributed to. For that, anything beyond the orcid will lead to inconsistencies. I would be ok to repurpose the term editor relation to do something like what you are proposing though, basically saying that term editor is the human readable variant of dc:contributor.

@wdduncan
Copy link

Or you could propose a new annotation that is defined to only take orcids as values. That way there would be no special re-defining of annotations that are already used.

@alanruttenberg
Copy link
Collaborator

alanruttenberg commented Dec 20, 2021 via email

@matentzn
Copy link
Contributor Author

The reason why I want to use dc:contributor is because it is an international standard, and I want OBO terms to be queryable in Wikidata and similar using this property, and the terms we added connected to our Wikidata records through our orcid. We could label the orcids as @alanruttenberg suggests. In any case, this is not for me, or you, to decide - there is no point in me mandating it and no one caring and implementing it. So don't worry. Maybe the proposal does not fly, and that's that. But to achieve what I want to achieve, which is machine-unambiguous attribution analysis, there is just no alternative than to use a standard property (that wikidata understands) and an identifier as range.

@alanruttenberg
Copy link
Collaborator

alanruttenberg commented Dec 20, 2021 via email

@wdduncan
Copy link

You can also have the ORCID as an annotation on dc:creator. E.g.:

  ex:foo a  owl:Class ;
             dc:creator "someone name" .

[ a                      owl:Axiom ;
  ex:orcid           "0000-0000-0000-0000" ;
  owl:annotatedProperty  dc:creator;
  owl:annotatedSource    ex:foo ;
  owl:annotatedTarget    "someone name"
] .

Not sure what is wrong with the proposal of having the dc:creator use the format:
<orcid> <name>. In SPARQL, you can split the orcid and name parts, and the name part can be ignored. The name part is just for us humans to read.

@cmungall
Copy link
Contributor

cmungall commented Jan 3, 2022 via email

@cmungall
Copy link
Contributor

cmungall commented Dec 3, 2022

There are a lot of orthogonal issues being discussed here.

I have tried to separate these out into actionable proposals that can be voted on:

If someone wants to make issues for other aspects covered here (e.g whether to put labels on axiom annotations) then go ahead!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants