Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isa: DANGLING:123 causes parse error #226

Open
matentzn opened this issue May 13, 2024 · 2 comments
Open

isa: DANGLING:123 causes parse error #226

matentzn opened this issue May 13, 2024 · 2 comments

Comments

@matentzn
Copy link

We have the following problem in the latest Mondo release:

[Term]
id: MONDO:0021125
name: disease characteristic
def: "An attribute of a disease." [https://orcid.org/0000-0002-6601-2165]
synonym: "disease qualifier" EXACT []
synonym: "modifier" EXACT [NCIT:C41009]
synonym: "qualifier" EXACT [NCIT:C41009]
xref: NCIT:C41009 {source="MONDO:equivalentTo"}
is_a: PATO:0000001
property_value: exactMatch NCIT:C41009

When running:

runoak --input pronto:$< info MONDO:0000001 

is causing:

KeyError: 'PATO:0000001'

When running:

fastobo-validator mondo.obo
     Parsing `mondo.obo`
    Finished parsing `mondo.obo` in 0.73s
   Completed validation of `mondo.obo`

Everything is all good.

When removing the isa statement above:

[Term]
id: MONDO:0021125
name: disease characteristic
def: "An attribute of a disease." [https://orcid.org/0000-0002-6601-2165]
synonym: "disease qualifier" EXACT []
synonym: "modifier" EXACT [NCIT:C41009]
synonym: "qualifier" EXACT [NCIT:C41009]
xref: NCIT:C41009 {source="MONDO:equivalentTo"}
property_value: exactMatch NCIT:C41009

Everything is good as well:

runoak --input pronto:mondo.obo info MONDO:0000001 
MONDO:0000001 ! disease

As there are thousands of dangling classes in mondo.obo - what seems to be the problem?

@gouttegd
Copy link

The KeyError is thrown by the symmetrize_lineage method in the pronto.parsers.base.BaseParser class:

def symmetrize_lineage(self):
    for getter in self._entities.values():
        entities, graphdata = getter(self.ont)
        for entity in entities():
            graphdata.lineage.setdefault(entity.id, Lineage())
        for subentity, lineage in graphdata.lineage.items():
            for superentity in lineage.sup:
                graphdata.lineage[superentity].sub.add(subentity)

which is itself called at the end of the OBO parser parse_from method:

def parse_from(self, handle, threads=None):
    […]
    # Update lineage cache with symmetric of `SubClassOf`
    self.symmetrize_lineage()

Overall, it seems there is an assumption here that when a class is a subclass of another, the parent class must exist somewhere in the graph. This does not take into account the possibility of dangling is_a references, which are explicitly acknowledged by the OBO specification (§6.1.2) – and for which the OBO Flat File Format Guide recommends (§S.3.4) that they should be silently accepted without yielding an error.

@cmungall
Copy link

Potential duplicate with #225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants