-
Notifications
You must be signed in to change notification settings - Fork 30
September 3, 2019
Resources
- Software Release: GitHub, Zenodo
- DockerHub Build: https://hub.docker.com/repository/docker/callahantiff/pheknowlator
- Build Data Sources: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources
The KG Benchmark Builds can be downloaded from Zenodo:
π KGs: https://zenodo.org/doi/10.5281/zenodo.7030200
π Embeddings: https://zenodo.org/doi/10.5281/zenodo.7030188
resource_info.txt
class_source_list.txt
instance_source_list.txt
ontology_source_list.txt
Data Download Date:Β November 30, 2018 -Β Data Source Details
Ontologies
Classes
- Human Disease Ontology
- Gene Ontology: gene associations
- Reactome: gene associations
- Human Phenotype Ontology: all source annotations - genes to phenotypes
- Human Phenotype Ontology: all source annotations - diseases to genes to phenotypes
Instances
- CTD: chemicals-genes
- CTD: chemicals-pathways
- CTD: chemicals-diseases
- CTD: genes-pathways
- CTD: diseases-pathways
- STRING DB: Proteins
- String DB: entrez gene mappings
Knowledge Representation
We worked with a PhD-level biologist to develop a knowledge representation (see the figure below) that modeled mechanisms underlying human disease.
To do this, we manually mapped all possible combinations of the following six node types:
- Humans Diseases
- Human Phenotypes
- Human Genes
- Gene Ontology concepts
- Reactome Pathways
- Chemicals
As shown inΒ the figure above, theΒ Basic Formal OntologyΒ andΒ Relation OntologyΒ ontologies were then used to create edges between the node types.
As shown in this figure, the following edge-types were created:
- Phenotypes-Genes:Β TheΒ Human Phenotype Ontology (HP)Β providesΒ phenotype-Entrez gene annotationsΒ that were used to map 6,651 HP classes to 120,288 Entrez genes.
- Phenotypes-Diseases:Β TheΒ HPΒ providesΒ HP-DOID-Gene annotationsΒ that were used to map 5,438 HP concepts to 43,817 DOID concepts.
- Biological processes, Molecular Functions, and Cellular Locations-Genes:Β TheΒ Gene Ontology (GO)Β providesΒ GO-Gene annotationsΒ that were used to map 17,505 GO concepts to 265,002 Entrez genes.
- Biological processes, Molecular Functions, and Cellular Locations-Pathways-Pathways:Β ReactomeΒ providesΒ GO-Gene linksΒ that were used to map 17,906 pathways to 1,910 biological processes, molecular functions, and cellular locations.
- Chemicals-Pathways:Β TheΒ Comparative Toxicogenomics Database (CTD)Β providesΒ Chemical-pathway linksΒ that were used to map 8,886 MESH concepts to 711,043 Reactome pathways.
- Chemicals-Genes:Β TheΒ Comparative Toxicogenomics Database (CTD)Β providesΒ Chemical-Gene linksΒ that were used to map 8,881 MESH concepts 410,379 Entrez genes.
- Chemicals-Diseases:Β TheΒ Comparative Toxicogenomics Database (CTD)Β providesΒ Chemical-Disease linksΒ that were used to map 14,238 MESH concepts 1,216,900 DOID concepts.
- Genes-Genes:Β The STRING DatabaseΒ providesΒ Gene-Gene linksΒ that were used to create 594,100 gene-gene interactions. When generating these mappings, only the inferred protein-protein relationships considered to be high confidence were used (score of 700 or better).
- Genes-Disease:Β Mappings between genes and diseases were retrieved fromΒ DisGeNetΒ via SPARQL endpoint and used to map 6,051 Entrez genes to 20,452 DOID concepts.
- Genes-Pathways:Β TheΒ Comparative Toxicogenomics Database (CTD)Β providesΒ Gene-Pathway linksΒ that were used to map 110,370 Entrez genes to 107,029 Reactome pathways.
- Pathways-Disease:Β TheΒ Comparative Toxicogenomics Database (CTD)Β providesΒ Pathway-Disease linksΒ that were used to map 1,818 Reactome pathways to 106,727 DOID concepts.
Knowledge Graph
The knowledge graph represented above was built using the following steps:
Merge Ontologies:Β Merge ontologies using theΒ OWL Tools API
Express New Ontology Concept Annotations:Β Create new ontology annotations by asserting a relation between the instance and an instance of the ontology class. For example to assert the following relations:
MorphineΒ -->Β is substance that treatsΒ --> Migraine
We would need to create two axioms:
- isSubstanceThatTreats(Morphine, x1)
- instanceOf(x1, Migraine)
While the instance of the HP class hemiplegic migraines can be treated as an anonymous node in the knowledge graph, we generate a new international resource identifier for each newly generated instance.
Deductively Close Knowledge Graph:Β The knowledge graph is deductively closed by using the OWL 2 EL reasoner, ELK via ProtΓ©gΓ© v5.1.1. ELK is able to classify instances and supports inferences over class hierarchies and object properties. inference over disjointness, intersection, and existential quantification (ontology class hierarchies).
Generate Edge List:Β The final step before exporting the edge list is to remove any nodes that are not biologically meaningful or would otherwise reduce the performance of machine learning algorithms and the algorithm used to generate embeddings.
All Builds | |
---|---|
PheKnowLator_v1_ClassInstancesOnly_KG.owl PheKnowLator_v1_ClassInstancesOnly_KG_ClassInstanceMap.json PheKnowLator_v1_Full_KG.owl PheKnowLator_v1_Full_KG_NoDisjointness.owl PheKnowLator_v1_MergedOntologies_BioKG.owl |
|
Closed KGs | Not Closed KGs |
PheKnowLator_v1_Full_BioKG_Closed_Triples_Integer_Labels_Map.json PheKnowLator_v1_Full_BioKG_NoDisjointness_Closed_ELK.owl PheKnowLator_v1_Full_BioKG_NoDisjointness_Closed_ELK_Reasoner_RESULTS.txt PheKnowLator_v1_Full_BioKG_NoDisjointness_Closed_ELK_Triples_Integers.bcsr PheKnowLator_v1_Full_BioKG_NoDisjointness_Closed_ELK_Triples_Integers.txt PheKnowLator_v1_Full_BioKG_NoDisjointness_Closed_ELK_Triples_Labels.txt PheKnowLator_v1_Full_BioKG_NoDisjointness_Closed_NoMetadataNodes.owl |
PheKnowLator_v1_Full_BioKG_NoDisjointness_NotClosed_NoMetadataNodes.owl PheKnowLator_v1_Full_BioKG_NoDisjointness_NotClosed_Triples_Integers.txt PheKnowLator_v1_Full_BioKG_NoDisjointness_NotClosed_Triples_Integers_.bcsr PheKnowLator_v1_Full_BioKG_NoDisjointness_NotClosed_Triples_Labels.txt PheKnowLator_v1_Full_BioKG_NotClosed_Triples_Integer_Labels_Map.json |
Embeddings | |
closed_knowledge_graphs.zip not_closed_knowledge_graphs.zip |