-
Notifications
You must be signed in to change notification settings - Fork 30
v1 Build Details
All data for this release are available through the following links:
Data Download Date: November 30, 2018 - Data Source Details
Downloaded Resource Information: Ontologies
Downloaded Resource Information: Classes
- Human Disease Ontology
- Gene Ontology: gene associations
- Reactome: gene associations
- Human Phenotype Ontology: all source annotations - genes to phenotypes
- Human Phenotype Ontology: all source annotations - diseases to genes to phenotypes
Downloaded Resource Information: Instances
- CTD: chemicals-genes
- CTD: chemicals-pathways
- CTD: chemicals-diseases
- CTD: genes-pathways
- CTD: diseases-pathways
- STRING DB: Proteins
- String DB: entrez gene mappings
We worked with a PhD-level biologist to develop a knowledge representation (see Figure 1 below) that modeled mechanisms underlying human disease. To do this, we manually mapped all possible combinations of the following six node types:
- Humans Diseases
- Human Phenotypes
- Human Genes
- Gene Ontology concepts
- Reactome Pathways
- Chemicals
As shown in Figure 1, the Basic Formal Ontology and Relation Ontology ontologies were then used to create edges between the node types. The downloaded resource information for generating this information can be accessed here.
As shown in this figure, the following edge-types were created:
- Phenotypes-Genes: The Human Phenotype Ontology (HP) provides phenotype-Entrez gene annotations that were used to map 6,651 HP classes to 120,288 Entrez genes.
- Phenotypes-Diseases: The HP provides HP-DOID-Gene annotations that were used to map 5,438 HP concepts to 43,817 DOID concepts.
- Biological processes, Molecular Functions, and Cellular Locations-Genes: The Gene Ontology (GO) provides GO-Gene annotations that were used to map 17,505 GO concepts to 265,002 Entrez genes.
- Biological processes, Molecular Functions, and Cellular Locations-Pathways-Pathways: Reactome provides GO-Gene links that were used to map 17,906 pathways to 1,910 biological processes, molecular functions, and cellular locations.
- Chemicals-Pathways: The Comparative Toxicogenomics Database (CTD) provides Chemical-pathway links that were used to map 8,886 MESH concepts to 711,043 Reactome pathways.
- Chemicals-Genes: The Comparative Toxicogenomics Database (CTD) provides Chemical-Gene links that were used to map 8,881 MESH concepts 410,379 Entrez genes.
- Chemicals-Diseases: The Comparative Toxicogenomics Database (CTD) provides Chemical-Disease links that were used to map 14,238 MESH concepts 1,216,900 DOID concepts.
- Genes-Genes: TheSTRING Database provides Gene-Gene links that were used to create 594,100 gene-gene interactions. When generating these mappings, only the inferred protein-protein relationships considered to be high confidence were used (score of 700 or better).
- Genes-Disease: Mappings between genes and diseases were retrieved from DisGeNet via SPARQL endpoint and used to map 6,051 Entrez genes to 20,452 DOID concepts.
- Genes-Pathways: The Comparative Toxicogenomics Database (CTD) provides Gene-Pathway links that were used to map 110,370 Entrez genes to 107,029 Reactome pathways.
- Pathways-Disease: The Comparative Toxicogenomics Database (CTD) provides Pathway-Disease links that were used to map 1,818 Reactome pathways to 106,727 DOID concepts.
The knowledge graph represented above was built using the following steps:
-
Merge Ontologies: Merge ontologies using the OWL Tools API
-
Express New Ontology Concept Annotations: Create new ontology annotations by asserting a relation between the instance and an instance of the ontology class. For example to assert the following relations:
Morphine --> is substance that treats --> Migraine
We would need to create two axioms:
- isSubstanceThatTreats(Morphine, x1)
- instanceOf(x1, Migraine)
While the instance of the HP class hemiplegic migraines can be treated as an anonymous node in the knowledge graph, we generate a new international resource identifier for each newly generated instance.
-
Deductively Close Knowledge Graph: The knowledge graph is deductively closed by using the OWL 2 EL reasoner, ELK via Protégé v5.1.1. ELK is able to classify instances and supports inferences over class hierarchies and object properties. inference over disjointness, intersection, and existential quantification (ontology class hierarchies).
-
Generate Edge List: The final step before exporting the edge list is to remove any nodes that are not biologically meaningful or would otherwise reduce the performance of machine learning algorithms and the algorithm used to generate embeddings.
A modified version of the DeepWalk algorithm was implemented to generate molecular mechanism embeddings from the biomedical knowledge graph. A t-SNE plot of the dimensionality reduced mechanism embeddings is shown in Figure 2 below. For this release, the hyperparameters were set to 512 dimensions, 100 walks, walk length of 20, and a window of 10.
- Knowledge Graph Output
- Embedding Output:
embeddings.zip
This project is licensed under Apache License 2.0 - see the LICENSE.md file for details. If you intend to use any of the information on this Wiki, please provide the appropriate attribution by citing this repository:
@software{callahan_tj_2019_3830982,
author = {Callahan, TJ and
William A. Baumgartner Jr and
Ignacio J. Tripodi and
Adrianne L. Stefanski and
Jordan M. Wyrwa and
Lawrence Hunter},
title = {PheKnowLator},
month = mar,
year = 2019,
note = {{Newer version of the v1.0.0 release that includes output data generated by this code.}},
publisher = {Zenodo},
version = {v.1.0.0},
doi = {10.5281/zenodo.3830982},
url = {https://doi.org/10.5281/zenodo.3830982}
}