You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The original ideas have to questions of interest in mind:
Can we describe a pangenome in more detail than a simple "clustering families of genes"?
Can we use topology of pangenomes to define features than allows us to compare them?
Motivation
In order to do that, we choose topology, and specifically persistent homology (PH) and tools from TDA, since there are some parallels in other applications of these techniques than can be useful in this context:
List including how PH can be used to "measure" how far is the metric over a set of sequences to one be a additive. ( applications in evolution and complex events)
Suggested Pipelines and analysis
Compute a *metric function between le genes in a pangenome.
Compute the filtration associated to the Cech/Vietori Rips usual construction on the metric space obtained in 1.
Explore two options to generate features of a pangenome:
Persistence of simplexes. Each simplex define a family ( in the pangenomic sense) we can assign a birth and death times looking at the first time that simplex is absorbed into another one.
Persistence homology.
Explore useful features than allows gain insights of the pangenomes at hand.
We decided to explore first the route of persistence of simplexes since there were more obvious ways to to associate the genomes information to the topological structure.
NOTE: Some key parameters must be decided in this stage as:
Distance function between the genes.
Maximum distance of association ( when to genes are too far appart and cannot be part of a same family?
Maximum dimension of simplex in the complex.
Persistent of simplexes
Once we have the persistence of each simplex, we can associate the genomes present in that simplex ( the genomes that contains the genes present in that simplex). This information can be used to label the family. But, moreover, to classify each simplex into the category of a the shared genes of those genomes.
Note that this process allows to have a hierarchical clustering of the families and weight them using the persistence of the simplex that represent them.
Next steps
Complete the pipeline and compare the output with a classic pangenomic analysis.
Use this structure to suggest a more refined labeling of the families of genes ( core, shell, cloud)
Use this structure to define features to compare pangenomes.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Goals
The original ideas have to questions of interest in mind:
Motivation
In order to do that, we choose topology, and specifically persistent homology (PH) and tools from TDA, since there are some parallels in other applications of these techniques than can be useful in this context:
List including how PH can be used to "measure" how far is the metric over a set of sequences to one be a additive. ( applications in evolution and complex events)
Suggested Pipelines and analysis
We decided to explore first the route of persistence of simplexes since there were more obvious ways to to associate the genomes information to the topological structure.
NOTE: Some key parameters must be decided in this stage as:
Persistent of simplexes
Once we have the persistence of each simplex, we can associate the genomes present in that simplex ( the genomes that contains the genes present in that simplex). This information can be used to label the family. But, moreover, to classify each simplex into the category of a the shared genes of those genomes.
Note that this process allows to have a hierarchical clustering of the families and weight them using the persistence of the simplex that represent them.
Next steps
Beta Was this translation helpful? Give feedback.
All reactions