-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⬆️ 🎨 Allow the use of gtdb taxonomy in Autometa #284
Changes from 46 commits
9633568
e9d25c5
8eb37da
7cc9fbd
fc43bf8
13fc07c
52ccffa
4103cf1
11bdb83
62d1a01
47ab03f
5b7266c
2eb67bc
b1de92e
55945c3
a7acf67
3191c99
2970e58
510eafe
7b29936
941403c
04720eb
3f9e9fe
4bab399
194ea88
d514143
a83dae1
ab7818e
12684da
d2f15f8
51abf55
9133028
119bf3d
e6df890
c6f0d5d
64d6064
ed1cba8
b207159
e1b7648
5fd5f46
55c8014
535ea97
4463f1d
d777d0e
12a7bcd
8a0d784
9a328c0
684aee4
7f86c81
54c168c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,7 @@ jobs: | |
strategy: | ||
matrix: | ||
os: [ubuntu-latest] | ||
python-version: [3.7] | ||
python-version: [3.8] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this necessary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it's needed else one of the tests is failing. |
||
env: | ||
OS: ${{ matrix.os }} | ||
PYTHON: ${{ matrix.python-version }} | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -17,7 +17,9 @@ | |||||||||||||
|
||||||||||||||
from Bio import SeqIO | ||||||||||||||
|
||||||||||||||
from autometa.taxonomy.database import TaxonomyDatabase | ||||||||||||||
from autometa.taxonomy.ncbi import NCBI | ||||||||||||||
from autometa.taxonomy.gtdb import GTDB | ||||||||||||||
from autometa.taxonomy import majority_vote | ||||||||||||||
from autometa.common.markers import load as load_markers | ||||||||||||||
|
||||||||||||||
|
@@ -226,16 +228,16 @@ def get_metabin_stats( | |||||||||||||
|
||||||||||||||
|
||||||||||||||
def get_metabin_taxonomies( | ||||||||||||||
bin_df: pd.DataFrame, ncbi: NCBI, cluster_col: str = "cluster" | ||||||||||||||
bin_df: pd.DataFrame, taxa_db: TaxonomyDatabase, cluster_col: str = "cluster" | ||||||||||||||
) -> pd.DataFrame: | ||||||||||||||
"""Retrieve taxonomies of all clusters recovered from Autometa binning. | ||||||||||||||
|
||||||||||||||
Parameters | ||||||||||||||
---------- | ||||||||||||||
bin_df : pd.DataFrame | ||||||||||||||
Autometa binning table. index=contig, cols=['cluster','length','taxid', *canonical_ranks] | ||||||||||||||
ncbi : autometa.taxonomy.ncbi.NCBI instance | ||||||||||||||
Autometa NCBI class instance | ||||||||||||||
taxa_db : autometa.taxonomy.ncbi.TaxonomyDatabase instance | ||||||||||||||
Autometa NCBI or GTDB class instance | ||||||||||||||
cluster_col : str, optional | ||||||||||||||
Clustering column by which to group metabins | ||||||||||||||
|
||||||||||||||
|
@@ -246,7 +248,9 @@ def get_metabin_taxonomies( | |||||||||||||
Indexed by cluster | ||||||||||||||
""" | ||||||||||||||
logger.info(f"Retrieving metabin taxonomies for {cluster_col}") | ||||||||||||||
canonical_ranks = [rank for rank in NCBI.CANONICAL_RANKS if rank != "root"] | ||||||||||||||
canonical_ranks = [ | ||||||||||||||
rank for rank in TaxonomyDatabase.CANONICAL_RANKS if rank != "root" | ||||||||||||||
] | ||||||||||||||
is_clustered = bin_df[cluster_col].notnull() | ||||||||||||||
bin_df = bin_df[is_clustered] | ||||||||||||||
outcols = [cluster_col, "length", "taxid", *canonical_ranks] | ||||||||||||||
|
@@ -277,11 +281,13 @@ def get_metabin_taxonomies( | |||||||||||||
taxonomies[cluster][canonical_rank].update({taxid: length}) | ||||||||||||||
else: | ||||||||||||||
taxonomies[cluster][canonical_rank][taxid] += length | ||||||||||||||
cluster_taxonomies = majority_vote.rank_taxids(taxonomies, ncbi) | ||||||||||||||
cluster_taxonomies = majority_vote.rank_taxids(taxonomies, taxa_db=taxa_db) | ||||||||||||||
# With our cluster taxonomies, let's place these into a dataframe for easy data accession | ||||||||||||||
cluster_taxa_df = pd.Series(data=cluster_taxonomies, name="taxid").to_frame() | ||||||||||||||
# With the list of taxids, we'll retrieve their complete canonical-rank information | ||||||||||||||
lineage_df = ncbi.get_lineage_dataframe(cluster_taxa_df.taxid.tolist(), fillna=True) | ||||||||||||||
lineage_df = taxa_db.get_lineage_dataframe( | ||||||||||||||
cluster_taxa_df.taxid.tolist(), fillna=True | ||||||||||||||
) | ||||||||||||||
# Now put it all together | ||||||||||||||
cluster_taxa_df = pd.merge( | ||||||||||||||
cluster_taxa_df, lineage_df, how="left", left_on="taxid", right_index=True | ||||||||||||||
|
@@ -323,11 +329,18 @@ def main(): | |||||||||||||
required=True, | ||||||||||||||
) | ||||||||||||||
parser.add_argument( | ||||||||||||||
"--ncbi", | ||||||||||||||
help="Path to user NCBI databases directory (Required for retrieving metabin taxonomies)", | ||||||||||||||
"--dbdir", | ||||||||||||||
help="Path to user taxonomy database directory (Required for retrieving metabin taxonomies)", | ||||||||||||||
metavar="dirpath", | ||||||||||||||
required=False, | ||||||||||||||
) | ||||||||||||||
parser.add_argument( | ||||||||||||||
"--dbtype", | ||||||||||||||
help="Taxonomy database type to use (NOTE: must correspond to the same database type used during contig taxon assignment.)", | ||||||||||||||
choices=["ncbi", "gtdb"], | ||||||||||||||
required=False, | ||||||||||||||
default="ncbi", | ||||||||||||||
) | ||||||||||||||
parser.add_argument( | ||||||||||||||
"--binning-column", | ||||||||||||||
help="Binning column to use for grouping metabins", | ||||||||||||||
|
@@ -377,14 +390,17 @@ def main(): | |||||||||||||
logger.info(f"Wrote metabin stats to {args.output_stats}") | ||||||||||||||
# Finally if taxonomy information is available then write out each metabin's taxonomy by modified majority voting method. | ||||||||||||||
if "taxid" in bin_df.columns: | ||||||||||||||
if not args.ncbi: | ||||||||||||||
if not args.dbdir: | ||||||||||||||
logger.warn( | ||||||||||||||
"taxid found in dataframe. --ncbi argument is required to retrieve metabin taxonomies. Skipping..." | ||||||||||||||
"taxid found in dataframe. --dbdir argument is required to retrieve metabin taxonomies. Skipping..." | ||||||||||||||
) | ||||||||||||||
else: | ||||||||||||||
ncbi = NCBI(dirpath=args.ncbi) | ||||||||||||||
if args.dbtype == "ncbi": | ||||||||||||||
taxa_db = NCBI(dbdir=args.dbdir) | ||||||||||||||
elif args.dbtype == "gtdb": | ||||||||||||||
taxa_db = GTDB(dbdir=args.dbdir) | ||||||||||||||
Comment on lines
+398
to
+401
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if this should be a dispatcher routine or to simply keep a list of
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least for me the |
||||||||||||||
taxa_df = get_metabin_taxonomies( | ||||||||||||||
bin_df=bin_df, ncbi=ncbi, cluster_col=args.binning_column | ||||||||||||||
bin_df=bin_df, taxa_db=taxa_db, cluster_col=args.binning_column | ||||||||||||||
) | ||||||||||||||
taxa_df.to_csv(args.output_taxonomy, sep="\t", index=True, header=True) | ||||||||||||||
|
||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the python version for tests being changed here? Is this necessary for
gtdb_to_taxdump
installation?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was giving an error with
3.7
, link.from typing import Union, List, Literal
is only supported in versions >=3.8