Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge function doubt #369

Open
jamorillo opened this issue May 24, 2024 · 2 comments
Open

merge function doubt #369

jamorillo opened this issue May 24, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@jamorillo
Copy link

jamorillo commented May 24, 2024

Dear Chi,
I have some doubts about applying the merge function of microeco. If it collapses or combines all ASVs/OTUs into genera (for example), I would expect the row.names of the "OTU table" to be genera names, but instead they remain as OTUs. How can I perform this calculation within microeco? I need all data "collapsed" in genera.
For example:

dataset

microtable-class object:
sample_table have 90 rows and 4 columns
otu_table have 404 rows and 90 columns
tax_table have 404 rows and 7 columns
phylo_tree have 404 tips
rep_fasta have 404 sequences

dataset_Genus <- dataset$merge_taxa(taxa = "Genus")
dataset_Genus

microtable-class object:
sample_table have 90 rows and 4 columns
otu_table have 180 rows and 90 columns
tax_table have 180 rows and 6 columns

-> OK, 180 genera. BUT:

row.names(dataset_Genus$otu_table)

[1] "OTU_16" "OTU_172" "OTU_150" "OTU_12" "OTU_357" "OTU_1" "OTU_5" "OTU_34" "OTU_82" "OTU_102" "OTU_36" "OTU_2" ...

-> is here where I would expect genera names ike "Bacillus", "Pseudomonas" collapsing all ASVs belonging to the same Genus

@ChiLiubio
Copy link
Owner

Hi @jamorillo ,
The main reason is collapsed genera have many unclassified information for different Family, e.g. multiple "g__" in Genus column in tax_table. If we directly use genus names as rownames, these g__ will be merged into one. This can directly discard some unclassified information. If we combine all taxonomic levels names as rownames, it is not readable for users. So it is best to select one representative OTU/ASV to temprarily represent its genus. This does not affect all the following analysis, because the collopased data (microtable object) has totally same format with previous one. It is a very important principle for the pipeline. So If you want to use genera names instead of OTU/ASV, you can directly replace them. Here is an example.

library(microeco)
library(magrittr)
data(dataset)
test <- dataset$merge_taxa("Genus")
# delete those duplicated names, e.g. g__ or other same names
test$tax_table %<>% .[! duplicated(.$Genus), ]
# delete remained g__ if it is necessary
test$tax_table %<>% .[.$Genus != "g__", ]
test$tidy_dataset()
rownames(test$otu_table) <- rownames(test$tax_table) <- gsub("g__", "", test$tax_table$Genus)

@ChiLiubio ChiLiubio added the documentation Improvements or additions to documentation label May 25, 2024
@jamorillo
Copy link
Author

Aha. I understand the reason. I tested your piepeline, it works perfectly. My idea of this "collapsed" table is to use it for specifc heatmaps with selected genera, then is useful to have the genera already in the row.names.
Thanks a lot once more,
jose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants