Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using query_missingGenes_from_module on multiple modules #3

Open
arianccbasile opened this issue Jun 10, 2019 · 3 comments
Open

using query_missingGenes_from_module on multiple modules #3

arianccbasile opened this issue Jun 10, 2019 · 3 comments

Comments

@arianccbasile
Copy link

Dear Andreas,
I am trying to use your script query_missingGenes_from_module to spot missing genes from module but I need to do it recursively on multiple modules (a list provided) and then write the result in a file. When I do it on one module only I have no problem but when I put my list in a vector and call with a loop the elements of the vector to be my "ID_MODULE" I don't manage to format the output correctly.

Any hint? Thank you in advantage and for having provided this amazing resource.
Sincerely,
Arianna

@asmvernon
Copy link
Collaborator

Hi Arianna,
I'm glad you're finding it useful!

The query_missingGenes_from_module returns a data frame where MISSING_GENES contains a list of the genes. There are a few options on how you can then retrieve these data and aggregate it for multiple modules depending on what you want to do with it afterward.

Here's some code with some options. Hope one of them help guide you to best achieve your goal:

library(Metqy)
# install.packages('reshape2')

data("data_example_KOnumbers_vector")
modules <- c('M00001', 'M00002')

module_missing_genes_str <- data.frame('MODULE' = modules,
                                   'MISSING_GENES' = '',
                                   stringsAsFactors = FALSE)

# Collect pairs of modules and each missing gene by constructnig the following structure:
#
#   MODULE  MISSING_GENE
#   M00001 K00001
#   M00001 K00002
#   M00001 K00003
#   M00002 K00004
module_missing_genes_pairs <- NULL
  

for(M in 1:length(modules)){
  OUT <- query_missingGenes_from_module(data_example_KOnumbers_vector,modules[M])
  these_genes_list <- OUT$MISSING_GENES
  # collapse list into string
  these_genes_str <- paste(these_genes_list, collapse=',')
  module_missing_genes_str$MISSING_GENES[M] <- these_genes_str
  
  # Append the combination of the module ID repeted as many times as there are missing genes 
  #                             combine rows together
  module_missing_genes_pairs <- rbind(module_missing_genes_pairs,
                                      # combine two list into pair-wise columns
                                      cbind(rep(modules, length(these_genes_list)), these_genes_list))
  
}
# Tiddy module_missing_genes_pairs up 
module_missing_genes_pairs <- data.frame(module_missing_genes_pairs, stringsAsFactors = FALSE)
names(module_missing_genes_pairs) <- c('MODULE', 'MISSING_GENES')

# GENERATE A MATRIX OF THE MODULES AND THE MISSING GENES WHERE '1' INDICATES THE GENE IS MISSING FOR THAT MODULE
module_missing_genes_pairs$Value <- 1
module_missing_genes_matrix <- reshape2::dcast(module_missing_genes_pairs,formula = MODULE~MISSING_GENES,value.var = "Value",drop = F,fill = 0)
# Use the module ID to name the rows
rownames(module_missing_genes_matrix) <- module_missing_genes_matrix[,1]
# Remove the colums with the module IDS to have a numeric matrix

module_missing_genes_matrix           <- module_missing_genes_matrix[,2:ncol(module_missing_genes_matrix)]

You can find the script here

Let me know if you need any further help!

Also, I'd be interested to know what you are using it for :)

All the best,
Andrea

@arianccbasile
Copy link
Author

It sounds super good! I will try it asap. I was using it for gapfilling of metabolic models but actually I abandoned this way and used an other tool at the very ending. Now this pipeline came to my mind because I am doing metagenome assembled genomes annotation and I am back to this script.

Random question, do you plan to update the db?

All the best,
Arianna

@asmvernon
Copy link
Collaborator

Hi Arianna,
Hope you find it useful!

I don't plan to update the db, but you can pass updated data if you have access to it :)
Please look at the documentation on how to do this by using the use_module_reference_table and use_genome_reference_table argument fields

Best,
A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants