-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyse sample growth on multiple media (multi threading) #164
Comments
Yeah good idea. I think it makes sense to allow that the There is a way to make this work right now if you exploit the sample_specific media and are fine with storing the models multiple times. The trick is to create one sample with a unique id for each medium and have a matching Here is an example with the test data included in MICOM. We will simulate an E. coli community with a normal and anaerobic medium. import micom as mm
from micom.workflows import build, grow
import pandas as pd
tax = mm.data.test_data(1) # only one sample
tax["sample_id"] = "aerobic"
tax_noo2 = tax.copy()
tax_noo2["sample_id"] = "anaerobic"
multi_tax = pd.concat([tax, tax_noo2])
# So we now have a table with the same 3 taxa in two sample ids and build those models
manifest = build(multi_tax, mm.data.test_db, "models")
# Now we build a sample-specific medium
med = mm.qiime_formats.load_qiime_medium(mm.data.test_medium)
med_noo2 = med[med.reaction != "EX_o2_m"].copy()
med["sample_id"] = "aerobic"
med_noo2["sample_id"] = "anaerobic"
multi_med = pd.concat([med, med_noo2])
# The rest is straight-forward
# This will be run in parallel and will make sure to free the RAM
results = grow(manifest, "models", multi_med, 1.0, threads=2)
# And let's have a look
print(results.growth_rates) Which gives you something like this:
But, that is pretty inefficient in terms of storage. |
Hi @cdiener. This is interesting! I didn't know one could give specific tags to media as well! Yeah... the process is quite lossy because one is forced to create multiple duplicate models. But it is better than the alternative multiprocessing wrapper I wrote because of the inexplicable ram buildup (likely due to termination issues with the created threads). |
The RAM buildup is because of opencobra/cobrapy#568 which affects multiple solvers. You can get around that my setting |
Checklist
Is your feature related to a problem? Please describe it.
I want to analyse the microbial growth rates for a given microbiome profile (single sample) when run on multiple different diets (or media). The idea is to understand how individual growth rates are impacted with changes in the medium. Multiple samples run on the same medium or
grow
with the appropriate models has in-built multithreading to take advantage of a multi-core cpu. But growth of a single sample on multiple media doesn't have a multi threading option.Describe the solution you would like.
Given that a single sample
grow
uses only 1 thread, I want to be able to run growth calculations on different media (100-200 types of media) by using multithreading to access all threads simultaneously.Describe alternatives you considered
I currently use the
pool.starmap
function in the standard python multiprocessing library. I am facing an issue of memory pile up (likely due to threads that are not deallocated/ terminated despite the completion of the calculation) that causes the program to crash due to heavy ram usage.The text was updated successfully, but these errors were encountered: