-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
threading question #85
Comments
FYI, I do the section in the docs that says the default is 4, I am more curious where that is set when the values I see in the config file suggest 1. |
First regarding 1 vs 2: Snakemake doesn't really have an internal notion of samples it only considers jobs. It builds a DAG of all jobs that needs to be run, and then runs them as soon as all of their dependencies are met (e.g.: SNV calling needs first an alignment and won't start before) and as soon as enough resources are free (e.g.: enough threads are available). Currently, it parses the DAG breadth-first (so it will tend to run most of the samples in parallel - i.e.: the alignment jobs will tend to be all called before the SNV calling jobs). So if you want each sample to be processed separately, you would need to run a whole snakemake separately for each. |
Now for your questions: regarding threads:
|
now regarding fine tuning you configuration file: calling SNVs is done by default using ShoRAH which work in independant local windows. Thus it is an embarrassingly-parallel type of problems and can scale to more threads (currently we run 64 concurrent threads on our thread rippers), requesting 1GiB of RAM per thread on average which works most of the time. [snv]
consensus=false
time=240
threads=64
mem=1024
localscratch=$TMPDIR bwa (the default aligner for SARS-CoV-2) works in batches of ~1 million reads. [bwa_align]
mem=2048
threads=6 |
for running specifically 8 samples in parallel and allocating exactly 16 threads on each: It's not easily done in the current way vpipe is written. Another approach would be to split your sample file in batches of 8 samples and run them separately. But in that case, you better use the |
last, a different approach if you run on HPC (and not on a single 128 core workstation) would be to let snakemake dispatch jobs on the cluster using its |
Thanks, awesome information. I am indeed on an HPC system. I think I have settled on running each sample independently and am hoping the cluster option scales the various steps (jobs) according to threads used. For samples that seem to be more diverse, the last step of making the json file is painfully slow. I noticed just prior to that, there are 10 partitioned vcf files, would it be computationally (time) more efficient to process those individually and then combine sub-jsons? |
Hi, thanks for this pipeline, loving it. BUT am not yet a snakemake guru.
I have a question regarding optimizing of the compute. Seems like I can run the pipeline two ways:
If I do (1), I am starting vpipe with the
--cores 128
option (AMD server with 128 physical cores) but it seems to use only 4 threads for those sub-programs that can use them. In the vpipe config files, I see the threads option, but that seems to be set to 1. So, where did it get the 4 and is there an easy way to change that globally?--threads=128
or something?If I do (2), is there a way to specify the number of samples that should be processed simultaneously AND similar to above, the threads to use for each process? Something like process 8 samples at a time using 16 threads each.
Thanks
Bob
The text was updated successfully, but these errors were encountered: