-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STAR index generation is extremely slow after recipe rebuild #53018
Comments
I notice weirdly that the Conda channel contains a newer build than the current recipe on this repo (4 vs 3): https://anaconda.org/bioconda/star/files |
I've just tried using the newer version (h5ca1c30_4) and found that it has the same problems that I reported previously. Namely, that the Suffix Array sorting step does not use multiple threads even when they are requested, leading to a very long run time for building the index. However, I've also now just tested the build from ~ 6 months ago (star_2.7.11b-h43eeafb_2) and found that these issues are not present in this version. Here are the timings for this version:
So performance for this version is similar to what was seen in version 2.7.11a. Overall, it does seem like something related to the recipe rebuild has changed (turned off?) the multi-threading capabilities, at least for the genomeGenerate step. |
I'm currently failing to reproduce this with Docker (Linux image on my Mac), at least with a smaller testing genome. I wonder if it might be Singularity-specific. Or maybe the genome is just not big enough.... 2.7.10b, build 0, 4 cores, 2:04.28 total:
2.7.10b, build 0, 8 cores, 1:41.17 total:
2.7.11b, build 4, 4 cores, 2:05.42 total:
2.7.11b, build 4, 8 cores, 1:55.76 total:
|
adding |
Thanks, @pinin4fjords and @pabloaledo! I'm traveling for the next couple of days, but if any testing is still needed once I get back, I'm happy to help with it. |
I think that does it @pabloaledo !
I monitored actual core usage, and that change seems to restore things |
@davidecarlson if you could verify the latest build whenever you're available and confirm it fixes things, I'll deploy the fix to rnaseq. |
Hi All,
I noticed that after the recent rebuild of the STAR recipe (#52349), building genome indices is dramatically slower than it was prior to the rebuild of the recipe.
I don't have a test that users version 2.7.11b before and after the recipe rebuild, but below are timing comparisons of the time it takes to build the human reference genome index (Gencode v47 of the GRCh38.p14 genome) with the current version and an older 2.7.11a version using the following command:
Timings for version 2.7.11b (compiled 2024-12-15T08:41:36):
Timings for version 2.7.11a (compiled 2023-09-15T02:58:53):
The biggest difference is that sorting the Suffix Array chunks takes more than 3 hours with the current version of the conda build, while in the earlier version it takes only about 20 minutes or so.
I noticed that while I requested 12 threads for both versions, the version using the latest recipe never uses more than a single a core. I suspect that this accounts for at least some of the slowdown.
Another difference I noticed is that the older version of the binary is dynamically linked to the relevant OpenMP library:
While the latest version is not:
All tests were done on an AMD Milan 96-core server with 256 GB of RAM.
Any idea what could have caused the change?
Thanks!
Dave
The text was updated successfully, but these errors were encountered: