Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAR index generation is extremely slow after recipe rebuild #53018

Closed
davidecarlson opened this issue Jan 2, 2025 · 9 comments · Fixed by #53155
Closed

STAR index generation is extremely slow after recipe rebuild #53018

davidecarlson opened this issue Jan 2, 2025 · 9 comments · Fixed by #53155

Comments

@davidecarlson
Copy link

Hi All,

I noticed that after the recent rebuild of the STAR recipe (#52349), building genome indices is dramatically slower than it was prior to the rebuild of the recipe.

I don't have a test that users version 2.7.11b before and after the recipe rebuild, but below are timing comparisons of the time it takes to build the human reference genome index (Gencode v47 of the GRCh38.p14 genome) with the current version and an older 2.7.11a version using the following command:

STAR --runMode genomeGenerate --genomeDir star_out --genomeFastaFiles GRCh38.p14.genome.fa --sjdbGTFfile gencode.v47.primary_assembly.annotation.gtf --runThreadN 12

Timings for version 2.7.11b (compiled 2024-12-15T08:41:36):

Jan 02 08:43:38 ... starting to generate Genome files
Jan 02 08:44:21 ..... processing annotations GTF
Jan 02 08:44:37 ..... finished GTF processing
Jan 02 08:44:43 ... starting to sort Suffix Array. This may take a long time...
Jan 02 08:44:58 ... sorting Suffix Array chunks and saving them to disk...
Jan 02 12:03:57 ... loading chunks from disk, packing SA...
Jan 02 12:05:10 ... finished generating suffix array
Jan 02 12:05:10 ... generating Suffix Array index
Jan 02 12:09:01 ... completed Suffix Array index
Jan 02 12:09:01   Finished preparing junctions
Jan 02 12:09:01 ..... inserting junctions into the genome indices
Jan 02 12:20:45   Finished SA search: number of new junctions=528709, old junctions=0
Jan 02 12:21:33   Finished sorting SA indicesL nInd=211483600
Jan 02 12:22:31   Finished inserting junction indices
Jan 02 12:23:04   Finished SAi
Jan 02 12:23:04 ..... finished inserting junctions into genome
Jan 02 12:23:04 ... writing Genome to disk ...
Jan 02 12:23:04 ... writing Suffix Array to disk ...
Jan 02 12:23:08 ... writing SAindex to disk
Jan 02 12:23:10 ..... finished successfully

Timings for version 2.7.11a (compiled 2023-09-15T02:58:53):

Jan 02 12:39:57 ... starting to generate Genome files
Jan 02 12:40:40 ..... processing annotations GTF
Jan 02 12:40:57 ..... finished GTF processing
Jan 02 12:41:03 ... starting to sort Suffix Array. This may take a long time...
Jan 02 12:41:17 ... sorting Suffix Array chunks and saving them to disk...
Jan 02 13:02:37 ... loading chunks from disk, packing SA...
Jan 02 13:03:51 ... finished generating suffix array
Jan 02 13:03:51 ... generating Suffix Array index
Jan 02 13:07:50 ... completed Suffix Array index
Jan 02 13:07:51   Finished preparing junctions
Jan 02 13:07:51 ..... inserting junctions into the genome indices
Jan 02 13:08:54   Finished SA search: number of new junctions=528709, old junctions=0
Jan 02 13:09:42   Finished sorting SA indicesL nInd=211483600
Jan 02 13:10:41   Finished inserting junction indices
Jan 02 13:11:15   Finished SAi
Jan 02 13:11:15 ..... finished inserting junctions into genome
Jan 02 13:11:15 ... writing Genome to disk ...
Jan 02 13:11:15 ... writing Suffix Array to disk ...
Jan 02 13:11:28 ... writing SAindex to disk
Jan 02 13:11:31 ..... finished successfully

The biggest difference is that sorting the Suffix Array chunks takes more than 3 hours with the current version of the conda build, while in the earlier version it takes only about 20 minutes or so.

I noticed that while I requested 12 threads for both versions, the version using the latest recipe never uses more than a single a core. I suspect that this accounts for at least some of the slowdown.

Another difference I noticed is that the older version of the binary is dynamically linked to the relevant OpenMP library:

$ ldd STAR-avx2
	linux-vdso.so.1 (0x000015555551c000)
	libhts.so.3 => /gpfs/software/Anaconda/envs/star-test2/bin/./../lib/libhts.so.3 (0x0000155555258000)
	libz.so.1 => /gpfs/software/Anaconda/envs/star-test2/bin/./../lib/libz.so.1 (0x000015555523d000)
	libstdc++.so.6 => /gpfs/software/Anaconda/envs/star-test2/bin/./../lib/libstdc++.so.6 (0x000015555505b000)
	libm.so.6 => /lib64/libm.so.6 (0x0000155554f6f000)
	libgomp.so.1 => /gpfs/software/Anaconda/envs/star-test2/bin/./../lib/libgomp.so.1 (0x0000155554f30000)
	libgcc_s.so.1 => /gpfs/software/Anaconda/envs/star-test2/bin/./../lib/libgcc_s.so.1 (0x0000155554f0f000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x0000155554f0a000)
	libc.so.6 => /lib64/libc.so.6 (0x0000155554d01000)
	libdeflate.so.0 => /gpfs/software/Anaconda/envs/star-test2/bin/./../lib/./libdeflate.so.0 (0x0000155554ce9000)
	liblzma.so.5 => /gpfs/software/Anaconda/envs/star-test2/bin/./../lib/./liblzma.so.5 (0x0000155554cb9000)
	libbz2.so.1.0 => /gpfs/software/Anaconda/envs/star-test2/bin/./../lib/./libbz2.so.1.0 (0x0000155554ca3000)
	libdl.so.2 => /lib64/libdl.so.2 (0x0000155554c9e000)
	librt.so.1 => /lib64/librt.so.1 (0x0000155554c99000)
	/lib64/ld-linux-x86-64.so.2 (0x000015555551e000)

While the latest version is not:

ldd STAR-avx2
	linux-vdso.so.1 (0x000015555551c000)
	libhts.so.3 => /gpfs/software/Anaconda/envs/star_test/bin/./../lib/libhts.so.3 (0x0000155555243000)
	libz.so.1 => /gpfs/software/Anaconda/envs/star_test/bin/./../lib/libz.so.1 (0x0000155555228000)
	libstdc++.so.6 => /gpfs/software/Anaconda/envs/star_test/bin/./../lib/libstdc++.so.6 (0x0000155555046000)
	libm.so.6 => /lib64/libm.so.6 (0x0000155554f5a000)
	libgcc_s.so.1 => /gpfs/software/Anaconda/envs/star_test/bin/./../lib/libgcc_s.so.1 (0x0000155554f3b000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x0000155554f34000)
	libc.so.6 => /lib64/libc.so.6 (0x0000155554d2b000)
	libdeflate.so.0 => /gpfs/software/Anaconda/envs/star_test/bin/./../lib/./libdeflate.so.0 (0x0000155554d13000)
	liblzma.so.5 => /gpfs/software/Anaconda/envs/star_test/bin/./../lib/./liblzma.so.5 (0x0000155554ce3000)
	libbz2.so.1.0 => /gpfs/software/Anaconda/envs/star_test/bin/./../lib/./libbz2.so.1.0 (0x0000155554ccf000)
	libdl.so.2 => /lib64/libdl.so.2 (0x0000155554cc8000)
	librt.so.1 => /lib64/librt.so.1 (0x0000155554cc3000)
	/lib64/ld-linux-x86-64.so.2 (0x000015555551e000)

All tests were done on an AMD Milan 96-core server with 256 GB of RAM.

Any idea what could have caused the change?

Thanks!
Dave

@pinin4fjords
Copy link
Contributor

@mencian , #52349 was your work, would you have any input?

@pinin4fjords
Copy link
Contributor

I notice weirdly that the Conda channel contains a newer build than the current recipe on this repo (4 vs 3): https://anaconda.org/bioconda/star/files

@davidecarlson
Copy link
Author

I've just tried using the newer version (h5ca1c30_4) and found that it has the same problems that I reported previously. Namely, that the Suffix Array sorting step does not use multiple threads even when they are requested, leading to a very long run time for building the index.

However, I've also now just tested the build from ~ 6 months ago (star_2.7.11b-h43eeafb_2) and found that these issues are not present in this version. Here are the timings for this version:

$ singularity exec ../star_2.7.11b--h43eeafb_2.sif STAR --runMode genomeGenerate --genomeDir star_out_vh5ca1c30_4 --genomeFastaFiles GRCh38.p14.genome.fa --sjdbGTFfile gencode.v47.primary_assembly.annotation.gtf --runThreadN 12
	/usr/local/bin/STAR-avx2 --runMode genomeGenerate --genomeDir star_out_vh5ca1c30_4 --genomeFastaFiles GRCh38.p14.genome.fa --sjdbGTFfile gencode.v47.primary_assembly.annotation.gtf --runThreadN 12
	STAR version: 2.7.11b   compiled: 2024-07-03T14:39:20+0000 :/opt/conda/conda-bld/star_1720017372352/work/source
Jan 06 14:40:47 ..... started STAR run
Jan 06 14:40:47 ... starting to generate Genome files
Jan 06 14:41:34 ..... processing annotations GTF
Jan 06 14:42:00 ... starting to sort Suffix Array. This may take a long time...
Jan 06 14:42:14 ... sorting Suffix Array chunks and saving them to disk...
Jan 06 15:05:34 ... loading chunks from disk, packing SA...
Jan 06 15:06:48 ... finished generating suffix array
Jan 06 15:06:48 ... generating Suffix Array index
Jan 06 15:10:53 ... completed Suffix Array index
Jan 06 15:10:54 ..... inserting junctions into the genome indices
Jan 06 15:14:32 ... writing Genome to disk ...
Jan 06 15:14:32 ... writing Suffix Array to disk ...
Jan 06 15:14:40 ... writing SAindex to disk
Jan 06 15:14:43 ..... finished successfully

So performance for this version is similar to what was seen in version 2.7.11a.

Overall, it does seem like something related to the recipe rebuild has changed (turned off?) the multi-threading capabilities, at least for the genomeGenerate step.

@pinin4fjords
Copy link
Contributor

pinin4fjords commented Jan 8, 2025

I'm currently failing to reproduce this with Docker (Linux image on my Mac), at least with a smaller testing genome. I wonder if it might be Singularity-specific. Or maybe the genome is just not big enough....

2.7.10b, build 0, 4 cores, 2:04.28 total:

time docker run --rm -it \
    -v "$(pwd)":/data \
    quay.io/biocontainers/star:2.7.10b--h9ee0642_0 \
    STAR --runMode genomeGenerate \
         --genomeDir /data/dmel_index \
         --genomeFastaFiles /data/Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa \
         --runThreadN 4
Unable to find image 'quay.io/biocontainers/star:2.7.10b--h9ee0642_0' locally
2.7.10b--h9ee0642_0: Pulling from biocontainers/star
73349e34840e: Already exists 
acab339ca1e8: Already exists 
006dd9638427: Pull complete 
Digest: sha256:5fd11e81f508a3513b7274fa84204c9f1baff31c387c8b95b1687ab91c9abe7d
Status: Downloaded newer image for quay.io/biocontainers/star:2.7.10b--h9ee0642_0
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
	/usr/local/bin/STAR --runMode genomeGenerate --genomeDir /data/dmel_index --genomeFastaFiles /data/Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa --runThreadN 4
	STAR version: 2.7.10b   compiled: 2022-11-01T09:53:26-04:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
Jan 08 18:39:16 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from ./Log.out into /data/dmel_index/Log.out. Will keep ./Log.out

Jan 08 18:39:16 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=143726002, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 12
Jan 08 18:39:20 ... starting to sort Suffix Array. This may take a long time...
Jan 08 18:39:22 ... sorting Suffix Array chunks and saving them to disk...
Jan 08 18:40:13 ... loading chunks from disk, packing SA...
Jan 08 18:40:18 ... finished generating suffix array
Jan 08 18:40:18 ... generating Suffix Array index
Jan 08 18:41:04 ... completed Suffix Array index
Jan 08 18:41:04 ... writing Genome to disk ...
Jan 08 18:41:05 ... writing Suffix Array to disk ...
Jan 08 18:41:07 ... writing SAindex to disk
Jan 08 18:41:10 ..... finished successfully
docker run --rm -it -v "$(pwd)":/data  STAR --runMode genomeGenerate      4  0.02s user 0.03s system 0% cpu **2:04.28 total**

2.7.10b, build 0, 8 cores, 1:41.17 total:

time docker run --rm -it \
    -v "$(pwd)":/data \
    quay.io/biocontainers/star:2.7.10b--h9ee0642_0 \
    STAR --runMode genomeGenerate \
         --genomeDir /data/dmel_index \
         --genomeFastaFiles /data/Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa \
         --runThreadN 8
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
	/usr/local/bin/STAR --runMode genomeGenerate --genomeDir /data/dmel_index --genomeFastaFiles /data/Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa --runThreadN 8
	STAR version: 2.7.10b   compiled: 2022-11-01T09:53:26-04:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
Jan 08 18:44:28 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from ./Log.out into /data/dmel_index/Log.out. Will keep ./Log.out

Jan 08 18:44:28 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=143726002, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 12
Jan 08 18:44:32 ... starting to sort Suffix Array. This may take a long time...
Jan 08 18:44:34 ... sorting Suffix Array chunks and saving them to disk...
Jan 08 18:45:14 ... loading chunks from disk, packing SA...
Jan 08 18:45:19 ... finished generating suffix array
Jan 08 18:45:19 ... generating Suffix Array index
Jan 08 18:46:04 ... completed Suffix Array index
Jan 08 18:46:04 ... writing Genome to disk ...
Jan 08 18:46:05 ... writing Suffix Array to disk ...
Jan 08 18:46:06 ... writing SAindex to disk
Jan 08 18:46:08 ..... finished successfully
docker run --rm -it -v "$(pwd)":/data  STAR --runMode genomeGenerate      8  0.02s user 0.02s system 0% cpu **1:41.17 total**

2.7.11b, build 4, 4 cores, 2:05.42 total:

time docker run --rm -it \
    -v "$(pwd)":/data \
    quay.io/biocontainers/star:2.7.11b--h5ca1c30_4 \
    STAR --runMode genomeGenerate \
         --genomeDir /data/dmel_index \
         --genomeFastaFiles /data/Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa \
         --runThreadN 4
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
	/usr/local/bin/STAR-plain --runMode genomeGenerate --genomeDir /data/dmel_index --genomeFastaFiles /data/Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa --runThreadN 4
	STAR version: 2.7.11b   compiled: 2024-12-15T08:50:10+0000 :/opt/conda/conda-bld/star_1734251957083/work/source
Jan 08 18:48:40 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from ./Log.out into /data/dmel_index/Log.out. Will keep ./Log.out

Jan 08 18:48:40 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=143726002, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 12
Jan 08 18:48:52 ... starting to sort Suffix Array. This may take a long time...
Jan 08 18:48:53 ... sorting Suffix Array chunks and saving them to disk...
Jan 08 18:49:48 ... loading chunks from disk, packing SA...
Jan 08 18:49:52 ... finished generating suffix array
Jan 08 18:49:52 ... generating Suffix Array index
Jan 08 18:50:39 ... completed Suffix Array index
Jan 08 18:50:39 ... writing Genome to disk ...
Jan 08 18:50:41 ... writing Suffix Array to disk ...
Jan 08 18:50:42 ... writing SAindex to disk
Jan 08 18:50:45 ..... finished successfully
docker run --rm -it -v "$(pwd)":/data  STAR --runMode genomeGenerate      4  0.02s user 0.02s system 0% cpu **2:05.42 total**

2.7.11b, build 4, 8 cores, 1:55.76 total:

time docker run --rm -it \
    -v "$(pwd)":/data \
    quay.io/biocontainers/star:2.7.11b--h5ca1c30_4 \
    STAR --runMode genomeGenerate \
         --genomeDir /data/dmel_index \
         --genomeFastaFiles /data/Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa \
         --runThreadN 8
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
	/usr/local/bin/STAR-plain --runMode genomeGenerate --genomeDir /data/dmel_index --genomeFastaFiles /data/Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa --runThreadN 8
	STAR version: 2.7.11b   compiled: 2024-12-15T08:50:10+0000 :/opt/conda/conda-bld/star_1734251957083/work/source
Jan 08 18:53:21 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from ./Log.out into /data/dmel_index/Log.out. Will keep ./Log.out

Jan 08 18:53:21 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=143726002, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 12
Jan 08 18:53:32 ... starting to sort Suffix Array. This may take a long time...
Jan 08 18:53:34 ... sorting Suffix Array chunks and saving them to disk...
Jan 08 18:54:19 ... loading chunks from disk, packing SA...
Jan 08 18:54:24 ... finished generating suffix array
Jan 08 18:54:24 ... generating Suffix Array index
Jan 08 18:55:10 ... completed Suffix Array index
Jan 08 18:55:10 ... writing Genome to disk ...
Jan 08 18:55:11 ... writing Suffix Array to disk ...
Jan 08 18:55:13 ... writing SAindex to disk
Jan 08 18:55:16 ..... finished successfully
docker run --rm -it -v "$(pwd)":/data  STAR --runMode genomeGenerate      8  0.02s user 0.02s system 0% cpu **1:55.76 total**

@pabloaledo
Copy link
Contributor

adding -f openmp in here should at least enable openmp and (hopefully) solve the issue?.

@pinin4fjords
Copy link
Contributor

pinin4fjords commented Jan 9, 2025

adding -f openmp in here should at least enable openmp and (hopefully) solve the issue?.

Thanks! I'll try #53155

@davidecarlson
Copy link
Author

Thanks, @pinin4fjords and @pabloaledo! I'm traveling for the next couple of days, but if any testing is still needed once I get back, I'm happy to help with it.

@pinin4fjords
Copy link
Contributor

I think that does it @pabloaledo !

=================== SUMMARY ===================
CONTAINER,REQUESTED_THREADS,STAR_THREADS,SYS_MAX_THREADS,REAL_TIME
quay.io/biocontainers/star:2.7.10b--h9ee0642_0,4,NA,4,177
quay.io/biocontainers/star:2.7.10b--h9ee0642_0,8,NA,8,175
quay.io/biocontainers/star:2.7.11b--h43eeafb_2,4,NA,4,189
quay.io/biocontainers/star:2.7.11b--h43eeafb_2,8,NA,8,181
quay.io/biocontainers/star:2.7.11b--h5ca1c30_4,4,NA,1,344
quay.io/biocontainers/star:2.7.11b--h5ca1c30_4,8,NA,1,328
quay.io/biocontainers/star:2.7.11b--h5ca1c30_5,4,NA,4,199 << PR artifact with open mp
quay.io/biocontainers/star:2.7.11b--h5ca1c30_5,8,NA,8,194 << PR artifact with open mp
===============================================

I monitored actual core usage, and that change seems to restore things

@pinin4fjords
Copy link
Contributor

@davidecarlson if you could verify the latest build whenever you're available and confirm it fixes things, I'll deploy the fix to rnaseq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants