Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SemiBin2 fails to generate bins for large assemblies #166

Closed
apcamargo opened this issue Jun 12, 2024 · 3 comments
Closed

SemiBin2 fails to generate bins for large assemblies #166

apcamargo opened this issue Jun 12, 2024 · 3 comments

Comments

@apcamargo
Copy link

apcamargo commented Jun 12, 2024

I'm trying to bin a couple of assemblies with SemiBin2 (v2.1.0), using the single_easy_bin command. For some assemblies, the job finishes early and no bins are generated:

[2024-06-11 10:41:43,492] INFO: Setting number of CPUs to 64
[2024-06-11 10:41:43,492] INFO: Binning for short_read
[2024-06-11 10:41:43,495] INFO: SemiBin will run in self supervised mode
[2024-06-11 10:41:49,295] INFO: Did not detect GPU, using CPU.
[2024-06-11 10:42:01,482] INFO: Generating training data...
[2024-06-11 10:49:31,759] INFO: Calculating coverage for every sample.
[2024-06-11 11:21:08,692] INFO: Processed: mapping_binning/B_1.bam
[2024-06-11 11:21:08,694] INFO: Processed: mapping_binning/B_2.bam
[2024-06-11 11:21:57,939] INFO: Processed: mapping_binning/B_3.bam
[2024-06-11 11:22:34,630] INFO: Start training from a single sample.
[2024-06-11 11:22:42,504] INFO: Training model...
[2024-06-11 12:13:52,844] INFO: Training finished.
[2024-06-11 12:13:52,909] INFO: Start binning.

It seems that this only affects large assemblies, as the runs for small assemblies finished without an issue, while the large assemblies failed. The contigs in our assemblies are ≥1 kb and we mapped the reads with strobealign and sorted them. Everything we are doing is standard, except that we are running SemiBin2 through Apptainer.

apptainer pull semibin.sif docker://quay.io/biocontainers/semibin:2.1.0--pyhdfd78af_0

apptainer exec semibin.sif SemiBin2 single_easy_bin \
    -i binning_assemblies/${SAMPLE}.fna.gz \
    -b mapping_binning/${SAMPLE}/*.bam \
    -o semibin2_output/${SAMPLE}

I suspect that this might be an issue with memory that happens because there's too much data. We will try to run it again setting --min-len 2500, assuming that it will ignore shorter contigs and generate network inputs only for the contigs longer than the threshold. I will update the issue if there are any developments.

This might be related to #150. I decided to open another issue because my jobs finished without generating any bins, instead of hanging indefinitely.

@apcamargo
Copy link
Author

Increasing --min-len did fix the issue. My guess is that this issue was due to lack of memory.

@psj1997 psj1997 closed this as completed Jul 8, 2024
@apcamargo
Copy link
Author

I realize this issue has been closed, but just wanted to suggest that it would be good if a error message is shown in cases like this. It might not be obvious for most users that this issue is due to lack of sufficient memory.

@luispedro
Copy link
Member

@apcamargo You're right (and it's the sort of thing that I would generally be saying myself)

I've now created a meta-issue about the memory requirements to aggregate this discussion: #171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants