-
Notifications
You must be signed in to change notification settings - Fork 0
Simulating long reads
Aaron McKenna edited this page Aug 1, 2024
·
8 revisions
The underlying sequence and barcodes are combined in: simulate_reads_2024_07_31.ipynb
Barcodes from freebarcodes, using the 15 nucleotide barcodes with two-base error correction.
We generally want full-length sequences, as this is what we get from amplicon sequencing:
badread simulate --length 15000,100 --reference test_basic_long_read_no_error.fa --quantity 50x | gzip > reads.fastq.gz
results:
Generating fragment lengths from a gamma distribution:
mean = 15000 bp parameters:
stdev = 13000 bp k (shape) = 1.3314e+00
N50 = 22622 bp theta (scale) = 1.1267e+04
│ ▖▌▌▌▌▌▖▖▖
│ ▖▌▌▌▌▌▌▌▌▌▌▌▖▖
f │ ▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖
r │▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖
a │▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖
g │▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖▖
s │▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖▖▖▖
│▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖▖▖▖▖▖▖▖▖▖▖▖▖▖▖▖▖
├─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ ▖▖▖▌▌▌▌▌▌▌▌▌▖▖▖▖▖
│ ▖▖▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖
b │ ▖▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖▖
a │ ▖▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖
s │ ▖▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖▖
e │ ▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖▖▖
s │ ▖▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖▖▖▖▖▖
│ ▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▖▖▖▖▖▖▖
├─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
0 6800 13600 20400 27200 34000 40800 47600 54400 61200 68000
Generating read identities from a beta distribution:
mean = 95% shape parameters:
max = 99% alpha = 5.7384e+01
stdev = 2.5% beta = 2.4162e+00
│ ▖▌▌▖
│ ▌▌▌▌▌
│ ▖▌▌▌▌▌▌▌
│ ▖▌▌▌▌▌▌▌▌
│ ▌▌▌▌▌▌▌▌▌▌▌
│ ▖▌▌▌▌▌▌▌▌▌▌▌▌
│ ▖▖▌▌▌▌▌▌▌▌▌▌▌▌▌▌
│ ▖▖▖▖▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌
├─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
50 55 60 65 70 75 80 85 90 95 100
minimap2 -a minimap_alignment_reference_2024_07_31.fa reads.fastq.gz > 2024_07_31_aligned_simulated_reads.sam
samtools view -b -h 2024_07_31_aligned_simulated_reads.sam > 2024_07_31_aligned_simulated_reads.bam
samtools sort 2024_07_31_aligned_simulated_reads.bam > 2024_07_31_aligned_simulated_reads_sorted.bam
samtools index 2024_07_31_aligned_simulated_reads_sorted.bam
After running ipython notebook:
gzip simulation/test_basic_long_read_no_error.fq
minimap2 -a minimap_alignment_reference_2024_07_31.fa test_basic_long_read_no_error.fq.gz > 2024_07_31_aligned_perfect_reads.sam
samtools view -b -h 2024_07_31_aligned_perfect_reads.sam > 2024_07_31_aligned_perfect_reads.bam
samtools sort 2024_07_31_aligned_perfect_reads.bam > 2024_07_31_aligned_perfect_reads_sorted.bam
samtools index 2024_07_31_aligned_perfect_reads_sorted.bam
After running ipython notebook:
gzip simulation/test_basic_long_read_no_error_3code.fq
minimap2 -a minimap_alignment_reference_2024_07_31.fa test_basic_long_read_no_error_3code.fq.gz > 2024_07_31_aligned_perfect_reads_tripple.sam
samtools view -b -h 2024_07_31_aligned_perfect_reads_tripple.sam > 2024_07_31_aligned_perfect_reads_tripple.bam
samtools sort 2024_07_31_aligned_perfect_reads_tripple.bam > 2024_07_31_aligned_perfect_reads_tripple_sorted.bam
samtools index 2024_07_31_aligned_perfect_reads_tripple_sorted.bam