layout | title |
---|---|
default |
Samtools - Documentation |
Documentation for BCFtools, SAMtools, and HTSlib's utilities is available
by using man command
on the command line.
The manual pages for several releases are also included below --- be sure
to consult the documentation for the release you are using.
Older manual pages are available for releases: 0.1.19, 1.0, 1.1, 1.2, 1.3, 1.3.1, 1.4, 1.4.1, 1.5, 1.6, 1.7, 1.8, 1.9, 1.10, 1.11, 1.12, 1.13, 1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.20
- Options for viewing files with & without headers
-
Installation
-
Calling
-
Tips and Tricks
SAMtools conforms to the specifications produced by the GA4GH File Formats working group. Details of the current specifications are available on the hts-specs page.
HTSlib also includes brief manual pages outlining aspects of several of
the more important file formats.
These are available via man format
on the command line
or here on the web site:
- faidx describes .fai FASTA index files
- sam lists the mandatory SAM fields and meanings of flag values
- vcf lists the mandatory VCF fields and common INFO tags
- htslib-s3-plugin describes the S3 plugin
- Duplicate marking.
-
Zlib implementations comparing samtools read and write speeds.
-
CRAM comparisons between version 2.1, version 3.0 and BAM formats.
A joint publication of SAMtools and BCFtools improvements over the last 12 years was published in 2021.
- Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H, Twelve years of SAMtools and BCFtools, GigaScience (2021) 10(2) giab008 [33590861]
The same journal issue also saw the HTSlib paper, describing the C library.
- Bonfield JK, Marshall J, Danecek P, Li H, Ohan V, Whitwham A, Keane T, Davies RM, HTSlib: C library for reading/writing high-throughput sequencing data, GigaScience (2021) 10(2) giab007 [33594436]
The introduction of the SAM/BAM format and the samtools
command line tool:
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9 [19505943]
Extension of the SAM/BAM format to support de novo assemblies:
- Cock PJA, Bonfield JK, Chevreux B, Li H, SAM/BAM format v1.5 extensions for de novo assemblies, bioRxiv (2015) 020024 [doi:10.1101/020024]
The introduction of the CRAM format:
- Hsi-Yang Fritz M, Leinonen R, Cochrane G, and Birney E, Efficient storage of high throughput DNA sequencing data using reference-based compression, Genome Research (2011) 21(5) 734-740. [21245279]
The introduction of the VCF format:
- Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group, The variant call format and VCFtools, Bioinformatics (2011) 27(15) 2156-8 [21653522]
The original mpileup calling algorithm plus mathematical notes (mpileup/bcftools call -c
):
- Li H, A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data, Bioinformatics (2011) 27(21) 2987-93. [21903627]
- Li H, Mathematical Notes on SAMtools Algorithms (2010) [link]
Mathematical notes for the updated multiallelic calling model (mpileup/bcftools call -m
):
- Danecek P, Schiffels S, and Durbin R, Multiallelic calling model in bcftools (-m) (2014) [link]
Hidden Markov model for detecting runs of homozygosity (bcftools roh
):
- Narasimhan V, Danecek P, Scally A, Xue Y, Tyler-Smith C, and Durbin R, BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data, Bioinformatics (2016) 32(11) 1749-51 [26826718]
Copy number variation/aneuploidy calling from microarray data (bcftools cnv/bcftools polysomy
):
- Danecek P, McCarthy SA, HipSci Consortium, and Durbin R, A Method for Checking Genomic Integrity in Cultured Cell Lines from SNP Genotyping Data, PLoS One (2016) 11(5) e0155014 [27176002]
Haplotype-aware calling of variant consequences (bcftools csq
):
- Danecek P, McCarthy SA, BCFtools/csq: Haplotype-aware variant consequences, Bioinformatics (2017) 33(13) 2037-39 [28205675]
Base alignment quality (BAQ) method improve SNP calling around INDELs:
- Li H, Improving SNP discovery by base alignment quality, Bioinformatics (2011) 27(8) 1157-8 [21320865]
Segregation based QC metric originally implemented in SGA:
- Durbin R, Segregation based metric for variant call QC (2014) [link]