GLPhase

This is a cuda-enabled fork of SNPTools impute.cpp. This code should scale linearly with sample size up to a small multiple of the number of CUDA cores (shaders) on the GPU being used.

GLPhase also has an option for incorporating pre-existing haplotypes into the phasing and imputation process. Release 1.4.13 was used with this option to impute genotypes for the first release of the Haplotype Reference Consortium.

Installation

Dependencies

GLPhase depends on libgsl, boost, and libz.

Compilation

# to compile all code (with all optimizations turned on)
make

# run the glphase executable to get a description of the
# glphase command line arguments
bin/glphase

# run regression tests (turns off optimizations)
make test

# run regression tests + longer integration tests
make disttest

# compile without CUDA support
# first clean the work dir
make clean
make NCUDA=1

# compile without CUDA or OMP support (on MacOSX for example)
make NCUDA=1 NOMP=1

Converting a VCF to SNPTools `.bin` format

A perl script at scripts/vcf2STBin.pl can be used to convert a VCF with PL format fields to a SNPTools conformant .bin file. For example, this command will convert a gzipped input VCF at input.vcf.gz into a SNPTools .bin file at input.bin:

scripts/vcf2STbin.pl input.vcf.gz

Running GLPhase (v1.4.13)

As a drop-in replacement for SNPTools/impute.cpp

GLPhase can be run as a CUDA-enabled drop-in replacement for SNPTools/impute.cpp. Assuming a SNPTools style .bin file with genotype likelihoods exists:

bin/glphase input.bin

Using pre-existing haplotypes

GLPhase can use pre-existing haplotypes to restrict the set of possible haplotypes from which the MH sampler may choose surrogate parent haplotypes. This approach is described in:

The Haplotype Reference Consortium. A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics (accepted) -- bioRxiv

This command phases and imputes haplotypes on a SNPTools .bin file using a genetic map and pre-existing haplotypes. The output file is a gzipped VCF file at output_base_name.vcf.gz.

glphase -B0 -i5 -m95 -q0 -Q1 -t2 -C100 -K200 \
    input.bin \
    -g genetic_map.txt \
    -h pre_existing_haplotypes.haps.gz \
    -s pre_existing_haplotypes.sample \
    -o output_base_name

The pre-existing haplotypes should be in WTCCC format, and a genetic map can be obtained from the Impute2 website.

Ligating haplotypes

It is recommended to ligate haplotypes using hapfuse. Before fusing, the output from GLPhase needs to be converted from gzipped VCF to something htslib can read. Here an example using bcftools:

zcat output_base_name.vcf.gz | bcftools -Ob -o \
    output_base_name.bcf

Name		Name	Last commit message	Last commit date
Latest commit History 3,604 Commits
bin		bin
samples		samples
scripts		scripts
src		src
t		t
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGES		CHANGES
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLPhase

Installation

Dependencies

Compilation

Converting a VCF to SNPTools `.bin` format

Running GLPhase (v1.4.13)

As a drop-in replacement for SNPTools/impute.cpp

Using pre-existing haplotypes

Ligating haplotypes

About

Releases

Packages

Languages

License

rwdavies/GLPhase

Folders and files

Latest commit

History

Repository files navigation

GLPhase

Installation

Dependencies

Compilation

Converting a VCF to SNPTools .bin format

Running GLPhase (v1.4.13)

As a drop-in replacement for SNPTools/impute.cpp

Using pre-existing haplotypes

Ligating haplotypes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Converting a VCF to SNPTools `.bin` format

Packages