Kids First DRC Consensus Calling Workflow

This workflow is used by the Kids First (KF) Data Resource Center (DRC) to create consensus calls from outputs generated by our somatic variant callers.

This workflow takes the protected vcf outputs from the Kids First DRC Somatic Workflow and creates protected and public consensus VCF and MAF files. Benchmarking of our SNV callers and consensus methods can be found here. The general outline is as follows:

Prep MNP Variants
- Strelka2 outputs multi-nucleotide polymorphisms (MNPs) as consecutive single-nucleotide polymorphisms
- In order preserve MNPs, we gather MNP calls from the other caller inputs, and search for evidence supporting these consecutive SNP calls as MNP candidates
- Once found, the Strelka2 SNP calls supporting a MNP are converted to a single MNP call
- This is done to preserve the predicted gene model as accurately as possible in our consensus calls
Consensus merge
- Calls are gathered from all four callers
- By default, calls with support from 2+ callers OR calls that are marked as HotSpotAllele in the INFO field are retained
- Retained calls then have their MQ and MQ0 values calculated from the input tumor cram
- GT fields are estimated as "majority rules," and when no majority exists, set as 0/1 by default
- AD, DP, and AF are calculated as the average value between callers
- ADR, DPR, and AFR fields are added as the range of values from the previous point, to give the observer a sense on confidence in the value
VEP Annotate Consensus (see Kids First DRC Somatic Variant Annotation Workflow for details )
Echtvar Annotation
- Additional annotation is performed augment VEP annotation
- While VEP does have extensive gnomad allele frequency annotation, it is limited to exome values. The added gnomad AF only resource we use augments this as an additional INFO/AF field to add WGS frequencies
Soft filter variants
- A soft filter is added based on criteria provided
- By default, we perform soft filtering as outlined in the KFDRC Annotation Subworkflow
VCF2MAF protected
- Here, for convenience of analysis we convert the resultant, soft-filtered VCF (AKA, "Protected VCF") into MAF format
Hard filter VCF
- The Protected VCF is hard filtered on PASS and HotSpotAllele for reasons outlined in the Soft filter variants step
- This VCF is known as the "Public VCF"
VCF2MAF public
Rename outputs

Workflow Description and KF Recommended Inputs

General workflow inputs, all file references can be obtained here:

indexed_reference_fasta: Homo_sapiens_assembly38.fasta
strelka2_vcf
mutect2_vcf
lancet_vcf
vardict_vcf
cram #Tumor cram recommended for MQ score calculation
input_tumor_name
input_normal_name
output_basename
tool_name: "consensus_somatic"
ncallers: # Optional number of callers required for consensus, recommend 2
consensus_ram: 3
annotation_zip: gnomad.v3.1.1.custom.echtvar.zip # population stats VCF for public filtering
vep_cache: homo_sapiens_merged_vep_105_indexed_GRCh38.tar.gz
gatk_filter_name: [NORM_DP_LOW, GNOMAD_AF_HIGH]
gatk_filter_expression: [ vc.getGenotype('insert_norm_sample_id_here').getDP() <= 7,gnomad_3_1_1_AF != '.' && gnomad_3_1_1_AF > 0.001 && && gnomad_3_1_1_FILTER=='PASS']
bcftools_public_filter: FILTER="PASS"|INFO/HotSpotAllele=1
retain_info: "gnomad_3_1_1_AC,gnomad_3_1_1_AN,gnomad_3_1_1_AF,gnomad_3_1_1_nhomalt,gnomad_3_1_1_AC_popmax,gnomad_3_1_1_AN_popmax,gnomad_3_1_1_AF_popmax,gnomad_3_1_1_nhomalt_popmax,gnomad_3_1_1_AC_controls_and_biobanks,gnomad_3_1_1_AN_controls_and_biobanks,gnomad_3_1_1_AF_controls_and_biobanks,gnomad_3_1_1_AF_non_cancer,gnomad_3_1_1_primate_ai_score,gnomad_3_1_1_splice_ai_consequence,gnomad_3_1_1_AF_non_cancer_afr,gnomad_3_1_1_AF_non_cancer_ami,gnomad_3_1_1_AF_non_cancer_asj,gnomad_3_1_1_AF_non_cancer_eas,gnomad_3_1_1_AF_non_cancer_fin,gnomad_3_1_1_AF_non_cancer_mid,gnomad_3_1_1_AF_non_cancer_nfe,gnomad_3_1_1_AF_non_cancer_oth,gnomad_3_1_1_AF_non_cancer_raw,gnomad_3_1_1_AF_non_cancer_sas,gnomad_3_1_1_AF_non_cancer_amr,gnomad_3_1_1_AF_non_cancer_popmax,gnomad_3_1_1_AF_non_cancer_all_popmax,gnomad_3_1_1_FILTER,MQ,MQ0,CAL,HotSpotAllele"
retain_fmt: # csv string with FORMAT fields that you want to keep
retain_ann: "HGVSg"
maf_center: "."
custom_enst: kf_isoform_override.tsv. As of VEP 104, several genes have had their canonical transcripts redefined. While the VCF will have all possible isoforms, this affects maf file output and may results in representative protein changes that defy historical expectations

Workflow outputs

annotated_protected_outputs: Array of files containing MAF format of PASS hits, PASS VCF with annotation pipeline soft FILTER-added values, and VCF index
annotated_public_outputs: Same as above, except MAF and VCF have had entries with soft FILTER values removed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kfdrc-consensus-calling.md

kfdrc-consensus-calling.md

Kids First DRC Consensus Calling Workflow

Workflow Description and KF Recommended Inputs

General workflow inputs, all file references can be obtained here:

Workflow outputs

Files

kfdrc-consensus-calling.md

Latest commit

History

kfdrc-consensus-calling.md

File metadata and controls

Kids First DRC Consensus Calling Workflow

Workflow Description and KF Recommended Inputs

General workflow inputs, all file references can be obtained here:

Workflow outputs