You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does exomiser support only certain vcf file format and above?
Exomiser v14.0.0 is not recognizing unstructured key meta-information line with key as “##META” as a valid header line.
I have some old vcfs that have 16 rows in header that start with “##META” and are unstructured meta information lines. However, this seems to be allowed in vcfv4.4 (page 5, section 1.4).
When I run this vcf through exomiser, I get the following error:
htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Invalid VCFSimpleHeaderLine: key=META name=null, for input source: file:///oak/stanford/groups/euan/UDN/gateway/data/UDN644400/WES/FromSequencingCore/WES_blood_hg19/Processed/UDN644400_family_merged.vcf.gz
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97) ~[htsjdk-3.0.5.jar:3.0.5]
at htsjdk.tribble.TabixFeatureReader.(TabixFeatureReader.java:82) ~[htsjdk-3.0.5.jar:3.0.5]
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117) ~[htsjdk-3.0.5.jar:3.0.5]
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81) ~[htsjdk-3.0.5.jar:3.0.5] ……..
……………………….
##fileformat=VCFv4.0
Lines with #META in header from my vcf:
##META='Cassandra_version=15.4.29'
##META='Pileup_File=/stornext/snfswgl/next-gen/Illumina/Instruments/D00143/170130_D00143_0967_BHF7KHBCXY/Results/Project_170130_D00143_0967_BHF7KHBCXY/Sample_HF7KHBCXY-2-ID10/SNP/7
##META='Annovar-refGene(hg19).Version=2013-08-23'
##META='Annovar-knownGene(hg19).Version=2013-08-23'
##META='Annovar-ensgene(hg19).Version=2013-08-23'
##META='Annovar-ensgene(GRCh37_MT).Version=2013-08-23'
##META='DbNSFP.Description=The dbNSFP is an integrated database of functional annotations from multiple sources for the comprehensive collection of human non-synonymous SNPs. v2.5.
##META='Hgmd.Database_version=null.Description=HGMD_PRO_2016.1.Downloaded=2016-07-8'
##META='1000 Genomes Phase 1.Description=SNPs Indels and SVs friom 1000 Genomes.Downloaded=2014-03-04'
##META='DbSNP.Description=NCBIs SNP database. v141 (GRCh37).Downloaded=2014-07-16'
##META='ARIC.Description=Allele freq from Aric cohort.Downloaded=2014-7-16'
##META='Mappability.Description=Encode 100bp alignability track. v1.Downloaded=2014-03-04'
##META='CgMaf.Description=Complete genomics variations from the reference genome identified across 54-genome subset of the 69 CG public genomes. Version 2.Downloaded=2014-03-04'
##META='ESP.Description=ESP5400 taken from 5400 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data. Version 1.Downloaded=2014-03-04'
##META='Encode.Description=Reglatory features from Encode. Taken from ensembl release 75.Downloaded=2014-03-04'
##META='Swissprot.Description=Uniprot gene annotation. Version 2014_02.Downloaded=2014-07-16'
##INFO=<ID=ReqIncl,Number=.,Type=String,Description="Site was required to be included in the VCF">
If I delete the rows with “##META” in the header of my vcf file, I can successfully run exomiser. However, I have several such vcf and do not want to create new vcfs with modified header. Is there a way to mitigate this?
Thanks,
Shruti
Shruti Marwaha, PhD.
Research Engineer,
Stanford Center for Undiagnosed Diseases
GREGoR (Genomics Research to Elucidate the Genetics of Rare disease) Stanford Site
Stanford University
The text was updated successfully, but these errors were encountered:
That's weird. Did it run OK on earlier versions? Under the hood Exomiser uses the HTSJDK, so support for whatever version of VCF is entirely down to that. I think it only supports up to 4.2.
Introduced ##META header lines for defining phenotype metadata
This is shown on page 7 section 1.4.8
1.4.8 Sample field format
It is possible to define sample to genome mappings as shown below:
##META=<ID=Assay,Type=String,Number=.,Values=[WholeGenome, Exome]>
##META=<ID=Disease,Type=String,Number=.,Values=[None, Cancer]>
##META=<ID=Ethnicity,Type=String,Number=.,Values=[AFR, CEU, ASN, MEX]>
##META=<ID=Tissue,Type=String,Number=.,Values=[Blood, Breast, Colon, Lung, ?]>
##SAMPLE=<ID=Sample1,Assay=WholeGenome,Ethnicity=AFR,Disease=None,Description="Patient germline genome from unaffected",DOI=url>
##SAMPLE=<ID=Sample2,Assay=Exome,Ethnicity=CEU,Disease=Cancer,Tissue=Breast,Description="European patient exome from breast cancer">
So, this would mean that the line should have the form ##META=<ID....>, but this is for VCFv4.3. Your old files are v4.0, which your file states it is, and should therefore be considered legal.
I think HTSJDK effectively supports VCFv4.3 read and VCFv4.2 writing, which would explain why the error is happening. It would be more useful if they could precisely support the version stated in the header or throw an error about the type not matching the version they do fully support. What they actually support isn't clearly defined outside of checking that the file starts with the header "##fileformat=VCFv4".
Hi there,
I have some old vcfs that have 16 rows in header that start with “##META” and are unstructured meta information lines. However, this seems to be allowed in vcfv4.4 (page 5, section 1.4).
When I run this vcf through exomiser, I get the following error:
htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Invalid VCFSimpleHeaderLine: key=META name=null, for input source: file:///oak/stanford/groups/euan/UDN/gateway/data/UDN644400/WES/FromSequencingCore/WES_blood_hg19/Processed/UDN644400_family_merged.vcf.gz
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97) ~[htsjdk-3.0.5.jar:3.0.5]
at htsjdk.tribble.TabixFeatureReader.(TabixFeatureReader.java:82) ~[htsjdk-3.0.5.jar:3.0.5]
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117) ~[htsjdk-3.0.5.jar:3.0.5]
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81) ~[htsjdk-3.0.5.jar:3.0.5] ……..
……………………….
##fileformat=VCFv4.0
Lines with #META in header from my vcf:
##META='Cassandra_version=15.4.29'
##META='Pileup_File=/stornext/snfswgl/next-gen/Illumina/Instruments/D00143/170130_D00143_0967_BHF7KHBCXY/Results/Project_170130_D00143_0967_BHF7KHBCXY/Sample_HF7KHBCXY-2-ID10/SNP/7
##META='Annovar-refGene(hg19).Version=2013-08-23'
##META='Annovar-knownGene(hg19).Version=2013-08-23'
##META='Annovar-ensgene(hg19).Version=2013-08-23'
##META='Annovar-ensgene(GRCh37_MT).Version=2013-08-23'
##META='DbNSFP.Description=The dbNSFP is an integrated database of functional annotations from multiple sources for the comprehensive collection of human non-synonymous SNPs. v2.5.
##META='Hgmd.Database_version=null.Description=HGMD_PRO_2016.1.Downloaded=2016-07-8'
##META='1000 Genomes Phase 1.Description=SNPs Indels and SVs friom 1000 Genomes.Downloaded=2014-03-04'
##META='DbSNP.Description=NCBIs SNP database. v141 (GRCh37).Downloaded=2014-07-16'
##META='ARIC.Description=Allele freq from Aric cohort.Downloaded=2014-7-16'
##META='Mappability.Description=Encode 100bp alignability track. v1.Downloaded=2014-03-04'
##META='CgMaf.Description=Complete genomics variations from the reference genome identified across 54-genome subset of the 69 CG public genomes. Version 2.Downloaded=2014-03-04'
##META='ESP.Description=ESP5400 taken from 5400 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data. Version 1.Downloaded=2014-03-04'
##META='Encode.Description=Reglatory features from Encode. Taken from ensembl release 75.Downloaded=2014-03-04'
##META='Swissprot.Description=Uniprot gene annotation. Version 2014_02.Downloaded=2014-07-16'
##INFO=<ID=ReqIncl,Number=.,Type=String,Description="Site was required to be included in the VCF">
If I delete the rows with “##META” in the header of my vcf file, I can successfully run exomiser. However, I have several such vcf and do not want to create new vcfs with modified header. Is there a way to mitigate this?
Thanks,
Shruti
Shruti Marwaha, PhD.
Research Engineer,
Stanford Center for Undiagnosed Diseases
GREGoR (Genomics Research to Elucidate the Genetics of Rare disease) Stanford Site
Stanford University
The text was updated successfully, but these errors were encountered: