You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically occasionally you might want to keep the ref/alt allele bases because it gives you more information about the surrounding bases. There are three options for the base normalization:
Strip off only the very first matching base from ref+alt (first)
Strip off all matching starting bases from ref+alt (all) -- this is the current behavior
Don't do any harmonization (do store all the appropriate fields as if it were normalized) (none)
This could be something that is relevant for both annotation-tools as well as genome-nexus-annotation-pipleine. The former does the vcf2maf conversion, but the latter also does harmonization of bases as well (the API returns harmonized version of chrom/pos/ref/alt). We should prolly add options to both those tools around this, so the annotation pipeline can have some option like this:
--strip-matching-bases {first,all,none}
And the annotation-tools could have something like:
--strip-matching-bases {first,all}
For annotation-tools it prolly doesn't make sense to have the "none" option since you are starting from the VCF file which by definition lists the additional base in ref and alt for indels
Note that the issue with using "first' is that if you run the MAF thru multiple times it will change every time until all bases are stripped off. This is not a big deal if you start from the source VCF, which is how it works for most internal pipelines at MSK, but it can be an issue when you use MAF as the source of truth file. Some way to capture immutable genomic locations was implemented previously but never merged so might be good to revisit that. Another option is to add some feature like that in the conversion script from VCF to MAF i.e. add the original VCF fields in the resulting MAF to make sure you don't lose the source of truth. Then whenever you re-annotate you use the source of truth fields rather than the potentially harmonized fields
Note: need to figure out what to do with matching ending bases
The text was updated successfully, but these errors were encountered:
This is related to this issue:
mskcc/vcf2maf#279
Basically occasionally you might want to keep the ref/alt allele bases because it gives you more information about the surrounding bases. There are three options for the base normalization:
This could be something that is relevant for both annotation-tools as well as genome-nexus-annotation-pipleine. The former does the vcf2maf conversion, but the latter also does harmonization of bases as well (the API returns harmonized version of chrom/pos/ref/alt). We should prolly add options to both those tools around this, so the annotation pipeline can have some option like this:
And the annotation-tools could have something like:
For annotation-tools it prolly doesn't make sense to have the "none" option since you are starting from the VCF file which by definition lists the additional base in ref and alt for indels
Note that the issue with using "first' is that if you run the MAF thru multiple times it will change every time until all bases are stripped off. This is not a big deal if you start from the source VCF, which is how it works for most internal pipelines at MSK, but it can be an issue when you use MAF as the source of truth file. Some way to capture immutable genomic locations was implemented previously but never merged so might be good to revisit that. Another option is to add some feature like that in the conversion script from VCF to MAF i.e. add the original VCF fields in the resulting MAF to make sure you don't lose the source of truth. Then whenever you re-annotate you use the source of truth fields rather than the potentially harmonized fields
Note: need to figure out what to do with matching ending bases
The text was updated successfully, but these errors were encountered: