Skip to content

Latest commit

 

History

History
57 lines (52 loc) · 2.1 KB

README.md

File metadata and controls

57 lines (52 loc) · 2.1 KB

VIS: HGVS variant interpretation using SPLICEAI

Current version does not support inserts and is poorly tested for duplications!
Written and tested in Python 3.7.9.

This script allows direct HGVS mutation variant prediction using SpliceAI.
This entire script is based on SpliceAI. The code can be found on their GitHub: https://github.com/Illumina/SpliceAI

Prerequisites

Genome Annotation
This script requires genome annotation for the genome the user provides. These can be downloaded here:
hg38: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
hg19: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz


Libraries
This script requires some libraries to run. These can be found on their respective GitHub pages:
HGVS: https://github.com/biocommons/hgvs
pyfaidx: https://github.com/mdshw5/pyfaidx
Ensembl Rest: https://github.com/gawbul/pyEnsemblRest
Pandas: https://github.com/pandas-dev/pandas

Alternatively, these can be installed directly via:

pip install hgvs
pip install pyfaidx
pip install pyensemblrest
pip install pandas

Usage

The script can be run directly from the command line:

python3 HGVSpredict.py -I input -O output -G genome -P preferred_transcript (optional)

Input can be any regular text format readable by python 3.7.9 (.txt for example), with variants separated by newline characters. Encoding does not matter.
The output is in .csv format. It is therefore advised to put .csv in the output file name.

Code flow

  • Check arguments
  • Validate variants with preferred transcripts
  • Per-variant runs:
    • Conversion from HGVS to genomic variant
    • Locating the mutation within the gene
    • Get SpliceAI scores
    • Predict transcript effect based on location and scores



RUG logo
Kyran Wissink
Student Biomedical Sciences
University of Groningen
github.com/KyranWissink
[email protected]