AlphaFold-based pipeline for prediction of peptide-MHC structures.

Please cite as:
V. Mikhaylov, A. J. Levine, "Accurate modeling of peptide-MHC structures with AlphaFold,"
bioRxiv 2023.03.06.531396; doi: https://doi.org/10.1101/2023.03.06.531396

Download and install

Download AlphaFold and its parameters. (This pipeline was tested with AlphaFold 2.1.0.) No need to download PDB and the protein databases.
Clone this repository:

git clone https://github.com/v-mikhaylov/tfold-release.git

Enter the tfold-release folder.

Install the dependencies. With conda, you should be able to create an environment that would work for both TFold pipeline and AlphaFold:

conda env create --file tfold-env.yml
conda activate tfold-env

(This environment for running AlphaFold outside of Docker is due to https://github.com/kalininalab/alphafold_non_docker.)

Download the data file data.tar.gz with templates and other information from Zenodo, https://zenodo.org/record/7803946. This can be done in web browser or using zenodo-get:

pip install zenodo-get
zenodo_get 7803946

Unpack data.tar.gz into the tfold-release folder. This will create a folder data.

Set paths to a couple folders in tfold/config.py and tfold_patch/tfold_config.py.
That should be it.

Model pMHCs

Prepare an input file. An example can be found in data/examples/sample.csv. It should be a .csv file with a header and with columns pep and MHC allele or MHC sequence.

The format for MHC alleles is SpeciesId-Locus*Allele for class I and SpeciesId-LocusA*AlleleA/LocusB*AlleleB for class II. Some examples: HLA-A*02:01, H2-K*d, HLA-DRA*01:01/DRB4*01:144, H2-IEA*d/IEB*k.
For class II, the MHC sequence should contain alpha-chain and beta-chain sequences separated by '/'.
For more details and options, please see details.ipynb.

Activate conda environment:

conda activate tfold-env

Choose an output folder $working_dir and run the script as follows:

model_pmhcs.sh $input_file $working_dir [-d YYYY-MM-DD]

Here [-d YYYY-MM-DD] is an optional cutoff on the allowed template dates.

The models will be saved in $working_dir/outputs$ , with a separate folder for each pMHC. There will also be a summary .csv file in $working_dir with information about the best models (by predicted score).

Details

The notebook details.ipynb contains some additional details on the pipeline that can be useful e.g. for splitting the jobs over multiple GPUs. It also contains a description of our cleaned pMHC and TCR structure database and associated tools.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
tfold		tfold
tfold_patch		tfold_patch
.gitignore		.gitignore
LICENSE		LICENSE
collect_results.py		collect_results.py
details.ipynb		details.ipynb
model_pmhcs.py		model_pmhcs.py
model_pmhcs.sh		model_pmhcs.sh
readme.md		readme.md
tfold-env.yml		tfold-env.yml
tfold_msa_tools.py		tfold_msa_tools.py
tfold_run_alphafold.py		tfold_run_alphafold.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaFold-based pipeline for prediction of peptide-MHC structures.

Download and install

Model pMHCs

Details

About

Releases

Packages

Languages

License

SyntenyBio/tfold-release

Folders and files

Latest commit

History

Repository files navigation

AlphaFold-based pipeline for prediction of peptide-MHC structures.

Download and install

Model pMHCs

Details

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages