Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D-vac pipeline #163

Open
4 of 5 tasks
gcroci2 opened this issue Sep 26, 2023 · 2 comments
Open
4 of 5 tasks

3D-vac pipeline #163

gcroci2 opened this issue Sep 26, 2023 · 2 comments
Assignees
Labels
docs Improvements or additions to documentation GNNs pMHC-I production

Comments

@gcroci2
Copy link
Collaborator

gcroci2 commented Sep 26, 2023

We want to insert ready-to-use notebooks to perform the entire 3D-Vac pipeline; in particular, we can develop two notebooks:

  • 1. 3D modeling notebook. Given a peptide-protein complex sequence as input (or multiple), create a 3D structure/s model/s using PANDORA, and output a PDB file/s. @DarioMarzella
  • 2. Featurization and prediction script. @gcroci2
    • 2.1 Use deeprank2 to featurize the structure/s and save it/them into an HDF5 file/s.
    • 2.2 Run a pre-trained GNN model on the featurized data. Side note: we need to re-train the GNN architecture on all the data we have available (~100k), using the best-selected parameters as concluded in issue Finalize GNNs for the scientific paper #151.
    • 2.3 Print the predictions and communicate the threshold from the shuffled config with validation
@gcroci2 gcroci2 added docs Improvements or additions to documentation pMHC-I GNNs production labels Sep 26, 2023
@gcroci2
Copy link
Collaborator Author

gcroci2 commented Oct 25, 2023

The DeepRank2 part (data processing + testing) is in the script src/4_train_models/DeepRank2/GNN/pre-trained_testing.py.

The threshold selected by maximizing MCC on the validation set of the shuffled data configuration is 0.5151 (AUC on test 0.8565, MCC on test 0.5582, from exp_100k_std_transf_bs64_naivegnn1_wloss_0_230607 as described in #151).

Any suggestions for improvement? @LilySnow, @DarioMarzella. Otherwise, I am done with the DeepRank2 part.

@gcroci2
Copy link
Collaborator Author

gcroci2 commented Jan 12, 2024

The DeepRank2 part (data processing + testing) is in the script src/4_train_models/DeepRank2/GNN/pre-trained_testing.py.

  • Note that the script for now runs only with this branch of DeepRank2, since the edits are still under review in PR515 (but will be merged soon).

The threshold selected by maximizing MCC on the validation set of the shuffled data configuration is 0.5151 (AUC on test 0.8565, MCC on test 0.5582, from exp_100k_std_transf_bs64_naivegnn1_wloss_0_230607 as described in #151).

Any suggestions for improvement? @LilySnow, @DarioMarzella. Otherwise, I am done with the DeepRank2 part.

Now the relevant scripts in this regard are in src/6_test_cases/; @DarioMarzella will finalize further the part for generating the PDB files (now in src/6_test_cases/generate_pdb_test_case.py)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation GNNs pMHC-I production
Projects
Status: In progress
Development

No branches or pull requests

2 participants