Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve Trainer and DeeprankDataset logic for production testing #515

Merged
merged 63 commits into from
Jan 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
a5ad04a
add relevant attributes to the Trainer and improve their logic
gcroci2 Oct 19, 2023
a094e2f
add tests for testing when no test is provided and when no mmodel is …
gcroci2 Oct 19, 2023
a6209cb
fix test_optim
gcroci2 Oct 20, 2023
d562513
change dataset_train to train_data and update docs, for later functio…
gcroci2 Oct 20, 2023
fa45c60
change dataset_train to train_data in all relevant scripts
gcroci2 Oct 20, 2023
fe18d3d
improve logic for handling both a pre-trained model and a dataset_tra…
gcroci2 Oct 20, 2023
f9b82de
add logic for handling the pre-trained model as input in DeeprankData…
gcroci2 Oct 21, 2023
c21fe2b
add tests for catching uncorrect pre-trained models
gcroci2 Oct 21, 2023
fc5f6af
add folder for pretrained models in tests
gcroci2 Oct 21, 2023
0a068b5
update data paths in test_dataset.py
gcroci2 Oct 21, 2023
bcc5138
implement inheritance in dataset from a pre-trained model
gcroci2 Oct 23, 2023
5280ece
add tests for inheritance from pre-trained model
gcroci2 Oct 23, 2023
7fcc033
add classes_to_index as inherited param and to the pre-trained model
gcroci2 Oct 23, 2023
cc3d79a
add classes_to_index to the tests' models
gcroci2 Oct 23, 2023
6b037ad
add classes_to_index's check to the tests
gcroci2 Oct 23, 2023
8e2496c
save features_transform's lambdas as strings and load them as functio…
gcroci2 Oct 23, 2023
11f826e
update pre-trained models
gcroci2 Oct 23, 2023
0fcea3a
add trainer tests for testing without defining the dataset_train
gcroci2 Oct 23, 2023
f8e6c57
fix test_dataset.py for the newly defined features_transform in the s…
gcroci2 Oct 23, 2023
b8e2348
remove dill usage since we're not saving lambda functions anymore (di…
gcroci2 Oct 23, 2023
060e6bf
improve initialization order in the Trainer class
gcroci2 Oct 23, 2023
1d3c0f5
fix datasets for cases in which there is a target attribute but no ta…
gcroci2 Oct 24, 2023
4d588e1
fix Trainer _eval method for cases in which there is a target attribu…
gcroci2 Oct 24, 2023
147a16a
add logic for checking the target settings in the init, and fix _filt…
gcroci2 Oct 24, 2023
08c90e0
add tests for cases with no target and improve target's filter tests
gcroci2 Oct 24, 2023
3b5d746
fix tests according to the new target's checks
gcroci2 Oct 24, 2023
a5fd524
add hdf5 file with no target
gcroci2 Oct 24, 2023
d728b2a
add new file with no target
gcroci2 Oct 24, 2023
0d35844
Merge branch '510_testing_pre_trained_gcroci2' of https://github.com/…
gcroci2 Oct 24, 2023
32f3e25
add test for verifying that the testing output is correct when target…
gcroci2 Oct 24, 2023
38f26c8
fix prospector errors
gcroci2 Oct 24, 2023
eee5512
fix build with python 3.11
gcroci2 Oct 24, 2023
d842fc7
try to fix geometric installation using pip instead of conda
gcroci2 Oct 24, 2023
e1265ae
fix prospector error
gcroci2 Oct 24, 2023
10a5795
add docs for testing a pre-trained model
gcroci2 Oct 25, 2023
bf1c8a3
fix bug in trainer for testing cases with no target
gcroci2 Oct 30, 2023
623ef38
Merge branch 'dev' into 510_testing_pre_trained_gcroci2
gcroci2 Nov 17, 2023
4a2b32a
uniform use_tqdm parameter
gcroci2 Nov 17, 2023
d91377f
uniform root_directory_path parameter
gcroci2 Nov 17, 2023
96c9b94
uniform parameters' order in dataset.py
gcroci2 Nov 17, 2023
86b15fb
put redundant code for inheriting training info in the parent class
gcroci2 Nov 17, 2023
4164c5b
uniform grp variable
gcroci2 Nov 17, 2023
6a6a305
make inherit_params an attribute of the dataset classes
gcroci2 Nov 17, 2023
79b00a2
improve testing new data part
gcroci2 Nov 21, 2023
ef57a25
add testing new data in the readme
gcroci2 Nov 21, 2023
d07b8c3
uniform pretrained_model_path to pretrained_model
gcroci2 Nov 21, 2023
55c061c
make error msg about the dataset clearer
gcroci2 Nov 21, 2023
2584917
use None instead of 'None' in the trainer _eval and _epoch methods
gcroci2 Nov 21, 2023
0906b2d
fix prospector errors
gcroci2 Nov 21, 2023
ae7fe4d
Merge branch 'dev' into 510_testing_pre_trained_gcroci2
gcroci2 Nov 21, 2023
836c5b2
move features checking after inheritance
gcroci2 Nov 22, 2023
b364eba
fix prospector errors
gcroci2 Nov 22, 2023
780f2b9
try to fix optimizer error in py3.11
gcroci2 Nov 22, 2023
2935414
Update docs/getstarted.md
gcroci2 Nov 24, 2023
2e4ec65
remove train parameter from dataset.py
gcroci2 Jan 3, 2024
d1da845
remove train refs from trainer.py
gcroci2 Jan 3, 2024
0bb3023
update tests with the new train_data logic
gcroci2 Jan 3, 2024
0255dcf
update docs
gcroci2 Jan 3, 2024
f4ba712
update tutorials
gcroci2 Jan 3, 2024
bff8a3d
change train_data to train_source
gcroci2 Jan 3, 2024
8c3c5d1
add comment for clarifying tests
gcroci2 Jan 3, 2024
3d31c59
merge with dev
gcroci2 Jan 3, 2024
7f33a68
fix integration test
gcroci2 Jan 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/actions/install-python-and-package/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ runs:
conda install -c bioconda msms
## PyTorch, PyG, PyG adds
### Installing for CPU only on the CI
conda install pytorch torchvision torchaudio cpuonly -c pytorch
conda install pyg -c pyg
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 -c pytorch
pip install torch_geometric==2.3.1
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-$(python3 -c "import torch; print(torch.__version__)")+cpu.html
- name: Install dependencies on MacOS
shell: bash {0}
Expand Down
54 changes: 43 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,15 @@ DeepRank2 extensive documentation can be found [here](https://deeprank2.rtfd.io/
- [Table of contents](#table-of-contents)
- [Installation](#installation)
- [Dependencies](#dependencies)
- [Deeprank2 Package](#deeprank2-package)
- [Deeprank2 Package](#deeprank2-package)
- [Test installation](#test-installation)
- [Contributing](#contributing)
- [Data generation](#data-generation)
- [Datasets](#datasets)
- [GraphDataset](#graphdataset)
- [GridDataset](#griddataset)
- [Training](#training)
- [Run a pre-trained model on new data](#run-a-pre-trained-model-on-new-data)
- [Computational performances](#computational-performances)
- [Package development](#package-development)

Expand All @@ -61,7 +62,8 @@ Before installing deeprank2 you need to install some dependencies. We advise to
* [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
* [PyTorch](https://pytorch.org/get-started/locally/)
* We support torch's CPU library as well as CUDA.
* [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
* Currently, the package is tested using [PyTorch 2.0.1](https://pytorch.org/get-started/previous-versions/#v201).
* [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
* [DSSP 4](https://swift.cmbi.umcn.nl/gv/dssp/)
* Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
* on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
Expand All @@ -70,7 +72,7 @@ Before installing deeprank2 you need to install some dependencies. We advise to
* Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.
* For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).

### Deeprank2 Package
## Deeprank2 Package

Once the dependencies are installed, you can install the latest stable release of deeprank2 using the PyPi package manager:

Expand Down Expand Up @@ -214,14 +216,12 @@ dataset_train = GraphDataset(
dataset_val = GraphDataset(
hdf5_path = hdf5_paths,
subset = valid_ids,
train = False,
dataset_train = dataset_train
train_source = dataset_train
)
dataset_test = GraphDataset(
hdf5_path = hdf5_paths,
subset = test_ids,
train = False,
dataset_train = dataset_train
train_source = dataset_train
)
```

Expand All @@ -248,14 +248,12 @@ dataset_train = GridDataset(
dataset_val = GridDataset(
hdf5_path = hdf5_paths,
subset = valid_ids,
train = False,
dataset_train = dataset_train,
train_source = dataset_train,
)
dataset_test = GridDataset(
hdf5_path = hdf5_paths,
subset = test_ids,
train = False,
dataset_train = dataset_train,
train_source = dataset_train,
)
```

Expand Down Expand Up @@ -313,6 +311,40 @@ trainer.test()

```

### Run a pre-trained model on new data

If you want to analyze new PDB files using a pre-trained model, the first step is to process and save them into HDF5 files [as we have done above](#data-generation).

Then, the `DeeprankDataset` instance for the newly processed data can be created. Do this by specifying the path for the pre-trained model in `train_source`, together with the path to the HDF5 files just created. Note that there is no need of setting the dataset's parameters, since they are inherited from the information saved in the pre-trained model. Let's suppose that the model has been trained with `GraphDataset` objects:

```python
from deeprank2.dataset import GraphDataset

dataset_test = GraphDataset(
hdf5_path = "<output_folder>/<prefix_for_outputs>",
train_source = "<pretrained_model_path>"
)
```

Finally, the `Trainer` instance can be defined and the new data can be tested:

```python
from deeprank2.trainer import Trainer
from deeprank2.neuralnets.gnn.naive_gnn import NaiveNetwork
from deeprank2.utils.exporters import HDF5OutputExporter

trainer = Trainer(
NaiveNetwork,
dataset_test = dataset_test,
pretrained_model = "<pretrained_model_path>",
output_exporters = [HDF5OutputExporter("<output_folder_path>")]
)

trainer.test()
```

For more details about how to run a pre-trained model on new data, see the [docs](https://deeprank2.readthedocs.io/en/latest/getstarted.html#run-a-pre-trained-model-on-new-data).

## Computational performances

We measured the efficiency of data generation in DeepRank2 using the tutorials' [PDB files](https://zenodo.org/record/8187806) (~100 data points per data set), averaging the results run on Apple M1 Pro, using a single CPU.
Expand Down
Loading
Loading