Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joss paper #17

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Joss paper #17

wants to merge 18 commits into from

Conversation

davidegraff
Copy link
Collaborator

Description

Provide a brief description of the PR's purpose here.

Todos

Notable points that this PR has either accomplished or will accomplish.

  • TODO 1

Questions

  • Question1

Status

  • Ready to go

@codecov
Copy link

codecov bot commented Nov 17, 2021

Codecov Report

Merging #17 (0fecbbe) into main (d182a9d) will not change coverage.
The diff coverage is n/a.

paper.md Outdated
```python
>>> metadata = ps.build_metadata("vina", dict(exhaustivness=32))
```
We may also utilize other docking engines in the AutoDock Vina family by specifying the `software` for Vina-type metadata. Here, we use the accelerated optimization routine of QVina for faster docking. Note that we also support `software` values of `"smina"` and `"psovina"` in addition to `"vina"` and `"qvina"`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add links/references to the different software packages listed here?
cc openjournals/joss-reviews#3950

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikemhenry references have been added!


In contrast, the multi-node setup exhibits less ideal scaling \autoref{fig:dist} with a measured speedup approximately 55% that of perfect scaling. We attribute this scaling behavior to hardware-dependent network communication overhead. Distributing a `sleep(5)` function allocated 6 CPU cores per task (to mimic a fairly quick docking simulation) in parallel over differing hardware setups
led to an approximate 2.5% increase in wall-time relative to the single-node setup each time the number of nodes in the setup was doubled while keeping the total number of CPU cores the same. Such a trend is consistent with network communication being detrimental to scaling behavior. This test also communicated the absolute minimum amount of data over the network, as there were no function arguments or return values. When communicating `CalculationData` objects (approximately 600 bytes in serialized form) over the network, as in `pyscreener`, the drop increased to 6% for each doubling of the total number of nodes. Minimizing the total size of `CalculationData` objects was therefore an explicit implementation goal. Future development will seek to further reduce network communication overhead costs to bring `pyscreener` scaling closer to ideal scaling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite an interesting observation, could you also add what is the configuration of the cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.rc.fas.harvard.edu/about/cluster-architecture/ contains the cluster architecture information. Because I'm not completely familiar with the question, what would be the necessary details to include here?


![Wall-time of the computational docking of all 1,615 FDA-approved drugs against 5WIU using QVina2 over six CPU cores for a single-node setup with the specified number of CPU cores. (Left) calculated speedup. (Right) wall time in minutes. Bars reflect mean $\pm$ standard deviation over three runs.\label{fig:local}](figures/timing-local.png)

To handle task distribution, `pyscreener` relies on the `ray` library [@moritz_ray_2018] for distributed computation. For multithreaded docking software, `pyscreener` allows a user to specify how many CPU cores to run each individual docking simulation over, running as many docking simulations in parallel as possible for a given number of total CPU cores in the `ray` cluster. To examine the scaling behavior of `pyscreener`, we docked all 1,615 FDA-approved drugs into the active site of the D4 dopamine receptor (PDB ID 5WIU [@wang_d4_2017]) with QVina2 running over 6 CPU cores. We tested both single node hardware setups, scaling the total number of CPU cores on one machine, and multi-node setups, scaling the total number of machines. In the single-node case, `pyscreener` exhibited essentially perfect scaling \autoref{fig:local} as we scaled the size of the `ray` cluster from 6 to 48 CPU cores running QVina over 6 CPU cores.
Copy link
Contributor

@mikemhenry mikemhenry Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure that I'm doing something wrong here (maybe using the wrong flags or backend?) But on a cluster when I run pyscreener --config integration-tests/configs/test_vina.ini -nc 32 it takes 0h 36m 5.93s but when I run pyscreener --config integration-tests/configs/test_vina.ini -nc 64 it takes 1h 12m 9.79s. This is with the vina backend and my hunch is that I'm using the flags incorrectly and instead of parallelizing jobs across the CPUs, I'm trying to give vina N CPUs for each job. I've attached the full output below:

https://gist.github.com/mikemhenry/e0075ee01466145c116f2b8ebeda35b7

cc openjournals/joss-reviews#3950

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the timing look like for vina with 64 vs 32 CPUs on your computer? I'm unfamiliar with any firm vina scaling benchmarks, but in my experience I get severely diminishing returns beyond 8 CPUs. If you get no benefit doubling the CPUs per job from 32 to 64, then all that would serve to do is limit the number of parallel jobs you're able to run (2 jobs simultaneously in the 32 CPUs/job case vs only 1 in the 64 CPUs/job.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! I was able to verify the performance claim with a small test:

Total time to dock 200 ligands: 0h 24m 17.38s (7.29s/ligand) # 32 CPU
Total time to dock 200 ligands: 0h 13m 5.39s (3.93s/ligand) # 64 CPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants