Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One question about DDP #3

Open
cswwp opened this issue Nov 21, 2020 · 2 comments
Open

One question about DDP #3

cswwp opened this issue Nov 21, 2020 · 2 comments

Comments

@cswwp
Copy link

cswwp commented Nov 21, 2020

@richardkxu Nice repo. One question: Is there difference between Single node, multiple GPUs with torch.distributed.launch (①) and Single node, multiple GPUs with multi-processes(②)? or they are equalize, and just two different method?


image


image

@richardkxu
Copy link
Owner

The main difference is which distributed training library you use. The 1st one uses NVIDIA Apex library. The 2nd one uses torch.nn.DistributedDataParallel. The 1st one gives better perf and works better with NVIDIA GPUs. It also becomes the default way in newer version of pytorch (> 1.6.0). Hope this is helpful!

@cswwp
Copy link
Author

cswwp commented Nov 22, 2020

The main difference is which distributed training library you use. The 1st one uses NVIDIA Apex library. The 2nd one uses torch.nn.DistributedDataParallel. The 1st one gives better perf and works better with NVIDIA GPUs. It also becomes the default way in newer version of pytorch (> 1.6.0). Hope this is helpful!

Thank you, very helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants