One question about DDP #3

cswwp · 2020-11-21T15:30:16Z

@richardkxu Nice repo. One question: Is there difference between Single node, multiple GPUs with torch.distributed.launch (①) and Single node, multiple GPUs with multi-processes(②)? or they are equalize, and just two different method?

①

②

richardkxu · 2020-11-21T17:54:55Z

The main difference is which distributed training library you use. The 1st one uses NVIDIA Apex library. The 2nd one uses torch.nn.DistributedDataParallel. The 1st one gives better perf and works better with NVIDIA GPUs. It also becomes the default way in newer version of pytorch (> 1.6.0). Hope this is helpful!

cswwp · 2020-11-22T07:06:54Z

The main difference is which distributed training library you use. The 1st one uses NVIDIA Apex library. The 2nd one uses torch.nn.DistributedDataParallel. The 1st one gives better perf and works better with NVIDIA GPUs. It also becomes the default way in newer version of pytorch (> 1.6.0). Hope this is helpful!

Thank you, very helpful

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One question about DDP #3

One question about DDP #3

cswwp commented Nov 21, 2020 •

edited

Loading

richardkxu commented Nov 21, 2020

cswwp commented Nov 22, 2020

One question about DDP #3

One question about DDP #3

Comments

cswwp commented Nov 21, 2020 • edited Loading

richardkxu commented Nov 21, 2020

cswwp commented Nov 22, 2020

cswwp commented Nov 21, 2020 •

edited

Loading