Add an example of training a tabular model on multiple GPUs #474

akihironitta · 2024-12-27T01:10:02Z

Adds an example for training a model on multiple GPUs.

Usage

$ python examples/trompt_multi_gpu.py

Changes

To enable it, this PR modifies Trompt accordingly:

- out = Trompt(...).forward_stacked(tf)  # forward_stacked is removed
+ out = Trompt(...)(tf)
  assert out.size() == (batch_size, num_layers, num_channels)

Some highlights in the script

Unlike examples/trompt.py, it computes training accuracy batch-wise instead of computing it at the end of each epoch to save time (although model parameters change over steps within each epoch).
It reduces losses and metrics with all_reduce and torchmetrics's API across all ranks at the end of each epoch.
It avoids unnecessary device synchronisations within each epoch, e.g., by not calling float(cuda_tensor), and by setting repeat_interleave(..., output_size=...).
... (I'm happy to elaborate if anything in the script is unclear.)

Benchmark results from `--dataset jannis` on `g6.12xlarge` with four L4 GPUs

	four GPUs	single GPU
time per training step	0.725 seconds	0.741 seconds
time per training epoch	29 seconds	117 seconds
test accuracy	80.23 %	80.29 %

weihua916 · 2024-12-27T05:06:49Z

torch_frame/nn/models/trompt.py

@@ -122,7 +122,7 @@ def reset_parameters(self) -> None:
            trompt_conv.reset_parameters()
        self.trompt_decoder.reset_parameters()

-    def forward_stacked(self, tf: TensorFrame) -> Tensor:


Let's make sure this change does not break the example code.

Confirmed the change doesn't break these scripts across all supported task types:

examples/trompt.py benchmark/data_frame_benchmark.py benchmark/data_frame_text_benchmark.py

akihironitta added 2 commits December 27, 2024 00:57

add trompt ddp

1e3868d

update

e3431d2

akihironitta self-assigned this Dec 27, 2024

github-actions bot added example nn labels Dec 27, 2024

akihironitta removed the nn label Dec 27, 2024

update

f566dfd

akihironitta added the nn label Dec 27, 2024

akihironitta added 11 commits December 27, 2024 01:26

update test

c411f93

update

024e8a5

update

30ac943

no stream sync

4d07d34

update changelog

a494e01

update

1941549

update

35d2fe5

add --compile

b312385

update

aabe3b0

update

cd1ddc0

update

7d090e7

weihua916 reviewed Dec 27, 2024

View reviewed changes

Merge branch 'master' into aki/ddp

24ad096

akihironitta requested a review from weihua916 December 28, 2024 15:52

akihironitta merged commit 655730c into master Dec 30, 2024
14 checks passed

akihironitta deleted the aki/ddp branch December 30, 2024 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an example of training a tabular model on multiple GPUs #474

Add an example of training a tabular model on multiple GPUs #474

akihironitta commented Dec 27, 2024 •

edited

Loading

weihua916 Dec 27, 2024

akihironitta Dec 28, 2024

Add an example of training a tabular model on multiple GPUs #474

Add an example of training a tabular model on multiple GPUs #474

Conversation

akihironitta commented Dec 27, 2024 • edited Loading

Usage

Changes

Some highlights in the script

Benchmark results from --dataset jannis on g6.12xlarge with four L4 GPUs

weihua916 Dec 27, 2024

Choose a reason for hiding this comment

akihironitta Dec 28, 2024

Choose a reason for hiding this comment

akihironitta commented Dec 27, 2024 •

edited

Loading

Benchmark results from `--dataset jannis` on `g6.12xlarge` with four L4 GPUs