[WIP] Allow benchmark between multiple configs #703

H-Huang · 2024-11-26T22:04:27Z

Stack from ghstack (oldest at bottom):

-> [WIP] Allow benchmark between multiple configs #703

This is WIP and needs to be cleaned up but creating PR to solicit feedback.

There isn't an easy way run multiple runs together and display the metrics in a single view. We already save the metrics to tensorboard so this PR adds a data structure to retrieve these metrics and display it in a table. It is an enhancement in test_runner.py

Example:

# under test_runner.py there are 3 different configurations we want to compare metrics between
OverrideDefinitions(
    [
        [
            "--experimental.pipeline_parallel_degree 2",
            "--training.data_parallel_shard_degree 2",
            "--metrics.enable_tensorboard",
        ],
        [
            "--training.data_parallel_shard_degree 4",
            "--metrics.enable_tensorboard",
        ],
        [
            "--training.tensor_parallel_degree 4",
            "--metrics.enable_tensorboard",
        ],
    ],
    "example",
    "my_example",
    ngpu=4,
),

This can be outputted as:

Run ID: 0, args: ['--experimental.pipeline_parallel_degree 2', '--training.data_parallel_shard_degree 2', '--metrics.enable_tensorboard']
Run ID: 1, args: ['--training.data_parallel_shard_degree 4', '--metrics.enable_tensorboard']
Run ID: 2, args: ['--training.tensor_parallel_degree 4', '--metrics.enable_tensorboard']

           | wps                          | mfu(%)                       | memory/max_active(GiB)       | memory/max_active(%)         | memory/max_reserved(%)       | loss_metrics/global_avg_loss | loss_metrics/global_max_loss
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0          | 55928.27734375               | 1.6978793144226074           | 0.7439417839050293           | 0.940051257610321            | 1.0439579486846924           | 7.137195110321045            | 7.143610954284668           
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1          | 97578.0234375                | 2.9622886180877686           | 1.9954571723937988           | 2.521476984024048            | 2.6999762058258057           | 7.031539440155029            | 7.066774845123291           
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2          | 10559.697265625              | 0.3205729126930237           | 1.0774049758911133           | 1.3614182472229004           | 1.4314316511154175           | 7.13456916809082             | 7.13456916809082

[ghstack-poisoned]

ghstack-source-id: 613f40b90f936d11f3ffe5f523d878ba7150f18f Pull Request resolved: #703

wconstab

I think the MetricRetriever and print_metrics are nice. But I'm kinda curious why you felt they were needed. You can also just fire up tensorboard and compare all the metrics that way, right? (and then you get graphs of loss too).

Actually iirc I once tried to compare titan runs in tensorboard and I found the way the logdir was set up made it a bit annoying. It might be worth addressing that pain point if that is a pain point. (I might have just been running things in a dumb way too).

wconstab · 2024-11-27T23:32:50Z

test_runner.py

@@ -10,7 +10,9 @@
 import subprocess
 from collections import defaultdict
 from dataclasses import dataclass
-from typing import Sequence
+from typing import Any, Dict, Sequence


RE the binary files above, not sure if you meant to include those in the PR

Allow benchmark between multiple configs

34846e3

[ghstack-poisoned]

H-Huang added a commit that referenced this pull request Nov 26, 2024

Allow benchmark between multiple configs

b037b72

ghstack-source-id: 613f40b90f936d11f3ffe5f523d878ba7150f18f Pull Request resolved: #703

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 26, 2024

H-Huang marked this pull request as ready for review November 26, 2024 22:09

H-Huang changed the title ~~Allow benchmark between multiple configs~~ [WIP] Allow benchmark between multiple configs Nov 26, 2024

H-Huang requested review from wconstab, tianyu-l and lessw2020 November 26, 2024 22:09

wconstab reviewed Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Allow benchmark between multiple configs #703

[WIP] Allow benchmark between multiple configs #703

H-Huang commented Nov 26, 2024 •

edited

Loading

wconstab left a comment

wconstab Nov 27, 2024

[WIP] Allow benchmark between multiple configs #703

Are you sure you want to change the base?

[WIP] Allow benchmark between multiple configs #703

Conversation

H-Huang commented Nov 26, 2024 • edited Loading

wconstab left a comment

Choose a reason for hiding this comment

wconstab Nov 27, 2024

Choose a reason for hiding this comment

H-Huang commented Nov 26, 2024 •

edited

Loading