Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit DAG benchmarks #327

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Revisit DAG benchmarks #327

wants to merge 3 commits into from

Conversation

psalz
Copy link
Member

@psalz psalz commented Jan 14, 2025

This overhauls the DAG benchmarks to address some long standing issues, as well as to future-proof them for loop templates:

  • Instead of measuring the combined time taken to generate a graph and all the graphs that come before it (i.e., TDAG -> CDAG -> IDAG), only measure the generation of the graph being benchmarked.
  • Don't measure benchmark context creation / destruction.
  • Fix IDAG benchmarks creating host tasks instead of device tasks (i.e., number of devices and p2p copy support had no effect until now).
  • Fix scheduler benchmarks comparing against single-threaded TDAG + CDAG generation, instead of all three graphs.
  • Remove outdated "throttled submission" scheduler benchmarks (these were designed to measure the impact of the scheduler's lock-step operation with the main thread, which has long been removed).

Due to these changes, we should expect CDAG benchmark times to go down, because they no longer include TDAG generation. IDAG benchmarks no longer include TDAG / CDAG, but due to the switch from host_task to parallel_for, this is a tossup (benchmarks with lots of communication, such as the (strangely named) chain topology now take a lot longer).

For the scheduler benchmarks I've decided to include the TDAG generation time in the main thread, which is overlapped with CDAG / IDAG generation in the scheduler thread, as this best reflects reality.

I've also made some improvements to our CI perf impact plotting script:

  • Group benchmarks by category
  • Left-align legend, one entry per line
  • Don't create images for categories without changes ("No Data")
  • Support newly added and removed benchmarks

@psalz psalz force-pushed the revisit-dag-benchmarks branch from 6cd07f3 to dbf0901 Compare January 14, 2025 12:25
psalz added 3 commits January 14, 2025 13:30
- Group benchmarks by category
- Left-align legend, one entry per line
- Don't create images for categories without changes ("No Data")
- Support newly added and removed benchmarks
This overhauls the DAG benchmarks to address some long standing issues:

- Instead of measuring the combined time taken to generate a graph and
  all the graphs that come before it (i.e., TDAG -> CDAG -> IDAG), only
  measure the generation of the graph being benchmarked.
- Don't measure benchmark context creation / destruction.
- Fix IDAG benchmarks creating host tasks instead of device tasks (i.e.,
  number of devices and p2p copy support had no effect).
- Fix scheduler benchmarks comparing against single-threaded TDAG + CDAG
  generation, instead of all three graphs.
- Remove outdated "throttled submission" scheduler benchmarks.
@psalz psalz force-pushed the revisit-dag-benchmarks branch from dbf0901 to 7b13888 Compare January 14, 2025 12:31
Copy link

Check-perf-impact results: (e103131cc2f75d73207c09cd7ef74e9c)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: 8 individual benchmarks affected
🚀 Significant speedup (<0.80x) in some microbenchmark results: 39 individual benchmarks affected
Added microbenchmark(s): 24 individual benchmarks affected
Removed microbenchmark(s): 48 individual benchmarks affected

Relative execution time per category: (mean of relative medians)

  • command-graph : 0.72x 🚀
  • graph-nodes : 0.99x
  • grid : 1.02x
  • instruction-graph : 3.40x ⚠️
  • scheduler : new 🌟
  • system : 1.05x
  • task-graph : 0.87x 🚀

@coveralls
Copy link

Pull Request Test Coverage Report for Build 12767682328

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 95.05%

Totals Coverage Status
Change from base Build 12390656669: 0.0%
Covered Lines: 7088
Relevant Lines: 7193

💛 - Coveralls

Copy link
Contributor

@PeterTh PeterTh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -19,6 +19,14 @@
#define CELERITY_DETAIL_UTILS_CAT_2(a, b) a##b
#define CELERITY_DETAIL_UTILS_CAT(a, b) CELERITY_DETAIL_UTILS_CAT_2(a, b)

#define CELERITY_DETAIL_UTILS_NON_COPYABLE(classname) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan.

Copy link
Contributor

@fknorr fknorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the new visualization!


void initialize() {
tdag_benchmark_context::initialize();
create_all_commands();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do anything since m_tasks is empty, right?

Also, what is it supposed to do, pre-allocate m_command_batches? Maybe also add a comment.

Comment on lines +134 to +140
void initialize() { tm.generate_epoch_task(celerity::detail::epoch_action::init); }

void prepare() { m_tasks.reserve(m_command_groups.size()); }

void execute() { create_all_tasks(); }

void finalize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have a comment about what the purpose of these four functions is (I would naively expect all setup to happen in the ctor, and the benchmarked code to be in a single member function).

void finalize() {
m_tasks.clear();
tdag_benchmark_context::finalize();
create_all_commands();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do commands need to be created in what appears to be a teardown function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants