Revisit DAG benchmarks #327

psalz · 2025-01-14T12:20:54Z

This overhauls the DAG benchmarks to address some long standing issues, as well as to future-proof them for loop templates:

Instead of measuring the combined time taken to generate a graph and all the graphs that come before it (i.e., TDAG -> CDAG -> IDAG), only measure the generation of the graph being benchmarked.
Don't measure benchmark context creation / destruction.
Fix IDAG benchmarks creating host tasks instead of device tasks (i.e., number of devices and p2p copy support had no effect until now).
Fix scheduler benchmarks comparing against single-threaded TDAG + CDAG generation, instead of all three graphs.
Remove outdated "throttled submission" scheduler benchmarks (these were designed to measure the impact of the scheduler's lock-step operation with the main thread, which has long been removed).

Due to these changes, we should expect CDAG benchmark times to go down, because they no longer include TDAG generation. IDAG benchmarks no longer include TDAG / CDAG, but due to the switch from host_task to parallel_for, this is a tossup (benchmarks with lots of communication, such as the (strangely named) chain topology now take a lot longer).

For the scheduler benchmarks I've decided to include the TDAG generation time in the main thread, which is overlapped with CDAG / IDAG generation in the scheduler thread, as this best reflects reality.

I've also made some improvements to our CI perf impact plotting script:

Group benchmarks by category
Left-align legend, one entry per line
Don't create images for categories without changes ("No Data")
Support newly added and removed benchmarks

- Group benchmarks by category - Left-align legend, one entry per line - Don't create images for categories without changes ("No Data") - Support newly added and removed benchmarks

This overhauls the DAG benchmarks to address some long standing issues: - Instead of measuring the combined time taken to generate a graph and all the graphs that come before it (i.e., TDAG -> CDAG -> IDAG), only measure the generation of the graph being benchmarked. - Don't measure benchmark context creation / destruction. - Fix IDAG benchmarks creating host tasks instead of device tasks (i.e., number of devices and p2p copy support had no effect). - Fix scheduler benchmarks comparing against single-threaded TDAG + CDAG generation, instead of all three graphs. - Remove outdated "throttled submission" scheduler benchmarks.

github-actions · 2025-01-14T12:31:58Z

Check-perf-impact results: (e103131cc2f75d73207c09cd7ef74e9c)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: 8 individual benchmarks affected
🚀 Significant speedup (<0.80x) in some microbenchmark results: 39 individual benchmarks affected
➕ Added microbenchmark(s): 24 individual benchmarks affected
➖ Removed microbenchmark(s): 48 individual benchmarks affected

Relative execution time per category: (mean of relative medians)

command-graph : 0.72x 🚀
graph-nodes : 0.99x
grid : 1.02x
instruction-graph : 3.40x ⚠️
scheduler : new 🌟
system : 1.05x
task-graph : 0.87x 🚀

coveralls · 2025-01-14T12:39:53Z

Pull Request Test Coverage Report for Build 12767682328

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 95.05%

Totals
Change from base Build 12390656669:	0.0%
Covered Lines:	7088
Relevant Lines:	7193

💛 - Coveralls

PeterTh

LGTM

PeterTh · 2025-01-14T13:39:20Z

include/utils.h

@@ -19,6 +19,14 @@
 #define CELERITY_DETAIL_UTILS_CAT_2(a, b) a##b
 #define CELERITY_DETAIL_UTILS_CAT(a, b) CELERITY_DETAIL_UTILS_CAT_2(a, b)

+#define CELERITY_DETAIL_UTILS_NON_COPYABLE(classname)                                                                                                          \


fknorr

Like the new visualization!

fknorr · 2025-01-15T13:30:51Z

test/dag_benchmarks.cc

+
+	void initialize() {
+		tdag_benchmark_context::initialize();
+		create_all_commands();


This doesn't do anything since m_tasks is empty, right?

Also, what is it supposed to do, pre-allocate m_command_batches? Maybe also add a comment.

fknorr · 2025-01-15T13:33:30Z

test/dag_benchmarks.cc

+	void initialize() { tm.generate_epoch_task(celerity::detail::epoch_action::init); }
+
+	void prepare() { m_tasks.reserve(m_command_groups.size()); }
+
+	void execute() { create_all_tasks(); }
+
+	void finalize() {


Maybe have a comment about what the purpose of these four functions is (I would naively expect all setup to happen in the ctor, and the benchmarked code to be in a single member function).

fknorr · 2025-01-15T13:34:29Z

test/dag_benchmarks.cc

+	void finalize() {
+		m_tasks.clear();
+		tdag_benchmark_context::finalize();
+		create_all_commands();


Why do commands need to be created in what appears to be a teardown function?

psalz force-pushed the revisit-dag-benchmarks branch from 6cd07f3 to dbf0901 Compare January 14, 2025 12:25

psalz added 3 commits January 14, 2025 13:30

CI: Improve plots generated by performance check script

1898534

- Group benchmarks by category - Left-align legend, one entry per line - Don't create images for categories without changes ("No Data") - Support newly added and removed benchmarks

Update benchmark results for DAG benchmark overhaul

7b13888

psalz force-pushed the revisit-dag-benchmarks branch from dbf0901 to 7b13888 Compare January 14, 2025 12:31

PeterTh approved these changes Jan 14, 2025

View reviewed changes

fknorr reviewed Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit DAG benchmarks #327

Revisit DAG benchmarks #327

psalz commented Jan 14, 2025

github-actions bot commented Jan 14, 2025

coveralls commented Jan 14, 2025

PeterTh left a comment

PeterTh Jan 14, 2025

fknorr left a comment

fknorr Jan 15, 2025

fknorr Jan 15, 2025

fknorr Jan 15, 2025

Revisit DAG benchmarks #327

Are you sure you want to change the base?

Revisit DAG benchmarks #327

Conversation

psalz commented Jan 14, 2025

github-actions bot commented Jan 14, 2025

coveralls commented Jan 14, 2025

Pull Request Test Coverage Report for Build 12767682328

Details

💛 - Coveralls

PeterTh left a comment

Choose a reason for hiding this comment

PeterTh Jan 14, 2025

Choose a reason for hiding this comment

fknorr left a comment

Choose a reason for hiding this comment

fknorr Jan 15, 2025

Choose a reason for hiding this comment

fknorr Jan 15, 2025

Choose a reason for hiding this comment

fknorr Jan 15, 2025

Choose a reason for hiding this comment