-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit DAG benchmarks #327
base: master
Are you sure you want to change the base?
Conversation
6cd07f3
to
dbf0901
Compare
- Group benchmarks by category - Left-align legend, one entry per line - Don't create images for categories without changes ("No Data") - Support newly added and removed benchmarks
This overhauls the DAG benchmarks to address some long standing issues: - Instead of measuring the combined time taken to generate a graph and all the graphs that come before it (i.e., TDAG -> CDAG -> IDAG), only measure the generation of the graph being benchmarked. - Don't measure benchmark context creation / destruction. - Fix IDAG benchmarks creating host tasks instead of device tasks (i.e., number of devices and p2p copy support had no effect). - Fix scheduler benchmarks comparing against single-threaded TDAG + CDAG generation, instead of all three graphs. - Remove outdated "throttled submission" scheduler benchmarks.
dbf0901
to
7b13888
Compare
Check-perf-impact results: (e103131cc2f75d73207c09cd7ef74e9c)
Relative execution time per category: (mean of relative medians)
|
Pull Request Test Coverage Report for Build 12767682328Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -19,6 +19,14 @@ | |||
#define CELERITY_DETAIL_UTILS_CAT_2(a, b) a##b | |||
#define CELERITY_DETAIL_UTILS_CAT(a, b) CELERITY_DETAIL_UTILS_CAT_2(a, b) | |||
|
|||
#define CELERITY_DETAIL_UTILS_NON_COPYABLE(classname) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the new visualization!
|
||
void initialize() { | ||
tdag_benchmark_context::initialize(); | ||
create_all_commands(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't do anything since m_tasks
is empty, right?
Also, what is it supposed to do, pre-allocate m_command_batches
? Maybe also add a comment.
void initialize() { tm.generate_epoch_task(celerity::detail::epoch_action::init); } | ||
|
||
void prepare() { m_tasks.reserve(m_command_groups.size()); } | ||
|
||
void execute() { create_all_tasks(); } | ||
|
||
void finalize() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have a comment about what the purpose of these four functions is (I would naively expect all setup to happen in the ctor, and the benchmarked code to be in a single member function).
void finalize() { | ||
m_tasks.clear(); | ||
tdag_benchmark_context::finalize(); | ||
create_all_commands(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do commands need to be created in what appears to be a teardown function?
This overhauls the DAG benchmarks to address some long standing issues, as well as to future-proof them for loop templates:
Due to these changes, we should expect CDAG benchmark times to go down, because they no longer include TDAG generation. IDAG benchmarks no longer include TDAG / CDAG, but due to the switch from
host_task
toparallel_for
, this is a tossup (benchmarks with lots of communication, such as the (strangely named)chain topology
now take a lot longer).For the scheduler benchmarks I've decided to include the TDAG generation time in the main thread, which is overlapped with CDAG / IDAG generation in the scheduler thread, as this best reflects reality.
I've also made some improvements to our CI perf impact plotting script: