You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unless I missed something, divan currently only supports running over a set of configurations that is known at compile time. While this results in a very nice API for simple use cases, having only this option can be a problem in more complex benchmarks for several reasons:
It will lead to severe code bloat issues when benchmarking over a wide range of problem sizes (since problem sizes can currently only be specified via #[bench(consts = ...)], which results in one different copy of the function being generated for each problem size), unless one remembers to dispatch to a utility function with a run-time problem size.
It can lead to undesirable bias when one wants to benchmark how the code behaves in the face of a problem size which is only known at run time (which is the common case), unless one remembers to use the above dynamic dispatch trick or another optimization barrier to hide the problem size from the compiler. In which case suffering the aforementioned code bloat is pointless.
It precludes running over a set of benchmark configurations that cannot be known until runtime.
To give you an example of the latter, I have a set of criterion benchmarks around that exercises parallel code over a range of thread pinning configurations. Given N the host's CPU thread counts, I want to test with N threads, N/2 threads, N/4 threads..., all the way to 1 thread, and for each of these thread counts, I want to test two thread pinning configurations, one "dense" configuration where threads are packed into as few NUMA/NUCA domains as possible (which minimizes synchronization costs), and one "sparse" configuration where threads are spread over as many NUMA/NUCA domains as possible (which maximizes shared resource usage).
This sort of benchmarking cannot be done unless at some point during the benchmark initialization process, I get the occasion to probe the host using something like hwloc, generate benchmark configurations, and register them.
Given the significant heterogeneity of GPU hardware, I suspect that people who try to use divan for benchmarking GPU code will face similar issues in much less exotic use cases.
While a fully general run-time benchmark registration mechanism would require some kind of Divan::add_benchmark() API, I suspect the most common use cases could be covered by just having some sort of Bencher::with_input_configurations(configs: impl IntoIterator<Item = (String, InputGenerator)>) API that lets you run a single benchmark function over N different input configurations and report each configuration separately with a different name in the final divan output.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Unless I missed something, divan currently only supports running over a set of configurations that is known at compile time. While this results in a very nice API for simple use cases, having only this option can be a problem in more complex benchmarks for several reasons:
#[bench(consts = ...)]
, which results in one different copy of the function being generated for each problem size), unless one remembers to dispatch to a utility function with a run-time problem size.To give you an example of the latter, I have a set of criterion benchmarks around that exercises parallel code over a range of thread pinning configurations. Given N the host's CPU thread counts, I want to test with N threads, N/2 threads, N/4 threads..., all the way to 1 thread, and for each of these thread counts, I want to test two thread pinning configurations, one "dense" configuration where threads are packed into as few NUMA/NUCA domains as possible (which minimizes synchronization costs), and one "sparse" configuration where threads are spread over as many NUMA/NUCA domains as possible (which maximizes shared resource usage).
This sort of benchmarking cannot be done unless at some point during the benchmark initialization process, I get the occasion to probe the host using something like hwloc, generate benchmark configurations, and register them.
Given the significant heterogeneity of GPU hardware, I suspect that people who try to use divan for benchmarking GPU code will face similar issues in much less exotic use cases.
While a fully general run-time benchmark registration mechanism would require some kind of
Divan::add_benchmark()
API, I suspect the most common use cases could be covered by just having some sort ofBencher::with_input_configurations(configs: impl IntoIterator<Item = (String, InputGenerator)>)
API that lets you run a single benchmark function over N different input configurations and report each configuration separately with a different name in the final divan output.Beta Was this translation helpful? Give feedback.
All reactions