Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share data across samples in the same multi-thread benchmark run #51

Open
anko opened this issue May 2, 2024 · 0 comments
Open

Share data across samples in the same multi-thread benchmark run #51

anko opened this issue May 2, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@anko
Copy link

anko commented May 2, 2024

I'm trying to benchmark a concurrent data structure, and I want to benchmark its read/write behaviour under thread contention. However, unlike all of the threaded examples in documentation, this structure's performance characteristics change as it is modified: internal parts of it are consumed or rearranged by different threads, so it needs to be constructed again for each run of the benchmark.

This means:

  • If the tested structure is made static, or initialised in the benchmark function body before calling divan::Bencher methods, then only the first iteration sees the structure as it was constructed. The other iterations see one which contents have been consumed by the first iteration, with almost no work left to benchmark.

    #[divan::bench(threads=[1, 2, 4, 8, 16])]
    fn benchmark_function(bencher: divan::Bencher) {
        static x: MyStruct = create_structure();
        bencher
            .bench(|| x.consume_contents());
    }
  • If it is initialised in with_inputs, each thread gets its own copy of the whole structure, so they never contend.

    #[divan::bench(threads=[1, 2, 4, 8, 16])]
    fn benchmark_function(bencher: divan::Bencher) {
        bencher
            .with_inputs(|| create_structure())
            .bench_values(|x| x.consume_contents());
    }

Either the structure is constructed once, then shared among all iterations (the first option), or constructed separately for each thread, and never shared (the second option). I need a way to make it constructed once per benchmark run, and shared only among threads that are part of the same benchmark run. Do I correctly understand that this is currently not possible using the threads option?


My current workaround is to start a const number of threads myself inside the with_inputs closure and have them wait at a std::sync::Barrier, then as part of the bench_local_values closure, release the Barrier and join the threads to time them:

#[divan::bench(consts = [1, 2, 4, 8, 16])]
fn benchmark_function<const THREADS: usize>(bencher: divan::Bencher) {
    use std::sync::{Arc, Barrier};
    bencher
        .with_inputs(|| -> (Vec<std::thread::JoinHandle<_>>, _) {
            let x: MyStruct = Arc::new(create_structure());
            let barrier = Arc::new(Barrier::new(THREADS + 1));
            let threads = (0..THREADS).map(|_| {
                let x = x.clone();
                let barrier = barrier.clone();
                std::thread::spawn(move || {
                    barrier.wait();
                    x.consume_contents();
                })
            }).collect();
            (threads, barrier)
        })
        .bench_local_values(|(threads, barrier)| {
            barrier.wait();
            for t in threads {
                t.join().unwrap();
            }
        });
}

This works, but there's a lot of code duplicating what I imagine Divan would do internally to implement the threads option.

I also see worse performance when benchmarking with 1 thread using this method than I do from an otherwise-identical benchmark with #[divan::bench(threads = [1])]. Probably because Divan doesn't use a Barrier when single-threaded. Which is smart, and another reason why I feel like this could be handled.

Am I missing a better existing way to do this?

  • If yes, could an example be added illustrating it?

  • If no, do you think this use-case could be handled by Divan?

@nvzqz nvzqz added the enhancement New feature or request label Nov 27, 2024
@nvzqz nvzqz changed the title How do I initialise a structure so it's shared between threads of the same bench iteration? Share data across samples in the same multi-thread benchmark run Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants