Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add statistically-significant improvement reporting #48

Open
TheLostLambda opened this issue Mar 15, 2024 · 1 comment
Open

Add statistically-significant improvement reporting #48

TheLostLambda opened this issue Mar 15, 2024 · 1 comment

Comments

@TheLostLambda
Copy link

Similar to what criterion does, but I think a useful starting point would just be a ±% change in times between runs (if it's determined that the two runs differ significantly given the variance of each)!

I imagine this is somewhat blocked on writing out the previous benchmark results somewhere they can be referenced first!

@nvzqz
Copy link
Owner

nvzqz commented Jun 30, 2024

This is a bit of a nuanced issue. Currently benchmarks don't have any statistics outside of min/max/median/mean time. But I would very much like to do proper statistical analysis across benchmark runs to determine if a difference is distinguishable from random (i.e. statistically-significant).

The way the Stabilizer folks went about it resulted in a normal distribution of results. But being an easy-to-pickup userspace program, Divan doesn't have the same luxury of being an LLVM plugin. That said, benchmarks tend to follow a log-normal distribution. So perhaps we can make the same stats work from that?


As per the intent of the issue, my plan once #10/#42 is complete is to report a ±% change from the previous run given information previously recorded in target/divan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants