-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize the SampleXXX function and fixes #249 #261
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First things first, sorry about the very delayed review.
Second, this looks like a great PR. I am not at all qualified to review the assembly bits, but if the tests pass, they pass.
A couple of things I'd ask:
-
please split up the work into more manageable commits. As is, it's very difficult for me to test the effect this might have on non-AMD architectures without copying a bunch of code back and forth between branches. I would suggest the following commit order:
- fix the float comparisons so that tests pass on all architectures
- add the benchmark for the sample method
- add all the assembly bits and conditional logic.
This way, it's possible to actually usebenchcmp
without copying a bunch of code around from different branches.
-
I'm not sure if it actually has an effect, or is something that'll get optimized away, but it seems like the conditional logic could be simplified. Rather than defining 3 bools and checking them to see which version of the function to execute, could we just assign the appropriate function in the global scope once, based on the same booleans? Whether this has a perf impact or not, IMO it'll make the code a lot cleaner.
Thank you!
I think these advices are very useful. I will modify this pr in few days. |
SampleCalc4KSampleSum-32 3.38µs ± 3% 0.47µs ± 1% -86.02% (p=0.100 n=3+3) SampleCalc4KSampleMax-32 5.35µs ± 5% 0.86µs ± 1% -83.83% (p=0.100 n=3+3) SampleCalc4KSampleMin-32 5.25µs ± 4% 0.88µs ± 2% -83.26% (p=0.100 n=3+3) SampleCalc4KSampleMean-32 2.53µs ±10% 0.44µs ±11% -82.76% (p=0.100 n=3+3) SampleCalc4KSampleStdDev-32 7.16µs ± 3% 5.22µs ± 2% -27.06% (p=0.100 n=3+3) SampleCalc4KSampleVariance-32 7.34µs ± 1% 5.32µs ± 2% -27.54% (p=0.100 n=3+3) SampleCalc4KSamplePercentile-32 262µs ± 6% 270µs ± 4% ~ (p=1.000 n=3+3)
4672687
to
dee2f26
Compare
Hi @mihasya , |
Hi @rcrowley, @mihasya , could you have time to review this patch?