Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize the SampleXXX function and fixes #249 #261

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

lrita
Copy link
Contributor

@lrita lrita commented Jul 30, 2019

Hi @rcrowley, @mihasya , could you have time to review this patch?

name                             old time/op  new time/op  delta
SampleCalc4K/SampleSum-4         2.07µs ± 6%  0.25µs ± 3%  -88.07%  (p=0.100 n=3+3)
SampleCalc4K/SampleMax-4         2.79µs ± 4%  0.71µs ± 1%  -74.50%  (p=0.100 n=3+3)
SampleCalc4K/SampleMin-4         2.73µs ± 1%  0.70µs ± 1%  -74.34%  (p=0.100 n=3+3)
SampleCalc4K/SampleMean-4        1.99µs ± 2%  0.24µs ± 1%  -87.73%  (p=0.100 n=3+3)
SampleCalc4K/SampleStdDev-4      6.16µs ± 1%  4.09µs ± 4%  -33.52%  (p=0.100 n=3+3)
SampleCalc4K/SampleVariance-4    6.10µs ± 2%  4.03µs ± 2%  -33.93%  (p=0.100 n=3+3)
SampleCalc4K/SamplePercentile-4   222µs ± 1%   215µs ± 1%   -2.78%  (p=0.100 n=3+3)

@lrita
Copy link
Contributor Author

lrita commented Sep 4, 2019

Hi @rcrowley, @mihasya , could you have time to review this patch?

Making it support go1.4-go1.12 already and remove the go1.3 support.

Copy link
Collaborator

@mihasya mihasya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First things first, sorry about the very delayed review.

Second, this looks like a great PR. I am not at all qualified to review the assembly bits, but if the tests pass, they pass.

A couple of things I'd ask:

  1. please split up the work into more manageable commits. As is, it's very difficult for me to test the effect this might have on non-AMD architectures without copying a bunch of code back and forth between branches. I would suggest the following commit order:

    • fix the float comparisons so that tests pass on all architectures
    • add the benchmark for the sample method
    • add all the assembly bits and conditional logic.
      This way, it's possible to actually use benchcmp without copying a bunch of code around from different branches.
  2. I'm not sure if it actually has an effect, or is something that'll get optimized away, but it seems like the conditional logic could be simplified. Rather than defining 3 bools and checking them to see which version of the function to execute, could we just assign the appropriate function in the global scope once, based on the same booleans? Whether this has a perf impact or not, IMO it'll make the code a lot cleaner.

Thank you!

@lrita
Copy link
Contributor Author

lrita commented Mar 13, 2020

I think these advices are very useful. I will modify this pr in few days.

lrita added 4 commits March 31, 2020 14:08
SampleCalc4KSampleSum-32         3.38µs ± 3%   0.47µs ± 1%  -86.02%  (p=0.100 n=3+3)
SampleCalc4KSampleMax-32         5.35µs ± 5%   0.86µs ± 1%  -83.83%  (p=0.100 n=3+3)
SampleCalc4KSampleMin-32         5.25µs ± 4%   0.88µs ± 2%  -83.26%  (p=0.100 n=3+3)
SampleCalc4KSampleMean-32        2.53µs ±10%   0.44µs ±11%  -82.76%  (p=0.100 n=3+3)
SampleCalc4KSampleStdDev-32      7.16µs ± 3%   5.22µs ± 2%  -27.06%  (p=0.100 n=3+3)
SampleCalc4KSampleVariance-32    7.34µs ± 1%   5.32µs ± 2%  -27.54%  (p=0.100 n=3+3)
SampleCalc4KSamplePercentile-32   262µs ± 6%    270µs ± 4%     ~     (p=1.000 n=3+3)
@lrita lrita force-pushed the optimize_sample_function branch from 4672687 to dee2f26 Compare March 31, 2020 12:11
@lrita lrita requested a review from mihasya March 31, 2020 12:14
@lrita
Copy link
Contributor Author

lrita commented Mar 31, 2020

Hi @mihasya ,
Re-commit by above good advices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants