-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix iperf no connection and PerfResult warmup timestamp calculation #382
Fix iperf no connection and PerfResult warmup timestamp calculation #382
Conversation
When the iperf job fails we created a slightly different PerfResult data structure which can create problems in other parts of the code which assume a stable structure. This fixes that by adding the one additional level of a SequentialPerfResult. Signed-off-by: Ondrej Lichtner <[email protected]>
When creating these PerfIntervals representing zero flows we need to set the duration to non zero to avoid zero division issues. Signed-off-by: Ondrej Lichtner <[email protected]>
this will need some testing on real data to see if the warmup/warmdown trimming isn't causing issues... I tested this locally by printing the difference between the old calculation and the new one and the difference was minimal - the original implementation comes from #248 so that will be relevant for testing this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides my concern about the warmup/warmdown timestamps, this looks ok.
Did you test this on any parallel iperf test we have internally? Eg. 35469f23-2f89-4b8a-a40d-d29395f71fdf (96 parallel flows) |
this is an arm test which i can't run at the moment... instead i'll schedule a job with all tests that have |
test job scheduled: Edit: made a mistake in scheduling... new job: |
scheduled a test for the other solution to check the difference |
83c9e63
to
fef6817
Compare
Compared the "Option A" test run vs "Option B" test run here: There's a couple of tests where there are different results seen however IMO these are mostly due to the instable nature of multistream performance tests that we usually see anyway. Both test runs evaluates just OK when compared to historical baselines. So IMO both implementations are basically equivalent. As one more step, I'll run a multistream test with 64 threads locally and print out the calculation of the |
So local comparison with running a test in containers is here. The warmup_end code i used:
the output i got:
We can see that the optionA and optionB are identical and they're always basically at most The recipe i used for this locally was with this configuration:
And it is worth noting that the startup of the iperf clients took a while as the difference between the first and last client is sizable:
|
Did the same with the
logs:
here we see some small difference between A and B, but when comparing A to old or B to old imo the difference is irrelevant. |
fef6817
to
8e924a8
Compare
with the new data and testing i believe the simpler option A implementation is better and there should be no real impact on any of our testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides my only comment, this looks ok.
lnst/RecipeCommon/Perf/Measurements/Results/NeperFlowMeasurementResults.py
Outdated
Show resolved
Hide resolved
This index based calculation is inaccurate if the data structure doesn't reflect a single very specific usecase... instead we can simply calculate the timestamps by addition... Signed-off-by: Ondrej Lichtner <[email protected]>
8e924a8
to
3cf4333
Compare
`XDPBenchMeasurements` results used to use `FlowMeasurementResults` as a base class overriding some of its methods because xdp-bench tool doesn't measure CPU usage and so `FlowMeasurementResults` CPU metrics were set to `None`. Recent changes from [0] now expects `FlowMeasurementResults` to use both perf and CPU results. Especially `.{start,end}_timestamp` properties expect CPU metrics to be set and if not, it crashes since not using CPU metrics is "masked" by setting `None` instead of regular result containers. [0] LNST-project#382
`XDPBenchMeasurements` results used to use `FlowMeasurementResults` as a base class overriding some of its methods because xdp-bench tool doesn't measure CPU usage and so `FlowMeasurementResults` CPU metrics were set to `None`. Recent changes from [0] now expects `FlowMeasurementResults` to use both perf and CPU results. Especially `.{start,end}_timestamp` properties expect CPU metrics to be set and if not, it crashes since not using CPU metrics is "masked" by setting `None` instead of regular result containers. [0] LNST-project#382
`XDPBenchMeasurements` results used to use `FlowMeasurementResults` as a base class overriding some of its methods because xdp-bench tool doesn't measure CPU usage and so `FlowMeasurementResults` CPU metrics were set to `None`. Recent changes from [0] now expects `FlowMeasurementResults` to use both perf and CPU results. Especially `.{start,end}_timestamp` properties expect CPU metrics to be set and if not, it crashes since not using CPU metrics is "masked" by setting `None` instead of regular result containers. [0] #382
`XDPBenchMeasurements` results used to use `FlowMeasurementResults` as a base class overriding some of its methods because xdp-bench tool doesn't measure CPU usage and so `FlowMeasurementResults` CPU metrics were set to `None`. Recent changes from [0] now expects `FlowMeasurementResults` to use both perf and CPU results. Especially `.{start,end}_timestamp` properties expect CPU metrics to be set and if not, it crashes since not using CPU metrics is "masked" by setting `None` instead of regular result containers. [0] LNST-project#382
Description
This should fix the issues we regularly see with lnst crashing with exception when iperf failed to establish a connection. The source of the issue is that the created perf result data structure was slightly different.
While testing this I also found issues with the warmup/warmdown timestamp calculation because the zero created zero iperf measurements don't have a duration long enough to include a warmup/warmdown... this however shouldn't crash the trimming and should instead return another zero measurement.
Tests
(Please provide a list of tests that prove that the pull
request doesn't break the stable state of the master branch. This should
include test runs with valid results for all of critical workflows.)
Reviews
@jtluka @enhaut @Axonis
Closes: #