Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duration histogram addition and subtraction #50

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

elv-nate
Copy link
Contributor

I need this change for some of the backend metrics work. It's useful to be able to add and subtract these in order to handle the monotonic counts that come out of influx.

@elv-nate elv-nate requested a review from elv-gilles September 12, 2024 22:34
@elv-nate elv-nate self-assigned this Sep 12, 2024
util/histogram/histogram.go Outdated Show resolved Hide resolved
util/histogram/histogram.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@elv-gilles elv-gilles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, and please add unit-test(s) for moving average.

util/histogram/histogram.go Outdated Show resolved Hide resolved
util/histogram/histogram.go Outdated Show resolved Hide resolved
util/histogram/moving_average_histogram.go Outdated Show resolved Hide resolved
func NewMovingAverageHistogram(
typ DurationHistogramType,
maxDuration time.Duration,
durationPerHistogram time.Duration,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be easier to pass histCount and durationPerHistogram as parameters (instead of maxDuration & durationPerHistogram) ?

This would allows to replace StatLastMinute with StatLastN where the caller knows what n means since he passed the values in the ctor.

func (m *MovingAverageHistogram) StatLastN(n int, f func(h *DurationHistogram) time.Duration) time.Duration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's reasonable, yes. I do want to keep StatLastMinute because realistically, this will be used for metrics largely, and that way the metrics call stays exactly the same regardless of the duration / frequency and what not.

util/histogram/moving_average_histogram.go Outdated Show resolved Hide resolved
Comment on lines 98 to 104
countToKeep := int(time.Minute / m.durationPerHistogram)
if time.Minute%m.durationPerHistogram != 0 {
countToKeep++
}
if countToKeep == 1 {
countToKeep++
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so countToKeep is 2 if m.durationPerHistogram >= 30s ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic to be more correct / hopefully more clear.

util/histogram/moving_average_histogram.go Outdated Show resolved Hide resolved
Comment on lines 86 to 92
// We calculate this countToKeep in order to at least capture the last minute worth of data,
// accounting for the possibility of an empty period that just started.
countToKeep := int(time.Minute/m.durationPerHistogram) + 1
// This case occurs with the duration per histogram is not a divisor of a minute.
if time.Duration(countToKeep)*m.durationPerHistogram < time.Minute+m.durationPerHistogram {
countToKeep++
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This produces:

durationPerHistogram     countToKeep
 1s                        61
 5s                        13
10s                         7
20s                         4
29s                         4
30s                         3
1m0s                        2
1m0.000000001s              2
2m0s                        2
3m0s                        2
3m0.000000001s              2
4m0s                        2

Instead I propose:

func countToKeep(durationPerHistogram, durationToCover time.Duration) int {
	countToKeep := int((durationToCover + 1) / durationPerHistogram)
	if time.Duration(countToKeep)*durationPerHistogram < durationToCover {
		countToKeep++
	}
	return countToKeep
}

func TestCountToKeep(t *testing.T) {
	type testCase struct {
		durationPerHistogram time.Duration
		toCover              time.Duration
		wantCount            int
	}
	for i, tc := range []*testCase{
		{durationPerHistogram: time.Second * 1, toCover: time.Minute, wantCount: 60},
		{durationPerHistogram: time.Second * 5, toCover: time.Minute, wantCount: 12},
		{durationPerHistogram: time.Second * 10, toCover: time.Minute, wantCount: 6},
		{durationPerHistogram: time.Second * 20, toCover: time.Minute, wantCount: 3},
		{durationPerHistogram: time.Second * 29, toCover: time.Minute, wantCount: 3},
		{durationPerHistogram: time.Second * 30, toCover: time.Minute, wantCount: 2},
		{durationPerHistogram: time.Minute * 1, toCover: time.Minute, wantCount: 1},
		{durationPerHistogram: time.Minute + 1, toCover: time.Minute, wantCount: 1},
		{durationPerHistogram: time.Minute * 2, toCover: time.Minute, wantCount: 1},
		{durationPerHistogram: time.Minute * 3, toCover: time.Minute, wantCount: 1},
		{durationPerHistogram: time.Minute*3 + 1, toCover: time.Minute, wantCount: 1},
		{durationPerHistogram: time.Minute * 4, toCover: time.Minute, wantCount: 1},
		{durationPerHistogram: time.Second * 1, toCover: time.Hour, wantCount: 3600},
		{durationPerHistogram: time.Second * 5, toCover: time.Hour, wantCount: 720},
		{durationPerHistogram: time.Minute * 1, toCover: time.Hour, wantCount: 60},
		{durationPerHistogram: time.Minute * 5, toCover: time.Hour, wantCount: 12},
	} {
		ck := countToKeep(tc.durationPerHistogram, tc.toCover)
		require.Equal(t, tc.wantCount, ck, "failed at: %d / %v", i, tc.durationPerHistogram)
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem with this is that it's possible to report basically no data, depending on the rotation schedule. If we always synchronized rotation of the histograms with collection of stats, it would be fine, but that is not the case and in general I prefer to design the system assuming that there is no synchronization between the rotation and data being requested.

For instance, with your example for covering a minute with a one minute histogram, consider the following scenarios. In all these cases, the last histogram was rotated at 12:00:15 (a random time), and so the next histogram will start being filled at 12:01:15.

Case 1 (good case)

A stat over data in the last minute is requested at 12:01:14. countToKeep is 1, so the histogram used to calculate the output statistic holds data from 12:00:15 to 12:01:14, 59 seconds.

Case 2 (average case)

A stat over data in the last minute is requested at 12:00:45. The histogram used to calculate the output statistic holds data from 12:00:15 to 12:00:45, 30 seconds.

Case 3 (bad case)

A stat over data in the last minute is requested at 12:01:16. The histogram has just rotated, and as only one histogram is requested, it holds less than a second of data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Or keep in memory when the rotation occurred and decide - based on when the call happens - if it is necessary to add 1.
Or always add 1 and weight the current and the older buckets based on the time spent since rotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants