[Profiler] Implement WaitHandle wait duration profiling #6426

chrisnas · 2024-12-12T08:15:36Z

Summary of changes

Listen to .NET 9+ events related to WaitHandle-derivated instances to profile wait durations

Reason for change

R&D week based on dotnet/runtime#94737 and #5814

Implementation details

Listen to WaitHandleWaitStart/WaitHandleWaitStop events to measure wait duration and, if possible, get the type of wait object like Mutex, Semaphore, ManualResetEvent or AutoResetEvent

Test coverage

Other details

datadog-ddstaging · 2024-12-12T08:42:57Z

Datadog Report

Branch report: chrisnas/r+d_week_wait_events
Commit report: 797a57f
Test service: dd-trace-dotnet

✅ 0 Failed, 121740 Passed, 537 Skipped, 6h 7m 2.08s Total Time

andrewlock · 2024-12-12T08:50:12Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6426) - mean (69ms)  : 67, 71
     .   : milestone, 69,
    master - mean (69ms)  : 65, 73
     .   : milestone, 69,

    section CallTarget+Inlining+NGEN
    This PR (6426) - mean (978ms)  : 960, 996
     .   : milestone, 978,
    master - mean (979ms)  : 952, 1006
     .   : milestone, 979,

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6426) - mean (107ms)  : 105, 110
     .   : milestone, 107,
    master - mean (107ms)  : 104, 110
     .   : milestone, 107,

    section CallTarget+Inlining+NGEN
    This PR (6426) - mean (678ms)  : 662, 695
     .   : milestone, 678,
    master - mean (678ms)  : 663, 693
     .   : milestone, 678,

gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6426) - mean (91ms)  : 89, 92
     .   : milestone, 91,
    master - mean (91ms)  : 89, 93
     .   : milestone, 91,

    section CallTarget+Inlining+NGEN
    This PR (6426) - mean (634ms)  : 616, 651
     .   : milestone, 634,
    master - mean (633ms)  : 616, 650
     .   : milestone, 633,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6426) - mean (191ms)  : 187, 195
     .   : milestone, 191,
    master - mean (189ms)  : 185, 193
     .   : milestone, 189,

    section CallTarget+Inlining+NGEN
    This PR (6426) - mean (1,091ms)  : 1063, 1119
     .   : milestone, 1091,
    master - mean (1,091ms)  : 1058, 1124
     .   : milestone, 1091,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6426) - mean (276ms)  : 271, 280
     .   : milestone, 276,
    master - mean (275ms)  : 270, 279
     .   : milestone, 275,

    section CallTarget+Inlining+NGEN
    This PR (6426) - mean (866ms)  : 839, 892
     .   : milestone, 866,
    master - mean (868ms)  : 838, 898
     .   : milestone, 868,

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6426) - mean (264ms)  : 259, 269
     .   : milestone, 264,
    master - mean (264ms)  : 259, 270
     .   : milestone, 264,

    section CallTarget+Inlining+NGEN
    This PR (6426) - mean (848ms)  : 810, 885
     .   : milestone, 848,
    master - mean (846ms)  : 817, 874
     .   : milestone, 846,

verdie-g · 2024-12-17T16:36:31Z

profiler/src/ProfilerEngine/Datadog.Profiler.Native/ContentionProvider.cpp

+    currentThreadInfo->SetWaitStart(0ns);
+
+    // We are not interested in waits that are too short
+    if (waitDuration < 1ms)


I'm not sure about this. Assuming a memcache request takes 300 us to complete, we would miss all sync-over-async on them, even though it could be a very serious issue.

What would be the issue with not filtering any waits?

I'm not sure about this. Assuming a memcache request takes 300 us to complete, we would miss all sync-over-async on them, even though it could be a very serious issue.

What would be the issue with not filtering any waits?

We could have an impact on performances (both with the number of samples to serialize and the collection).
This sampling duration will be settable via environment variable

In my experience, this event is very cheap in term of number but also in size, especially compared to GCAllocationTick so I wouldn't be too concerned.

verdie-g · 2024-12-17T16:37:41Z

profiler/src/ProfilerEngine/Datadog.Profiler.Native/ContentionProvider.cpp

+        return;
+    }
+
+    AddContentionSample(0ns, -1, currentThreadInfo->GetWaitType(), waitDuration, 0, WStr(""), _emptyStack);


Does it mean the wait events be branded as contention? I'm not sure sync-over-async can be considered as contention.

Does it mean the wait events be branded as contention? I'm not sure sync-over-async can be considered as contention.

This is an easy way to make the wait information available immediately.
I'm not sure it is worth creating a new profile type but we are discussing it.
I was hoping to be able to make the difference between the different synchronization mechanisms based on the AssociatedObjectID but it is not always available so it won't be possible: the user will have to look at the call stacks in the flamegraph

Is there a risk that contention events will show up twice?

Remove traces

First implementation step

a90e9b8

chrisnas requested review from a team as code owners December 12, 2024 08:15

github-actions bot added the area:profiler Issues related to the continous-profiler label Dec 12, 2024

chrisnas added 2 commits December 13, 2024 19:48

Trying to add wait type based on the MethodTable of the lock object

2153209

Fix comment

64bb53d

verdie-g reviewed Dec 17, 2024

View reviewed changes

chrisnas added 2 commits January 17, 2025 14:57

Add synchronization constructs such as ReaderWriteLock

b7d7a1f

Remove traces

Merge remote-tracking branch 'origin' into chrisnas/r+d_week_wait_events

797a57f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Profiler] Implement WaitHandle wait duration profiling #6426

[Profiler] Implement WaitHandle wait duration profiling #6426

chrisnas commented Dec 12, 2024

datadog-ddstaging bot commented Dec 12, 2024 •

edited

Loading

andrewlock commented Dec 12, 2024 •

edited

Loading

verdie-g Dec 17, 2024

chrisnas Jan 16, 2025

verdie-g Jan 16, 2025

verdie-g Dec 17, 2024

chrisnas Jan 16, 2025 •

edited

Loading

verdie-g Jan 17, 2025

[Profiler] Implement WaitHandle wait duration profiling #6426

Are you sure you want to change the base?

[Profiler] Implement WaitHandle wait duration profiling #6426

Conversation

chrisnas commented Dec 12, 2024

Summary of changes

Reason for change

Implementation details

Test coverage

Other details

datadog-ddstaging bot commented Dec 12, 2024 • edited Loading

Datadog Report

andrewlock commented Dec 12, 2024 • edited Loading

Execution-Time Benchmarks Report ⏱️

verdie-g Dec 17, 2024

Choose a reason for hiding this comment

chrisnas Jan 16, 2025

Choose a reason for hiding this comment

verdie-g Jan 16, 2025

Choose a reason for hiding this comment

verdie-g Dec 17, 2024

Choose a reason for hiding this comment

chrisnas Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

verdie-g Jan 17, 2025

Choose a reason for hiding this comment

datadog-ddstaging bot commented Dec 12, 2024 •

edited

Loading

andrewlock commented Dec 12, 2024 •

edited

Loading

chrisnas Jan 16, 2025 •

edited

Loading