[CONTP-547] reduce language detection refresh period to 1 minute #1336

adel121 · 2025-01-02T12:45:27Z

What does this PR do?

This PR fixes a flakiness in e2e test (See here).

Language detection client runs in the core agent, and periodically sends detected languages to the cluster agent.
When the cluster agent receives a detected language, it patches the related deployment with a language annotation.

The default period is 20 minutes.

If the first detected language is sent before the cluster agent is available, the cluster agent won't receive the detected language until after 20 minutes. This is caused by the absence of a retry mechanism in the language detection client.

As a result, some assertions in the E2E tests of datadog-agent fail, and are reported as flaky.

The ideal fix would be to implement a retry mechanism in the PLD client. In the meantime, a temporary fix to remove flakiness is to simply reduce the refresh period to 1 minute.

Which scenarios this will impact?

All scenarios installing the agent with helm.

Motivation

Fix flakiness in e2e.

Additional Notes

pducolin · 2025-01-03T09:31:33Z

components/datadog/agent/kubernetes_helm.go

@@ -344,6 +344,17 @@ func buildLinuxHelmValues(baseName, agentImagePath, agentImageTag, clusterAgentI
 			},
 			"containers": pulumi.Map{
 				"agent": pulumi.Map{
+					"env": pulumi.StringMapArray{
+						pulumi.StringMap{
+							// TODO: remove this environment variable override once a retry mechanism is added to the language detection client


👏 praise
Seems to be indeed something that could impact customers too, though their environments run longer, so they are less impacted by start up flakiness

reduce language detection refresh period to 1 minute

589ebc5

adel121 requested a review from a team as a code owner January 2, 2025 12:45

pducolin approved these changes Jan 3, 2025

View reviewed changes

adel121 merged commit 4850a5c into main Jan 3, 2025
7 checks passed

adel121 deleted the adelhajhassan/reduce_languagedetection_client_refresh_period_to_1minute branch January 3, 2025 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CONTP-547] reduce language detection refresh period to 1 minute #1336

[CONTP-547] reduce language detection refresh period to 1 minute #1336

adel121 commented Jan 2, 2025

pducolin Jan 3, 2025

[CONTP-547] reduce language detection refresh period to 1 minute #1336

[CONTP-547] reduce language detection refresh period to 1 minute #1336

Conversation

adel121 commented Jan 2, 2025

What does this PR do?

Which scenarios this will impact?

Motivation

Additional Notes

pducolin Jan 3, 2025

Choose a reason for hiding this comment