Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CONTP-547] reduce language detection refresh period to 1 minute #1336

Conversation

adel121
Copy link
Contributor

@adel121 adel121 commented Jan 2, 2025

What does this PR do?

This PR fixes a flakiness in e2e test (See here).

Language detection client runs in the core agent, and periodically sends detected languages to the cluster agent.
When the cluster agent receives a detected language, it patches the related deployment with a language annotation.

The default period is 20 minutes.

If the first detected language is sent before the cluster agent is available, the cluster agent won't receive the detected language until after 20 minutes. This is caused by the absence of a retry mechanism in the language detection client.

As a result, some assertions in the E2E tests of datadog-agent fail, and are reported as flaky.

The ideal fix would be to implement a retry mechanism in the PLD client. In the meantime, a temporary fix to remove flakiness is to simply reduce the refresh period to 1 minute.

Which scenarios this will impact?

All scenarios installing the agent with helm.

Motivation

Fix flakiness in e2e.

Additional Notes

@adel121 adel121 requested a review from a team as a code owner January 2, 2025 12:45
@@ -344,6 +344,17 @@ func buildLinuxHelmValues(baseName, agentImagePath, agentImageTag, clusterAgentI
},
"containers": pulumi.Map{
"agent": pulumi.Map{
"env": pulumi.StringMapArray{
pulumi.StringMap{
// TODO: remove this environment variable override once a retry mechanism is added to the language detection client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 praise
Seems to be indeed something that could impact customers too, though their environments run longer, so they are less impacted by start up flakiness

@adel121 adel121 merged commit 4850a5c into main Jan 3, 2025
7 checks passed
@adel121 adel121 deleted the adelhajhassan/reduce_languagedetection_client_refresh_period_to_1minute branch January 3, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants