From 4317df5abab0d5ad09e36deb757a7eadd1b920ce Mon Sep 17 00:00:00 2001
From: Jianquan Ye <jianquanye@microsoft.com>
Date: Wed, 8 Jan 2025 12:49:49 +1000
Subject: [PATCH] Update snmpd.conf.j2 prolong agentXTimeout to avoid timeout
 failure in high CPU utilization scenario (#21316)

Why I did it
Fix #21314
Update and prolong the timeout of the requests between snmpd and SNMP AgentX.

In SONiC SNMP AgentX, the MIB updaters and AgentX client shares the same AsyncIO/Coroutine event loop.
During the MIB updaters update the SNMP values, the AgentX client can't respond to the snmpd request.

The default value of snmpd request is 1s(timeout) * 5(retries)

When the CPU is high, the MIB updaters are slow, 1s timeout is not enough, even if it retries 5 times.
Hence update to 5s(timeout) * 4(retries), the time windows = 20s, which makes sure the SNMP request can be handled even with 100% CPU utilization.

Work item tracking
Microsoft ADO 30112399:

How I did it
Update the default value(https://linux.die.net/man/5/snmpd.conf):

agentXTimeout 1(default value) -> 5
agentXRetries 5(default value) -> 4

How to verify it
Test on Cisco chassis, test_snmp_cpu.py which triggers 100% CPU utilization test whether snmp requests work well.
---
 dockers/docker-snmp/snmpd.conf.j2 | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/dockers/docker-snmp/snmpd.conf.j2 b/dockers/docker-snmp/snmpd.conf.j2
index b6dd826007a1..796b041c9cff 100644
--- a/dockers/docker-snmp/snmpd.conf.j2
+++ b/dockers/docker-snmp/snmpd.conf.j2
@@ -194,6 +194,9 @@ trapsink {{ v3SnmpTrapIp }}:{{ v3SnmpTrapPort }}{% if v3SnmpTrapVrf != 'None' %}
 #
                                            #  Run as an AgentX master agent
 master          agentx
+agentXTimeout   5
+agentXRetries   4
+
 # internal socket to allow extension to other docker containers
 # Currently the other container using this is docker-fpm-frr
 # make sure this line matches bgp:/etc/snmp/frr.conf