From 4317df5abab0d5ad09e36deb757a7eadd1b920ce Mon Sep 17 00:00:00 2001 From: Jianquan Ye Date: Wed, 8 Jan 2025 12:49:49 +1000 Subject: [PATCH] Update snmpd.conf.j2 prolong agentXTimeout to avoid timeout failure in high CPU utilization scenario (#21316) Why I did it Fix #21314 Update and prolong the timeout of the requests between snmpd and SNMP AgentX. In SONiC SNMP AgentX, the MIB updaters and AgentX client shares the same AsyncIO/Coroutine event loop. During the MIB updaters update the SNMP values, the AgentX client can't respond to the snmpd request. The default value of snmpd request is 1s(timeout) * 5(retries) When the CPU is high, the MIB updaters are slow, 1s timeout is not enough, even if it retries 5 times. Hence update to 5s(timeout) * 4(retries), the time windows = 20s, which makes sure the SNMP request can be handled even with 100% CPU utilization. Work item tracking Microsoft ADO 30112399: How I did it Update the default value(https://linux.die.net/man/5/snmpd.conf): agentXTimeout 1(default value) -> 5 agentXRetries 5(default value) -> 4 How to verify it Test on Cisco chassis, test_snmp_cpu.py which triggers 100% CPU utilization test whether snmp requests work well. --- dockers/docker-snmp/snmpd.conf.j2 | 3 +++ 1 file changed, 3 insertions(+) diff --git a/dockers/docker-snmp/snmpd.conf.j2 b/dockers/docker-snmp/snmpd.conf.j2 index b6dd826007a1..796b041c9cff 100644 --- a/dockers/docker-snmp/snmpd.conf.j2 +++ b/dockers/docker-snmp/snmpd.conf.j2 @@ -194,6 +194,9 @@ trapsink {{ v3SnmpTrapIp }}:{{ v3SnmpTrapPort }}{% if v3SnmpTrapVrf != 'None' %} # # Run as an AgentX master agent master agentx +agentXTimeout 5 +agentXRetries 4 + # internal socket to allow extension to other docker containers # Currently the other container using this is docker-fpm-frr # make sure this line matches bgp:/etc/snmp/frr.conf