-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix orch stuch when removing vlan member (#3294) #3295
base: master
Are you sure you want to change the base?
Fix orch stuch when removing vlan member (#3294) #3295
Conversation
What I did ignore the returned value of setPortPvid()
|
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@prsunny |
Is it because the two commands are executed at the same time? what if there is a delay b/w the two commands? |
@prsunny thanks for your reply. |
What I did
Fixes #3294
The root cause of this issue is the two data struct of vlan member info in orchagent is not in sync.
Why I did it
Fix the bug
How I did it
The return value of setPortPvid() is does not matter, so we ignore it.
So we get the m_members" and "m_portVlanMember" in sync all the time.
How I verified it
Details if related
When one deletes one interface from a vlan, then makes the interface to router interface(add a ip addr to interface)
We have no way to ensure that above 2 config arrives at Orch in order.
It's possible that the "creating a router interface" arrives at first. If so, then the "removing vlan member" will be failed.
There are 2 data struct storing the vlan member info in orchagent. (They should be in sync all the time.)
"class Port" in port.h
The instance of "class Port" is vlan interface. it's "m_members" is a set of vlan member, like EhternetXX, EthernetYY...
"class PortsOrch" in portsorch.h
The instance of "class PortsOrch" is EthernetXX. It's m_portVlanMember is a map of vlan info, like Vlan100, Vlan200...
Please take a look at PortsOrch::removeVlanMember().
if setPortPvid() retrun fail(because the port is already a router interface). So the "m_members" and "m_portVlanMember" will be not in sync.
In the next enter of removeVlanMember() with same params.
The iterator "vlan_member" is point to the end, but the assert() doesn't work because NOS is relase version.(if NOS is in debug version, the assert() will trigger abort())
When m_portVlanMember[port.m_alias].erase(vlan_member) erase this end iterator, the c++ std lib will be stuck, occupy CPU 100%.