Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unconfiguring of 'router bgp <ASN> vrf <VRF_NAME>' fails because VRF disable ignores BGP_VRF_AUTO instances #17903

Open
2 tasks done
piotrsuchy opened this issue Jan 22, 2025 · 4 comments
Labels
triage Needs further investigation

Comments

@piotrsuchy
Copy link
Contributor

Description

Configs and exact scripts to reproduce this using two docker containers bridged are below.

Genesis of the issue:

Normally, when we do:

  1. adding of vrf, vxlan and bridge into the kernel
ip link add vrfv10 type vrf table 10
ip link add brv10 type bridge
ip link add vxlan10 type vxlan id 10 dstport 4789
ip link set vxlan10 master brv10
ip link set brv10 master vrfv10
  1. reloading FRR with a new configuration that has vrf configured (assigning a VNI to it) and a bgp instance for that vrf:
 diff -u --color frr-no-vrf-host1.conf frr-vrf-host1.conf
--- frr-no-vrf-host1.conf       2025-01-21 16:54:56.246640943 +0100
+++ frr-vrf-host1.conf  2025-01-21 16:53:12.926118775 +0100
@@ -14,6 +14,10 @@
 debug bgp updates out
 debug bgp zebra
 !
+vrf vrfv10
+ vni 10
+exit-vrf
+!
 interface frr_if0
  ipv6 nd ra-interval 10
  no ipv6 nd suppress-ra
@@ -24,6 +28,19 @@
  no ipv6 nd suppress-ra
 exit
 !
+router bgp 4250100001 vrf vrfv10
+ bgp router-id 10.40.0.1
+ !
+ address-family ipv4 unicast
+  redistribute kernel
+ exit-address-family
+ !
+ address-family l2vpn evpn
+  advertise ipv4 unicast route-map VPC_OUT
+  advertise ipv6 unicast route-map VPC_OUT
+ exit-address-family
+exit
+!
 router bgp 4250100001
  bgp router-id 10.40.0.1
  no bgp suppress-duplicates

It happens in that order - logs looking like this:

2025/01/21 16:02:43 BGP: [GQ4MW-3N8GB] VRF vrfv10(38) is enabled. <====== this is picked up from the kernel
2025/01/21 16:02:43 BGP: [H3RGF-V9W58] VRF enable add vrfv10 id 38
2025/01/21 16:02:43 BGP: [HKBB3-YX6A9] Rx Intf add VRF 38 IF vrfv10
...
...
...
2025/01/21 16:02:43 BGP: [TV0XP-3WR0A] Rx VNI add VRF default VNI 10 tenant-vrf vrfv10 SVI ifindex 39
2025/01/21 16:02:43 BGP: [M3X4Y-24DVB] VRF None vni 10 type-3 route evp [3]:[0]:[32]:[10.40.0.1] RMAC 00:00:00:00:00:00 nexthop 10.40.0.1 esi (null)
2025/01/21 16:02:43 BGP: [TV0XP-3WR0A] Rx VNI add VRF default VNI 10 tenant-vrf vrfv10 SVI ifindex 39
2025/01/21 16:02:43 BGP: [RHWNZ-VRQBG] Rx route ADD VRF 0 kernel[0] 0.0.0.0/0 nexthop 192.168.10.1 (type 3 if 1899) metric 0 distance 0 tag 0
2025/01/21 16:02:43 BGP: [RHWNZ-VRQBG] Rx route ADD VRF 0 kernel[0] 10.40.0.2/32 nexthop 192.168.10.2 (type 3 if 1899) metric 0 distance 0 tag 0
2025/01/21 16:02:43 BGP: [Z38CW-7NYWG] group_announce_route_walkcb: afi=l2vpn, safi=evpn, p=[3]:[0]:[32]:[10.40.0.1]
2025/01/21 16:02:43 BGP: [T5JFA-13199] subgroup_process_announce_selected: p=[3]:[0]:[32]:[10.40.0.1], selected=0x561f363dd350
2025/01/21 16:02:43 BGP: [TN0HX-6G1RR] u2:s2 send UPDATE w/ attr: nexthop 10.40.0.1, extcommunity ET:8 RT:24865:10, pmsi tnltype 6, path
2025/01/21 16:02:43 BGP: [H06SA-0JAPR] u2:s2 send MP_REACH for afi/safi l2vpn/evpn
2025/01/21 16:02:43 BGP: [HVRWP-5R9NQ] u2:s2 send UPDATE RD 10.40.0.1:1 [3]:[0]:[32]:[10.40.0.1] label 10 l2vpn evpn
2025/01/21 16:02:43 BGP: [WEV7K-2GAQ5] u2:s2 send UPDATE len 100 (max message len: 65535) numpfx 1
2025/01/21 16:02:43 BGP: [MBFVT-8GSC6] u2:s2 10.40.0.2 send UPDATE w/ nexthop 10.40.0.1
2025/01/21 16:02:43 BGP: [XXWBM-V772F] 10.40.0.2(host2) rcvd UPDATE w/ attr: nexthop 10.40.0.1, extcommunity RT:24865:10 ET:8, pmsi tnltype 6, path 4250100002 4250100001
2025/01/21 16:02:43 BGP: [RZMGQ-A03CG] 10.40.0.2(host2) rcvd UPDATE about RD 10.40.0.1:1 [3]:[0]:[32]:[10.40.0.1] l2vpn evpn -- DENIED due to: as-path contains our own AS; <==================== END OF KERNEL UPDATES
2025/01/21 16:02:43 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001 <========== HERE we finally create the vrf in BGP
2025/01/21 16:02:43 BGP: [ZZKY3-FX5JH] bgp_get: Registering BGP instance VRF vrfv10 to zebra
2025/01/21 16:02:43 BGP: [TNK7N-FJF7K] Registering VRF 38
2025/01/21 16:02:43 BGP: [HKBB3-YX6A9] Rx Intf add VRF 38 IF brv10
2025/01/21 16:02:43 BGP: [HKBB3-YX6A9] Rx Intf add VRF 38 IF vrfv10
2025/01/21 16:02:43 MGMTD: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00 <==== 
2025/01/21 16:02:43 MGMTD: [G6NKK-8C6DV] end_config: VTY:0x5606c77e7590, pending SET-CFG: 2
2025/01/21 16:02:43 BGP: [RBZV6-DW61Y] Tx redistribute add VRF 38 afi 1 kernel 0
2025/01/21 16:02:43 BGP: [TV0XP-3WR0A] Rx VNI del VRF default VNI 10 tenant-vrf default SVI ifindex 0
2025/01/21 16:02:43 BGP: [XXJ7P-NWW2X] Rx L3VNI ADD VRF vrfv10 VNI 10 Originator-IP 0.0.0.0 RMAC svi-mac fe:ae:3b:21:31:6a vrr-mac fe:ae:3b:21:31:6a filter none svi-if 39
2025/01/21 16:02:43 BGP: [N6DMZ-VH4HB] PSUCHY: bgp_vrf is vrfv10
2025/01/21 16:02:43 BGP: [TZAHW-7DQTC] VRF vrfv10 vni 10 pip enable RMAC fe:ae:3b:21:31:6a sys RMAC fe:ae:3b:21:31:6a static RMAC 00:00:00:00:00:00 is_anycast_mac Disable
2025/01/21 16:02:43 ZEBRA: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/21 16:02:43 ZEBRA: [G6NKK-8C6DV] end_config: VTY:0x5599189dc670, pending SET-CFG: 0
...

However once every X times when this sequence is done, the reload happens before BGP finishes learning of the addition of VRF, VXLAN and kernel. The logs look like this in that case:

2025/01/22 11:57:04 BGP: [TN0HX-6G1RR] u2:s2 send UPDATE w/ attr: nexthop 10.40.0.1, extcommunity ET:8 RT:24865:10, pmsi tnltype 6, path
2025/01/22 11:57:04 BGP: [H06SA-0JAPR] u2:s2 send MP_REACH for afi/safi l2vpn/evpn
2025/01/22 11:57:04 BGP: [HVRWP-5R9NQ] u2:s2 send UPDATE RD 10.40.0.1:1 [3]:[0]:[32]:[10.40.0.1] label 10 l2vpn evpn
2025/01/22 11:57:04 BGP: [WEV7K-2GAQ5] u2:s2 send UPDATE len 100 (max message len: 65535) numpfx 1
2025/01/22 11:57:04 BGP: [MBFVT-8GSC6] u2:s2 10.40.0.2 send UPDATE w/ nexthop 10.40.0.1
2025/01/22 11:57:04 BGP: [XXWBM-V772F] 10.40.0.2(host2) rcvd UPDATE w/ attr: nexthop 10.40.0.1, extcommunity RT:24865:10 ET:8, pmsi tnltype 6, path 4250100002 4250100001
2025/01/22 11:57:04 BGP: [RZMGQ-A03CG] 10.40.0.2(host2) rcvd UPDATE about RD 10.40.0.1:1 [3]:[0]:[32]:[10.40.0.1] l2vpn evpn -- DENIED due to: as-path contains our own AS;
2025/01/22 11:57:04 ZEBRA: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 11:57:04 ZEBRA: [G6NKK-8C6DV] end_config: VTY:0x558501bb0470, pending SET-CFG: 0
2025/01/22 11:57:04 MGMTD: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 11:57:04 MGMTD: [G6NKK-8C6DV] end_config: VTY:0x55689f7efaf0, pending SET-CFG: 2
2025/01/22 11:57:04 BGP: [TV0XP-3WR0A] Rx VNI del VRF default VNI 10 tenant-vrf default SVI ifindex 0
2025/01/22 11:57:04 BGP: [XXJ7P-NWW2X] Rx L3VNI ADD VRF vrfv10 VNI 10 Originator-IP 0.0.0.0 RMAC svi-mac 5a:9e:d5:f9:6a:5d vrr-mac 5a:9e:d5:f9:6a:5d filter none svi-if 331
2025/01/22 11:57:04 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001
2025/01/22 11:57:04 BGP: [ZZKY3-FX5JH] bgp_get: Registering BGP instance VRF vrfv10 to zebra
2025/01/22 11:57:04 BGP: [TNK7N-FJF7K] Registering VRF 330
2025/01/22 11:57:04 BGP: [TZAHW-7DQTC] VRF vrfv10 vni 10 pip enable RMAC 5a:9e:d5:f9:6a:5d sys RMAC 5a:9e:d5:f9:6a:5d static RMAC 00:00:00:00:00:00 is_anycast_mac Disable
2025/01/22 11:57:04 BGP: [HKBB3-YX6A9] Rx Intf add VRF 330 IF brv10
2025/01/22 11:57:04 BGP: [HKBB3-YX6A9] Rx Intf add VRF 330 IF vrfv10
2025/01/22 11:57:04 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001
2025/01/22 11:57:04 BGP: [ZZKY3-FX5JH] bgp_get: Registering BGP instance VRF vrfv10 to zebra
2025/01/22 11:57:04 BGP: [TNK7N-FJF7K] Registering VRF 330
2025/01/22 11:57:04 BGP: [HKBB3-YX6A9] Rx Intf add VRF 330 IF brv10
2025/01/22 11:57:04 BGP: [HKBB3-YX6A9] Rx Intf add VRF 330 IF vrfv10
2025/01/22 11:57:04 STATIC: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 11:57:04 STATIC: [G6NKK-8C6DV] end_config: VTY:0x563478248fa0, pending SET-CFG: 0
2025/01/22 11:57:04 BGP: [RBZV6-DW61Y] Tx redistribute add VRF 330 afi 1 kernel 0

So you can see that there are two times that a VRF is created - and this order is flipped:

Correct - no race condition:

2025/01/21 16:02:43 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001 <========== HERE we finally create the vrf in BGP
...
2025/01/21 16:02:43 BGP: [TV0XP-3WR0A] Rx VNI del VRF default VNI 10 tenant-vrf default SVI ifindex 0
2025/01/21 16:02:43 BGP: [XXJ7P-NWW2X] Rx L3VNI ADD VRF vrfv10 VNI 10 Originator-IP 0.0.0.0 RMAC svi-mac fe:ae:3b:21:31:6a vrr-mac fe:ae:3b:21:31:6a filter none svi-if 39

Race condition case:

2025/01/22 11:57:04 BGP: [TV0XP-3WR0A] Rx VNI del VRF default VNI 10 tenant-vrf default SVI ifindex 0
2025/01/22 11:57:04 BGP: [XXJ7P-NWW2X] Rx L3VNI ADD VRF vrfv10 VNI 10 Originator-IP 0.0.0.0 RMAC svi-mac 5a:9e:d5:f9:6a:5d vrr-mac 5a:9e:d5:f9:6a:5d filter none svi-if 331
2025/01/22 11:57:04 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001
...
2025/01/22 11:57:04 BGP: [HKBB3-YX6A9] Rx Intf add VRF 330 IF brv10
2025/01/22 11:57:04 BGP: [HKBB3-YX6A9] Rx Intf add VRF 330 IF vrfv10
2025/01/22 11:57:04 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001 <==== ANOTHER ADD

So the problem occurs, when if we have a race condition case and the order is flipped, when we go to the next step - removal of interfaces and then reloading of FRR config to the one from before (frr-no-vrf-host1.conf) by:

  1. removal of vrf, vxlan and bridge in kernel (using ip link del)
  2. unconfiguring of VRF and bgp instance of that VRF in FRR (reloading the config)

It fails on reloading the config, because the step 1) doesn't properly remove the whole state of the VRF - the l3vni is lingering, because bgp_vrf_disable() uses bgp = bgp_lookup_by_name(vrf->name; which includes this check:

	for (ALL_LIST_ELEMENTS(bm->bgp, node, nnode, bgp)) {
		if (CHECK_FLAG(bgp->vrf_flags, BGP_VRF_AUTO)) {
			// zlog_warn("PSUCHY: it skips bgp instance %s", name); // log added by me for debug, shows up in the logs
			continue;
		}

So even though it's a legit bgp instance we would like to get - it's skipped. Here's the commit that introduced it:
#16159

I would like to know if it's possible to somehow revert this PR, or fix it in an other way, that wouldn't introduce this kind of behavior in case of a race condition that we see.

Example logs of an issue appearing using a reproduction script:

Iteration 75
Creating VRF, bridge, and VXLAN...
Bringing them up...
Reloading FRR with the VRF/BGP config...
Setup complete.
Removing VRF, bridge, and VXLAN...
Reloading FRR with the no vrf / vpc config...
Failed to execute no router bgp 4250100001 vrf vrfv2050947 <===== here FRR strugles to reload because "Please unconfigure l3vni"
Failed to execute no router bgp 4250100001 vrf
Failed to execute no router bgp 4250100001
Failed to execute no router bgp
[27645|mgmtd] done
[27646|zebra] done
[27652|bgpd] done
[27660|watchfrr] done
[27662|staticd] done
[27670|mgmtd] done
[27671|zebra] done
[27677|bgpd] done
[27685|watchfrr] done
[27687|staticd] done
router bgp 4250100001 vrf vrfv2050947
BUG FOUND. It wasn't possible to delete 'router bgp ASN vrf VRF_NAME'. Stopping script
Stopping loop at iteration 75 due to VRF configuration found.

Version

root@host1:~# vtysh -c 'show ver'
FRRouting 10.1.1 (host1) on Linux(5.15.0-119-generic).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--disable-rpki' '--disable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'



It is frr-10.1.1 plus a small patch with additional logs:



cat diff_frr_10.1.1
diff --git a/bgpd/bgp_evpn.c b/bgpd/bgp_evpn.c
index 7af6ff7ce..99e1e637f 100644
--- a/bgpd/bgp_evpn.c
+++ b/bgpd/bgp_evpn.c
@@ -6781,6 +6781,11 @@ int bgp_evpn_local_l3vni_add(vni_t l3vni, vrf_id_t vrf_id,

        /* if the BGP vrf instance doesn't exist - create one */
        bgp_vrf = bgp_lookup_by_vrf_id(vrf_id);
+       if (!bgp_vrf) {
+               zlog_debug("PSUCHY: bgp_vrf is not found");
+       } else {
+               zlog_debug("PSUCHY: bgp_vrf is %s", vrf_id_to_name(bgp_vrf->vrf_id));
+       }
        if (!bgp_vrf) {

                int ret = 0;
@@ -6804,6 +6809,10 @@ int bgp_evpn_local_l3vni_add(vni_t l3vni, vrf_id_t vrf_id,

                /* mark as auto created */
                SET_FLAG(bgp_vrf->vrf_flags, BGP_VRF_AUTO);
+               if (bgp_debug_zebra(NULL))
+                       zlog_debug(
+                               "PSUCHY: VRF %s vni %u IS SET TO AUTO",
+                vrf_id_to_name(bgp_vrf->vrf_id), bgp_vrf->l3vni);
        }

        /* associate the vrf with l3vni and related parameters */
diff --git a/bgpd/bgp_evpn_vty.c b/bgpd/bgp_evpn_vty.c
index 846a82ba9..7225919ce 100644
--- a/bgpd/bgp_evpn_vty.c
+++ b/bgpd/bgp_evpn_vty.c
@@ -6527,6 +6527,7 @@ DEFUN (show_bgp_vrf_l3vni_info,
                vty_out(vty, "BGP VRF: %s\n", name);
                vty_out(vty, "  Local-Ip: %pI4\n", &bgp->originator_ip);
                vty_out(vty, "  L3-VNI: %u\n", bgp->l3vni);
+               vty_out(vty, "  FLAGS: %u\n", bgp->vrf_flags);
                vty_out(vty, "  Rmac: %s\n",
                        prefix_mac2str(&bgp->rmac, buf, sizeof(buf)));
                vty_out(vty, "  VNI Filter: %s\n",
diff --git a/bgpd/bgp_vty.c b/bgpd/bgp_vty.c
index 1a87799ad..726afb93e 100644
--- a/bgpd/bgp_vty.c
+++ b/bgpd/bgp_vty.c
@@ -12626,6 +12626,7 @@ static void bgp_show_all_instances_summary_vty(struct vty *vty, afi_t afi,

        for (ALL_LIST_ELEMENTS(bm->bgp, node, nnode, bgp)) {
                if (CHECK_FLAG(bgp->vrf_flags, BGP_VRF_AUTO))
+                       zlog_debug("PSUCHY: VRF_AUTO for vrf %s vni %d", vrf_id_to_name(bgp->vrf_id), bgp->l3vni);
                        continue;

                nbr_output = true;
diff --git a/bgpd/bgpd.c b/bgpd/bgpd.c
index 894226ada..559deb9fa 100644
--- a/bgpd/bgpd.c
+++ b/bgpd/bgpd.c
@@ -3621,8 +3621,10 @@ struct bgp *bgp_lookup_by_name(const char *name)
        struct listnode *node, *nnode;

        for (ALL_LIST_ELEMENTS(bm->bgp, node, nnode, bgp)) {
-               if (CHECK_FLAG(bgp->vrf_flags, BGP_VRF_AUTO))
+               if (CHECK_FLAG(bgp->vrf_flags, BGP_VRF_AUTO)) {
+                       zlog_warn("PSUCHY: it skips bgp instance %s", name);
                        continue;
+               }
                if ((bgp->name == NULL && name == NULL)
                    || (bgp->name && name && strcmp(bgp->name, name) == 0))
                        return bgp;

How to reproduce

Here everything is described:
https://github.com/piotrsuchy/tinynetlab/tree/main/reproduction_setups/unconfigure_l3vni_upstream

Running just one bash script sets up the whole environment, using docker images from dockerhub and extra setup steps.

Expected behavior

I expect, similarly to FRR-8.4.2, that even if there is a race condition, if we disable the VRF, it properly removes the l3vni.

Actual behavior

When a race condition happens, the bgp instance of the VRF is an 'AUTO_VRF_BGP' and when we ask for a vrf disable, it is skipped:

f153b9a

That means there is a subsequent reload fails because it says:
"Please unconfigure l3vni"

And the "router bgp vrf <VRF_NAME>" is not deleted from the config.

Additional context

No response

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.
@piotrsuchy piotrsuchy added the triage Needs further investigation label Jan 22, 2025
@ton31337
Copy link
Member

Could you test the latest versions and also please test with this patch #17652.

@piotrsuchy
Copy link
Contributor Author

piotrsuchy commented Jan 22, 2025

It does appear on master as well:

root@host1:~# vtysh -c 'show ver'
FRRouting 10.3-dev (host1) on Linux(5.15.0-119-generic).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--disable-rpki' '--disable-scripting' '--enable-pim6d' '--disable-grpc' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-pcre2posix' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'
Iteration 2
Creating VRF, bridge, and VXLAN...
Bringing them up...
Reloading FRR with the VRF/BGP config...
Setup complete.
Removing VRF, bridge, and VXLAN...
Reloading FRR with the no vrf / vpc config...
[793|zebra] done
[792|mgmtd] done
[799|bgpd] done
[807|watchfrr] done
[809|staticd] done
[818|zebra] done
[817|mgmtd] done
[824|bgpd] done
[832|watchfrr] done
[834|staticd] done
Failed to execute no router bgp 4250100001 vrf vrfv10
Failed to execute no router bgp 4250100001 vrf
Failed to execute no router bgp 4250100001
Failed to execute no router bgp
router bgp 4250100001 vrf vrfv10
BUG FOUND. It wasn't possible to delete 'router bgp ASN vrf VRF_NAME'. Stopping script
root@host1:~# cat /frr.log  | grep --color -C 10 Creating
2025/01/22 13:38:38 ZEBRA: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 13:38:38 ZEBRA: [G6NKK-8C6DV] end_config: VTY:0x5601a818b1d0, pending SET-CFG: 0
2025/01/22 13:38:38 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001
2025/01/22 13:38:38 BGP: [ZZKY3-FX5JH] bgp_get: Registering BGP instance VRF vrfv10 to zebra
2025/01/22 13:38:38 BGP: [VE0JD-7ZESQ] Registering VRF vrfv10
2025/01/22 13:38:38 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF brv10
2025/01/22 13:38:38 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF vrfv10
2025/01/22 13:38:38 BGP: [HKP8J-MZHJ2] Tx redistribute add VRF vrfv10 afi 1 kernel 0
2025/01/22 13:38:38 BGP: [Z1RP6-0X3QJ] Creating Default VRF, AS 4250100001
2025/01/22 13:38:38 BGP: [TTAN7-16X9N] dup addr detect enable max_moves 5 time 180 freeze disable freeze_time 0
2025/01/22 13:38:38 BGP: [ZZKY3-FX5JH] bgp_get: Registering BGP instance VRF default to zebra
2025/01/22 13:38:38 BGP: [VE0JD-7ZESQ] Registering VRF default
2025/01/22 13:38:38 BGP: [KGTKH-FVHEW] Rx Router Id update VRF 0 Id 10.40.0.1/32
2025/01/22 13:38:38 BGP: [WMCA1-27995] RID change : vrf VRF default(0), RTR ID 10.40.0.1
2025/01/22 13:38:38 BGP: [R28N5-6MEFQ] Rx Intf add VRF default IF eth0
2025/01/22 13:38:38 BGP: [HFMHR-E3VMR] Rx Intf address add VRF default IF eth0 addr 192.168.10.2/24
2025/01/22 13:38:38 BGP: [R28N5-6MEFQ] Rx Intf add VRF default IF lo
2025/01/22 13:38:38 BGP: [HFMHR-E3VMR] Rx Intf address add VRF default IF lo addr 10.40.0.1/32
2025/01/22 13:38:38 BGP: [R28N5-6MEFQ] Rx Intf add VRF default IF vxlan10
--
2025/01/22 13:38:39 BGP: [TV0XP-3WR0A] Rx VNI add VRF default VNI 10 tenant-vrf vrfv10 SVI ifindex 6
2025/01/22 13:38:39 BGP: [M4H2R-7F9N8] Rx route ADD VRF default kernel[0] 0.0.0.0/0 nexthop 192.168.10.1 (type 3 if 1932) metric 0 distance 0 tag 0
2025/01/22 13:38:39 BGP: [M4H2R-7F9N8] Rx route ADD VRF default kernel[0] 10.40.0.2/32 nexthop 192.168.10.2 (type 3 if 1932) metric 0 distance 0 tag 0
2025/01/22 13:38:39 BGP: [QFD5T-57760] install_uninstall_routes_for_vni: Total 1 L2VNI VPNs pending to be processed for remote route installation
2025/01/22 13:38:39 ZEBRA: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 13:38:39 ZEBRA: [G6NKK-8C6DV] end_config: VTY:0x5601a818b1d0, pending SET-CFG: 0
2025/01/22 13:38:39 MGMTD: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 13:38:39 MGMTD: [G6NKK-8C6DV] end_config: VTY:0x5587f1bc5ab0, pending SET-CFG: 2
2025/01/22 13:38:39 BGP: [TV0XP-3WR0A] Rx VNI del VRF default VNI 10 tenant-vrf default SVI ifindex 0
2025/01/22 13:38:39 BGP: [XXJ7P-NWW2X] Rx L3VNI ADD VRF vrfv10 VNI 10 Originator-IP 0.0.0.0 RMAC svi-mac f6:a4:3a:0e:3e:ab vrr-mac f6:a4:3a:0e:3e:ab filter none svi-if 6
2025/01/22 13:38:39 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001
2025/01/22 13:38:39 BGP: [ZZKY3-FX5JH] bgp_get: Registering BGP instance VRF vrfv10 to zebra
2025/01/22 13:38:39 BGP: [VE0JD-7ZESQ] Registering VRF vrfv10
2025/01/22 13:38:39 BGP: [TZAHW-7DQTC] VRF vrfv10 vni 10 pip enable RMAC f6:a4:3a:0e:3e:ab sys RMAC f6:a4:3a:0e:3e:ab static RMAC 00:00:00:00:00:00 is_anycast_mac Disable
2025/01/22 13:38:39 BGP: [T0MP2-YRTMX] Scheduling L3VNI ADD to be processed later for VRF vrfv10 VNI 10
2025/01/22 13:38:39 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF brv10
2025/01/22 13:38:39 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF vrfv10
2025/01/22 13:38:39 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001
2025/01/22 13:38:39 BGP: [ZZKY3-FX5JH] bgp_get: Registering BGP instance VRF vrfv10 to zebra
2025/01/22 13:38:39 BGP: [VE0JD-7ZESQ] Registering VRF vrfv10
2025/01/22 13:38:39 STATIC: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 13:38:39 STATIC: [G6NKK-8C6DV] end_config: VTY:0x5595e7c9c7b0, pending SET-CFG: 0
2025/01/22 13:38:39 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF brv10
2025/01/22 13:38:39 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF vrfv10
2025/01/22 13:38:39 BGP: [HKP8J-MZHJ2] Tx redistribute add VRF vrfv10 afi 1 kernel 0
2025/01/22 13:38:39 BGP: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 13:38:39 BGP: [G6NKK-8C6DV] end_config: VTY:0x556e8ebfb0b0, pending SET-CFG: 0
2025/01/22 13:38:39 BGP: [J9NHR-QQRCW] install_uninstall_routes_for_vrf: Total 1 L3VNI BGP-VRFs pending to be processed for remote route installation

Trying with the patch in a moment EDIT - with a patch it also does appear:

2025/01/22 14:11:28 ZEBRA: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 14:11:28 ZEBRA: [G6NKK-8C6DV] end_config: VTY:0x5647b7c44530, pending SET-CFG: 0
2025/01/22 14:11:28 MGMTD: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 14:11:28 MGMTD: [G6NKK-8C6DV] end_config: VTY:0x55a1a8aa7270, pending SET-CFG: 2
2025/01/22 14:11:28 BGP: [TV0XP-3WR0A] Rx VNI del VRF default VNI 10 tenant-vrf default SVI ifindex 0
2025/01/22 14:11:28 BGP: [XXJ7P-NWW2X] Rx L3VNI ADD VRF vrfv10 VNI 10 Originator-IP 0.0.0.0 RMAC svi-mac b2:02:7f:22:91:56 vrr-mac b2:02:7f:22:91:56 filter none svi-if 45
2025/01/22 14:11:28 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001
2025/01/22 14:11:28 BGP: [ZZKY3-FX5JH] bgp_get: Registering BGP instance VRF vrfv10 to zebra
2025/01/22 14:11:28 BGP: [VE0JD-7ZESQ] Registering VRF vrfv10
2025/01/22 14:11:28 BGP: [TZAHW-7DQTC] VRF vrfv10 vni 10 pip enable RMAC b2:02:7f:22:91:56 sys RMAC b2:02:7f:22:91:56 static RMAC 00:00:00:00:00:00 is_anycast_mac Disable
2025/01/22 14:11:28 BGP: [T0MP2-YRTMX] Scheduling L3VNI ADD to be processed later for VRF vrfv10 VNI 10
2025/01/22 14:11:28 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF brv10
2025/01/22 14:11:28 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF vrfv10
2025/01/22 14:11:28 BGP: [NTAZ6-NXSGN] Creating VRF vrfv10, AS 4250100001
2025/01/22 14:11:28 BGP: [ZZKY3-FX5JH] bgp_get: Registering BGP instance VRF vrfv10 to zebra
2025/01/22 14:11:28 BGP: [VE0JD-7ZESQ] Registering VRF vrfv10
2025/01/22 14:11:28 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF brv10
2025/01/22 14:11:28 BGP: [R28N5-6MEFQ] Rx Intf add VRF vrfv10 IF vrfv10
2025/01/22 14:11:28 STATIC: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025/01/22 14:11:28 STATIC: [G6NKK-8C6DV] end_config: VTY:0x560315f407b0, pending SET-CFG: 0
Iteration 6
Creating VRF, bridge, and VXLAN...
Bringing them up...
Reloading FRR with the VRF/BGP config...
Setup complete.
Removing VRF, bridge, and VXLAN...
Reloading FRR with the no vrf / vpc config...
Failed to execute no router bgp 4250100001 vrf vrfv10
Failed to execute no router bgp 4250100001 vrf
Failed to execute no router bgp 4250100001
Failed to execute no router bgp
[3282|zebra] done
[3281|mgmtd] done
[3288|bgpd] done
[3298|staticd] done
[3296|watchfrr] done
[3306|mgmtd] done
[3307|zebra] done
[3313|bgpd] done
[3321|watchfrr] done
[3323|staticd] done
router bgp 4250100001 vrf vrfv10
BUG FOUND. It wasn't possible to delete 'router bgp ASN vrf VRF_NAME'. Stopping script
Stopping loop at iteration 6 due to VRF configuration found.

@ton31337
Copy link
Member

I just pulled https://github.com/piotrsuchy/tinynetlab, changed iteration count from 100 to 1000, and ran sudo ./prepare_environment.sh...

Iteration 978
Creating VRF, bridge, and VXLAN...
Bringing them up...
Reloading FRR with the VRF/BGP config...
...

Am I missing something to reproduce this?

@piotrsuchy
Copy link
Contributor Author

I provided a virtual machine for Donatas to run the reproduction on. Because we rely on the race condition it might be the case that it happens with a smaller probability on 'faster' machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

2 participants