43 – DCNM EPL and Network Insight in the Context of VXLAN EVPN Multi-site

Refresh on Endpoint Locator in the context of DCNM

The Endpoint Locator (EPL) feature allows DCNM to monitor in real-time the endpoints (EPs), the switches (location) where the EPs have been learnt, networks & VRF associated to the EPs, and this analysis is supported for up to 4 VXLAN EVPN fabrics with a maximum of 100 thousand endpoints across all fabrics. The analysis results can be displayed in a single pane of glass combining the whole infrastructure, either from the datacenter scope or from the Multi-Site Domain (MSD) scope.

The state information for each element is displayed in real time and recorded between two synch processes providing the network life history of an endpoint, as well as the time of connectivity of a particular IP and/or MAC address or the moves associated to endpoints across different locations.

An endpoint can be a virtual machine (VM), a container, a bare-metal server, a network service appliance and so on. In other terms, any element that offers network reachability information learnt by the VXLAN BGP EVPN fabric (EVPN Route-type 2 advertisements).

EPL relies on the MP-BGP EVPN Address Family updates to pinpoint endpoint information. In order to get a full visibility of the EPs and their locations, DCNM needs to peer with the BGP Route-Reflector(s) (a functionality usually deployed on the Spine nodes) to get these updates. For that purposes DCNM offers a dedicated interface Eth-2, also known as In-Band interface to peer with the RR(s).

DCNM Cluster and Use of Compute Nodes

Some applications like “Network insights for Resources” (NIR) require more CPU and more memory resources. Consequently, DCNM must be deployed in “Cluster Mode”, meaning with additional Compute nodes in order to distribute the workload and replicate database information across different service resources.

The Compute Nodes expand therefore the resources for the DCNM framework to scale-out according to the end-user requirements. Also, additional Compute Nodes are required, traditionally 3 nodes, if you need to manage a large number of nodes (typically more than 80).

When the cluster mode is deployed, in the context of the EPL tracking service, the network interface used to peer between the RR and the DCNM framework is automatically provisioned by DCNM in a container created in one the available Compute nodes. One In-Band Management IP address is automatically created for each Fabric.

Notice for NIR application, DCNM will also automatically provision additional In-Band interfaces across the Compute nodes cluster for all network devices solicitated for telemetry and data analytics, as discussed in the chapter “Network Insight in the context of DCNM” below.

Figure 1: DCNM HA connectivity in Cluster Mode

Figure 1 depicts the DCNM HA cluster with 3 Compute Nodes with the In-Band Management network (in red) and the two IP addresses provisioned by DCNM on Compute Node 1 and 2, respectively (33.33.33.97 and 33.33.33.98). Each IP address is automatically created during the EPL configuration for each VXLAN EVPN Fabric. In our case studies, we have two VXLAN EVPN Fabrics, as presented in the figure 4 below.

The IP address assigned to the compute nodes for In-band management purposes can be retrieved directly from the EPL configuration page (Control  Endpoint Locator  Configure). As shown in Figure 2, you can click on the “i” icon on the top right corner to display the BGP neighbor configuration of the Route-Reflector, highlighting the peering with the DCNM In-band Mgmt address provisioned on one of the Compute nodes.

Figure 2: Endpoint Locator Configuration pane in Fabric-1

Repeat the same for the EPL configuration for the other Fabric to get the second In-band IP address for Fabric-2.

Figure 3: Endpoint Locator Configuration pane in Fabric-2

In addition to the BGP configurations, DCNM will configure automatically the interfaces eth2 for In-Band Management in the Compute nodes.

show In-Band Interface eth2 from one DCNM Compute Node

$ ifconfig eth2
eth2: flags=4163 mtu 1500
inet 33.33.33.97 netmask 255.255.255.0 broadcast 33.33.33.255
inet6 fe80::250:56ff:fe98:138d prefixlen 64 scopeid 0x20
ether 00:50:56:98:13:8d txqueuelen 1000 (Ethernet)
RX packets 59136853 bytes 48581283056 (45.2 GiB)
RX errors 0 dropped 371 overruns 0 frame 0
TX packets 29383282 bytes 2195885777 (2.0 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

DCNM relies on a dedicated policy template called “epl_evpn_bgp_rr_neighborto push the BGP configurations toward each Route-Reflector (Spine nodes) that belongs to a VXLAN EVPN fabric from where EPL needs to retrieve the endpoints’ information. The Policy template can be retrieved by selecting the “View/Edit policies” for every RR Spine node of interest. The BGP AS number and the BGP IPv4 Neighbor address are automatically configured by DCNM. This template is non-editable.

As depicted in Figure 2 and Figure 3, the BGP configuration CLI is pushed to the RRs as shown in this sample below for the RR in Fabric-2.

 router bgp 65512
   neighbor 33.33.33.98
     remote-as 65512
     address-family l2vpn evpn
       send-community
       send-community extended
       route-reflector-client

Notice how the “update-source loopback0” command is not included in the BGP neighbor configuration of the Spine nodes. Currently, DCNM doesn’t mandate to receive data packet only originated from each configured RR (Spine node) interface loopback 0 (BGP Router-ID).

Figure 4 below shows the BGP peerings that must be established between the Route-Reflectors of each VXLAN EVPN Fabric and the In-band management interfaces of the compute nodes part of the DCNM cluster.

Figure 4: BGP peering between DCNM Compute Nodes and RR’s from each Fabric

Adding redistribution of BGP to OSPF (and vice-versa) between the two Fabrics

As depicted in Figure 4, routing information to reach the Route-Reflectors deployed in the remote site must be exchanged across sites to establish the MP-BGP EVPN adjacencies required to track the endpoints localized in Fabric-2. Those adjacencies must be established between the compute nodes In-band IP addresses and the loopback0 interfaces deployed on the remote RRs.

When deploying VXLAN Multi-Site, only the minimum required ‘underlay’ IP prefixes are exchanged between sites (just to ensure the establishment of control plane and data plane connectivity between the Border Gateway nodes). This means that the In-band subnet (33.33.33.0/24 in our example) used to address the containers on the compute nodes and the loopback addresses of the remote RRs (addressed out of the original TEP pool assigned by DCNM during the fabric bring up process) are not exchanged by default between fabrics.

As a result, it is therefore required to explicitly create specific route-maps to redistribute the In-band network 33.33.33.0/24 from Fabric-1 to Fabric-2 and the remote RRs Loopback0 IP addresses to the local leaf nodes where the DCNM cluster nodes are attached (actually, we will redistribute the remote RR reachability information to the whole Fabric-1).

Hence, we need first to create a FreeForm Template and apply it to the pair of BGWs located in Fabric-1 as follow:

  1. Go to Control => Fabric Builder => select Fabric-1
  2. Select Tabular View
  3. Check the 2 BGWs
  4. Select View/Edit Policies
  5. Add a new Policy (press + button)
    • In the policy name field, type “free”, and select “switch_freeform”
  6. A pop-up window appears
    • Add a description
    • Enter the CLI as shown in Figure 5

Note: by selecting the two BGW devices, the freeform template will be created for each selected node. This is possible only if the CLI configuration to be applied is strictly identical for both devices.

Figure 5: Switch FreeForm Template to redistribute the Route-map

Example of Route-map for Fabric-1

The configuration sample below shows how to exchange the required routing information in and out of Fabric 1.

 ip prefix-list EPL-RR-FAB2 seq 5 permit 10.22.0.5/32
ip prefix-list EPL-NWK seq 5 permit 33.33.33.0/24
!
route-map EPL-BGP-to-OSPF permit 10
  match ip address prefix-list EPL-RR-FAB2
route-map EPL-OSPF-to-BGP permit 10
  match ip address prefix-list EPL-NWK
!
router ospf UNDERLAY
  redistribute bgp 65510 route-map EPL-BGP-to-OSPF
!
router bgp 65510
  address-family ipv4 unicast
    redistribute ospf UNDERLAY route-map EPL-OSPF-to-BGP
  • 10.22.0.5/32 associated with the prefix-list EPL-RR-FAB2 is the Router-ID for the Route-Reflector based in Fabric-2 (Loopback0).
  • 33.33.33.0/24 associated with the prefix-list EPL-NWK is the In-Band Management network used for the services that are created on the compute nodes.
  • Under the router OSPF process, the route-map EPL-BGP-to-OSPF permits the redistribution of 10.22.0.5/32 learnt from the BGP EVPN adjacencies with the remote BGW nodes into the OSPF (underlay) protocol used in Fabric 1.
  • Under the router BGP process, the route-map EPL-OSPF-to-BGP permits the redistribution of the subnet 33.33.33.0/24 into the BGP EVPN Control Plane, to be able to advertise it to the remote BGW nodes.
Figure 6: Route-maps in BGWs located in Fabric-1

Example of Route-map for Fabric-2

Likewise, you need to create another FreeForm Template to redistribute the proper routes in and out of Fabric 2, as listed below and depicted in Figure 7.

 ip prefix-list EPL-NWK seq 5 permit 33.33.33.0/24
ip prefix-list EPL-RR-FAB2 seq 5 permit 10.22.0.5/32
!
route-map EPL-BGP-to-OSPF permit 10
  match ip address prefix-list EPL-NWK
route-map EPL-OSPF-to-BGP permit 10
  match ip address prefix-list EPL-RR-FAB2
!
router ospf UNDERLAY
  redistribute bgp 65512 route-map EPL-BGP-to-OSPF
!
router bgp 65512
  address-family ipv4 unicast
    redistribute ospf UNDERLAY route-map EPL-OSPF-to-BGP
Figure 7: Route-maps in BGWs located in Fabric-2

When the freeform templates have been saved and deployed, you can check the In-band reachability from the Route-reflector in Fabric-2:

Fabric-2-Spine-2# show bgp l2vpn evpn neighbors 33.33.33.98
BGP neighbor is 33.33.33.98, remote AS 65512, ibgp link, Peer index 7
BGP version 4, remote router ID 33.33.33.1
Neighbor previous state = OpenConfirm
BGP state = Established, up for 10:01:07
Neighbor vrf: default
Last read 00:00:07, hold time = 90, keepalive interval is 30 seconds
..//..
Forwarding state preserved by peer for:
Restart time advertised to peer: 120 seconds
Stale time for routes advertised by peer: 300 seconds
Extended Next Hop Encoding Capability: advertised
Message statistics:
Sent Rcvd
Opens: 3 2
Notifications: 1 0
Updates: 369 0
Keepalives: 1647 1659
Route Refresh: 0 0
Capability: 0 0
Total: 1684 1661
Total bytes: 95164 31483
Bytes in queue: 0 0
For address family: L2VPN EVPN
BGP table version 9805, neighbor version 9805
..//..
Local host: 10.22.0.5, Local port: 179
Foreign host: 33.33.33.98, Foreign port: 42354

fd = 79

Network Insights in the context of DCNM

Network Insights offers network resource utilization, rate of change, trends, environmental and resource anomalies over time for operational, configuration and hardware resources for all the VXLAN EVPN nodes supporting these functions. In order to get those telemetry statistics and flow analytics information, each concerned network device must be configured by NI in order to push the required information to the Network Insight database supported by the DCNM Cluster nodes.

Even if Network Insights is not strictly related to EPL, they both have one common requirement when it comes to establishing inter-fabrics connectivity.

When configuring the telemetry parameters, different groups are created for exporting from the leaf and spine nodes the Software telemetry information, as well as the Flow telemetry data. This information is sent to different IP addresses belonging to the In-Band management network which are automatically created by DCNM using micro-services in containers distributed across the 3 Compute nodes, as shown in the few examples below:

..//.. Software Telemetry

destination-profile
use-vrf default
source-interface loopback0
destination-group 500
ip address 33.33.33.103 port 57500 protocol gRPC encoding GPB
use-chunking size 4096

..//.. Flow Telemetry

flow exporter telemetryExp_0
destination 33.33.33.101
transport udp 30000
source loopback0
dscp 44

As a result, all leaf nodes, including the devices that belong to the remote Fabrics, must be able to peer with the In-band management network in the VXLAN EVPN Fabric where DCNM Compute Nodes are located (Fabric-1 in our example).

Figure 8: BGP peering between all switches and the In-Band Mgmt Network

To achieve the peering between all nodes where the telemetry and analytic functionalities are enabled, it suffices to slightly modify the prefix-list already shown for the EPL use case and replace the specific RR router-ID(s) with a less specific prefix matching the subnet used to assign the Loopback0 addresses to all the nodes of the remote VXLAN EVPN Fabric:

Example of FreeForm Template deployed in the BGWs located in Fabric-1

ip prefix-list TELEMETRY-FAB2 seq 10 permit 10.22.0.0/24 le 32
ip prefix-list IB-NWK seq 5 permit 33.33.33.0/24
!
route-map TELEMETRY-BGP-to-OSPF permit 10
  match ip address prefix-list TELEMETRY-FAB2
route-map IB-OSPF-to-BGP permit 10
  match ip address prefix-list IB-NWK
!
router ospf UNDERLAY
  redistribute bgp 65510 route-map TELEMETRY-BGP-to-OSPF
!
router bgp 65510
  address-family ipv4 unicast
    redistribute ospf UNDERLAY route-map IB-OSPF-to-BGP
  • 10.22.0.0/24 associated with the prefix-list TELEMETRY-FAB2 includes all the Router-ID for the VXLAN EVPN Network devices based in Fabric-2 (Loopback0).
  • 33.33.33.0/24 associated with the prefix-list IB-NWK is the In-Band Management network used for the services that are created on the DCNM compute nodes.
  • Under the router OSPF process, the route-map TELEMETRY-BGP-to-OSPF permits the redistribution of any endpoints that belongs to 10.22.0.0/24 learnt from the BGP EVPN adjacencies with the remote BGW nodes into the OSPF (underlay) protocol used in Fabric 1.
  • Under the router BGP process, the route-map IB-OSPF-to-BGP permits the redistribution of the subnet 33.33.33.0/24 into the BGP EVPN Control Plane, to be able to advertise it to the remote BGW nodes.

Example of FreeForm Template deployed in the BGWs located in Fabric-2

 ip prefix-list IB-NWK seq 5 permit 33.33.33.0/24
ip prefix-list TELEMETRY-FAB2 seq 10 permit 10.22.0.0/24 le 32
!
route-map IB-BGP-to-OSPF permit 10
  match ip address prefix-list IB-NWK
route-map TELEMETRY-OSPF-to-BGP permit 10
  match ip address prefix-list TELEMETRY-FAB2
!
router ospf UNDERLAY
  redistribute bgp 65512 route-map IB-BGP-to-OSPF
!
router bgp 65512
  address-family ipv4 unicast
    redistribute ospf UNDERLAY route-map TELEMETRY-OSPF-to-BGP

This entry was posted in DCI. Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.