With my friend and respectful colleague Max Ardica, we have tested and qualified the current solution to interconnect multiple VXLAN EVPN fabrics. We have elaborated this technical support to clarify the network design requirements when the function Layer 3 Anycast gateways is distributed among all server node platforms and all VXLAN EVPN Fabrics. The whole article is organised in 5 different posts.
- This 1st part below elaborates the design considerations to interconnect two VXLAN EVPN based fabrics.
- The 2nd post discusses the Layer 2 DCI requirements interconnecting Layer-2-based VXLAN EVPN fabrics deployed in conjunction with Active/Active external routing blocks.
- The 3rd section covers the Layer 2 and Layer 3 DCI requirement interconnecting VXLAN EVPN fabrics deployed in conjunction with distributed Layer 3 Anycast Gateway.
- The 4th post examines host mobility across two VXLAN EVPN Fabrics with Layer 3 Anycast Gateway.
- Finally the last section develops inbound and outbound path optimization with VXLAN EVPN fabrics geographically dispersed.
Recently, fabric architecture has become a common and popular design option for building new-generation data center networks. Virtual Extensible LAN (VXLAN) with Multiprotocol Border Gateway Protocol (MP-BGP) Ethernet VPN (EVPN) is essentially becoming the standard technology used for deploying network virtualization overlays in data center fabrics.
Data center networks usually require the interconnection of separate network fabrics, which may also be deployed across geographically dispersed sites. Consequently, organizations need to consider the possible deployment alternatives for extending Layer 2 and 3 connectivity between these fabrics and the differences between them.
The main goal of this article is to present one of these deployment options, using Layer 2 and 3 data center interconnect (DCI) technology between two or more independent VXLAN EVPN fabrics. This design is usually referred to as VXLAN multi-fabrics.
To best understand the design presented in this article, the reader should be familiar with VXLAN EVPN, including the way it works in conjunction with the Layer 3 anycast gateway and its design for operation in a single site. For more information, see the following Cisco® VXLAN EVPN documents:
- VXLAN Overview: Cisco Nexus 9000 Series Switches
- VXLAN Network with MP-BGP EVPN Control Plane Design Guide
- VXLAN Design with Cisco Nexus 9300 Platform Switches
- VXLAN Configuration Guide
To extend Layer 2 as well as Layer 3 segmentation across multiple geographically dispersed VXLAN-based fabrics, you can consider two main architectural approaches: VXLAN EVPN Multi-pods fabric and VXLAN EVPN Multi-fabrics.
You can create a single logical VXLAN EVPN fabric in which multiple pods are dispersed to different locations using a Layer 3 underlay to interconnect the pods. Also known as stretched fabric, this option is now more commonly called VXLAN EVPN Multi-pods fabric. The main benefit of this design is that the same architecture model can be used to interconnect multiple pods within a building or campus area as well as across metropolitan-area distances.
The VXLAN EVPN Multipod design is thoroughly discussed in this post 32 – VXLAN Multipod stretched across geographically dispersed datacenters
You can also interconnect multiple VXLAN EVPN fabrics using a DCI solution that provides multitenant Layer 2 and Layer 3 connectivity services across fabrics. This article focuses on this option.
Compared to the VXLAN EVPN Multi-pod fabric design, the VXLAN EVPN Multi-fabric architecture offers greater independence of the data center fabrics. Reachability information for VXLAN tunnel endpoints (VTEPs) is contained in each single VXLAN EVPN fabric. As a result, VXLAN tunnels are initiated and terminated within the same fabric (in the same location). This approach helps ensure that policies, such as storm control, are applied specifically at the Layer 2 interface level between each fabric and the DCI network to limit or eliminate storm and fault propagation across fabrics.
The Multi-fabric deployment model also offers greater scalability than the Multi-pod approach, supporting a greater total number of leaf nodes across fabrics. The total number of leaf devices supported equals the maximum number of leaf nodes supported in a fabric (256 at the time of the writing of this post) multiplied by the number of interconnected fabrics. A Multi-fabric deployment can also support an overall greater number of endpoints than a multipod deployment; commonly, only a subset of Layer 2 segments is extended across separate fabrics, so most MAC address entries can be contained within each fabric. Additionally, host route advertisement can be filtered across separate fabrics for the IP subnets that are defined only locally within each fabric, increasing the overall number of supported IP addresses.
Separate VXLAN fabrics can be interconnected using Layer 2 and 3 functions, as shown in Figure 1:
- VLAN hand-off to DCI for Layer 2 extension
- Virtual Routing and Forwarding Lite (VRF-Lite) hand-off to DCI for multitenant Layer 3 extension
The maximum distance between separate VXLAN EVPN fabrics is determined mainly by the application software framework requirements (maximum tolerated latency between two active members) or by the mode of disaster recovery required by the enterprise (hot, warm, or cold migration).
After Layer 2 and 3 traffic is sent out of each fabric through the border nodes, several DCI solutions are available for extending Layer 2 and 3 connectivity across fabrics while maintaining end-to-end logical isolation.
- Layer 2 DCI:
Layer 2 connectivity can be provided in several ways. It can be provided through a dedicated Layer 2 dual-sided virtual port channel (vPC) for metropolitan-area distances using fiber or dense wavelength-division multiplexing (DWDM) links between two sites. For any distance, it can be provided any valid Layer 2 overlay technology over a Layer 3 transport mechanism such as Overlay Transport Virtualization (OTV), Multiprotocol Label Switching (MPLS) EVPN, VXLAN EVPN, or Virtual Private LAN Service (VPLS).
- Layer 3 DCI:
The Layer 3 DCI connection has two main purposes:
- It advertises between sites the network prefixes for local hosts and IP subnets.
- It propagates to the remote sites host routes and subnet prefixes for stretched IP subnets.
The DCI solutions selected for extending multitenant Layer 2 and 3 connectivity across VXLAN EVPN fabrics usually depend on the type of service available in the transport network connecting the fabrics:
Scenario 1: Enterprise-owned direct links (dark fibers or DWDM circuits)
- vPC or OTV for Layer 2 extension
- Back-to-back VRF-Lite subinterfaces for Layer 3 extension
Scenario 2: Enterprise-owned or service provider–managed multitenant Layer 3 WAN service
- OTV for Layer 2 extension through a dedicated VRF-Lite subinterface
- VRF-Lite subinterfaces to each WAN Layer 3 VPN service
Scenario 3: Enterprise-owned or service provider–managed single-tenant Layer 3 WAN service
- OTV for Layer 2 extension across native Layer 3
- VRF-Lite over OTV for Layer 3 segmentation
Note: In this article, OTV is the DCI solution of choice used to extend Layer 2 connectivity between VXLAN fabrics. OTV is an IP-based DCI technology designed purposely to provide Layer 2 extension capabilities over any transport infrastructure. OTV provides an overlay that enables Layer 2 connectivity between separate Layer 2 domains while keeping these domains independent and preserving the fault-isolation, resiliency, and load-balancing benefits of an IP-based interconnection. For more information about OTV as a DCI technology and associated deployment considerations, refer to the document at :
Scenario 1: In a metropolitan area network (MAN), you can use a direct DWDM transport for Layer 2 and 3 extension. Therefore, dedicated links for Layer 2 and 3 segmentation are established using different DWDM circuits (or direct fiber when the network is owned by the enterprise). Figure 2 shows two separate physical interfaces. One is used for Layer 3 segmentation, and the other is used for Layer 2 extension (classic Ethernet or overlay). A dedicated Layer 3 connection in conjunction with vPC is a best practice, and it is usually required for platforms that do not support dynamic routing over vPC. However, using border node platforms (such as Cisco Nexus® 7000 or 9000 Series Switches) and software releases, you also can use Layer 3 over a vPC dual-sided connection (not covered in this article).
Note: Dynamic routing over vPC is supported on Cisco Nexus 9000 Series Switches starting from Cisco NX-OS Software Release 7.0(3)I5(1).
Nonetheless, the recommended approach is to deploy an overlay solution to provide Layer 2 DCI services such as OTV, especially if more than two sites need to be interconnected (a scenario not documented here). OTV inherently offers multipoint Layer 2 DCI services while helping ensure robust protection against the creation of end-to-end Layer 2 loops.
Note: At this stage, the OTV encapsulation can be built with a generic routing encapsulation (GRE) header or with a VXLAN and User Datagram Protocol (UDP) header. The latter option is preferred because it allows better load balancing of OTV traffic across the Layer 3 network that interconnects the VXLAN fabrics. However, it requires the use of F3 or M3 line cards and NX-OS Release 7.2 or later.
- OTV edge devices are connected to the border nodes with a Layer 3 and a L2 connection as shown in Figure 3, a deployment model referred to as “on a stick”. The OTV overlay is carried within a dedicated Layer 3 subinterface and VRF instance. This subinterface is carved on the same physical interface that is used to provide cross-fabric Layer 3 connectivity for all the tenants (discussed later in this section) and can belong to the default VRF instance or to a dedicated OTV VRF instance.
Note: Dual-sided vPC can be used as a Layer 2 DCI alternative for VXLAN dual-fabric deployments, though it is less preferable due to the lack of various failure isolation functions natively offered by OTV.
- Dedicated back-to-back subinterfaces carry OTV encapsulated traffic and provide Layer 3 Tenant connectivity.
- OTV: Default VRF = E1/1.999
- Tenant 1: VRF T1 = E1/1.101
- Tenant 2: VRF T2 = E1/1.102
- Tenant 3: VRF T3 = E1/1.103
Scenario 2: The second scenario applies to deployments in which a Layer 3 multitenant transport service is available across separate fabrics.
This deployment model is similar to the one in scenario 1: OTV edge devices “on a stick” are used to extend Layer 2 connectivity across sites. The OTV traffic is sent out of the border nodes on dedicated subinterfaces that are mapped to a specific Layer 3 VPN service on the WAN edge router acting as a provider edge device.
At the same time, independent Layer 3 subinterfaces are deployed for each tenant for end-to-end Layer 3 communication. Each subinterface is then mapped to a dedicated Layer 3 VPN service.
Figures 4 and 5 show this scenario.
Scenario 3: In this scenario, the WAN and MAN transport network offers only single-tenant Layer 3 service (it does not support MPLS or VRF-Lite).
In this case, the Layer 2 DCI services offered by an overlay technology such as OTV can be used to establish cross-fabric Layer 2 and 3 multitenant connectivity. This connection is achieved in two steps:
- First, a Layer 2 DCI overlay service is established over the native Layer 3 WAN and MAN core.
- Next, per-tenant Layer 3 peerings are established across the fabrics over this Layer 2 overlay transport. The dedicated tenant Layer 3 interfaces to connect to the remote sites can be deployed as Layer 3 subinterfaces or as switch virtual interfaces (SVIs). Your choice has an impact on the physical connectivity between the border nodes and the OTV edge devices:
- Use of subinterfaces for tenant-routed communication: In this case, you must deploy two separate internal interfaces to connect the border nodes to the OTV devices. As shown in Figure 6, the first interface is configured with subinterfaces to carry tenant traffic (a unique VLAN tag is associated with each tenant), and the second interface is a Layer 2 trunk used to extend Layer 2 domains between sites.
Because Layer 3 routing peerings are established for each tenant to the remote border nodes across the OTV overlay service, you must enable bidirectional forwarding detection (BFD) on each Layer 3 subinterface for indirect failure detection, helping ensure faster convergence in those specific cases.
- Use of SVIs for tenant-routed communication: In this case, shown in Figure 7, the same Layer 2 trunk interface can be used to carry the VLANs associated with each tenant Layer 3 interface and the VLANs extending the Layer 2 domains across sites. Note that these two sets of VLANs must be unique, and that SVIs are not defined for the VLANs that extend the Layer 2 domains across sites.
In this specific scenario, the join interfaces of the OTV edge devices can be connected directly to the Layer 3 core, as shown in Figure 7.
The following sections present two use cases in which independent VXLAN EVPN fabrics are interconnected:
- The first use case is a more traditional deployment in which each VXLAN EVPN fabric runs in Layer 2 mode only and uses a centralized external routing block for intersubnet communication. This scenario can be a requirement when the default gateway function is deployed on external devices such as firewalls. This use case is discussed in the sections VXLAN EVPN Multi-Fabrics with External Routing Block (part 2)
- The second use case deploys the distributed functions of anycast Layer 3 gateways across both sites, reducing hairpinning workflow across remote network fabrics. This enhanced scenario can address a variety of requirements:
- For business-continuance purposes, it is a common practice to allow transparent “hot” mobility of virtual machines from site to site without any service interruption. To reduce application latency, the same default gateways are active at both sites, reducing hairpinning across long distances for local routing purposes as well as for egress path optimization. Nonetheless, to reduce hairpinning for east-west workflows and to offer north-south traffic optimization, the same MAC and IP addresses for the default gateways must be replicated on all active routers at both sites. With the anycast Layer 3 gateway, the default gateway function is performed by all computing leaf nodes. This scenario is the main focus of this article and is discussed in greater detail in the section VXLAN EVPN Multi-fabric with Distributed Anycast Layer 3 Gateway (part 3).
- For disaster-recovery purposes and for operational cost containment, enterprises may want to move endpoints using a “cold” migration process only. Consequently, only the IP address of the gateway will be replicated to simplify operation management after machines have been moved (for example, to maintain the same IP address schema after the relocation of the servers). The MAC address of the default gateway can be different on each fabric, and for a “cold” migration process, the endpoint will apply Address Resolution Protocol (ARP) on its default gateway when restarting its processes and will get the new MAC address accordingly. This use case is not covered in this post.