Since I wrote this article, talking about future DCI features, VXLAN EVPN Multi-site has become available with NX-OS 7.0(3)i7(1).
Good read !
++++++++++++++++++++++ from June 2017 +++++++++++++++++++++++++
Some may find the title a bit strange, but, actually, it’s not 100% wrong. It just depends on what the acronym “DCI” stands for. And, actually, a new definition for DCI may come shortly, disrupting the way we used to interconnect multiple Data Centres together.
For many years, the DCI acronym has conventionally stood for Data Centre Interconnect.
Soon, the “Data Centre Interconnect” naming convention may essentially be used to describe solutions for interconnecting traditional-based DC network solutions, which have been used for many years. I am not denigrating any traditional DCI solutions per se, but the Data Centre networking is evolving very quickly from the traditional hierarchical network architecture to the emerging VXLAN-based fabric model gaining momentum in enterprise adopting it to optimize modern applications, computing resources, save costs and gain operational benefits. Consequently, these independent DCI technologies will continue to be deployed primarily for extending Layer 2 and Layer 3 networks between traditional DC networks. However, for the interconnection of modern VXLAN EVPN standalone (1) Fabrics, a new innovative solution called “VXLAN EVPN Multi-site” – which integrates in a single device the extension of the Layer 2 and Layer 3 services across multiple sites – has been created for a long life. As a consequence, DCI may soon stand for Data Centre Integration, thus this strange title.
(1) The term “standalone Fabric” describes a Network-based Fabric. The concept of Multi-Pod, Multi-Fabric and Multi-site discussed in this article doesn’t concerned the enhanced SDN-based Fabric, also known as ACI Fabric. If both rely on VXLAN as the network overlay transport within the Fabric, they both use different solutions for interconnecting multiple Fabrics together.
ACI already offers a more sophisticated and solid integration of Layer 2 and Layer 3 services with its Multi-Pod solution since CY2016. ACI Multi-site is coming with ACI 3.0 (Q3CY17).
A bit of history:
After a long process of qualification and testing, two years ago, this article has been elaborated to describe the approach of the VXLAN EVPN Multi-Pod. At the time, it offered the best design considerations and recommendations for interconnecting multiple VXLAN EVPN Pods. Afterwards, we ran a similar process for validating the architecture for interconnecting multiple independent VXLAN EVPN Fabrics, of which a new post 36 with its sub-sections has been published. This second set of technical information and design considerations has mainly been elaborated to clarify the network services and network design required for interconnecting thru an independent DCI technology, multiple VXLAN EVPN Fabrics in conjunction with the Layer 3 Anycast Gateway (1) functions.
(1) The Layer 3 Anycast gateway is distributed among all compute and service leaf nodes and across all VXLAN EVPN Fabrics offering the default gateway for any endpoints at the closest Top of Rack.
Although the sturdiness of this solution of VXLAN EVPN Multi-Fabric is fully reached, this methodical validation design demonstrates the critical software and hardware components required when interconnecting VXLAN EVPN Fabrics with the Layer 3 Anycast Gateway, with manual connectivity’s for independent and dedicated Layer 2 and Layer 3 DCI services, making the whole traditional connectivity a bit more complex than usual.
To summarize, as of today, we had 2 choices when interconnecting multiple VXLAN EVPN Pods or Fabrics:
VXLAN EVPN Multi-PoD
The first option aims to create a single logical VXLAN EVPN Fabric in which multiple Pods are dispersed to different locations using a Layer 3 underlay to interconnect the Pods. Also known as stretched Fabric, this option is more commonly called VXLAN EVPN Multi-Pod Fabric. This model goes without a DCI solution, from a VXLAN overlay point of view, there is no demarcation between a Pod and another. The key benefit of this design is that the same architecture model can be used to interconnect multiple Pods within a building or campus area as well as across metropolitan areas.
If this solution is simple and easy to deploy, and thanks to the flexibility of VXLAN EVPN, it’s lack of sturdiness with the same Data-Plane and Control-Plane shared across all geographically dispersed Pod.
Although with the VXLAN Multi-Pod methodology, the Underlay and Overlay Control-Planes have been optimised – separating the Control-Planes for the Underlay network (IGP area’s) and the Overlay network (BGP ASN) – to offer a more robust End-to-End architecture than a basic stretched Fabric, the same VXLAN EVPN Fabric is overextended across different locations. Indeed, the reachability information for all VXLAN Tunnel endpoints (VTEPs) as well as that of endpoints is extended across all VXLAN EVPN Pod (throughout the whole VXLAN Domain).
VXLAN EVPN Multi-Fabric
The second option when interconnecting multiple VXLAN EVPN Fabrics in a more solid fashion, is by inserting an autonomous DCI solution – extending Layer 2 and Layer 3 connectivity services between independent VXLAN Fabrics, while maintaining segmentation for the multi-tenancy purposes. Compared to the VXLAN EVPN Multi-Pod Fabric design, the VXLAN EVPN Multi-Fabric architecture offers a higher level of separation for each of the Data Centre Fabrics, from a Data-Plane and Control-Plane point of view. The reachability information for all VXLAN Tunnel endpoints (VTEPs) is contained within each single VXLAN EVPN Fabric. As a result, VXLAN tunnels are initiated and terminated inside each Data Centre Fabric (same location).
The standalone VXLAN EVPN Multi-Fabric model also offers higher scalability than the VXLAN Multi-Pod approach, supporting a larger total number of leaf nodes and endpoints across all Fabrics.
However, the sensitive aspect of this solution is that it requires complex and manual configurations for a dedicated Layer 2 and Layer 3 extension between each VXLAN EVPN Fabric. In addition, the original protocol responsible for Layer 2 and Layer 3 segmentations must handoff at the egress side of the border leaf nodes (respectively, VRF-Lite handoff and Dot1Q handoff). Last but not least, in addition to the dual-box design for dedicated Layer 2 and Layer 3 DCI services, endpoints cannot be locally attached to them.
Although the additional DCI layer has been a key added-value, maintaining the failure domain inside each Data Centre with a physical boundary between DC and DCI components, it has been a choice balanced between sturdiness and CAPEX, sometimes impacting the final choice to deploy a solid DCI solution.
On the same article part 5, Inbound and Outbound Path Optimisation solutions, that can be leveraged for different Data Centre deployments, have also been elaborated. And although VXLAN Multi-Pod and Multi-Fabric solutions may become soon “outdated” for interconnecting modern VXLAN-based Fabrics (see next section), the technical content of path optimisation is still valid for VXLAN Multi-Site or just when trying to better apprehend the full learning and distribution processes for the endpoint reachability information within a VXLAN EVPN Fabric. Hence, don’t ignore them yet, they may still be very useful for education purposes.
VXLAN EVPN Multi-site
The above short refresh on the traditional solutions when interconnecting geographically dispersed modern DC Fabrics being said, and while some network vendors (hardware and software/virtual platforms) are trying to offer a suboptimal method of the options discussed previously, actually, these two traditional solutions might soon become obsolete when interconnecting standalone VXLAN EVPN-based Fabrics. Indeed, a new architecture called VXLAN EVPN Multi-Site is becoming available starting Q3CY17. A related IETF draft is available for further details (https://tools.ietf.org/html/draft-sharma-Multi-Site-evpn) if needed.
The concept of VXLAN EVPN Multi-Site integrates the Layer 2 and Layer 3 services so that they can be extended within the same box. This model relies on Layer 2 VNI stitching. A new function known as Border Gateway is responsible for that integrated extension. As a result, an Intra-Site VXLAN tunnel terminates at a Border Gateway which, from the same VTEP, re-initiates a new VXLAN tunnel toward the remote sites (Inter-Sites). The extension of the L2 and L3 services are transported using the same protocol. This new innovation combines the simplicity and flexibility of the VXLAN Multi-Pod with the sturdiness of the VXLAN Multi-Fabric.
The role of Border Gateway is a unique function offered by the hardware at line rate the because of Cisco ASIC technology (Cloud Scale ASIC). Yet another feature that will never come with a whitebox with Merchant Silicon.
It is not mandatory that the non-Border Gateway network devices within the VXLAN EVPN Fabric support the function of Multi-Site, neither in hardware nor in software. As a result, any VXLAN EVPN Fabric based on the first-generation of N9k in production today will be able to deploy Multi-Site by simply adding the pair of required devices(1) that support the Border Gateway function. This will both integrate and extend the Layer 2 and Layer 3 services.
(1) VXLAN Multi-Site is supported on –EX and –FX Nexus 9k series platforms.
Nevertheless, in addition, this new approach to the interconnection of multiple sites brings several added values.
VXLAN EVPN Multi-Site offers an “All-In-One” box solution with, if desired, the possibility to locally attach endpoints for other purposes (computes, firewalls, load balancers, WAN edge routers, etc).
For VXLAN EVPN Multi-PoD, in order to attach endpoints to the transit node, it is required that the hardware support the bud-node mode. A bud node is a device that is a VXLAN VTEP device and at the same time an IP transit device for the same multicast group used for VXLAN VNIs.
For VXLAN EVPN Multi-Fabric, a pair of additional boxes must be dedicated for DCI purposes only, with no SVI and no attached endpoint.
This new integrated solution offers a full separation of Data-Plane and Control-Plane per site, maintaining the failure domain to its smallest diameter.
The stitching of the Layer 2 and Layer 3 VNIs together (intra-Fabric tunnel ⇔ inter-Fabric tunnel) happens in the same Border node (Leaf or Spine).
The decapsulation and encapsulation of the respective tunnels happen inside the same VTEP. This method allows for the enabling of traffic policers per tenant, controlling the rate of Storm independently for Broadcast, Unknown Unicast or Multicast traffic, or other policers limiting the bandwidth per tunnel. Therefore, failure containment is improved through the granular control of BUM traffic, protecting remote sites from any broadcast storms.
The approach of VXLAN EVPN Multi-Site is to offer a level of simplicity and flexibility never-before-reached. Indeed, the Border Gateway function can be initiated from existing vPC-based Border leaf nodes, with a maximum of the two usual vPC peer devices, offering seamless migration from traditional to Multi-Site architectures. This vPC-based option allows for locally attached dual-homed endpoints at Layer 2.
However, it is also possible to deploy up to four independent border leaf nodes per site, relying on BGP delivering the function of Border Gateway in a cluster fashion, all active with the same VIP address. With this BGP-based option, Layer 3 services (FW, router, SLB, etc..) can be leveraged on the same switches.
Finally, it is also possible to initiate the Multi-Site function with its Border Gateway role directly from the Spine layer (up to 4 Spine nodes), as depicted in Figure 5.
When the Border Gateway is deployed, however the cluster is built – with 2 or 4 nodes – it will share the same VIP (Virtual IP address) across the associated members. The same Border Gateway VIP is used for the virtual networks initiated inside the VXLAN Fabric as well as for the VXLAN tunnels established with the remote sites.
The architecture of VXLAN EVPN Multi-Site is transport agnostic. It can be established between two sites connected with direct fibre links in a back-to-back fashion, or across an external native Layer 3 WAN/MAN or Layer 3 VPN Core (MPLS VPN), interconnecting multiple Data Centres across any distances. This new solution offers higher reliability and resiliency for the end-to-end design with all redundant devices delivering fast convergence in case of link or device failure.
From an operational, admin and network management point of view, VXLAN OAM tools can be leveraged to understand the physical path taken by the VXLAN tunnel, inside and between the Fabrics, to provide information on path reachability; Overlay to Underlay correlation; interface load and error statistic in the path; latency, etc. – a data lake of information which is usually “hidden” in a traditional network overlay deployment.
Because the VNI to VNI stitching happens in the same VTEP, there is no requirement to handoff the VRF-Lite, nor the Dot1Q frame. Hence, theoretically, Multi-Site allows for going beyond the 4K Layer 2 segments imposed by traditional DCI solution.
As depicted in Figure 7, the border leaf offers multiple functions, including local endpoint attachment and Border Gateway for integrated Layer 2 and Layer 3 service extension.
vPC must be used for any locally dual-homed endpoints at layer 2. This picture depicts a BGP-based deployment of the Border Gateways with Layer 3 connections toward an external Layer 3 network.
The VXLAN EVPN Multi-Site solution is not limited to interconnecting modern Greenfield Data Centres. It offers two additional flavours for seamless network connectivity with Brownfield Data Centres.
- The first use-case is with a legacy DC alone devoid of any DCI service. For this option, if the enterprise wants to build two new Greenfield DCs and keep the original one, with all sites interconnected together, it is possible to deploy a pair of Border Gateways, dual-homed toward the aggregation layer. The VLANs to be extended are mapped to the VNI extended toward the brand-new VXLAN Multi-Site, as shown in Figure 8.
- The second concerns a traditional OTV solution already in production between two or multiple historic and traditional Data Centre networks. With that model, it will be possible to leverage OTV as the Overlay transport used for the Multi-Site (this will require N7k/M3 as Border Gateway) instead of VXLAN EVPN, as shown in Figure 9.
Figure 9 illustrates a Brownfield Data Centre leveraging the existing OTV service to interconnect it with a Greenfield DC Fabrics DC 3 using OTV which is fully controlled and stitched its local VXLAN EVPN fabric. This option should come in a next phase when N7k/M3 will support VXLAN EVPN Multi-Site. This solution will offer coexistence with FHRP filtering and Layer 3 Anycast gateway.
- Every DC fabric needs external connectivity to the campus or core network (WAN/MAN).
- The VXLAN EVPN fabric can be connected across Layer 3 boundaries using MPLS L3VPN, Virtual Routing and Forwarding (VRF) IP routing (VRF-lite), or Locator/ID Separation Protocol (LISP).
- Integration occurs for the Layer 3 segmentation toward a WAN/MAN, native or segmented Layer 3 network. If required, each Tenant from the VXLAN EVPN Fabric can connect to a specific L3 VPN network, like achieved previously with the Multi-Pod and Multi-Fabric models. In this case, in addition to the Multi-Site service, VRF-Lite are handed off at the Border Leaf nodes to be bound with their external segmentations while the L3 segmentation is also maintained with the remote Fabric, as illustrated in Figure 10 with VXLAN Fabric 1.
- In deployments with a very large number of extended VRF instances, the number of eBGP sessions or IGP neighbors can cause scalability problems and configuration complexity and usually require a two-box solution.
- It is now possible to merge the Layer 3 segmentation with the external Layer 3 connectivity (Provider Edge Router) with a new single-device solution.
- Fabric 2 depicts a complete integration of the Layer 3 segmentation provided by the VXLAN EVPN fabric toward an MPLS L3VPN network using the Fabric Border Provider Edge (BorderPE) feature. This feature of L3 VNI to L3 VPN stitching is supported with the Nexus 7k, ASR1k, and ASR9k.
- Finally, the technology used to replicate BUM traffic is independent per site. One site can run PIM ASM Multicast to replicate the BUM traffic inside the same VXLAN domain regardless what the replication mode used on other sites. The BUM traffic is carried across the Multi-Site overlay using Ingress Replication, while other sites can run whatever is desired by the Network team, Multicast or Ingress Replication. Although Ingress Replication allows BUM transport across any WAN network, VXLAN Multi-Site will support Multicast replication later in a next software release.
- DC Interconnection versus DC Integration
- With VXLAN EVPN Multi-Site coming shortly, we will consider 3 main use cases for interconnecting multiple DC network:
- Brownfield to Brownfield DC Interconnection
- Brownfield to Greenfield DC Interconnection
- Greenfield to Greenfield DC Interconnection
- For the first, if OTV previously existed for DC interconnection requirements, then, it’s likely to continue using the same DCI approach. There is no technical reason to change to a different DCI solution if the OTV Edge devices are already available. Indeed, OTV offers an easy way to add new DC into its Layer 2 Overlay network.
- Nonetheless, we never want to extend all VLANs from site to site, hence, even if not often elaborated in the DCI documentation, it is important to take the routed traffic inter-site into consideration. The new function of Border Gateway integrated with VXLAN EVPN Multi-site offers a great alternative for interconnecting Data Centre for Layer 2 and Layer 3 requirements, maintaining the in-dependency network transport of each DC, while reducing the failure domain to a single site with a built-in very granular Storm controller for Broadcast, Unknown Unicast or Multicast traffic. Although OTV 2.5 relies on VXLAN as the encapsulation method, VXLAN continues to evolve with the necessary embedded intelligence leveraging the open standards MP-BGP EVPN control plane to provide a viable, efficient and solid DCI solution comparable with OTV. VXLAN EVPN Multi-site architecture represents this evolution. It offers a single integrated solution for interconnecting multiple traditional Data Centres with embedded and distributed Layer 3 Anycast gateway from the same Border Gateway devices, extending the classical Ethernet frame and routed packets via its Overlay network.
- For the second, as depicted in Figure 8, VXLAN EVPN Multi-site can be leveraged to interconnect the Brownfield DC by adding a new pair of N93xx-EX (or FX) series running VXLAN Multi-site with the Border Gateway function for controlling the VLAN to VNI transport. On the remote site, the Greenfield DC leverages the Multi-site feature that integrates the Layer 2 and Layer 3 services toward the Brownfield DC, all functions offered in a single-box model.
- For the third one, for Standalone VXLAN EVPN fabrics, the solution is VXLAN EVPN Multi-Site. There is no technical reason to add a dedicated DCI solution inside a modern Fabrics when Layer 2 and Layer 3 services are integrated in the same Border leaf or Border Spine layer, reducing the failure domain to its smallest diameter, that is to say, the link between the host and its Top of Rack switch.
Naming Convention: Pod, Multi-Pod, Multi-Fabric, Multi-Site
The same naming convention for Pod, Multi-Pod, Multi-Fabric and Multi-Site has been elaborated for both VXLAN EVPN Standalone and ACI.
Nonetheless, it is important to specify that the three models discussed in the previous sections differ when comparing VXLAN Standalone with ACI Fabric.
In a VXLAN context deployment, a Pod is seen as a “Fault Domain” area.
The Pod is usually represented by a Leaf-Spine Network Fabric sharing a common Control-plane (MP-BGP) across its whole physical connectivity plant.
A Fabric represents the VXLAN Domain deployed in the same location (same building or Campus) or Geographically Dispersed (Metro distances) sharing a common Control-plane and Data-plane. A single or multiple Pods form therefore a single VXLAN fabric or VXLAN domain.
In VXLAN Stand-alone the Multi-Pod architecture is organized with multiple PoD sharing a common Control-plane (MP-BGP) and Data-Plane across the whole physical connectivity plant across the same VXLAN Domain. The VXLAN Multi-Pod represents a single Availability Zones (VXLAN Fabric) within a Single Region – it is strongly recommended to limit the distances for a single Availability Zone within a Metro area, but technically nothing prevent to deploy a VXLAN Multi-Pod across intercontinental distances. The same VXLAN domain is stretched across multiple locations. The term of Multi-Pod differs from the VXLAN EVPN stretched fabric because the strength of the whole solution has been improved as discussed above.
ACI already offers a more sophisticated and solid integration of Layer 2 and Layer 3 services with its Multi-Pod solution since CY2016.
VXLAN Multi-Fabric deployment offers Multiple Availability Zone (Multiple VXLAN Fabrics) designs, one per VXLAN Domain managed by its own isolated Control-plane with an independent Layer 2 and Layer 3 DCI services. There is no concept of Regions per se, as each VXLAN Fabric runs autonomously.
VXLAN Multi-Site offers the same concept as Multi-Fabric with independent VXLAN Fabric but with the extended Layer 2 and Layer 3 services integrated and fully controlled. As a result, Multiple autonomous VXLAN domains with integrated L2 & L3 services form the VXLAN Region.