In the VXLAN Multi-fabric design discussed in this post, each data center normally represents a separate BGP autonomous system (AS) and is assigned a unique BGP autonomous system number (ASN).
Three types of BGP peering are usually established as part of the VXLAN Multi-fabric solution:
- MP internal BGP (MP-iBGP) EVPN peering sessions are established in each VXLAN EVPN fabric between all the deployed leaf nodes. As previously discussed, EVPN is the intrafabric control plane used to exchange reachability information for all the endpoints connected to the fabric and for external destinations.
- Layer 3 peering sessions are established between the border nodes of separate fabrics to exchange IP reachability information (host routes) for the endpoints connected to the different VXLAN fabrics and the IP subnets that are not stretched (east-west communication). Often, a dedicated Layer 3 DCI network connection is used for this purpose. In a multitenant VXLAN fabric deployment, a separate Layer 3 logical connection is required for each VRF instance defined in the fabric (VRF-Lite model). Although either eBGP or IGP routing protocols can be used to establish interfabric Layer 3 connectivity, the eBGP scenario is the most common and is the one discussed in this post.
- Per-VRF eBGP peering sessions are frequently used for WAN connectivity to exchange IP reachability information with the external Layer 3 network domain (north-south communication). A common best practice and the recommended approach is to deploy eBGP for this purpose.
When extending IP subnets across separate VXLAN fabrics, you need to consider the paths used for ingress and egress communication. You particularly should avoid establishing an asymmetric path such as one shown in Figure 1 (virtual machines VM1 and VM3 are part of the same extended IP subnet that is advertised in the MAN and WAN), which would cause communication failure when independent stateful network services are deployed across sites.
The solution presented in this article avoids establishing asymmetric traffic paths by following three main design principles, illustrated in Figure 2:
- Traffic originating from the external Layer 3 domain and destined for endpoints connected to a specific VXLAN fabric should always come inbound through the site’s local border nodes (ingress traffic-path optimization).
- Traffic originating from data center endpoints and destined for the external Layer 3 domain should always prefer the outbound path through the local border nodes (outbound traffic-path optimization).
- All east-west routed communication between endpoints that are part of different data center sites should always prefer the dedicated Layer 3 DCI connection if it is available.
This article proposes the use of host-route advertisement from the border nodes to the local WAN edge routers to influence and optimize ingress traffic flows. After the WAN edge routers receive the host routes, two approaches are possible:
- You can inject specific host-route information from each VXLAN fabric into the MAN or WAN so that incoming traffic can be optimally steered to the destination. This method is usually applicable when a Layer 3 VPN hand-off is deployed to the WAN, so that host routes can be announced in each specific Layer 3 VPN service. Before adopting this approach, be sure to assess the scalability implications of the solution for consumption of local resources on the data center WAN edge and remote routers and within the WAN provider network.
- In designs in which advertisement of host routes in the MAN or WAN is not desirable because of scalability concerns or is not possible (as, for example, in many cases in which the MAN or WAN is managed by a service provider), you can deploy Cisco Location Identifier Separation Protocol (LISP) as an IP-based hand-off technology. LISP deployment is beyond the scope of this paper. For more information about LISP and LISP mobility, see http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Data_Center/DCI/5-0/LISPmobility/DCI_LISP_Host_Mobility.html.
The rest of this section focuses on the deployment model using a Layer 3 VPN hand-off to the MAN or WAN. The goal is to help ensure that the border nodes in each VXLAN fabric always advertise to the local WAN edge routers only host routes for endpoints connected to the local fabric.
As shown in Figure 3, the border nodes at a given site receive host route information for endpoints connected to remote fabrics through the route peering established over the dedicated Layer 3 DCI connection. You therefore must help ensure that these host routes advertised from the Layer 3 DCI are never sent to the local WAN edge routers. Host routing from the WAN always should steer the traffic to the fabric to which those destinations are connected, and a less specific route should be used only in specific WAN isolation scenarios discussed later in this section.
Many different approaches can be used to achieve the behavior shown in Figure 3. The example shown here simply proposes configuring the border nodes in a given fabric so that they do not announce to the local WAN edge routers host route information received through the Layer 3 DCI connection.
ip as-path access-list 1 permit "_65200_” ! ip prefix-list MATCH-HOST-ROUTES seq 5 permit 0.0.0.0/0 eq 32 ! ip access-list ANY 10 permit ip any any ! route-map DENY-HOST-ROUTES-FROM-REMOTE-DCs deny 10 match as-path 1 match ip address prefix-list MATCH-HOST-ROUTES ! route-map DENY-HOST-ROUTES-FROM-REMOTE-DCs permit 20 match ip address ANY ! router bgp 65100 vrf Tenant-1 neighbor 10.0.1.1 description Local WAN Edge Device remote-as 65300 address-family ipv4 unicast route-map DENY-HOST-ROUTES-FROM-REMOTE-DCs in
Note: The same configuration must be applied to all the border nodes facing the WAN edge and deployed across VXLAN fabrics.
Note that other prefixes received from remote sites are still accepted, as long as they are not host routes. This behavior is required mainly to handle the potential WAN isolation scenario in which a given fabric loses connectivity to the MAN or WAN (as a result of a dual failure of the WAN edge routers or a WAN or MAN outage). In that case, traffic originating from the external routed domain and destined for an endpoint connected to the WAN-isolated fabric should be steered to a different VXLAN fabric following a less specific route (usually the IP prefix for the subnet to which the destination endpoint is connected). This scenario is shown in Figure 4.
Note: The same considerations apply if VXLAN fabric 1 experiences a WAN isolation scenario.
When virtual machine VM1 migrates to data center DC2, the host route information is updated across the VXLAN fabrics, as previously described in the part 4 “Host Mobility Across Fabrics.” As a consequence, local border nodes BL3 and BL4 will start advertising VM1’s host route to the fabric 2 WAN edge router with AS 65200 of fabric 2, and border nodes BL1 and BL2 will stop sending the same host route information to the fabric 1 WAN edge router. As a consequence, traffic destined for VM1 and originating from the Layer 3 MAN or WAN will be steered directly to fabric 2.
Egress Traffic Path Optimization
After you have optimized the ingress traffic, you usually should do the same for the egress traffic to maintain symmetry for the communications with the external Layer 3 domain. As previously mentioned, this optimization is mandatory for deployment across sites with independent stateful network services such as firewalls.
To force the egress traffic to prefer the local WAN connection, you can modify the local-preference value for the prefixes learned through peering with the local WAN edge routers. The local preference is an attribute that routers exchange in the same autonomous system and that tells the autonomous system which path to prefer to reach destinations that are external to it. A path with a higher local-preference value is preferred. The default local-preference value is 100.
In the configuration sample shown here, one of the border nodes in VXLAN EVPN fabric 1 is configured to assign a higher local preference (200) to all the prefixes received from the WAN edge routers in fabric 1 in data center DC1 using the route map INCREASE-LOCAL-PREF-FOR-WAN-ROUTES.
ip access-list ANY 10 permit ip any any ! route-map INCREASE-LOCAL-PREF-FOR-WAN-ROUTES permit 10 match ip address ANY set local-preference 200 ! router bgp 65100 vrf Tenant-1 neighbor 10.0.1.1 remote-as 65300 description eBGP Peering with WAN Edge router in DC1 address-family ipv4 unicast route-map INCREASE-LOCAL-PREF-FOR-WAN-ROUTES in
The same external prefixes received through the Layer 3 DCI connection would instead have the default local-preference value of 100, so the local path will always be preferred (steered with the local preference 200), as shown in Figure 5.
In the WAN isolation scenario previously considered, the border nodes in isolated VXLAN fabric 2 would start using the external prefixes received on the Layer 3 DCI connection from the border nodes in VXLAN fabric 1. This behavior still helps ensure that inbound and outbound communication between VM2 and the WAN remain symmetrical, which you can verify by comparing previous Figure 4 to Figure 6.
Keeping Inter-Fabric Routing via Layer 3 DCI
The last requirement is to help ensure that all communications between endpoints belonging to different VXLAN fabrics preferably are established using the dedicated Layer 3 DCI connection. This route is desirable because this connectivity usually is characterized by lower latency and higher bandwidth than the path through the MAN or WAN.
To meet this requirement, you must help ensure that even if a prefix (host route or IP subnet) belonging to fabric 2 is received in fabric 1 through the WAN edge router, this latter information is considered less preferable than the information received through the Layer 3 DCI connection.
Recall the configuration previously discussed to optimize the egress traffic flows. In this case, all the routes received by the border nodes from the WAN edge routers are characterized by a local-preference value of 200. You then must add a route map (INCREASE-LOCAL-PREF-FOR-REMOTE-HOST-ROUTES) to the routing updates received from the remote border nodes to help ensure that all the IP prefixes belonging to the remote fabric (that is, received by eBGP updates originating from the autonomous system of the remote data center) have a higher local-preference value (300 in the example in Figure 7).
The following sample shows the required configuration.
ip as-path access-list 2 permit "^65200$” ! ip access-list ANY 10 permit ip any any ! route-map INCREASE-LOCAL-PREF-FOR-REMOTE-FABRIC-ROUTES permit 10 match as-path 2 set local-preference 300 route-map INCREASE-LOCAL-PREF-FOR-REMOTE-FABRIC-ROUTES permit 20 match ip address ANY ! router bgp 65100 vrf Tenant-1 neighbor 172.16.1.1 remote-as 65200 description eBGP Peering with BL Node 1 in DC2 address-family ipv4 unicast route-map INCREASE-LOCAL-PREF-FOR-REMOTE-FABRIC-ROUTES in
As a result of this configuration, all the intersite communication stays on the Layer 3 DCI connection and will use the MAN or WAN path only if this connection completely fails.
VXLAN EVPN Multi-Fabric is a hierarchical network design comprising individual Fabrics interconnected together. The design described in this article focuses on the individuality of the data center domains, allowing independent scale and, more important, independent failure domains. The connectivity between the individual fabric domains is independent of the choice that is being used within the data center, and thus a natural separation is achieved.
Overlay Transport Virtualization (OTV) is the recommended technology to provide Layer 2 extension while maintaining failure containment. With this ability and the additional attributes OTV offers for Data Center Interconnect (DCI), modern data center fabrics can be extended in an optimized fashion.
Different solutions can also be adopted to extend multi-tenant Layer 3 connectivity across Fabrics, mainly depending on the nature of the transport network interconnecting them.
Finally, specific deployment considerations and configuration can be used to symmetrize inbound and outbound traffic flows. This is always desirable to optimize access to Data Center resources that can be spread across different Fabrics and becomes mandatory when independent sets of stateful network services (as firewalls for example) are deployed in separate Fabrics.