Note: When I talked about this solution almost a year ago, we were using alpha versions of software releases, from which some improvements and command lines have changed with last released codes. Thus I’m elaborating on this original article including the final CLI syntaxes. Hope that helps !
Special thanks to Patrice Bellagamba and Marco Pessi for their great support
Enjoy the revolutionary evolution of DCI 🙂
The following article describes the mechanism for detection and notification of movement of virtual machines across multiple locations and how to redirect accordingly the traffic from the end-user to the new location supporting the active application. When deployed in conjunction with subnets extension across sites (e.g. OTV) this solution is also known as Ingress Path Optimisation using IP localisation as it redirects automatically and dynamically the original request directly to the DC that hosts the application (VM), reducing the hair-pining workflows via the primary site.
You could say, that it’s exactly what LISP IP Mobility is aimed to achieve. That’s true, however the concept here is to go a bit further with hybrid network compared to traditional physical deployment and see how this solution fits with the rapid evolution of the virtualisation (network & hypervisor) by detecting the movement of the virtual machine from inside a virtual container (A.K.A. Tenant Container in the Hybrid cloud service) which is usually isolated from the LISP Site gateway by multiple layer 3 hops (e.g. virtual FW, virtual SLB or virtual Layer 3 routers).
A tenant container consists of one or multiple application zones belonging to a c0-tenant with virtual network and security services (virtual SLB, virtual FW…) enabled inside the container according to the service class that the client has subscribed to. Hundred or thousand of tenant containers can co-exists into an hybrid cloud, all containers being isolated from each other usually using a L3 segmentation.
LISP IP Mobility: a brief description
Check this section for more details 20 – Locator/ID Separation Protocol (LISP)
In a very short high level description, LISP IP Mobility in conjunction with a DCI LAN extension techniques such as OTV, allows redirecting dynamically the traffic from the end-user to the location where the Virtual Machine has moved while maintaing the session stateful.
LISP IP Mobility supports two main functions:
The first main function of LISP Mobility is the LISP Site gateway that initiates the overlay tunnel between the two end-points, encapsulating/decapsulating the original IP packet with an additional IP header (LISP component identifiers). Usually, but not limited, the tunnel is built between a remote location where the end-user resides and the DC where the application exists. The IP-in-IP encapsulation is performed by the Ingress Tunnel Router (ITR) or from a Proxy Tunnel Router (PxTR). The overlay workflow is decapsulated from the Egress Tunnel Router (ETR) and forwarded to the server of interests. In general the function of ETR sits at the edge of the DC. When a VM migrates from one data centre to another, the movement of the machine known as EndPoint Identifier (EID) is detected in real time by the LISP device and the association between the EID and the Locator is immediately updated into the RLOC Mapping database. All traffic destined to the VM is dynamically, automatically, and transparently redirected to the new location.
The other important component in LISP is the function of LISP Multi-hop. Indeed it is not usual to attach the application/server directly to the LISP Gateway device (xTR). For security and optimization reason, stateful devices such as Firewall, IPS, load balancers are often deployed between the applications and the function of LISP encap/decap. Therefore due to multiple routed hops that may exist between the application (EID) and the border LISP Gateway (xTR), it becomes challenging to detect from the LISP xTR device the movement of the EID (i.e. VM). Thus a LISP agent located at the 1st hop router (the default gateway of the server of interest) must be present to detect the movement of the EID and to notify accordingly its local xTR which in its turn will apply the required action (e.g. update the LISP Mapping Data Base). This service is called LISP Multi-hop, however the function itself initiated on the default gateway is called LISP First Hop Router (LISP FHR).
For the purpose of this document, the workload mobility is addressed with a combination of LISP mobility with a LAN extensions solution (OTV) to offer a fully transparent stateful live migration (zero interruption). This mode is known as LISP Extended Subnet Mode (ESM).
The other great method which is out of scope here, and thanks to LISP, is to allow IP Mobility without IP address renumbering and above all without any LAN extension techniques (this is also known as Across Subnet Mode ASM). This mode is often deployed to address Cold Migration requirements (usually requiring the applications to restart) in a fully routed environment providing dynamic redirection to the active DC. Avoiding IP address renumbering not only eases the migration of the applications from the Enterprise, but also reduces the risks and accelerates the deployment of those services.
LISP Evolution requirements
Many tests on LISP IP mobility have been already performed in ESM and ASM modes but always running both functions of xTR (encap/decap) and Multi-hop / First Hop Router (FHR) on physical LISP devices (Nexus7000 or ASR1000) .
Despite that not so many DC are fully virtualized today, there is an increasing need for data center to support a mix of physical and virtual network and security services in conjunction with virtual machines and bare metal platforms in a hybrid network environment. This is especially more important in the SP cloud supporting virtual DC from enterprises (a virtual tenant container) deployed in a physical Multi-tenants infrastructure. As of today, you still need ASIC-based network devices maintaining the same level of performances handling multiple network overlays.
Each tenant DC is being represented by a “autonomous” virtual container (Virtual DC) inside the SP public cloud or the Enterprise private cloud, supporting respectively the tenant or subsidiary’s applications with the level of SLA that the client subscribed to (e.g. Virtual switching, Virtual Firewall, IPS, Virtual Secure gateway, Multiple security zones, Multiple-tiers). The goal here being to test the function of LISP 1st Hop Router inside the tenant container: An active session established from a branch office must be dynamically redirected to a distant DC supporting the stateful hot migration of the application of interest with zero interruption between the end-user and the apps as well as between the tiers of the software frameworks (front-end and back-end). For the focus of the 1st test, we will keep the network services as simple as possible.
Test description :
For the tests LISP is deployed in a hybrid model with the function of xTR achieved in the physical edge router (Nexus7000) while the LISP 1st hop router (FHR) function is initiated inside the virtual tenant container using the Cisco virtual router CSR 1000v (Cloud Service Router) directly facing the virtual servers on VLAN 1300 (Front-end) and VLAN 1301 (Back-end); the CSR LISP FHR being the default gateway of the virtual machines. In addition the same Nexus 7000 platform offers the OTV services from a dedicated Virtual Device Context (VDC). VDC is a form a virtual chassis available on the Nexus 7000 fully abstracted from the physical device.
Traffic destined to the application is encapsulated between the ITR (located on the end-user side) and the ETR initiated in the DC hosting currently the active application (VM). The traffic is then decapsulated from the LISP ETR toward the application servers.
The first series of tests consists of a tenant container built with 2 security zones (Front and Back-end) from which a two tier application is currently accessible with no access control list of any form.
The main goal of this test is to detect automatically and dynamically the movement of the EID (IP identifier of a VM) using the remote LISP FHR (CSR1000v) which relays the information consecutively its local LISP ETR then to the LISP map servers, so now the ITR on the branch office can initiate the LISP tunnel toward the new DC.
Two data centres, DC-1 and DC-2 supporting on both sites LISP xTR and LISP Multi-hop functions serve the same virtual container environment for a specific tenant. The tenant container is provisioned using a dedicated virtual router on each site. Although a CSR1000v is a Virtual Machine and could migrate easily from one site to another, it has been decided that only the Virtual machine supporting the tenant’s applications will migrate from one DC to the other. The LISP FHR routers are duplicated on both sides, offering the same default gateway parameters for the server sites (here VLAN 1300 and VLAN 1301) maintaining stateful live migration without any interruption. However they use different outbound network connectivity to communicate with the outside of the world (VLAN 10 on DC-1 and VLAN 11 on DC-2).
Note that for the purpose of this test I am using a single CSR1000v on each DC, in a production private or public cloud, all network and security services should be redundant.
To offer symmetrical egress traffic and to reduce hair-pining, FHRP isolation is being enabled on the OTV virtual device context (Nexus 7k) so default gateway can be replicated on both sides without posing any particular issue due to duplicate identifiers. After its migration the VM uses the same default gateway (same VIP and vMAC). See 24 – Enabling FHRP filter for additional details on HSRP filtering.
The LISP ETR on its turn notifies dynamically a LISP Mapping database that notifies the original LISP ETR, thus all xTR are informed automatically or on request about the new location of the end-point identifier. The whole update process of notification is achieved very quickly and full redirection is performed in a couple of seconds without any interruption.
In order to test the function of First Hop Router, let’s focus on single device attachment (no redundant HA config are described here).
Four main functions are deployed for LISP Mobility:
- The LISP Site gateway or egress Tunnel Router (eTR) role terminates the tunnel (Network Overlay) established with a remote Tunnel Router (Ingress Tunnel Router iTR). It de-encapsulates the LISP header and sends the original IP packet toward its final Endnode Identifier (EID) destination. For a comprehensive LISP workflow it is assumed that the eTR resides inside the DC. This function is usually achieved at the Core or distributed layer of the DC architecture. The eTR is also used to register the subnet of the dynamic host toward the mapping server. For our current test purposes this function is initiated at the Aggregation layer (Nexus7000). When achieved with the Nexus 7000, encapsulation and de-encapsulation tasks are computed by hardware with no impact on performances.
- The iTR role that initiates the tunnel router with the remote site. For a comprehensive LISP workflow it is admitted that the iTR resides inside the remote site. Upon a ‘no route’ event or a ‘default route’ event, the iTR requests via the LISP map server to receive the related association between the unknown EID and its current location called RLOC.
- LISP First Hop Router (FHR) addresses the Multi-hop functionality when the End-station ID is not directly routed by the LISP gateway but from an additional router (default gateway) facing the server. For the current test purposes this function is initiated inside the tenant container using the Cloud Service Router (CSR 1000v).
- LISP Mapping Database (M-DB) is responsible to maintain the location of EID in real time and it comprises two sub-functions: Map-Resolver (MR) and Map-Server (MS).
- The Map-Resolver receives the map requests directly from the iTRs requesting a mapping between an Endpoint Identifier and its current location (Locator) and pushes the map requests to the Map-Server system where all eTR are registered, exchanging EID mapping between all LISP devices.
- The Map-Server system collects the map requests and forward it to the registered eTR that currently owns it. The eTR of interest will respond directly to the original iTR seeking for that EID.
These two functions can cohabit on the same physical device but this is not mandatory and they can be distributed over dedicated hosts. For the purpose of our test, both functions MS and MR runs on the same device.
eTR (egress Tunnel Router) and iTR (Ingress TR) are often called xTR as each xTR can act as both iTR and eTR according to the way of the workflow. When the traffic is encapsulated with a LISP header sent toward the remote site (Locator) the LISP gateway acts as iTR, when the encapsulated traffic hits the Locator on the DC of interest, the LISP gateway acts as eTR removing the LISP header and send the original IP packet toward its final destination (EID).
LISP Mapping database initially used a BGP-based mapping system called LISP ALternative Topology (LISP+ALT), however this has now been replaced by a DNS-like indexing system called DDT inspired from LISP-TREE.
The Mapping Database:
The Mapping Database is configured with the EID prefixes concerned by the tunnel router. As each site can act as ETR (send request and return traffic), the Branch_1 office is also configured with eid-prefix. This is optional, it depends if the return traffic must be encapsulated or not. For this test let’s admit that inbound and outbound traffic flow are LISP encapsulated. Thus the three EID prefixes represent different source addresses (end-users and server VLAN) concerned by the encapsulation and their location to know with their respective network mask.
Branch Office (ITR):
For the purpose of this test, the interface loopback address 1 (22.214.171.124) is used to simulate the remote end-user. This address will be used to send pings to the EID.
Data Center 1 : LISP ETR – Nexus 7000:
The eTR is responsible to de-encapsulate traffic from the LISP tunnel initiated from the ITR (branch office).
The loopback address 10 is the well-known ip address used by the RLOC in DC-1.
The database-mapping describes the VLAN 1300 and VLAN 1300 concerned by the mobility of the eid’s.
The Interface VLAN 10 is used as target notification from the downstream LISP First Hop Router (CSR-106). Any new EID move within VLAN 1300 and 1301 will be notified toward Interface VLAN 10.
Data centre 1 : LISP First Hop Router on the tenant container CSR:
Prerequisite prior to configure LISP: LISP requires the Premium license to be activate with XE-3.11 minimum.
The database mapping concerns the VLAN’s 1300 and 1301 from which Applications (EID) will migrate. As soon as a new EID movement is detected the CSR 1000V First Hop Router notifies its upstream LISP gateway (xTR) using the destination IP address of the outbound VLAN10 (10.10.0.254) configured in the DC-1 aggregation 1 (aka eTR).
Let’s see also the configuration details on LISP devices belonging to DC-2
Data Center 2 : ETR – Aggregation Nexus 7000
Data centre 2 : LISP First Hop Router on the tenant container CSR1000v:
The database mapping 10.17.0.0/24 as well as 10.18.0.0 concerns the VLAN 1300 and VLAN 1301 from which the Applications of interest (EID) will migrate. The CSR1000v LISP First Hop Router notifies any movement of EID’s to the IP address of the upstream router achieved by the LISP gateway for the outband VLAN 11 (10.20.0.254 for DC-2 aggregation 1) .
Test results: a brief workflow
- End-user (126.96.36.199) sends Request to the Application (10.17.0.14)
- iTR (branch office) intercepts the user requests and check the localization (RLOC) of the application (EID) with the Map Server (188.8.131.52)
- The Map Server (MS) replies with the location (mapping eid <=> RLOC) of the application being eTR in the primary DC-1
- The iTR encapsulates the packet and sends it to RLOC ETR-DC-L (184.108.40.206)
- The Application migrates to the remote site, similar tenant container to which the VLAN 1300 (Apps) is extended.
- The LISP 1st Hop router (CSR-103 10.1.1.103) on the remote location DC-2 detects the movement of the application and informs its local eTR (10.20.0.254) about the new EID
- Meanwhile the eTR on DC-2 informs the MS about the new location of App in DC-2
- The Map Request (MR) updates eTR DC-1 accordingly
- The eTR on DC-1 updates its table (10.17.0.14:Null0)
- The ITR continues sending traffic to eTR DC-1
- Original eTR in DC-1 replies to the remote iTR from the branch office with a LISP Solicit Map Request (SMR) informing that it isn’t any more the owner of this EID, but it needs the get the new one from the Map Requestor (MR)
- The iTR sends a Map Request to the Map Requestor, get the respond directly from the eTR on DC-2 as being the new RLOC for 10.17.0.14, thus the traffic from iTR being consequently redirected to the RLOC in DC-2 (eTR DC2)
Pings are generated from the remote site end-user (loopback 10 @ 220.127.116.11)
Before the move
Primarily the application of interest (10.17.0.14) is currently located in DC-1. On the eTR DC-1 the LISP dynamic-EID table shows the EID 10.17.0.14 been notified from the LISP FHR CSR-106 interface G2 10.10.0.1.
1- The LISP eTR (DC-1-Agg1) located on DC-1 shows several EID including silent hosts that belongs to different VLAN’s 1300 & 1301. The local EID’s including our EID of interest (.14) are learnt from the CSR 1000V interface 10.10.0.1.
2- On the LISP FHR (CSR-106) in DC-1, the EID of interest is learn form its local interface G1 (connecting VLAN 1300).
3- On the remote site DC-2, the dynamic EID summary from the LISP ETR (DC2-Agg1) shows several EID including silent hosts located on DC-2 from different VLAN’s. The local EID’s are learnt from the CSR 1000V interface 10.20.0.1.
4- The LISP FHR in DC-2 (CSR-103) shows additional local stations that belongs to the same extended VLAN 1300 as well as VLAN 1301.
5- The Mapping Database (Map server) shows the last registered RLOC for 10.17.0.14 being in DC-1 (eTR 18.104.22.168), hence traffic is redirected to this RLOC until a migration is performed.
6- The native routing from the branch office knows nothing about the network 10.17.0.0/24 hence the traffic is encapsulated and routed by LISP.
Note that if the remote knew about how to explicitly route the traffic to the final destination 10.17.0.14, no LISP encapsulation would be achieved. In case of Default route (no explicit route), LISP is therefore invoked..
However the LISP map cache from the remote iTR knows that the EID 10.17.0.14 have been seen last from the ETR located on DC-1 (22.214.171.124) associated with the locator 126.96.36.199 on DC-1.
7- Although there is no native routing toward the final destination, pings from the end-user (loopback 10 188.8.131.52) to the EID of interest succeed as the traffic is routed via LISP. The iTR knows from its map-cache that to reach the final destination, it needs to encapsulate the traffic of interest and send it toward the RLOC located in DC-1. This explains why there is no packet drop for this 1st series of pings.
Migrate the Virtual Machine from DC-1 to DC-2
8- Just after the EID 10.17.0.14 has migrated to DC-2, the CSR1000v running LISP FHR on DC-2 detects immediately the movement; it then notifies its local eTR about the EID of interest, which in turn notifies the Map Server. This detection/notification takes a couple of seconds.
The original LISP FHR on DC-1 (CSR-106) has been updated accordingly and consequently the EID 10.17.0.14 disappears immediately on the LISP FHR on DC-1.
However it now exists now on the FHR located on DC-2 (CSR-103)
9- The eTR DC2-Agg1 has been eid-noticed by its local FHR accordingly. However on DC1-Agg1 has been notified to with an entry “Null0” for the host route 10.17.0.14.
10- The eTR DC2-Agg1 has been eid-noticed by its local FHR accordingly. However on DC1-Agg1 has been notified to with an entry “Null0” for the host route 10.17.0.14.
11- The MS shows who last registered (184.108.40.206 being the eTR in the Data Center 2) the EID 10.17.14.32 since 31 secs
12- The iTR from the branch office is not aware of any movement of the EID until it makes a request to the original RLOC, hence until next request its LISP map cache still shows the original RLOC 12.1.19.
13- A ping from the original RLOC will solicit the iTR for a new map request (SMR) to the Map Server, which in turn will update its LISP map cache with the new location. That explains the 1st ping timeout, while all the following pings work as expected.
14- The ITR LISP map cache is therefore updated with the map source being eTR DC-2 (220.127.116.11)
This series of tests described the process, components and configuration required to access geographically the applications running in a virtual environment within a tenant container (also known as a virtual DC).
The application from a virtual tenant container is able to migrate from one DC without any business interruption to another DC while the user traffic is redirected automatically within a few seconds to the new location:
It demonstrated that a movement of virtual machines moving between fully virtual routed containers can be dynamically detected in real time notifying the upstream physical LISP gateway toward the LISP Map-server about a new location of the EID. The physical Nexus 7000 can centralize the encapsulation/de-encapsulation of the user traffic into a LISP tunnel while the detection of the movement is performed directly inside the tenant container using the CSR1000v.
This approach offers two main flavours:
- It offers a very granular deployment of tenant containers maintaining the autonomous approach for each virtual DC, allowing selectively the detection of new virtual machines to trigger a dynamic redirection of the traffic flow from a branch office to the DC hosting the active virtual machine.
- It gives a huge flexibility to add any additional layer 3 device bumped in the wire between a fully virtual container (and/or bar-metal server) and the core layer of the Data centre while maintaining the detection of any movement of EID ‘s between virtual containers.