27 – Bis – Path Optimisation with ASA cluster stretched across long distances – Part 2

How can we talk about security service extension across multiple locations without elaborating on path optimisation ?  :)

Path Optimization with ASA Cluster stretched across long Distances

In the previous post, 27 – Active/Active Firewall spanned across multiple sites – Part 1, we demonstrated the integration of ASA clustering in a DCI environment.

We discussed the need to maintain the active sessions stateful while the machines migrate to a new location. However, we see that, after the move, the original DC still receives new requests from outside, prior to sending them throughout the broadcast domain (via the extended layer 2), reaching the final destination endpoint in a distant location. This is the expected behavior and is due to the fact that the same IP broadcast domain is extended across all sites of concern. Hence the IP network (WAN) is natively not aware of the physical location of the end-node. The routing is the best path at the lowest cost via the most specific route. However, that behavior requires the requested workflow to “ping-pong” from site to site, adding pointless latency that may have some performance impact on applications distributed across long distances.

With the increasing demand for dynamic workload mobility across multiple data centers, it becomes important to localize the DC where the application resides in order to dynamically redirect the data workload to the right location, all in a transparent manner for the end-users as well as the applications and without any interruption to workflow.

The goal is to dynamically inform the upward layer 3 network of the physical location of the target (virtual machines, applications, HA clusters, etc.), while maintaining the active sessions stateful and allowing new sessions to be established directly to the site where the application resides. Maintaining the sessions stateful imposes one-way symmetry, meaning that the return traffic must hit the original firewall that owns the session. However, all new sessions must take advantage of the path optimization using dynamic redirection through only local firewalls.

There are two ways to efficiently achieve these requirements:

  1. The first is to use LISP mobility, as discussed in post 23 – LISP Mobility in a virtualized environment (update). LISP establishes a network overlay between Tunnel Routers (xTR) in the path between distant sites. When a machine is moved across two data centers, a notification is sent to a LISP database mapping the end-nodes with their respective last known location, which in turn notifies the LISP Egress Tunnel Routers (eTR). The session is therefore LISP-encapsulated to the data center where the application of interest currently resides. The underlying layer 3 network is not concerned or affected by any dynamic change configuration, which is a huge added value.
  2. The second is a new network service called LISP IGP Assist. Actually, it’s a sub-function of a LISP First Hop Router (FHR), which is leveraged to detect the machine’s movement and redistribute the LISP route of the specific end-node (/32) into the IGP routing tables on each site. With a more specific route, the routed traffic is natively directed to the DC where the application of interest exists.

1. ASA Cluster and LISP Mobility

In the following scenario an ASA cluster built with four ASA units, two on each site, is stretched across DC-1 and DC-2. The ASA cluster is configured in Spanned Mode (cLACP). Note that it could be configured in Individual Mode, nonetheless it’s going to be exactly the same configuration and behavior for OTV and LISP services. OTV is used here to extend the ASA Cluster Control Link (CCL) as well as the data VLAN’s of the compute layer affected by the movement of the VM’s. We assume that the ASA cluster is configured in Transparent Mode. However, when writing this note in regard to ASA cluster 9.2(4), the routed mode is not yet qualified in Spanned Mode in a DCI scenario.

Let’s go step by step to better dig through the end-to-end workflow.

An end-user from a branch office requests access to an application that resides in DC-1 (1).

The request is intercepted by the Ingress Tunnel Router (iTR).

The iTR solicits a request (2) from the mapping database (aka “Map Requestor Service”) for the location of the application.

The LISP Map Server (MS) relays the request to the eTR in DC-1 where the application resides. This redirection via the eTR aims to validate the path between the relevant eTR and the iTR.

Consequently, the ETR in DC-1 informs the iTR about the location for Subnet A residing in the eTR of DC-1 (3).

 

Now the iTR records the location of the specific end-node (application). It can now encapsulate the user request with a LISP header and route it toward the eTR in DC-1 (4).

The eTR in DC-1 strips off the LISP header and routes the packet to the next hop to reach the subnet where the application exists.

The first level of the switch computes its LACP hashing algorithm, selecting the link to ASA-2 and forwarding the packet accordingly.

ASA-2 encodes cookies with its identifier and forwards the request to the server (5).

 

The application “hot” migrates to DC-2 (6) and replies to the user request (8) while maintaining a stateful session.

LISP Control Plane: In the meantime, the LISP FHR detects the presence of the new virtual machine (the application) and notifies its local eTR (7) in DC-2 about this new End-Node Identifier (EID).

Subsequently, the eTR in DC-2 informs (10) the Map Server (M-DB) about the new location of the EID. The Map Server then signals the original eTR in DC-1 about the change of location. The eTR in DC-1 updates its LISP table accordingly with a “Null0” entry for the end-node of interest.

Data Plane: Consequently, the ASA-3 receives the response from the server (8), decodes the TCP cookies, learns the owner of the TCP session and redirects the packet toward ASA-2 over the CCL (9).

The eTR on DC-1 encapsulates the IP packet for the iTR (11), which de-encapsulates the LISP header and routes it to the end-user.

One-way symmetry has been established via the CCL. The session has not been interrupted; the movement of the application has been transparent for the end-user and the application.

The end-user continues using the application and sends the next request to it.

 

The iTR has cached the original location of the application into its local database. Hence, it still encapsulates the IP packet for the eTR in DC-1 (12).

The eTR in DC-1 has been previously notified that it is no longer the owner of this end-node identifier and replies with a “Solicit Map Request (SMR)” (13) to the iTR in order to confirm with the Map Resolver (14) the new location of the application (EID). It then receives the new location (15) for the application with the destination the egress tunnel router eTR located in DC-2.

 

As the result, the iTR encapsulates the end-user’s request with a LISP header and sends it toward the eTR in DC-2 (15).

ASA-3 receives the IP packets and determines that the owner is ASA-2, to which it redistributes the workflow via the extended CCL. ASA-2 treats the packet accordingly and routes it toward the next L3 hop, which forwards it to the machine supporting the application (16).

The application responds to its local default gateway (17), thanks to FHRP filtering, which distributes the return packet to ASA-3. The LACP hashing algorithm could have also elected the ASA-4; however, the behavior and final redirection via the CCL would have been exactly the same.

ASA-3 notices again that the owner is ASA-2, to which it redistributes the packet via the CCL.

The eTR encapsulates the IP packet and forwards it to the iTR (18).

The session is maintained stateful with zero interruption.

The drawback of maintaining active sessions stateful is hairpinning workflow for the duration of the session. Thus, a question that comes to mind is:

  • is the application response time still performing as before?

If the distance is very long, the active sessions may be impacted the moment the data leaves. Nonetheless, that’s only limited to the current active and stateful sessions.

 

 

For any new sessions, the traffic will be directed to the DC where the application resides. In addition, the traffic will be validated and inspected using one of the local ASA members (20).

Hence, while the current sessions are maintained stateful through the firewall that owns the active data flow, any new sessions will take the shortest path to reach the application with their respective local firewalls treating the flow accordingly. The application response time will be improved due to the distance between the end-user and the application being shortened and hairpinning reduced, if not eliminated.

2. ASA Cluster and LISP IGP Assist

Some enterprise network managers may think that LISP IP mobility might be a bit complex or challenging to deploy. The reality is that there is nothing very complex per se, as detailed here, but there are indeed several different LISP functions running on different platforms that require different configurations, including LISP iTR configuration at the branch offices. Another component to take into consideration is that the complexity depends on whether the enterprise owns the WAN edge layer or if it is layer 3 managed services.

Whatever the original reason is to not deploy LISP VM mobility, IGP Assist may be a nifty trick to think about as a slight alternative to LISP Mobility. Indeed, we just need to leverage the basic function of a LISP FHR, detecting any movements from the EID side and relaying this change to the IGP routing tables. Currently, IGP Assist is only supported on the Nexus 7k.

If you remember the great function of Route Health Injection (rest in peace) available on some SLB devices (e.g. CSM, ACE), it will be easy for you to deploy IGP Assist.

The main concept is to inject /32 upward to the Internet access (Intra- or Extranet usually) to direct traffic to the right site where the mobile application currently resides using a more specific route.

Let’s have a look, step by step, how this is achieved, and let’s apply the same scenario that we used with LISP mobility, still with OTV as the DCI solution deployed for the LAN extension.

From a routing point of view, this is traditional IGP access (Enterprise Intra-DC L3 Routing) connected to one or multiple ISPs built with an E-BGP network (L3 Internet Core).

By default, the traffic to the application (subnet A) is directed to DC-1 with the most specific route /25.

Usually for an HA cluster failover, the network manager performs a manual redirection process by withdrawing the /25 of the subnet of interest from DC-1, consequently all traffic to that subnet is re-routed to DC-2 in a few seconds (directed to the next most specific route /24).

And that’s what IGP Assist is aiming to achieve automatically with host-based granular redirection (/32).

The virtual machine (end-node) supporting the application migrates (“hot” live migration) from DC-1 to DC-2 across the LAN extension (1). The application server continues replying to the original request maintaining the whole session stateful without any interruption.

The new EID is detected on DC-2. IGP Assist notifies all LISP FHRs (2), or more precisely, the redundant LISP FHR in DC-2 as well as the two LISP FHRs in DC-1.

Meanwhile, the LISP FHR redistributes the LISP routes into the IGP routing table (3).

As the result, the IGP control plane handles this notification, the data traffic hits the local ASA-3 in DC-2, which notices that the owner of the stateful session is ASA-2. Immediately, ASA-3 redistributes the packet flow to ASA-2 via the CCL, which in turn forwards it toward the requestor throughout the WAN edge layer on DC-1.

Following the LISP route redistribution into the IGP routing protocol, on DC-2 where the detection of the EID has been made, IGP Assist installs a /32 LISP Interface Route of the host (EID or virtual machine supporting the application of concern). On DC-1, the original location of the application, it removes the /32 LISP Interface Route of the same host that has previously moved out there.

Consequently, the host route (/32) is propagated to the core via the DC-2 WAN edge with a more specific route, directing the traffic to its destination directly through DC-2.

Thanks to the ASA clustering this automatically maintains the current session stateful (7). Similar to the LISP IP Mobility previously described, when deploying the ASA cluster in conjunction with LISP IGP Assist, the path to the active DC where the application resides is optimized for all new sessions (6). The current sessions are maintained stateful through the original owner.

The drawback of this solution is that it injects a /32 host route to the L3 core. If the enterprise owns a block of IP address space or the usage of the L3 core is an intranet using a private IP network, this is straightforward. However, it might be challenging, if not impossible, with multiple ISPs to deal with a /32. Thus, in addition to the negotiation process, it may require a mechanism such as an AS path prepending or BGP conditional advertisement to identify the site where the host route is active and to redirect the traffic accordingly.

Conclusion

With the rapid growth of virtual machines and the easy mobility service that brings the abstraction of the OS from the physical host, it has now become usual that resources supporting the application happen on multiple locations that may be separated by long distances.

Therefore it turns out the IT manager is able to dynamically redirect in real-time the data user workflow in a very granular fashion to the site where the application resides.

In the meantime, the security stateful firewalls impose one-way symmetrical establishment with the return traffic, meaning that a “ping-pong” effect extending the workflow grows as the multi-tier application moves from one site to another.

In order to reduce the pointless latency from the hairpinning workflow as much as possible while achieving elasticity with the virtual DC through stateful live migration, we can leverage multiple network and security services to work in conjunction. Nevertheless, these solutions are not directly related. Hence, any one of them can be enabled for a specific use, for exampe if you think that reducing the latency has no impact on your application’s performance (e.g. metro distances).

What follows is a high-level summary of the different options with their respective added values:

Option A:

1 – ASA clustering spanned across multiple locations

  • All ASA units are active.
  • Sessions are maintained stateful with redirection via the CCL and without any artificial cheats such as source NAT.
  • New sessions are treated locally.
  • The maximum distance time-wise for the Cluster Control Link (CCL) is 20ms roundtrip (theoretically 2000km maximum between two sites[1]).
  • All configurations are synchronized between all ASA units.
  • Not limited to two sites.

2 – OTV for DCI LAN extension
(A long list. The following items constitute the most important ones only.)

  • Failure domain containment
    • Eliminate flooding of unknown unicast
    • STP domain confined inside the physical data center
    • Control plane independence
    • Site independence
  • Dual-homing with independent paths
  • Reduced hairpinning
    • Internal site behaves like any traditional switch with local lookup and forwarding
    • FHRP isolation support
  • Fast convergence
  • Transport-agnostic
    • IP, MPLS, Dark Fiber, etc.
    • IP multicast transport or unicast-only
  • ARP caching
  • VLAN translation
  • Control plane learning
  • Native multi-homing (no STP)
  • Load balancing
    • Today: VLAN-based
    • Future: flow-based

3 – LISP IP mobility for ingress path optimization

  • For our purpose of a “hot” live migration across two sites, we leveraged LISP mobility in Extended Subnet Mode or ESM (LAN extension).
  • Provides dynamic redirections for ingress traffic toward the site where the active application resides.
    • Requires an additional egress redirection such as FHRP localization.
  • Offers host-based granular and dynamic signal notification.
  • Doesn’t impose any configuration change on either the IP core or in the DNS service.
  • Allows end-point migration to a new location without changing the assigned IP address.
  • Detects dynamically any IP movements based on data plane events.
  • Selective prefix allowed to roam.
  • Works in conjunction with any type of LAN extension.
  • Endpoint-agnostic
    • No special agent running on the host required.

4 – FHRP isolation for egress path optimization

  • Available for any network redundancy protocols (VRRP, HSRP).
  • Offers same active default gateway in different locations.
  • Optimizes the outbound traffic (server to client).
  • Optimizes the intra-tier application traffic (server to server).

 Option B:

1 – ASA clustering with all firewalls active

idem as above

2 – OTV for DCI LAN extension

idem as above

3 – LISP IGP Assist for ingress path optimization
  • Simple and easy configuration
  • Host route-based notification
  • Endpoint-agnostic
    • No special agent running on the host required.
  • Works in conjunction with any type of LAN extension.
  • Can be deployed without LAN extension for “cold” migration.
    • Requires a LISP Mapping database to signal the remote LISP FHR engines.
  • Detects dynamically any IP movements based on data plane events.

4 – FHRP isolation
idem as above

Additional official guides (to list just a few):

 

 

 



[1] Assuming speed of light takes 1ms to travel 200km one way.
This entry was posted in DCI. Bookmark the permalink.

Leave a Reply