27 – Stateful Firewall devices and DCI challenges – Part 1

Note: Since I wrote the following articles on ASA clustering stretched across multiple locations, additional improvements have been made to address some of the concerns listed in post 27.x. Please have a look at the ASA release-notes (especially 9.5(1) and 9.5(2)).

  • 9.1(4) Geographically dispersed ASA cluster up to 10ms of Latency
  • 9.2(1) Validated Spanned Interface mode (L2) North-South Insertion
  • 9.3(2) Spanned Interface mode (L2) – East-West Insertion
  • 9.5(1) Site Specific Identifier and MAC address
  • 9.5(2) LISP Inspection for Inter-site Flow Mobility

Use the last configuration guide for the updated features, not discussed in this post




Stateful Firewall devices and DCI challenges

Having dual sites or multiple sites in Active/Active mode aims to offer elasticity of resources available everywhere in different locations, just as with a single logical data center. This solution brings as well the business continuity with disaster avoidance. This is achieved by manually or dynamically moving the applications and software framework where resources are available. When “hot”-moving virtual machines from one DC to another, there are some important requirements to take into consideration:

  • Maintain the active sessions stateful without any interruption for hot live migration purposes.
  • Maintain the same level of security regardless the placement of the application
  • Migrate the whole application tier (not just one single VM) and enable FHRP isolation on each side to provide local default gateway (which works in conjunction with the next bullet point)
  • While maintaining the live migration, it can be crucial to optimise the workflow and reduce the hair-pining effect as much as we can since it adds latency.  As such, the distances between the sites as well as the network services used to optimize and secure the multi-tier application workflows amplify the impact of performances.

As with several other network and security services, the firewall is a stateful device that imposes a one-way symmetrical establishment. That means return traffic must hit the owner of the session, otherwise, the packet is dropped. Traditionally firewalls are deployed by a pair of devices in an Active/Standby manner, with dedicated layer 2 adjacency links to synchronize the states of all sessions and to probe the health of its peer. When the active firewall is stopped, the standby takes over, maintaining all active sessions stateful in a transparent manner for the application and for the end-user. As of today, most enterprises are deploying their perimeter firewalling in Active/Standby mode mainly for tightly coupled data center designs (metro distances using fiber links).

Figure 1:  Typical tighly-coupled DC deployment with firewalling. The primary DC-1 attracts all the traffic for the application of interest (best metrics). By default it is expected to maintain the session workflow within the same DC. 

As discussed in this previous high level post 13 – Network Service Localization and Path Optimization,  as a result of state failover of network services as well as application mobility, it is not rare to see 10 to 20 roundtrips between the two sites for the same active session. This is forced by all the stateful devices in the path imposing a one-way symmetrical establishment with the return traffic. This includes the security WAN edge, IPS, SSL offloader, SLB devices as well as the default gateways between application tiers, just to list the most common stateful devices.  Figure 2: The same application has moved to the secondary DC and a failover happened on the first firewall. NB: this is a basic design to keep the logic simple—usually additional stateful devices (SLB, SSL, IPS, WAAS, etc.) exist along the path, hence you can infer a longer final ping-pong effect.

If we consider that the signal propagation delay takes 1ms roundtrip to travel a 100km distance from each data center, 10 roundtrips bring almost 10ms between request and response for the same session, which might have a performance impact on the application. In the context of metro distances, it has been usually well accepted by network managers to work in “degraded” mode during maintenance windows, as these were fully controlled by the network and security organizations.

With the increased demand of virtual machines and dynamic workload mobility, it becomes challenging to control all of the component states and placement impacting the application workflow. Hence, the desire to control dynamically the optimum path to reach the application.

ASA Firewall clustering

Last year at Cisco Live in London, I presented a new concept based on firewall clustering to improve the DCI architecture; however, this enhanced solution was not yet supported due to some limitations with the ASA code (9.0) as well as the lack of testing.

Since v9.1(4) and recently version 9.2(1), the ASA clustering software has been improved to support long distances between members of a cluster (up to 10ms one-way latency) and several designs have been tested and qualified in DCI deployment scenarios. Thus, the excitement to post this article now :).

There are several detailed documents available on ASA clustering itself. Hence, for purposes of this post let’s focus only on the mechanisms that we can leverage in a DCI scenario. For further details on the ASA cluster, I recommend to read the configuration guide, which gives all details and explains the concept and nomenclature of ASA clustering. You will also find many great posts from others on the web.

Originally, ASA clustering aims to provide high-scale firewalling by stacking several physical ASA devices to form a single logical high-end firewall. All ASA devices are active and work in concert to pass connections as a single firewall.

To achieve this function, a “new” component called Cluster Control Link (CCL) was created to collapse all physical members of the ASA cluster together to form a single logical firewall. The CCL is used for the control plane, health-check, state sync, config sync as well to redirect the data plane traffic to the original owner of the session when needed. Remember, classical A/S or clustering A/A mode, it is mandatory that the return traffic hits the owner of the session, otherwise the packet will be dropped. With ASA clustering, this rule still applies, but instead of dropping an asymmetric flow, the firewall that owns the session is known by all other ASA devices, thus the packet is automatically redirected to the original owner via the CCL. The traffic is load-distributed in a clever fashion between the members of the ASA cluster.

There are two possible modes to load balance the data traffic from the upstream device across the ASA units:

  • Individual Interface Mode (layer 3) using ECMP or PBR
  • Spanned Ether-Channel Mode (layer 2) using LACP

Figure 3: Individual Interface Mode (left) versus Spanned Ether-Channel Mode (right) in a fully redundant deployment using Multi-chassis Ether-Channel (MEC).

In both modes, each ASA device is dual-homed using a virtual port-channel toward a Multi-chassis Ether-Channel engine (e.g. vPC) and the traffic from the upstream device is layer 3 load distributed (ECMP or PBR) for the Individual mode (left) or is layer 2 distributed (LACP) for the Spanned mode. It is not possible to mix different modes for the same cluster.

From a protocol point of view, a unique device can exist on each side of the LACP establishment between two entities.

Figure 4: A single logical device back to back for the LACP establishment.

Thus, in Spanned Mode, in order to form a logical LACP peer device, on the network side, the upstream pair of Nexus switches uses a virtual port-channel (vPC) and on the firewall side, the ASA cluster uses an enhanced LACP mode called cluster LACP (cLACP), which forms a virtual port-channel extended across all ASA units in the cluster, using the same IP address and the same virtual MAC address.

The ASA clustering deployment design is flexible. It can be deployed using a single port-channel to the same insertion point at the aggregation layer. As a result, both external (non-secured) and internal traffic (secured) traverse the same physical port-channel. The separation is achieved at layer 2 using VLAN tagging. The other method is to deploy the ASA cluster in sandwich mode between two Virtual Device Contexts (VDC) in order to physically separate external and internal traffic, each dedicated to an inside and outside port-channel as discussed in this post. This article relies on that method of sandwich mode providing solid hierarchy architecture.

Figure 5: Don’t be confused between the forwarding mode of the firewall and the interface distribution mode among the ASA cluster. The ASA cluster can be running in Routed Mode or Transparent Mode like most firewalls, and with or without multiple contexts. It is important to clarify which mode is supported. For example, if an enterprise wants the firewall to run in Transparent Mode, the only option to distribute the load among the ASA units is Spanned Ether-Channel Interface Mode.

Whichever load balancing protocol is used, there is no symmetrical algorithm that ensures return traffic re-uses the same path in reverse. Hence, it is not guaranteed to hit automatically the original owner of the session. The ASA cluster gets around this one-way symmetrical establishment by redirecting the session to its owner via the CCL.

Figure 6: TCP hand check establishment with redirection to the session owner over the CCL

  • In the example above, a new session (TCP SYN) is established, hitting the ASA 2, which encodes SYN cookies with its own information (1) and then forwards the SYN packet to the next destination (2).
  • When the TCP SYN-ACK is responded to, the traffic is load balanced based on the layer 2 or layer 3 source and destination identifiers and sent toward a new ASA; in our example the TCP SYN/ACK arrives at ASA 3.
  • ASA 3 decodes the owner information from the SYN cookie and notices that ASA 2 is the owner (4) for that session.
  • ASA 3 immediately forwards the packet to the owner unit ASA 2 over the CCL, which in turn forwards the SYN ACK via its inbound interface.

Consequently, the CCL must be dimensioned according to the speed of the inbound and outbound interfaces (e.g. 2 x 10GE for resiliency and performance).

Figure 7: To increase resiliency, it is recommended to dual-home each data interface using a port-channel split between the two upstream switches. There is one port-channel per ASA device.

ASA Firewall clustering spanned across multiple sites

Now that we understand how ASA clustering is created, let’s see how to leverage the clustering mode for a DCI solution.

For our purposes and examples, an ASA cluster formed with four physical ASA units is stretched across two DCs, with two ASA members on each site. We will keep this scenario for the whole article, but nothing prevents us from adding more ASA units (up to 16 units per ASA cluster are currently supported) stretched across three or more DCs interconnected using LAN extension (we are using OTV for multi-site LAN extension).

ASA cluster Configuration

Some of the added value of deploying the ASA cluster is that configuration is synchronized between all units, hence there is no need to manually replicate all policies on each unit, thus avoiding risk of misconfigurations.

Extending the Cluster Control Link across 2 DC

The first component to extend between the DCs is the Cluster Control Link. This CCL connection must be fully resilient, as discussed previously, and extended across sites using a solid DCI LAN extension. Each ASA uses a dedicated port-channel split between the two vPC peers. From the upstream logical switch, the port-channels are not related except that they carry the same VLAN from ASA to ASA units across the two sites. From a security point of view, it is preferable to deploy it inside the secure perimeter.

Figure 8: CCL deployment, each ASA uses a dedicated port-channel split between the two vPC peers.

As discussed above, there are 2 mode to load distribute the data traffic among the ASA members: A layer 3 mode called Individual interface mode and a layer 2 mode called Spanned Etherchannel mode. Both modes are valid for DCI with some slight different added values for one mode versus the other that will be discussed has as we move forward.

Extending the Data plane using Individual Interface Mode

The ASA units are deployed in sandwich mode between the inside and outside routers and ECMP is used to load distribute the traffic across the local ASA members. The CCL as well as the data plane VLANs are extended between sites.

Figure 9: For the purposes of this topic, the CCL is represented using a logical straightforward link (orange) extended between the two DCs; however, in the final deployment it should be distributed in a sturdy and redundant fashion as described in Figure 8.

The application data traffic (layer 2) is isolated by a layer 3 hop. The CCL is also isolated from the data workflow. However, nothing prevents network managers from collapsing the CCL VLAN with the application data VLANs within the same overlay network as the segmentation is therefore maintained at layer 2 with dot1Q tagging.

From a logical layer 3 point of view, IGP adjacency is established between the inside and outside routers through the local ASA units, as well as between sites (higher cost across sites used for disaster recovery purposes).

Figure 10: The default gateway is unique and still active in DC-1. After the migration of the application, the traffic hits the original router, returns to DC-2 in regard to the communication with the application upper tier, to finally return to DC-1. The ping-pong effect starts after the move.

  • In the scenario on the left, the primary DC-1 attracts the request for the application that exists in DC-1.
  • The default gateway for the application is active in DC-1 and standby in DC-2.
  • The ASA-1 (far left) becomes the owner of the session and routes the packet to the application.
  • In the scenario on the right, the application has moved (hot stateful live migration) and continues to respond with no interruption.
  • The return traffic hits the active default gateway in DC-1 which routes the packet toward the frontend server.
  • The frontend server responds to end-user via its default gateway (DF) active on DC-1.
  • The DG on DC-1 distributes the packet to the ASA-2.
  • ASA-2 checks the owner of the session and redirects the packet to ASA-1 over the CCL.
  • The workflow exits DC-1 toward the end-user.
  • The session is maintained stateful with zero interruption

Beyond the ASA clustering, the concern is that, even if the tiers of the applications are all moved to the distant DC-2,  the routed communication between the tiers is established via the active default gateway still located in DC-1. Hence the traffic from the back-end to the front-end is hair-pinned via DC-1.

Consequently there is a the great interest for most of network managers to enable HSRP isolation to improve the server to server communication as shown below.

Figure 11: HSRP isolation reduces the hair pining, however the return traffic must still hit the owner of the session to maintain the establishment stateful.  

With FHRP isolation techniques, server-to-server communication is routed locally, eliminating the pointless latency (far right bottom). The outbound traffic from the frontend server toward the end-user is routed to the upstream local firewall (shortest path). However, as we want to maintain the session as stateful, the local ASA-3 redirects the traffic workflow to its original owner (ASA-1) via the CCL extension.

While machines migrate from one location to another, the session is maintained stateful with zero interruption. However, although the application has moved through DC-2, the next request will hit DC-1 until we inform manually or dynamically the layer 3 network about the move. This is discussed in Part 2, next.

The added value with Individual Interface Mode is that it maintains the layer 2 failure domain in isolation from the ASA control and data planes. Traffic is routed up and down the ASA cluster. The application data VLAN can be extended in a transparent fashion using any validated DCI solution. However, the other side of the coin is that the ASA unit cannot be the first hop default gateway for the application, as another layer 3 router separates it. Thus, Individual Interface mode might be challenging for enterprises that would like the firewall to be the default gateway of the application servers. Another point to mention about Individual Interface Mode is that it doesn’t support firewalls configured in Transparent Mode.

Extending the Data plane using Spanned Ether-channel Mode

In our cluster, all ASA members share the same IP address and can be the first hop default gateway for the application (not yet qualified though). However, for the last option we need to be very cautious with the layer 2 extension and the vMAC address.

Figure 12: The ASA Cluster LACP (cLACP) is spanned across the 2 data centers. The LACP established from the vPC peers is local. A layer 3 device isolates the ASA Spanned Ether-channel interface from the application data VLAN.

From each vPC peer, a local port-channel is established on each site between the local ASA units. From the ASA cluster, a single LACP port-channel is spanned across the two distant DCs (cLACP). cLACP imposes to span the same port-channel across the same ASA cluster, therefore the same vPC domain identifier must be identical on each vPC peer.

In regard to workflow, the behavior is the same as with the Individual Interface Mode. Note a slight difference in case of failure; convergence with LACP should happen faster than with ECMP or PBR.

Figure 13: The ASA Cluster is running in transparent mode, HSRP isolation is enable improving the workflow while the session is maintained stateful.

When the ASA cluster is running in routed mode, the members share the same IP address and same vMAC address. Consequently to avoid any duplicate MAC address to appear from different switch ports, a router on each site can be added (preferred method) to separate the layer 2 data traffic between the spanned ether-channel and the LAN extension. Indeed, if the VLAN attaching the front-end servers is L2-adjacent with the firewall, then the vPC peer will detect the same vMAC address bouncing from different interfaces (toward the ASA members and from the data LAN extension, which is definitely not an expected situation in Ethernet).

Figure 14: The same vMAC is learnt from both sides of each vPC peers. A challenging situation definitely not supported by the protocol Ethernet.

To prevent this situation with the same MAC address learnt on different sides, a layer 3 gateway is inserted to separate the L2 data traffic between the spanned ether-channel and the extended data VLAN. It also prevents layer 2 loop in case of a human design mistake.

Figure 15: a router inserted between the data VLAN and the spanned port-channel isolates the duplicate MAC address.

In the figure above, the design on the left shows the layer 3 separation between the spanned ether-channel VLAN 10 and the application data VLAN 20. Only VLAN 20 is extended. On the right side, the application servers are L2-adjacent with the inside interfaces of the ASA units via VLAN 10. As a result, the vPC peers on both sites learn the duplicate vMAC address from their respective ASA units and from the extended VLAN 10, which is not acceptable.

If you are willing to offer the default gateway from the firewall (not yet qualified though), it definitely requires filtering the vMAC address between sites as well as the ARP requests.

Figure 16: Careful: Please don’t get me wrong – currently this is neither recommended nor supported. Don’t do the following until you understand the exact ramifications 🙂 .

To filter the vMAC address with OTV, you will need to perform the following on the internal interface of each OTV edge device.

The following access list will block the vMAC 1111.2222.3333 shared between all ASA units:

In addition we need to apply a route-map on the OTV control plane to avoid communicating vMAC information to the remote OTV edge devices.

Indeed OTV uses the control plane to populate each remote OTV edge device with its local layer 2 MAC table. As this MAC table is built from its regular MAC learning, it is important that OTV doesn’t inform any remote OTV Edge device about the existence of this vMAC address as it exists on each site.

However a possible drawback of the Spanned mode is that all ASA are active and share the same IP and MAC addresses, hence an ARP request will hit all units that forms the ASA cluster and all members will reply with the same source (IP and vMAC). Regarding the local ASA, the reply passes along the unique local port-channel, which is fine, however it could be tricky if the reply comes from the remote site via the DCI link. Fortunately the vMAC of the remote ASA will be blocked as described with the previous access list. Therefore the ARP reply will come only from the local ASA, which is the desired behaviour. However to reduce the broadcast traffic, you may want to filter the ARP request destined to the default gateway, across the DCI connection.

Some additional recommendations

It is recommended to enable JUMBO frame reservation and MTU cluster at least at 1600 for use with the cluster control link. When a packet is forwarded over cluster control link, an additional trailer will be added, which could cause fragmentation. Set this to 9216 to match the system jumbo frame size configured on the N7k. Hence, the MTU of the IP network inter-sites must be sized accordingly.

For a deep understanding of what is supported and what is not, please follow the inter-site Clustering guidelines recommendation in the ASA 9.2 Configuration Guide.

This entry was posted in DCI. Bookmark the permalink.

22 Responses to 27 – Stateful Firewall devices and DCI challenges – Part 1

  1. Thank you for this amazing post: it is clear, detailed and helpful.
    I believe Cisco should enhance the way clustering works between ASAs: instead of redirecting all return traffic to the owner of the flow, ASAs should truly share all their flows states with all other members in the cluster so that any ASA can directly inspect the return traffic.
    This would simplify the operations and optimize the return traffic flows.

    • Yves says:

      Thank you Jean-Christophe,

      I got your point. It would be great indeed, but the concern is that as security stateful service, a single unit can be responsible to process all the packets for the session flows it ACK’ed to apply TCP stateful checks. And that’s why the ASA clustering in conjunction with LISP mobility become very efficient. All the current active sessions are maintained statul redirected to the original owner to be maintained stateful while all new sessions are using the local firewall units.

      Kind regards, yves

      • There’s an ongoing discussion about this particular topic in the following thread: https://learningnetwork.cisco.com/message/440977#440977 All your comments are most welcome.

        As a side note, you should able to answer some of the points which triggerred that thread in the first place: https://learningnetwork.cisco.com/thread/74057?tstart=0

        • Yves says:

          Hi Jean Christophe,

          Thank you for the pointer, this is a great discussion.
          Forgive me I just realized that I never came back to this point.
          I don’t have more official answers that exist on the Cisco design guide, but hereafter are my personal thought 😉

          We need to consider the official “supported” design for each specific deployment. What is supported when the ASA clustering is deployed locally is not necessarily supported in a stretch deployment across long distances and multiple sites.
          e.g. I mentioned in this post that firewall routed mode is supported in spanned interface mode. Which is true when the ASA cluster is traditionally deployed locally (that is my short summary of the ASA cluster in a single campus deployment, not DCI).
          The ASA Cluster Config Guide (http://www.cisco.com/c/en/us/td/docs/security/asa/asa92/configuration/general/asa-general-cli/ha-cluster.html#pgfId-2577768) mentions that the firewall routed mode is NOT supported when the interface distribution mode is configured as spanned mode. Which is officially true when the ASA cluster is stretch across inter-site/DCI deployment.
          However what “supported” means is that the firewall routed mode with interface distribution is spanned mode has not been tested nor qualified in a inter-site environment (over long distances), hence it is not validated yet, consequently it is not yet supported.
          Having said that, it doesn’t mean that it doesn’t work. We would need to run all possible tests with all possible scenario and make sure we are not facing any caveat.
          I personally don’t see any major issue in a stretch design to initiate the firewall in routed mode while the distribution is made using spanned mode.
          But I admit that it will be great to find the time to run deeper test and with different platforms.
          I took your point and I will discuss this scenario with our qualification process and see if they can validate it (of course if there is no sad roadblock).


          Kind regards, yves

  2. Matteo says:

    Thanks Luis, very helpful post!

    i want just to ask clarification one one point if possible:

    In Spanned Ether-channel Mode you add a routing layer between ASA and data vlan in order to use an interconnection subnet beween ASA and router that is not extended. This make it possible to use the same ip on the asa in the two different DC, Am i right?

    If this is the case how to deal with scenario where the ASA is the default gateway for several vlan? adding a routing layer for each vlan doesn’t seem to be the best solution.

    Looking the following link http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Data_Center/DCI/4-0/EMC/implementation_guide/DCI4_EMC_IG/EMC_1.html#wp1267459
    it seems that using FHRP filtering the vmac learned from the DCI is not added to the mac table as the local static entry override this info. Is this a possible option also in case of ASA clustering or this is stil unsopported?

    kind regards


    • Yves says:

      Hi Matteo,

      This is correct, when configured in spanned mode, all members of the ASA clusters share the same identifiers (virtual IP & MAC addresses).
      The original purpose of DCI being primarily to extend the VLAN for the applications. If the VLAN for the apps of interest (e.g. Web server VLAN) is also extended toward the port-channel distributing the local ASA’s, thus the same vMAC will be seen from different interfaces (once from the local ASA’s and another from the remote ASA’s over the DCI http://yves-louis.com/DCI/wp-content/uploads/2014/07/ASA-cluster-DCI-duplicate-vMAC.png), which goes against the Ethernet rules, thus the basic need to bring an additional routing layer between the port-channel and the Apps VLAN to easily block that duplicate vMAC address. As the result though, with this design, the ASA canNOT act as the default gateway for the servers.

      That said, theoretically it is possible to directly filter this vMAC address across the DCI internal interfaces and make therefore the apps VLAN layer 2 extended toward the ASA port-channel: The ASA becoming the first hope router for the apps. There are different ways to filter the MAC address, depending on the DCI solution (The link you pointed out is one option that drops the HSRP ip and its vMAC addresses that you can adapt to filter the ASA vMAC; or you can refer to this post which includes an example described just after figure 16, although not yet tested as far as I’m know). Therefore you should be able to use the ASA as default gateway for the desired VLAN’s.

      Hopefully after Cisco Live Milan, I will make a slight update on this.

      Thank you, yves

  3. extmode says:


    What about Inter-Site Cluster for asa 9.6(2)?
    Routed mode with spanned interface or maybe use transparent mode ?
    Explain, please about best implementation, because even in cisco guide there is no specific and precise information on how to implement this scenario and the solutions vary from version to version.

    • Yves says:


      The features for Inter-sites clustering actually started with 9.5(1) with the concept of site-specific ID and MAC addr. followed by LISP inspection that came with 9.5(2) as mentioned at the beginning of this post 27.1.

      With this feature, when ASA members are configured in routed mode using Spanned EtherChannel, it is now possible to configure a site-specific MAC address per site, as a result, data packets are routed by the FW with a unique source MAC address. However, for the data VLANs of interest extended across the sites using OTV, or any DCI technology, it is required to create a filter to prevent traffic sent toward the global MAC address over the DCI. The key reason is that even though the site-specific MAC address is used by each ASA gear, routing the data packets, nonetheless, the ingress data packets to the ASA cluster are sent using the global MAC address, so data packets can be received by any of the units at both sites. Look at this example in the ASA cluster UG

      As for LISP inspection, it is used to enable flow mobility when a server moves between sites. The FW cluster members must be inserted between a LISP First-Hop router (FHR) and the LISP Site gateway (xTR). Then, it can inspect and intercept LISP msg that validate when an end-point (EID) has moved from one site to the remote site. The ASA maintains the EID table that correlates the EID with the site ID. As a result, when the “trusted” EID moves from one site to another, the local cluster member can secure locally the sessions for that specific EID, reducing the hair-pinging with the original owner.

      Best regards, yves

      • extmode says:

        Thanks for reply.

        Ok, You tel about scenario, when ASA cluster work in routed mode (spanned interface) and Cluster is the first hop for data vlan (traffic from inside to outside) . And for each data vlan created sub-interface on asa.
        And what about if use HSRP for ouside vlan and HSRP for inside vlan, and use IGP routing protocol (exmple EIGRP) between outside and inside HSRP gateways?

        And what about transparent mode, this mode not recomended and it does not work correctly for inter-site clustering?

        • Yves says:

          You are correct, as of today, indeed there are different scenarios supported. When I tested and wrote this article a couple of years (prior to 9.5) these features where not available, hence the reason I made this post. I assume you have been through the design guide which discusses about the different supported scenarios with inter-sites clustering, including in transparent mode (only using spanned etherchannel interface mode).
          The reason I talked about the routed mode, is because in routed mode with spanned etherchannel, the vMAC address is traditionally duplicated across all ASA members, which brings the issue with the same MAC that appears from different interfaces (as described in figure 13 from the post 27.1).

          Does that make sense ?

          regards, yves

          • extmode says:

            I agree with you, but I’m trying to get your opinion What method to choose for building Inter-sites clustering? transparent mode or routed?
            I like the more transparency mode, but it is subject to problems.

          • Yves says:


            I would tend to say that there are 2 main questions to ask when deploying the FW clustering across multiple sites:

            1) Do we want to load distribute the traffic toward the cluster members using a layer 3-based algorythm (PBR or ECMP with individual interface mode) ou using a Layer 2-based technique (LACP with spanned mode) ?
            To answer a part of the question, it is important to remind that with the individual interface mode, the FW doesn’t support the bridge mode (transparent), but only routed mode.
            In Individual Interface mode, the FW can’t be the default gateway for the hosts, because it requires the FW taken in sandwich fashion between routers facing the cluster members. As a result, it can’t be the first hop router for the hosts.

            2) FW in Transparent ou Routing mode ? There are a couple of additional questions to ask here
            – Should the FW be the Default Gateway ?
            * If YES, then, the whole firewall cluster must be configured in routed mode using spanned mode inter-site (etherchannel), so the cluster members are the 1st hop router from the hosts.
            * If NO, then the FW forwarding in Transparent mode can be a good solution. The next hop router can be therefore initiated behind the FW farm, hence, all the router traffic for multi-tier application goes through the security appliance. Bridge mode is often used to better secure the access to the FW and easy to deploy operate as well as in the Est-West inter-site insertion mode securing the inter-Tier traffic through the FW cluster (described here).

            – Hence, should the inter-Tier application traffic go through the FW ?
            * If YES, then follow the above comment for the placement of the first hop router behind the cluster member (FW members in Transparent mode and distribution in Spanned mode)
            * If NOT, then the First hop router can be inserted between the server farm and the switch that distributes le load in spanned mode across the FW’s running in Transparent mode. This is usually called North-South Inter-site mode (described here)

            All that said, it’s difficult to give one recommendation. It relies on the security team experiences and the Enterprise security rules. Some may recommend the FW members running in routed mode, other the FW in transparent mode. The last personal survey I made during during a large technical conference, it was 50/50%.

            Personally, I prefer the FW in routed mode using a Individual interface (Layer 3 ECMP) to distribute the routed traffic among the cluster members of interest, because all reasons to not extend L2 segment outside a DC are good 🙂

            PS: For the transparent versus routed mode intra DC, maybe this paper can help a bit (not recent but still valid)

  4. extmode says:

    I select transparent mode with spanned etherchannel.
    But i see this scenario bad work for two sites. MAC moving – its one problem-Solved this problem by filtering mac port-channel and add static vmac HSRP on ASA outside interface.
    Drop packet its second problem, when traffic move N-S. And i dont understand how
    troubleshoot this problem, because in cisco-guide no recomendation and not enough information

    • Yves says:

      In multi-site, there is not just the ASA cluster to look at.

      FW in transparent mode (using spanned load distribution) is well supported. MAC move is supported as well, but it relies above all on the L2 extension solution deployed, the distance and the workflow (hair-pining) and the whole network design with DG. You should talk to the TAC, they can help you.

      Few thoughts that you may think about if not yet done: Egress optimisation (FHRP filtering) usually works in conjunction with an ingress path optimisation. Otherwise, the ingress traffic may hit one DC (usually the primary), goes across the L2 DCI to reach the destined endpoint. The endpoints replies to its DG, data packets is forwarded via its local cluster member, which re-directs (via the CCL) the return traffic to the FW that owns that session.
      There are also ways to improve this with LISP mobility (ingress path optimization) with LISP inspection (ASA cluster).
      – Which version of ASA cluster are you using
      – MAC move: Hot or COLD move ?
      – if you fixed the MAC move issue with HSRP filter in transparent mode, check the latency between two sites
      – Distances ? how many members?
      – Which L2 DCI have you deployed ?
      – Where does the 1st hop router sit in your architecture (e.g. this diagram)
      – details of FHRP filtering ?
      – And more to ask, hence, am not sure I can help to troubleshoot this issue via this blog site, the best to troubleshoot is to talk with the TAC.

      B. regards, yves

  5. extmode says:

    >>When the ASA cluster is running in routed mode, the members share the same IP >>address and same vMAC address.

    Maybe you wanted to write a transparent mode?

    • Yves says:


      In transparent mode there is no IP address for the Data plane configured in the FW; it’s a Layer 2 bridge device, not a router (only the mgmt IP address).
      The ASA clustering can be deployed in 2 modes to distribute and receive inbound and outbound data packets: either using Layer 3 ECMP called Interface mode or using a Layer 2 LACP called Spanned Ether-channel mode. In i-mode; it can only “route” the traffic, in ec-mode, it can do both “routing” or “bridging” the traffic.
      When the data interfaces distribute the traffic using Layer 2 LACP spanned among all ASA members and all devices are in routed mode, as a result, all the members of the FW cluster share the same vIP and same vMAC addresses. All the tricky configuration, actually, relies on this vMAC duplicated across all devices.

      Does that clarify ?



  6. xanz0r says:

    We’re currently testing an inter-site, east-west transparent FTD cluster implementation using BGP-EVPN as DCI. We stretched the in- and outside vlans across the datacenters and are currently running into loop-like symptoms that seem hard to eliminate using BGP-EVPN in an ‘active/active-datacenter’ setup.

    Unfortunately, virtually all cluster-documentation we can find on the internet uses OTV instead of BGP-EVPN, which makes it hard to find the correct BGP-EVPN implementation parameters to support clustering. What’s worse, solving this loop is hardly mentioned at all since OTV seems to eliminate this problem ‘out of the box’.

    We strongly suspect that BGP-EVPN can be used perfectly fine to fulfill the DCI-role in an ASA/FTD-clustered environment, but that it does need tweaking to properly support this.

    Did you happen to experiment with ASA/FTD clusters in combination with BGP EVPN? (or any readers for that matter? 😉 Cause we could definitely use some pointers to get this working…… =)

    • Yves says:

      Thank you for the comment,
      Currently this is not yet supported with VXLAN EVPN, it should come in a next release with Multi-site.
      I am planing to run additional testing based on ASA/FTD clustering in the next following week.
      I will post the results when I can find the spare cycles.
      Best regards, yves

      • xanz0r says:

        Thank you very much for your response! We’ll stop troubleshooting BGP-EVPN and start building an OTV-environment instead then. 😉

        I’m looking forward on reading your next cluster article!

        Kind regards,


  7. mastedar says:

    Frankly said this whole solution is like asking for troubles. Let assume we have such a cluster extended on three or four different DCs. What will happen if one or two of my DCs will be disconnected from the rest of the cluster? Besides all of those gimmics to filter hsrp here, vmac there is… far from decent and clean solution. This is so confusing that even old fashioned advertising of host routes for assuring symmetry and pair of „island” firewalls per DC looks like a better option.

    • Yves says:

      I don’t disagree with your comment, but IMO the decision must be balanced between efficiency and complexity.
      As said several times, the best would be to not extend Layer 2 and keep DC isolated from a L2 point of view. However, there are some situations where L2 must be extended. As a result the question is how to maintain the stateful of each session while reducing the impact of hitting the owner of the session.
      As mentioned in this post with the virtual machine mobility, due to extended L2 domain, it is not rare to see 10 to 20 roundtrips between the two DCs for the same active session that must hit the stateful devices (eg. FW). Assume a live migration for an application tier (say distance is 5 ms of latency, roughly 500kms), thus, the response-time for the application of interest may take 100ms due to the ping-pong effect for maintaining the session stateful.
      Hence, there are 2 options:
      1) Don’t stretch the Act/act cluster beyond a single DC
      1-1) Don’t extend Layer 2 across distance locations.
      1-1) Allow only Warm/Cold migration (leverage local FWs)
      1-2) For hot live migration, accept traffic hair-pinning, hence, with higher latency

      2) Extend the Act/Act cluster across distance locations
      1-1) Like any HA cluster geographically distributed, the risk is a split-brain situation (to answer your question)
      1-2) leverage advanced functions of such as Site Specific Identifier and MAC address
      1-3) to improve efficient by reducing Hair-pinning, enable LISP Host Mobility ESM and enable LISP Inspection for Inter-site Flow Mobility in the FW cluster (maintaining the session stateful via local FW)
      1-4) configure the ASA cluster with Individual Interface Mode (layer 3) using ECMP or PBR (remove the risk of Dup MAC).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.