26 – Bis – VxLAN VTEP GW: Software versus Hardware-based

Just a slight note to clarify some VxLAN deployment for an hybrid network (Intra-DC).

As discussed in the previous post, with the software-based VxLAN, only one single VTEP L2 Gateway can be active for the same VxLAN instance.

This means that all end-systems connected to the VLAN concerned by a mapping with a particular VNID must be confined into the same leaf switch where the VTEP GW is attached. Other end-systems connected to the same VLAN but on different leaf switches isolated by the layer 3 fabric cannot communicate with the VTEP L2 GW. This may be a concern with hybrid network where servers supporting the same application are spread over multiple racks.

To allow bridging between VNID and VLAN, it implies that the L2 network domain is spanned between the active VTEP L2 Gateway and all servers of interest that share the same VLAN ID. Among other improvements, VxLAN is also aiming to contain the layer 2 failure domain to its smallest diameter, leveraging instead layer 3 for the transport, not necessarily both. Although it is certainly a bit antithetical to VxLAN purposes, nonetheless if all leafs are concerned by the same mapping of VNID to VLAN ID, it is feasible to extend the Layer 2 via the fabric using a layer 2 multi-pathing protocol, such as FabriPath.

In the following example, the server 4 attached to leaf 4 cannot communicate with the VTEP L2 GW located on leaf 1. As a result, VM-1 cannot communicate with server 4.

Fortunately the hardware solves this. The great added value of enabling the VTEP L2 gateway on the hardware switch (ToR) is that it is distributed and active on each leaf. Thus communication between VTEP on each switch is handled using the VxLAN tunnel. Hence, VNID 5000 can be bridged with VLAN 100 on leaf 4 and therefore VM-1 can communicate with server 4.

The other interesting added-value with the hardware-based anycast L2 gateway is the VLAN translation using the VLAN stitching, that can be useful for some migration purposes. Each leaf can map the same VNID with a different VLAN on its own side. In the following example VNID 5000 can be bridged with VLAN 100 on leaf 1 and VLAN 200 on leaf 6. Consequently, VLAN 100 and VLAN 200 share now the same broadcast domain.

If the software-based solution of VxLAN is a flexible solution in a fully virtualised environment, it is not always so well adapted to the hybrid network built with a mix of virtual and physical devices spread over unorganised racks.

Hope that clarifies the choice of VxLAN mode that you wish to deploy.


This entry was posted in DCI. Bookmark the permalink.

6 Responses to 26 – Bis – VxLAN VTEP GW: Software versus Hardware-based

  1. Erezli says:

    How is the mapping between VLAN and VNI performed?
    If a 12b VLAN id is mapped to a VNI then how can we use the entirre 24b VNI range?


    • Yves says:

      Hi Erez,

      It is a 1:1 mapping, hence the maximum VXLAN segment ID that can be translated to a VLAN ID is still limited to 4k (12 bits as you mentioned).
      However, the VLAN ID is local significant to the leaf switch where the VTEP L2 Gateway is initiated. With the ToR-based VXLAN implementation, the VTEP L2 GW is distributed and active on each leaf. Therefore you can have 4k VXLAN-to-VLAN translations on each leaf independently.

      If you have only one VTEP L2 gateway active in your VXLAN domain (e.g. host-based VXLAN), then the max is 4k VXLAN-to-VLAN translation mapping (and no VLAN-to-VLAN translation allowed).

      Thanks, yves

      • Erezli says:

        Thanks for the reply.

        I still dont understand how can each ToR have independant VXLAN-to-VLAN translation on each leaf. Assuming the VM palces the VLAN tag, this VLAN tag survives also when a VM migrates to a server connected to a different TOR so the new TOR must maintain same VXLAN-to-VLAN mapping like the source TOR


        • Yves says:

          Hi Erez,

          Actually the TOR VXLAN switch will strip off the dot1q tag before the VXLAN encapsulation.
          That’s the reason I mentioned that the VLAN is ToR significant. At the egress VTEP the dot1q tag is added back.

          VM-1 on VLAN 10 resides on Hots-A. Host A is dot1Q connected to ToR-1 VTEP_1. VLAN 10 is mapped to VNI 5000 on ToR-1 VTEP_1.
          VM-2 on VLAN 10 resides on Hots-B. Host B is dot1Q connected to ToR-2 VTEP_2. VLAN 10 is mapped to VNI 6000 on ToR-2 VTEP_2.
          VM-3 on VLAN 30 resides on Hots-C. Host C is dot1Q connected to ToR-2 VTEP_2. VLAN 30 is mapped to VNI 5000 on ToR-2 VTEP_2.

          As the result, VM-1 cannot L2 communicate with VM-2 although they belong respectively to the same VLAN ID, different ToR VTEP switch..
          However VM-1 on VLAN 10 is L2 adjacency with VM-3 on VLAN 30.
          Consequently VLAN 10 on ToR-1 and VLAN 30 on ToR-2 belong to the same broadcast domain (hence the reason I said the ToR-based VXLAN offers also VLAN translation).
          This could be very useful for transition stage.

          However, the illustration above is accurate only when VXLAN/VTEP is initiated on the ToR switch, because the VTEP L2 Gateway can be distributed and active on each physical leaf. If you initiate the L2 VTEP Gateway on a host-based solution, unfortunately you can have only 1 single active VTEP L2 gateway at a time, thus the theoretically maximum VNID < => VLAN_ID translations is 4096 (this is without counting on the real scalability and performance figures on the host-based solution)

          Does that make sense ?

          I’m a bit busy these days, but as soon as I can I will post a detailed article to describe this.

          thank you, yves

  2. Erezli says:

    Thanks again …

    In your example what will happen if VM-1 will migrate to Host B on TOR-2? How will ToR-2 know to map VM-1 to VNI-5000 and VM-2 to VNI-6000 since both will use VALN 10

    • Yves says:

      Hi Erez

      With the current implementation, if you use DVS to map a VM to a VLAN, then live mobility inside a single domain is therefore limited to 4k L2 segments. It depends if you want to allow mobility across a set of racks or across the whole DC infrastructure. Otherwise you may want to manually attach the VM of interest to the target VLAN mapping to the same VNID used from its original placement. Many organizations deployment models match the multi-tenant implementation (a tenant can be any logical functions or subsidiary within an organisation) for which a VMM domain can be dedicated per tenant. Multiple logical zones are created for the tenants from where you can re-use the same VLAN’s among multiple tenants, on top of the same infrastructure.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.