This post discusses about design considerations when interconnecting two tightly coupled fabrics using dark fibers or DWDM, but not limited to Metro distances. If we think very long distances, the point-to-point links can be also established using a virtual overlay such as EoMPLS port x-connect; nonetheless the debate will be the same.
Notice that this discussion is not limited to one type of network fabric transport, but any solutions that use multi-pathing is concerned, such as FabricPath, VxLAN or ACI.
Assuming the distance between DC-1 and DC-2 is about 100 km; if the following two design options sound quite simple to guess which one might be the most efficient, actually it’s not as obvious as we could think of, and a bad choice may have a huge impact for some applications. I met several networkers discussing about the best choice between full-mesh and partial-mesh for interconnecting 2 fabrics. Some folks think that full-mesh is the best solution. Actually, albeit it depends on the distances between network fabrics, this is certainly not the most efficient design option for interconnecting them together.
For this Thread, we assume the same distance between the two locations exists with both option designs (let’s take 100 km for a comprehensive math). On the left side, we have a partial-mesh design with transit leafs to interconnect the two fabrics. On the right side, we have a full-mesh established between the two fabrics.
With a full-mesh approach the inter-fabric workflow will take one hop to communicate with a remote host. As the result the total latency induced here with full-mesh will be theoretically 1 ms (100 km) between two remote end-points + the few micro-secs for traversing the spine in DC-2 (one hop).
However, if the 2 locations are interconnected using a transit leaf, the traffic between the two tiers will take 3 hops. In this case, it will add theoretically few micro-secs for traffic crossing the spine switch in DC-1 + 1ms (100km) + few micro-secs (remote transit leaf + spine switch in DC-2 ).
With the partial-mesh, each transit leaf interconnects the remote Spine. Thanks to the Equal Cost Multi-Pathing (ECMP) algorithm used to distribute the traffic among all possible equal paths. The server to server workflows (E-W) are therefore established between hosts and confined within the same location (e.g Web/App tier and DB tiers) with a maximum of one hop. A flow sent to the remote location via the DCI connection would consume two additional hops. As the result, ECMP will never elect the link toward the remote spine and the traffic between the application-tiers will be contained inside the same location (DC-1).
However if the two locations were interconnected using a full-mesh design, nothing would prevent the ECMP hashing to hit a remote spine, as the number of hops is equal across the two locations.
As the result a full-mesh will add theoretically 2 ms (2 x 100km) for a portion of the traffic.
If we compare the Full-mesh and the Partial-mesh design, we have the same latency induces by the distances between sites (we can ignore the few micro-secs due to multi-hops ). In Partial-mesh the local communication between two local end-points is contained within the same fabric, while with full mesh, although it saves some few micro-seconds with a maximum of one hop between two end-points, it is not guaranteed that the traffic between two local end-points will be always contained within the same location. Risks are that for the same application, some flows will take a local path (with a negligible latency) and some will take the remote path (with n msec due to the hair-pining via the remote spine switch).
- With long distances between two fabrics, E-W communication between two local end-points may have an impact on the Application performances with a full-mesh design.
- Any network transport that uses multi-pathing is concerned, such as FabricPath, VxLAN or ACI.
- A full-mesh will provide a single hop between the 2 remote end-points. However it will allow distributing the E-W flows equally between the 2 sites. As the result, half of the traffic may be impacted by the latency induced by the distance.
- A Transit leaf design will impose a 3 hops between the two remote end-points, however the latency induced by the local hop in each site is insignificant compared to the DCI distance. A Transit Leafs design (partial mesh) will contain the E-W flows within a fabric locally to each site.
As a global conclusion, for a single logical fabric geographically stretched across two physical distant location, when distances go beyond the campus distances, it may be important to deploy a transit leaf model.
All that said, it is certainly unlikely that several enterprises deploy a full-mesh interconnecting fabric for long distances, but nothing is impossible, it depends who owns the fiber 🙂