2 – Active/Active DC

Active-Standby versus Active-Active A Hot Standby data center can be used for application recovery or to relieve the primary data center from a heavy workload. Relieving data center resources from a heavy workload is usually referred to as Active-Active DR mode.

Active-Active DR Mode

 

One example of Active/Active DR mode involves an application that is active on a single physical server or VM while the network and compute stack are active in two locations. Some exceptions to this definition are specific software frameworks such as GRID computing, distributed database (i.e. Oracle RAC®) or some cases of server load balancing (SLB). When the resources are spread over multiple locations running in Active-Active mode, some software functions are active in one location and on standby in the other location. Active applications can be located in either site. This approach distributes the workload into several data centers.

It is important to clarify these Active/Active modes by considering the different levels of components, which are all related for the final recovery process:

• The network is running on each site interconnecting all compute, network, and security services; and advertising the local subnets outside each data center. Applications active in the remote data centers are therefore accessible from the traditional routed network without changing or restarting any IP processes.

• All physical compute components are up and running with their respective bare metal operating system or hypervisor software stack.

• Storage is replicated on different locations and can be seen as Active/Active for different software frameworks. However, usually a write command for a specific storage volume is sent to one location at a time while the same data is mirrored to the remote location.

Let’s have a deeper look at the service itself offered by multiple data centers in Active/Active mode. Assuming that we have application A on data center 1 (i.e. Paris) that offers an e-commerce web portal for a specific set of items. The same e-commerce portal offering the same items can also be available and active on a different location (i.e. London), but with a different IP identifier. For the end-user, the service will be unique and the location transparent, but the request can be distributed by the network services based on the proximity criteria established between the end-user and the data center that hosts the same application. So, the same application in this case looks Active/Active, but the software that runs on each compute system is performed autonomously in the front-end tier. They are not related except from a database point of view. Finally the whole session is maintained at the same servers and in the same location until the session is closed.

This entry was posted in DCI. Bookmark the permalink.

8 Responses to 2 – Active/Active DC

  1. santsboy says:

    Hi Yves,

    just to confirm I have understood correctly. There are 2 types of A/A DCs:
    1- DR A/A DC –> where the active applications in one DC at the other DC will be standby. Users access the different DCs depending on the application they want to use.

    2- A/A DC –> Where the applications are active on both DCs and the users end to one DC or the other depending on LISP (for example).

    Is this correct? or there is only one A/A DC type?

    Thanks a lot.

    Regards,

    Santsboy

    • Yves says:

      Hi Santsboy

      1- that’s correct, you don’t want to leave a backup DC in a standby mode doing nothing :). Each DC are active for different applications and each one backup the other on a per application basis:
      DC_1 is active for Apps_A and standby for Apps_B
      DC_2 is active for Apps_B and standby for Apps_A
      Usually you don’t need a LAN extension, it’s a cold restart. You need to redirect the traffic to the new location manually (DNS, Metric..) or dynamically using LISP ASM.

      2 – there are at least two models:
      – (old fashion) server farm (real servers) is spread over the 2 DC and the SLB engine facing the server farm uses different weights for each site, or probes each real server to distribute the load in a clever fashion (e.g. response time). You can also use some scripting from the hypervisor mgr to dynamically change the weight of the load for live migration when the machine moves.
      – Metro distance (consider 5ms max) where your Apps hot_live_stateful_migrates to the remote DC (manually or dynamically) with zero interruption. Usually it’s called twin-DC or tightly coupled DC, meaning that from an software framework and mgmt point of view, it’s seen as a single logical DC. Thus, you need LAN extension between the 2 sites. However for such scenario, you also need to take into consideration the storage too. 5 ms of latency to access the original volume (shared storage) may have a bad impact on performances (e.g. Data bases). There are some great solutions to improve this scenarios (caching or virtual volume), both have advantages and limitations.

  2. santsboy says:

    Hi Yves,

    thank you very much for the explanation, much appreciate it. Now I understand the different meanings.

    Regards,

    Santsboy

  3. santsboy says:

    Hi Yves,

    sorry to ask again about this topic, but I have a doubt. are these comments correct?

    – Only Synchronous storage replication can accomplish Twin -DCs ? (1 logical DC)
    – if this is the case we will only be able to have twin-DCs if we use dark fiber/DWDM as DCI medium?

    Thanks.

    Regards,

    Santsboy

    • Yves says:

      Hi Santsboy,

      >> – Only Synchronous storage replication can accomplish Twin -DCs ? (1 logical DC)
      You can run asynchronous data replication too if you wish, but usually Twin-DC are used to offer a logical single DC for elasticity from the network and the storage side.(move dynamically the VM where the resources are available)
      – On the network side you extend the LAN for stateful hot live migration
      – On the storage side you can use shared storage for short distances (not a rule but it is admitted for distance < 10kms), that you can improve using a caching method (e.g. FlexCache from Netapp) or Act/Act storage using virtual volume (e.g. EMC Metro vplex) >> – if this is the case we will only be able to have twin-DCs if we use dark fiber/DWDM as DCI medium?
      No , it ‘s more flexible than that, I was giving the trend with Twin-DC for hot live migration versus backup DC for disaster recovery using long distnces, but you can offer DR for the Twin-DC too and if you don’t want to support stateful hot live migration, you can have Layer 3 only to interconnect the DC even using dark fiber.

      Let me know if not clear, yves

      • santsboy says:

        Hi Yves,

        sorry for all these questions, but here is were it is getting confusing for me.

        We say, for Vmotion, we have an RTT of 5msec, so we can have between DCs a distance of 50KM/1msecRTT x 5msec = 250 Km

        but if we want to enhance this distance we use Active/Active (EMC VPLEX metro) and we get a maximum of 100 Km (latest version 200Km).

        I don’t get it… we enhance the distance by improving how replicated data travels from one DC to another and we get less Distance… I am a network guy, so I am new in storage, apologize if this is a stupid concept, but it is making me confuse everything. In which part I am wrong?

        Thank you very much for your help.

        regards,

        Santsboy

        • Yves says:

          Hi Santsboy

          Yes, that’s where the confusion come often. This is not Crystal clear.

          Long story short, the speed of light for data flow is 1msec=100kms. Note the 1msec for 50kms is for storage synchronous replication due to the 2 roundtrips.
          The maximum latency between the two VMware vSphere servers cannot exceed 5 msecs (in vsphere 4.x) and goes up to 10ms starting vsphere 5)
          However 5ms is too far to support shared storage, meaning that when you migrate a machine, the volume ID to which it is directly attached is 5ms far away, which will impact the I/O of its disk volume. If you think of an SQL data base server for example (with plenty of WRITE cmd and dynamic objects), assume it is used at 100%, you may loose almost 10% of TPS due to latency with 5msec of latency on the storage side (why the latency for the storage side will be different than the latency for the network side for the same distances?)
          However for some applications (e.g. web server) you could certainly improve the latency with I/O acceleration features from the SAN Fabric and/or with a caching mechanism (READ only) for static objects (e.g. Flexcache from Netapp).
          Thus the option is to offer a distributed virtual volume like Metro vplex from EMC to attach a local virtual volume (same LUN ID) local to the virtual machine, hence the latency becomes close to zero after the migration of the machine.
          However the way Metro vplex works requires a direct synchronous FC link between the two vplex clusters stretched across the two location, meaning 100kms max.

          Think of when you migrate a machine, there are two sides which have their own requirements, the network side and the storage side. We often mix these two sides when we talk about migration, thus the confusion for many folks.
          IMHO you first need to focus on the storage side:
          – what is my storage mode (shared, distributed)?
          – What is the maximum latency supported ?
          – What is the application type ==> Can I improve this with a caching tool or with virtual volumes
          – Can I afford to move my VM with an asynchronous data replication method ? ==> This is very interesting with Microsoft Hyper-V in conjunction with EMC Geo-cluster Vplex where you can offer stateless migration to up to 2000 kms.

          Maybe this link can better help http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1021215

          yves

  4. santsboy says:

    Hi Yves,

    Thank you very much for the explanation, it really solved my confusion.

    Now I see the picture, but at the time of the implementation I have the following queries:

    – Even if I use OTV (allowing me to pass layer 2 traffic through a dark fiber IP connection), if I use VPLEX Metro, I will be forced to have a FC layer 2 independent connection through dark fiber between DCs, in order to interconnect both VPLEX engines. It is correct?

    – In case I use VPLEX Metro I would be able to use Nexus switches only or I will be forced to use MDS platform? In case I have to use MDS, why I need to?

    Thank you very much for your help, it is much appreciate it.

    Regards,

    Santsboy

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.