Workload Mobility over distance with VMware, VPLEX & OTV

I have had the pleasure of configuring VCE’s Workload Mobility demo for EMCworld, CiscoLive! and VMworld. We have a number of VblockTM Systems at each conference in the Cisco, EMC and VCE booth. It made perfect sense to connect them together and show off our Workload Mobility Solution. Besides, isn’t cloud all about the ability to offer services from anywhere? Sure it’s just across the trade show floor in our demo but could easily be miles apart.


We have three Vblock 300 systems located in the VCE, EMC and Cisco booths. An additional network aggregation rack has been added to each Vblock system to house Nexus 7010 switches, EMC RecoverPoint appliances and EMC VPLEX engines. Panduit provided 1000 feet of fiber trunk cable containing 6 pair of fiber, which has been hung from the ceiling between booths.


The Nexus 7010 switches are providing our core network services, making each booth it’s own data center. RecoverPoint and VMware Site Recovery Manager are handling traditional long haul disaster recovery. VPLEX Metro is providing Active-Active storage clustering capabilities. This is the ability to stretch a VMware vSphere cluster between two sites today, and up to four in the future.  VPLEX Metro provides storage array block level LUN consistency and data availability while OTV on the Nexus 7000 series switches provide layer 2 network services.


Diagram: VCE Vblock WLM plan


Let’s take a step back for a moment and look at what makes this “cool”. Traditionally, migrating data and applications in or between data centers involves manual steps and data copies, where IT would either make physical backups or use data replication services to handle getting the data from side A to side B.


VMware clusters operate local to one data center, so moving VMs between data centers typically requires an outage, IP address change and time to bring up the services. To enable elastic cloud computing IT must find new ways to address deployment of applications across distance, including maintenance with little or no downtime, disaster avoidance, workload migration, consolidation and growth. IT also wants to be able to actively run workload across multiple locations using vSphere stretched clusters


In comes VCE Workload Mobility Solution

Excerpt from our solution document ( “The VCE Workload Mobility Solution (WLM Solution), an integrated solution that enables fast and simple application and infrastructure migration in real time with no downtime or disruption, can help overcome these migration challenges.


Utilizing the Vblock Infrastructure Platforms, EMC VPLEX Metro, and VMware vMotion, the WLM Solution removes the physical barriers within, across, and between data centers to facilitate data center resizing, consolidation and relocation, technology refresh, and workload balancing. The WLM Solution provides mobility to virtual applications, virtual machines, virtual networks, and virtual storage.”


Diagram: WLM High level drawing


Our live demonstrations at EMCWorld, CiscoLive! & VMworld will demonstrate how this works and what to expect in a production environment. We have three presentations that show what happens when a Vblock system protected by WLM experiences a failure (complete power off), and how to recover from that failure. We have setup two independent VPLEX clusters, Cluster 1 -VCE site A to EMC site B and Cluster 2 – VCE site A to Cisco Site B. However, using storage vMotion you can move VMs between all three sites, but they will only be protected between two sites.


Workload Mobility

We are featuring Workload Mobility in the Cisco booth which comes into play when you need to proactively vMotion VMs between sites. This lets you utilize precious memory, CPU and storage IOPS at both sites for the same workload at the same time. Maybe you want to update firmware, make changes and refresh hardware at one site. Move all the VMs to the other side and perform maintenance with no outage. VPLEX is keeping the VMware datastore in sync at both sites and the Nexus switches make sure network traffic to your VM keeps going.


Outage / EPO – Emergency Power Off

In the VCE booth presentation we are showing what happens when we kill the power on the Vblock system in the EMC booth and how VMs running on that system utilize VMware HA to restart on the surviving Vblock system and recover. VPLEX Metro has been keeping the datastore in sync and the Nexus switches now send network traffic to these VMs in the VCE booth. It takes about a minute for vCenter to register them as down and bring the VMs back up. Not a bad recovery time for a complete outage. VMware Fault Tolerance can take this a step further to keep the VM up during a host failure.



In the EMC booth we are demonstrating Vblock system restoration, highlighting how VPLEX resumes distributed data store replication and DRS, affinity rules and vMotion can be used to automatically move VMs back to a “recovered” Vblock system. VPLEX maintains a journal of writes performed while one site was down. This delta set of I/O transactions will be sent over the WAN once the VPLEX link is restored. VPLEX will show how large this delta set is and show time to completion. If your data center experienced a long outage you can still move VMs back to this Vblock system!


Demo Environment Detail

Each Vblock system has up to ten B200M2 blades installed in the UCS chassis and are broken up into clusters to support demo workloads for each booth. Our VMware environment has one vCenter Server running on the management pod in the VCE booth which manages three clusters with four hosts each. Two hosts come from the primary Vblock system and the other two come from the secondary forming stretched VMware clusters between the booths. HA and DRS are enabled on the clusters. Each booth has demo workstations connected to 1Gb Ethernet ports on the Nexus 7Ks and has access to vCenter in the VCE booth.


On the Vblock systems themselves we have a number of “demo pods”. We have a CRM application simulator using Swingbench and an Oracle 12g database, as well as Microsoft Exchange, VMware vCloud Director, Cisco Unified Communication and VMware View environments running live.


Each cluster has storage protected by VPLEX Metro which manages the volume. This is called a distributed volume and leverages EMC’s AccessAnywhere technology. Multiple VPLEX engines can be installed to protect and distribute these volumes. These engines are based on the same hardware as used in the VMAX, and they have the same cache, modules and reliability numbers for uptime.


Two directors exist in each VPLEX engine and they contain four back-end Fiber Channel (FC) ports which are zoned to the VNX storage arrays and four front-end FC ports which zoned to the hosts. Two 10Gb Ethernet ports or Four FC ports are used for WAN communication to the remote VPLEX. Note that using FC WAN/COM requires external FCIP switches such as Cisco MDS 9200 series. The VPLEX cluster has a management server which maintains a VPN connection to the management server at the remote data center. Up to 8000 volumes can be protected by a single VPLEX. Your data centers will need to provide WAN latency less than 10ms RTT which limits distance to about 100km but really depends on the quality if the link (could be shorter or longer).


A number of VLANs are set up on the switches to support the demos: management, vMotion, ESXi host, NFS, Workloads. These VLANs are all extended via OTV overlay interfaces between the Nexus 7Ks.


OTV runs over a Layer 3 network, which could be provided by the Nexus 7Ks or another layer 3 router. Multiple VDCs or Virtual Device Contexts are required to run OTV and layer 3 on same Nexus switch. A VDC separates the switch into multiple logical switches with physical ports allocated to each VDC. In order to pass traffic between VDCs ports on the same Nexus 7K we cross-connected ports from line card 2 and 3. For redundancy port channels are used to bind multiple ports together for layer 2 trunks and layer 3 interfaces. Switched virtual interfaces (SVIs) have been created on each switch for each VLAN using HSRP. We enabled EIGRP -Enhanced Interior Gateway Routing Protocol, which automatically maintains a routing topology.


Diagram: Nexus 7010 switches, trunk connections and VDCs


A team of 5 engineers built all three Vblock systems from the ground up, configured VPLEX and the Nexus 7k OTV/L3 network in about 15 days. We will have this demo running at each show this year, then move it to our RTP lab.



Photo: Vblock racks at the Staging lab in San Jose, CA


**Originally posted on CiscoDC Blog for CiscoLive! 2012