Archive for July 30th, 2008
VMware’s Site Recovery Manager and RecoverPoint: The Perfect Vision Now
To expand on my previous post, Tivo for the DataCenter, EMC’s Recoverpoint, we will now discuss the prerequisites for VMware’s Site Recovery Manager and its integration with Recoverpoint.
What is SRM?
It is the grand orchestrator, a conduit of sorts, between your Virtual Infrastructure and your storage array. It’s, VC integrated, software that handles the testing and failover of entire virtual datacenters via administratively built virtual runbooks. RTO’s and RPO’s, are not just misunderstood acronyms but instead obtainable by-products of cost and risk reduction. In short, it simplifies your virtual business continuity world.
Note: What follows is really the cookbook for integrating these two products as well as some personal notes so excuse the delivery of the content.
Assuming core replication is in place between your Primary or Protected site and your Secondary Site, the following items are foundational to every SRM implementation:
- Two ESX hosts (version 3.0.2/3.5 Update 1 or greater) representing primary and recovery sites – installed and configured. Check the VMware Site Recovery Manager Compatibility Matrix document at http://www.vmware.com/pdf/srm_10_compat_matrix.pdf
- Two VirtualCenter Servers (version 2.5 Update 1 or greater) representing primary and recovery sites. Each VirtualCenter should be managing one of the above ESX hosts
- Two Site Recovery Manager Servers (can be same as VirtualCenter Servers) representing primary and recovery sites
- Separate VirtualCenter database for primary and recovery sites
- Creation of a new Microsoft SQL Server or Oracle database with DBA privileges to that database for SRM. Separate SRM database for primary and recovery sites
- SRM compatible shared fibre channel or iSCSI SAN with replication. Check the VMware Site Recovery Manager Compatibility Matrix document at http://www.vmware.com/pdf/srm_10_compat_matrix.pdf
- Storage Replication Adapter (SRA) for storage used for SRM. Check the storage model and the software/firmware level supported in the VMware Site Recovery Manager Compatibility Matrix document at http://www.vmware.com/pdf/srm_10_compat_matrix.pdf
- Sample virtual machines (VMs) on the replicated datastore running on ESX host at primary site
- One non replicated datastore on the infrastructure representing recovery site to store placeholder virtual machines
- Additional storage space is required for clones or snapshots at the recovery site in order to run the SRM failover test. This will be a storage vendor specific implementation (e.g. BCV for EMC Symmetrix, clone/snapshots for CLARiiON/NetApp etc.)
Ok that takes care of the SRM side what on the RP side (continuous local and remote) do we need…
- FC SAN, ideally you will need 4 FC ports per RecoverPoint Appliance for multipathing and redundancy
- Eth int linked to the WAN for replication
- Eth int linked to the LAN for management
- SAN volumes required for RPA
- Repository volume-accessed as part of the installation process, need for both primary and remote site.
- must be accessible to all RPA’s and highly available
- Repository holds configuration info and some meta data replication information, Size-2G per consistency group, oversize up front as reconfiguring later will require a new activation license would increase to 5 to 10G per CG.
- Journal volume-Each consistency group needs a journal volume and to that end, it must be large enough to support the amount of delta to that group for a given day. Typically you should size that as 10 to 20% of you consistency group size.
- To support failover you must maintain a journal volume for the production volume copy; the local copy and the remote copy or both.
- the minimum journal volume size for each copy is 4.5G. More efficient capacity by keeping the journal volumes around the same size but counterintuitive to the 10-20% rule, stick with the rule.
- Replication volume-production volumes must be paired up with a copy. These are replication sets for the consistency groups
- Volumes do not need to be the same size, although the production copy cannot be larger than the volume for a copy. Excess space on a copy will not be used.
- Here are your replication options:
- CDP-local same site replication to another array. No WAN interface specified
- CRR-Continuous Remote Replication, async replication over the WAN to a remote site.
- CLR-Concurrent Local and Remote, protection spans locally and remotely. CDP for local, CRR for remote.
- Use NTP server for time sync on the RPA’s. Set it up on the first RPA in a site, and the other RPA’s will sync to it.
Ok here is a fun one-ZONING-lets break it up based on what type of splitter you will be using…
Note: there is alot of pertinent information within the installation guide specifically pertaining to Qlogic QLE2400 and 2300 HBA’s, make sure you read it if it applies.
On a side note RPA’s are recognized by 5001248 prefix for their WWN, helpful indeed.
- Host based splitters—–
- One initiator, one target per zone. For simplicity, zone host port 0 to RPA port 0; zone host port 1 with RPA port 2. Always insures that two ports are not on the same HBA.
- Array based splitters—-This is where the JUICE is, most efficient means to replication nirvana
- The CLARiiON array must have access to the RPAs for both reading and writing. For RPAs with QLE2400 HBAs, you should zone all RPA HBA ports with all CLARiiON array ports.
- For RPAs with QLA2300 HBAs, it is critical to zone at least two RPA HBA ports with each CLARiiON array port so that the system will be able to automatically dedicate one RPA port for initiator functions and one RPA port for target functions. This requirement is in conflict with the zoning requirements of QLA2300 -based deployments that use non-CLARiiON splitters (that is, host-based and fabric-based splitters). Therefore, QLA2300-based deployments using the CLARiiON splitter can only support consistency groups that replicate CLARiiON volumes. Consistency groups that replicate only non-CLARiiON volumes can use other types of splitters. Note: Traffic initiated by the CLARiiON array cannot flow through the front-end ports dedicated for CLARiiON’s MirrorView. As such, an RPA must be zoned as a target with at least one of the ports not dedicated to MirrorView.
- Using Navisphere, create a RecoverPoint storage group that includes:
- All RPAs, RecoverPoint replication volumes — LUNs that belong to the storage group for the relevant host servers
All RPAs must be masked to at least one LUN on the relevant CLARiiON array. Note: When you attempt to add a LUN to a second storage group, you are warned that this will allow multiple hosts to access the LUN. You can disregard this warning. - RecoverPoint journal volumes
- RecoverPoint repository volumes
- Ensure that the following are installed:
- CLARiiON storage processor running FLARE 26 patch level 14 or later
- RecoverPoint enabler-array installed
- Fabric based splitters SANTap requirements—-
- Cisco Multilayer Director Switch with Storage Service Module and SANTap services installed at the primary site.
- License for SANTap service for each Storage Services Module (SSM).
- Port 23 between RPAs and the SAN switches must be available.
- All relevant hosts, storage ports, and RPA ports must be connected via Fibre Channel to the same VSAN. In a multiswitch environment, all connections must be made to the same VSAN on every switch.
- Hosts and RPAs must be zoned so that each can see the storage it must access. RPAs can be zoned only after the RecoverPoint software has been loaded.
Well I’m dizzy, how about you guys? As you can see there is a lot that encompasses a healthy implementation of these two products, so attention to detail is of the utmost important. Lucky for you and me those fine individuals at EMC and VMware have littered the channels with documentation. So get to reading these products aren’t going to install themselves!!!