blog.virtualtacit.com

Root Down in a 2009 World

Archive for the ‘vmware’ Category

How To: Minimizing OS Heartbeat Timeouts within Site Recovery Manager during Test Failovers

without comments

During the course of my experiences with Site Recovery Manager I came across some timeout issues relating to OS Heartbeat delays that make even a successful test recovery look down right error ridden. Specifically it is the portion of your recovery plan that checks for VMware tools heartbeat. This is a configurable value, and by default in SRM 1.0 was 300 seconds and in Update 1 is 600 seconds. Although any value set, decremented or incremented, changed the outcome.

image So after talking to the extremely helpful, Lee Dilworth over at the blog, Uptime (VMware and Business Continuity), a couple of items were brought to my attention. One, and I don’t want to mislead you, most of the time this is more of an annoyance than anything else. It doesn’t imply that your test recovery went down in a ball of flames. But to keep these recovery steps visually accurate it is necessary in my opinion to make this change. Now that being said, it poses the question to why the OS heartbeat is not part of JUST the actual recovery. As I had thought the VMware tools heartbeat check was a network operation.

Here are the steps to curb this outcome, again thanks to Lee and please keep the great info coming our way. Link to original post, http://communities.vmware.com/message/1130385.

  • Edit you hostd config.xml on all recovery site hosts, the path is, /etc/vmware/hostd/config.xml. The <vmsvc> section should look as follows:
    • <vmsvc>
      <enabled>true</enabled>
      <heartbeatDelayInSecs>40</heartbeatDelayInSecs>
      </vmsvc>
    • Note: As Lee suggests the 40 is a configurable value, 20 was the previous value prior to ESX 3.5 U3.
  • Restart hostd or the ESX host, issue the following command..
    • service mgmt-vmware restart
  • Reset your original heartbeat timeout values to the default, 300 seconds for 1.0, 600 seconds for Update 1.

Written by Joe Kelly

December 23rd, 2008 at 2:47 pm

Posted in srm, vmware

VMware’s Site Recovery Manager and RecoverPoint: The Perfect Vision Now

with one comment

To expand on my previous post, Tivo for the DataCenter, EMC’s Recoverpoint, we will now discuss the prerequisites for VMware’s Site Recovery Manager and its integration with Recoverpoint.

What is SRM?

It is the grand orchestrator, a conduit of sorts, between your Virtual Infrastructure and your storage array. It’s, VC integrated, software that handles the testing and failover of entire virtual datacenters via administratively built virtual runbooks. RTO’s and RPO’s, are not just misunderstood acronyms but instead obtainable by-products of cost and risk reduction. In short, it simplifies your virtual business continuity world.

Note: What follows is really the cookbook for integrating these two products as well as some personal notes so excuse the delivery of the content.

Assuming core replication is in place between your Primary or Protected site and your Secondary Site, the following items are foundational to every SRM implementation:

  • Two ESX hosts (version 3.0.2/3.5 Update 1 or greater) representing primary and recovery sites – installed and configured. Check the VMware Site Recovery Manager Compatibility Matrix document at http://www.vmware.com/pdf/srm_10_compat_matrix.pdf
  • Two VirtualCenter Servers (version 2.5 Update 1 or greater) representing primary and recovery sites. Each VirtualCenter should be managing one of the above ESX hosts
  • Two Site Recovery Manager Servers (can be same as VirtualCenter Servers) representing primary and recovery sites
  • Separate VirtualCenter database for primary and recovery sites
  • Creation of a new Microsoft SQL Server or Oracle database with DBA privileges to that database for SRM. Separate SRM database for primary and recovery sites
  • SRM compatible shared fibre channel or iSCSI SAN with replication. Check the VMware Site Recovery Manager Compatibility Matrix document at http://www.vmware.com/pdf/srm_10_compat_matrix.pdf
  • Storage Replication Adapter (SRA) for storage used for SRM. Check the storage model and the software/firmware level supported in the VMware Site Recovery Manager Compatibility Matrix document at http://www.vmware.com/pdf/srm_10_compat_matrix.pdf
  • Sample virtual machines (VMs) on the replicated datastore running on ESX host at primary site
  • One non replicated datastore on the infrastructure representing recovery site to store placeholder virtual machines
  • Additional storage space is required for clones or snapshots at the recovery site in order to run the SRM failover test. This will be a storage vendor specific implementation (e.g. BCV for EMC Symmetrix, clone/snapshots for CLARiiON/NetApp etc.)

Ok that takes care of the SRM side what on the RP side (continuous local and remote) do we need…

  • FC SAN, ideally you will need 4 FC ports per RecoverPoint Appliance for multipathing and redundancy
  • Eth int linked to the WAN for replication
  • Eth int linked to the LAN for management
  • SAN volumes required for RPA
    • Repository volume-accessed as part of the installation process, need for both primary and remote site.
      • must be accessible to all RPA’s and highly available
      • Repository holds configuration info and some meta data replication information, Size-2G per consistency group, oversize up front as reconfiguring later will require a new activation license would increase to 5 to 10G per CG.
    • Journal volume-Each consistency group needs a journal volume and to that end, it must be large enough to support the amount of delta to that group for a given day. Typically you should size that as 10 to 20% of you consistency group size.
      • To support failover you must maintain a journal volume for the production volume copy; the local copy and the remote copy or both.
      • the minimum journal volume size for each copy is 4.5G. More efficient capacity by keeping the journal volumes around the same size but counterintuitive to the 10-20% rule, stick with the rule.
    • Replication volume-production volumes must be paired up with a copy. These are replication sets for the consistency groups
      • Volumes do not need to be the same size, although the production copy cannot be larger than the volume for a copy. Excess space on a copy will not be used.
  • Here are your replication options:
    • CDP-local same site replication to another array. No WAN interface specified
    • CRR-Continuous Remote Replication, async replication over the WAN to a remote site.
    • CLR-Concurrent Local and Remote, protection spans locally and remotely. CDP for local, CRR for remote.
  • Use NTP server for time sync on the RPA’s. Set it up on the first RPA in a site, and the other RPA’s will sync to it.

 

Ok here is a fun one-ZONING-lets break it up based on what type of splitter you will be using…

Note: there is alot of pertinent information within the installation guide specifically pertaining to Qlogic QLE2400 and 2300 HBA’s, make sure you read it if it applies.

On a side note RPA’s are recognized by 5001248 prefix for their WWN, helpful indeed.

    • Host based splitters—–
      • One initiator, one target per zone. For simplicity, zone host port 0 to RPA port 0; zone host port 1 with RPA port 2. Always insures that two ports are not on the same HBA.
    • Array based splitters—-This is where the JUICE is, most efficient means to replication nirvana
      • The CLARiiON array must have access to the RPAs for both reading and writing. For RPAs with QLE2400 HBAs, you should zone all RPA HBA ports with all CLARiiON array ports.
      • For RPAs with QLA2300 HBAs, it is critical to zone at least two RPA HBA ports with each CLARiiON array port so that the system will be able to automatically dedicate one RPA port for initiator functions and one RPA port for target functions. This requirement is in conflict with the zoning requirements of QLA2300 -based deployments that use non-CLARiiON splitters (that is, host-based and fabric-based splitters). Therefore, QLA2300-based deployments using the CLARiiON splitter can only support consistency groups that replicate CLARiiON volumes. Consistency groups that replicate only non-CLARiiON volumes can use other types of splitters. Note: Traffic initiated by the CLARiiON array cannot flow through the front-end ports dedicated for CLARiiON’s MirrorView. As such, an RPA must be zoned as a target with at least one of the ports not dedicated to MirrorView.
      • Using Navisphere, create a RecoverPoint storage group that includes:
        • All RPAs, RecoverPoint replication volumes — LUNs that belong to the storage group for the relevant host servers
          All RPAs must be masked to at least one LUN on the relevant CLARiiON array. Note: When you attempt to add a LUN to a second storage group, you are warned that this will allow multiple hosts to access the LUN. You can disregard this warning.
        • RecoverPoint journal volumes
        • RecoverPoint repository volumes
        • Ensure that the following are installed:
          • CLARiiON storage processor running FLARE 26 patch level 14 or later
          • RecoverPoint enabler-array installed
    • Fabric based splitters SANTap requirements—-
      • Cisco Multilayer Director Switch with Storage Service Module and SANTap services installed at the primary site.
      • License for SANTap service for each Storage Services Module (SSM).
      • Port 23 between RPAs and the SAN switches must be available.
      • All relevant hosts, storage ports, and RPA ports must be connected via Fibre Channel to the same VSAN. In a multiswitch environment, all connections must be made to the same VSAN on every switch.
      • Hosts and RPAs must be zoned so that each can see the storage it must access. RPAs can be zoned only after the RecoverPoint software has been loaded.

Well I’m dizzy, how about you guys? As you can see there is a lot that encompasses a healthy implementation of these two products, so attention to detail is of the utmost important. Lucky for you and me those fine individuals at EMC and VMware have littered the channels with documentation. So get to reading these products aren’t going to install themselves!!!

Written by Joe Kelly

July 30th, 2008 at 1:03 am

Posted in vmware

My Hypervisor is better than yours…and here’s why

without comments

So Laverick beat me to the punch on this one as I was preparing a post on this exact blog, but any who, I thought I would comment on it….

Straight from the horses mouth is an interesting post from the blog fancied, VMware: Virtual Reality that explains some of the major architectural differences between Hyper-V, Xen and ESX. A bit marketing tainted, the post describes such items as the reasoning behind the “Direct Driver” (ESX) architecture as opposed to the “Indirect Driver” (Xen, Hyper-V) architecture, hypervisor sizing, memory management and overcommitment, and shared storage. There really is nothing like a few netperf graphs, an “Uptime” taunt, and the smell of gun powder in the air to kick off the 4th!!!

Written by Joe Kelly

July 5th, 2008 at 3:37 am

Posted in vmware

Tagged with

VMware gets serious-DMZ Virtualization Best Practices

without comments

For any engineer that has deployed virtualized solutions, DMZ configurations are par for the course. Customers live and die by their presence and availability  on the web and you as a engineer should be versed in proper DMZ deployments. Not only to, most importantly, protect the customers internal data but to keep you and your companies name in good standing. As part of this new initiative toward security awareness and preventative maintenance, VMware has published a quick read on DMZ Virtualization with VMware Infrastructure, download here. What follows is a brief synopsis of the three most typical DMZ deployments in a virtualized environment:

  • Partially Collapsed DMZ with Separate Physical Trust Zones
    • Zone separation achieved through independent clusters
    • Firewalls, IDS, IPS’s, etc. are physical devices requiring no change
    • All servers within each zone are virtual servers
    • Most common approach I have seen as it is the easiest. Network isolation is completely physical removing the need for VLANs.
    • Typically the approach that most larger organizations use, loose the benefit of resource consolidation, reduced power and cooling and all the other salubrious ends to virtualization
  • Partially Collapsed DMZ with Virtual Separation of Trust Zones
    • This approach is a hybrid of sorts, layered between the SPTZ deployment and the DMZ in a box.
    • The virtual software is now a participant in the separation of security zones. Virtual switches corral which virtual servers can see which zones. The physical network devices are gatekeepers, controlling the security and communications between each zone.
    • Complexity level rises in such configurations although there is a greater balance between cost and resource utilization
  • Fully Collapsed DMZ
    • Full DMZ in a box or host that is. Complete virtualization of all entities involved (ie.vServers, vSecurity appliances, vFirewalls, etc.)
    • Certainly the most complex of all configurations, user appropriation a must
    • Again full utilization of resources and low cost is a huge driver for this approach
    • Full auditing suggested across firewall and switches to maintain VM availability, especially in tandem with the advanced features of your VI (DRS, VMotion, etc)

To conclude, choosing the most apt DMZ design should ideally take into consideration the  number of physical NICs available, the customers internal security practices, as well as their tolerence for complexity. With this information in your back pocket and the developement of such programs as VMsafe , together we can snuff the negativity built around virtualization and security.

In addition, make sure to check out the following references as there is a lot of useful information geared toward securing virtual infrastructure.

Written by Joe Kelly

June 30th, 2008 at 7:33 pm

Posted in vmware

Tagged with

64-Bit VI Client Support in Update 1

without comments

For all of you individuals out there that were circumventing solutions around running the VI client from a 64-Bit OS, VMware finally decided to add support for this in Update 1. In short you will need to install .NET 2.0 (x64) prior to the installation of the client. Here is the link for your reference.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004093

Written by Joe Kelly

May 12th, 2008 at 6:51 pm

Posted in vmware

Tagged with