blog.virtualtacit.com

Root Down in a 2009 World

Archive for July, 2008

VMware’s Site Recovery Manager and RecoverPoint: The Perfect Vision Now

with one comment

To expand on my previous post, Tivo for the DataCenter, EMC’s Recoverpoint, we will now discuss the prerequisites for VMware’s Site Recovery Manager and its integration with Recoverpoint.

What is SRM?

It is the grand orchestrator, a conduit of sorts, between your Virtual Infrastructure and your storage array. It’s, VC integrated, software that handles the testing and failover of entire virtual datacenters via administratively built virtual runbooks. RTO’s and RPO’s, are not just misunderstood acronyms but instead obtainable by-products of cost and risk reduction. In short, it simplifies your virtual business continuity world.

Note: What follows is really the cookbook for integrating these two products as well as some personal notes so excuse the delivery of the content.

Assuming core replication is in place between your Primary or Protected site and your Secondary Site, the following items are foundational to every SRM implementation:

  • Two ESX hosts (version 3.0.2/3.5 Update 1 or greater) representing primary and recovery sites – installed and configured. Check the VMware Site Recovery Manager Compatibility Matrix document at http://www.vmware.com/pdf/srm_10_compat_matrix.pdf
  • Two VirtualCenter Servers (version 2.5 Update 1 or greater) representing primary and recovery sites. Each VirtualCenter should be managing one of the above ESX hosts
  • Two Site Recovery Manager Servers (can be same as VirtualCenter Servers) representing primary and recovery sites
  • Separate VirtualCenter database for primary and recovery sites
  • Creation of a new Microsoft SQL Server or Oracle database with DBA privileges to that database for SRM. Separate SRM database for primary and recovery sites
  • SRM compatible shared fibre channel or iSCSI SAN with replication. Check the VMware Site Recovery Manager Compatibility Matrix document at http://www.vmware.com/pdf/srm_10_compat_matrix.pdf
  • Storage Replication Adapter (SRA) for storage used for SRM. Check the storage model and the software/firmware level supported in the VMware Site Recovery Manager Compatibility Matrix document at http://www.vmware.com/pdf/srm_10_compat_matrix.pdf
  • Sample virtual machines (VMs) on the replicated datastore running on ESX host at primary site
  • One non replicated datastore on the infrastructure representing recovery site to store placeholder virtual machines
  • Additional storage space is required for clones or snapshots at the recovery site in order to run the SRM failover test. This will be a storage vendor specific implementation (e.g. BCV for EMC Symmetrix, clone/snapshots for CLARiiON/NetApp etc.)

Ok that takes care of the SRM side what on the RP side (continuous local and remote) do we need…

  • FC SAN, ideally you will need 4 FC ports per RecoverPoint Appliance for multipathing and redundancy
  • Eth int linked to the WAN for replication
  • Eth int linked to the LAN for management
  • SAN volumes required for RPA
    • Repository volume-accessed as part of the installation process, need for both primary and remote site.
      • must be accessible to all RPA’s and highly available
      • Repository holds configuration info and some meta data replication information, Size-2G per consistency group, oversize up front as reconfiguring later will require a new activation license would increase to 5 to 10G per CG.
    • Journal volume-Each consistency group needs a journal volume and to that end, it must be large enough to support the amount of delta to that group for a given day. Typically you should size that as 10 to 20% of you consistency group size.
      • To support failover you must maintain a journal volume for the production volume copy; the local copy and the remote copy or both.
      • the minimum journal volume size for each copy is 4.5G. More efficient capacity by keeping the journal volumes around the same size but counterintuitive to the 10-20% rule, stick with the rule.
    • Replication volume-production volumes must be paired up with a copy. These are replication sets for the consistency groups
      • Volumes do not need to be the same size, although the production copy cannot be larger than the volume for a copy. Excess space on a copy will not be used.
  • Here are your replication options:
    • CDP-local same site replication to another array. No WAN interface specified
    • CRR-Continuous Remote Replication, async replication over the WAN to a remote site.
    • CLR-Concurrent Local and Remote, protection spans locally and remotely. CDP for local, CRR for remote.
  • Use NTP server for time sync on the RPA’s. Set it up on the first RPA in a site, and the other RPA’s will sync to it.

 

Ok here is a fun one-ZONING-lets break it up based on what type of splitter you will be using…

Note: there is alot of pertinent information within the installation guide specifically pertaining to Qlogic QLE2400 and 2300 HBA’s, make sure you read it if it applies.

On a side note RPA’s are recognized by 5001248 prefix for their WWN, helpful indeed.

    • Host based splitters—–
      • One initiator, one target per zone. For simplicity, zone host port 0 to RPA port 0; zone host port 1 with RPA port 2. Always insures that two ports are not on the same HBA.
    • Array based splitters—-This is where the JUICE is, most efficient means to replication nirvana
      • The CLARiiON array must have access to the RPAs for both reading and writing. For RPAs with QLE2400 HBAs, you should zone all RPA HBA ports with all CLARiiON array ports.
      • For RPAs with QLA2300 HBAs, it is critical to zone at least two RPA HBA ports with each CLARiiON array port so that the system will be able to automatically dedicate one RPA port for initiator functions and one RPA port for target functions. This requirement is in conflict with the zoning requirements of QLA2300 -based deployments that use non-CLARiiON splitters (that is, host-based and fabric-based splitters). Therefore, QLA2300-based deployments using the CLARiiON splitter can only support consistency groups that replicate CLARiiON volumes. Consistency groups that replicate only non-CLARiiON volumes can use other types of splitters. Note: Traffic initiated by the CLARiiON array cannot flow through the front-end ports dedicated for CLARiiON’s MirrorView. As such, an RPA must be zoned as a target with at least one of the ports not dedicated to MirrorView.
      • Using Navisphere, create a RecoverPoint storage group that includes:
        • All RPAs, RecoverPoint replication volumes — LUNs that belong to the storage group for the relevant host servers
          All RPAs must be masked to at least one LUN on the relevant CLARiiON array. Note: When you attempt to add a LUN to a second storage group, you are warned that this will allow multiple hosts to access the LUN. You can disregard this warning.
        • RecoverPoint journal volumes
        • RecoverPoint repository volumes
        • Ensure that the following are installed:
          • CLARiiON storage processor running FLARE 26 patch level 14 or later
          • RecoverPoint enabler-array installed
    • Fabric based splitters SANTap requirements—-
      • Cisco Multilayer Director Switch with Storage Service Module and SANTap services installed at the primary site.
      • License for SANTap service for each Storage Services Module (SSM).
      • Port 23 between RPAs and the SAN switches must be available.
      • All relevant hosts, storage ports, and RPA ports must be connected via Fibre Channel to the same VSAN. In a multiswitch environment, all connections must be made to the same VSAN on every switch.
      • Hosts and RPAs must be zoned so that each can see the storage it must access. RPAs can be zoned only after the RecoverPoint software has been loaded.

Well I’m dizzy, how about you guys? As you can see there is a lot that encompasses a healthy implementation of these two products, so attention to detail is of the utmost important. Lucky for you and me those fine individuals at EMC and VMware have littered the channels with documentation. So get to reading these products aren’t going to install themselves!!!

Written by Joe Kelly

July 30th, 2008 at 1:03 am

Posted in vmware

TiVo for the DataCenter-EMC’s RecoverPoint

with 3 comments

There has been a lot of buzz in the air about CDP or Continuous Data Protection and what it brings to the table. So I thought it would be beneficial to spell out the capabilities of this technology and expand on how the title of this post (thanks BS for your Enlightenment) is spot on for describing the functionality of these appliances, specifically in the way of data rewind or PIT rollback.

To begin, what exactly is CDP?

In short, it is continuous backup. The ability to split every write ingress to a protected volume (LUN),  instantaneously. This process is performed  out-of-band or out of the data path, to an appliance (in this case a RecoverPoint Appliance). One item to note here, the concept of out-of-band is so important as introducing any devices in the data path has the tendency to impede I/O. Impeding I/O affects application performance, poor application performance leads to loss of productivity, etc,etc.. see where I am going with this….

Let’s continue…this “write splitter” can reside on the host (Windows Only) , on the array (CX and CX3 series arrays only (iSCSI and FC)-Support for AIX, HP-UX, Linux, Solaris, VMware and Windows) or within the Fabric (ie. Cisco’s SANTAP, Brocade’s Fabric Application Platform, RP version only). So a couple of things here, there is actually two versions of RP..as noted below..both offer local and remote replication but RP seems to provide the added benefit of fabric based write splits and multi-array functionality as well as the following: BW reduction, Heterogeneous Multi-pathing support, and up to 8 appliances per site, to name a few.

RecoverPoint- Geared toward multi-storage vendor environments. Comes with the CDP module for local synchronous replication intra-array or another same site array and the CRR module or Continuous  Remote Replication for remote asynchronous replication between multiple arrays. Both allow for data rewind (think TiVo) for point in time recovery.

RecoverPoint/SE (Single Edition? Array to Array implication)- Geared specifically toward CX and CX3 series arrays only, noted above. CDP and CRR modules exist as well  but purely for Clariion to Clariion replication with the added limitation of an 8TB cap. In addition, there is support for block based storage on the NS20, 40 and 80 FC models, but no support for the fabric write splitter among other things.

We all know a picture is worth a thousand words, so lets look at a typical RecoverPoint (forget RP/SE for now) environment and an explanation of each component (images by way of EMC)…

 image

 image The RP Splitter Drivers

  • Out of Band write mirroring to the RP appliance
  • Function can exist on the Host, on the Array, or within the Fabric

imageThe RP Appliance-Thanks Kashya

  • Runs intelligent RP software
  • Handles Bi-directional replication
  • Adherence to write order, Consistency Groups
  • Maintains complete management and monitoring capabilities

image The Importance of the Journal

  • Delta tracker for all protected volumes-Stored in a compressed format for Point-In-Time rollback or rewind (did I mention TiVo)
  • Bookmark Tracker for application aware recovery
  • Maintains reserved pool to track changes to PIT (Point in Time) copies that have been recovered (target side processing space)
  • SAN based LUN, easily expandable via concatenation or striped LUN expansions

image Advance Networking Capabilities

  • Pre-Flight data compression
  • Eliminates the need for pricey FC/IP converters

image The “Spread the Love” Support Banner

  • Heterogeneous third party storage support, but please check the EMC Support Matrix
  • Storage agnostic, any to any replication

 

Here is another great image depicting RP/SE, Clariion to Clariion Remote Replication…

 

image

 

What about licensing you ask?  Important question, so here is the answer…be sure to talk to you local EMC rep about specifics

  • RecoverPoint is licensed by a per replicated capacity
  • RecoverPoint/SE is licensed per array and ultimately that depends on what model of Clariion

Beyond this basic run through, which I hope was helpful, there are numerous integration points with this product to talk about. But there is one that I personally am very excited about, VMware Site Recovery Manager. So hold on to you hats as the next post should be a dooozy…;)

Written by Joe Kelly

July 22nd, 2008 at 1:36 pm

Posted in disaster recovery

Who knew?

without comments

As I was tooling around Powerlink today I noticed some analysis tools under <Support><Product and Diagnostic Tools><Environment Analysis Tools>. Specifically, they were oriented toward the following:  Celerra Health Check, Host Environment Analysis Tools, and Switch Analysis Tools.

What’s nice about these tools is that you can upload pertinent support script outputs and generate what is a complete analysis of that device or host.  On top of that it alerts you to potential problems and provides recommendations where needed. Here are the specifics around each, Note–all require a logon to Powerlink..

  • Celerra Health Check
    • Method used to generate output file -
      • collect_support_materials, talked about here
    • Specifics-
      • Displays information about the Celerra Linux operating system, NAS version and model, control station configuration and file systems, data mover configuration, filesystems, interfaces and utilization, and backend checks.
      • Provides notices and warnings of potential problem areas where appropriate
      • Provides recommendations, where appropriate
  • Host Environment Analysis Tool (HEAT)
    • Method used to generate output file -
      • EMC Grab and EMCReports depending on the OS
    • Specifics
      • Processes the output of the EMCReports script for Windows 2000 and Windows 2003 hosts and performs the following functions:
        • Displays information about the host, memory details, IRQ levels, Windows services, network adapters, disk drives, file system alignment, SCSI, drivers, host bus adapters, installed software and hot-fixes, EMC PowerPath and Solutions Enabler, Symmetrix, CLARiiON, Celerra software, device mapping and application and event log checking.
        • Checks versions of system drivers, HBA drivers and firmware, EMC PowerPath and Solutions Enabler software, volume management software, EMC Disk Array software against the latest versions that are EMC Supported.
        • Provides notices and warnings of potential problem areas where appropriate.
        • Provides recommendations where appropriate.
      • Process the output of the EMCGrab scripts for AIX, HP-UX, Linux, Tru-64/OSF1, and Solaris hosts and performs the following functions:
        • Displays information about the host, OS, OS patches, host bus adapters, multipathing, drivers, file systems, installed volume management software, EMC PowerPath and Solutions Enabler software, Symmetrix, Clariion, Celerra software, device mapping and application and event log checking.
        • Checks versions of system drivers, hba drivers and firmware, EMC PowerPath and Solutions Enabler software, volume management software, EMC Disk Array software against the latest versions that are EMC Supported.
        • Provides notices and warnings of potential problem areas where appropriate.
        • Provides recommendations where appropriate.
  • Switch Analysis Tool (SWAT)
    • Method used to generate output file -
      • Depending on the Switch Vendor
          • a supportshow from a Brocade switch, or
          • a show tech-support details from a Cisco switch, or
          • a Data collection from a McData switch
    • Specifics
      • Displays information about the switch properties, effective configuration, name server entries, port statistics, fabric OS file system, zone checks, environment, memory, licensing, VSAN and some logging checks.
      • Provides notices and warnings of potential problem areas where appropriate.
      • Provides recommendations where appropriate.

Considering this whole process is web based you simply browse to the output file and select upload. The output is analyzed, an HTML report is generated and emailed to you. How is that for service!

Written by Joe Kelly

July 19th, 2008 at 11:02 pm

Posted in knowledge

Unified I/O-InfiniBandana

without comments

As a follow up to the post, Unified I/O-FCoEasy does it….Successor to FC?, I thought it would be fair to drop some info on Infiniband and its place in the storage ecosystem.

IB from a raw throughput perspective is no doubt head of the class. Low latency, high bandwidth, and the ability to collapse both FC and Ethernet on a single transport are all contributing factors to IB’s push toward the de facto standard for high speed I/O interconnects. Although most notably a main staple in HPC (High Performance Computing) environments it quickly is targeting its efforts toward virtualization. And why shouldn’t it, the inevitable build up or build out of pure physical cabling to support even the most simplest designs makes this consolidated technology a perfect fit. VMware has already announced support for Mellanox HCA’s in its ESX 3.5 release, signaling that the tides are now split between IB and 10GigE as viable Unified options–but wait a minute, lets look under the covers shall we..

image

Comparison of both competing technologies above– (notice FCoE and IBoE are not listed but both are defining technologies in development)

So where do we stand from both IB’s perspective and Ethernet as a transport; the good, the bad, the ugly–noted here–http://www.nowmicro.com/NM_PDF/wp-cisco-interconnects.pdf

Advantages of IB

  • Reduced CPU and memory overheads by utilizing specialized HCA hardware
  • Remote Direct Memory Access (RDMA) allows for the placement of information from one computer system into the memory of another computer
    system, reducing latency and minimizing the processing overhead demands at the host system
  • Low end-to-end system latency (from 5 to 10 microseconds, depending on the application)
  • Multiple vendors supporting and shipping products
  • Consolidated I/O with network, management, and storage all on one interface
  • Implements various protocols, such as general-purpose remote-procedure call (RPC), direct-access transports (as part of RDMA), sockets direct protocol for TCP/IP, and more

Disadvantages of IB

  • There is only one provider of an InfiniBand core chipset (Mellanox), constituting a risk of supply disruption. (and this is huge in my book, free market in effect-not good)
  • This lack of diversity also generally reduces innovation and over time drives prices higher.
  • There is limited expertise in the field with the protocol.
  • It is a new networking protocol and relies heavily on gateway functionality (storage area network and IP) to get the full benefits of a unified fabric.
  • Limited diagnostic and troubleshooting tools are available in shipping products.
  • It has had limited use in enterprise environments, which is the true test for sustaining long-term viability of the protocol. (here we are talking about significant up front cost, you are essentially up fitting you entire environment with gear to support this technology)

Ok, not nearly as sexy as first discussed…how about Ethernet..

Advantages of Ethernet

  • Long networking history
  • Knowledge base of the protocol and tools available–come on who doesnt know how to configure a NIC..what about an HCA? —that’s what I thought
  • Based on standards and available from multiple vendors

Downside of Ethernet

  • RDMA offload network interface cards are new and relatively expensive
  • 10 Gigabit Ethernet is still relatively expensive….and dropping as we speak
  • End-to-end MPI latency may be longer in Ethernet networks (testing is still in progress)
  • Other interconnects are more competitive in price performance at this time….give it time

Are we looking at both hardware and soft costs with IB when comparing price/performance per port to Ethernet? I am not so sure…this is definitely a topic worth revisting…thanks to JN for kicking up this topic

Written by Joe Kelly

July 18th, 2008 at 3:38 am

Posted in storage

Tagged with

Unified I/O-FCoEasy does it…Successor to FC?

with one comment

There has been a lot of news over the last several months about FCoE and its applicability in the data center. With the advent of 10GbE and the premise of data center unification I can understand why this technology has gotten legs. Joining the ranks of iSCSI, iFCP, and FCIP, FCoE offers the consumer yet another means to shove block based data traffic over an Ethernet network.  But unlike the former protocols,  FCoE is purely touted as a “Data Center” protocol, non-routable and connection oriented much like FC.

The practicality of this protocol comes into play with the use of a CNA or Converged Network Adapter, ultimately replacing the need for Ethernet NICs and FC HBAs. Less adapters equates to less cabling, less power and less complexity. Paradisiacal, huh???

Well hold on…undeniably the landscape of Ethernet will need to change to support this convergence of data and storage traffic. To further this unification, Converged Enhanced Ethernet  (or DCE, Data Center Ethernet) is under development to allow for FC to pipe over Ethernet networks.

What follows are the modifications that need to take place to Ethernet to support Fiber Channel.

  • Encapsulation of native fiber channel frames within an Ethernet frame.
  • Extensions to the Ethernet protocol itself to enable a lossless Ethernet fabric
  • Replacing the fiber channel link with MAC addresses in a lossless Ethernet

Fiber Channel, it seems, is architecturally  dammed. With Ethernet extending within the 10G range and beyond (40G and 100G under development) how can FC ever compete? By switching transports to Ethernet is how. Protecting FC investments is the key to FCoE’s adoption and all vendors with business interests in FC know that.  Cisco, of particular interest, is leading this march. With its buy-in to Nuova Systems in 2006 and most recent buy-out in early 2008, it is well positioned to capitalize on future FC ecosystem upheavals. The Nexus 5000 series of DC class switches, released in April ‘08, is the first real step toward Unified I/O and you better believe it will support FCoE, with a few license buy-in’s of course.

With that said, the adoption of FCoE will be slow going despite its backing. Companies in general will not be quick to uproot there existing native FC environments for the “new kid” technology on the block until it is tested and proven from soup to nuts.

What’s exciting about the introduction of FCoE to the market is that it will force enhancements to Ethernet that will ultimately benefit iSCSI and network storage protocols in general, further driving the server consolidation and virtualization cause. Its proliferation will be built on the back of DCE and its adoption will be in the hands of you and me. So educate yourself, its coming…

For more info on FCoE and the developments that are happening in this space, please visit http://www.fcoe.com/

Also be sure to check out the compelling video describing the hurdles of explosive data growth at CNBC and how they will tackle their challenges with FCoE, http://video.computerworld.com/services/link/bcpid1351827287/bctid1410385428.

 

****Updated****

Thanks to Omar for commenting and correcting my inaccuracies, as FCoE does not require an upheaval but is complementary and non-disruptive to existing FC environments. He also noted, the Nexus 5000 is currently being certified in EMC E-Labs and should be customer implemented by year’s end. I may add that the current MDS product line does support FCoE in their third generation modules, further protecting your current investment.

Comments are always welcome.

Written by Joe Kelly

July 14th, 2008 at 1:45 am

Posted in storage

Tagged with