Archive for November, 2008
The King coming to Durham
For those of you that have the opportunity and happen to be in the area, I would suggest going to see B.B. King in Durham on Sunday night. He truly is one of the most influential musicians and storytellers of his time. For those that have never gone to a show it encompasses a mix of great blues, deep southern rooted, uplifting epilogue and general crowd interaction and participation with a legend. My goal for the show? and yes I am going…is to get him to mark his signature on my strat. Maybe he will look beyond that fact that its a fender, I wouldn’t want to make Lucille jealous : )
Recoverpoint: Fabric Manager Rue <Please may I have another>
Like every thing else out there, knowing where to turn when you need answers is critical to your success as an IT professional. Some rely on technical dribble from the vendors themselves, useful in many respects, but sometimes you have to let the air out of the tires to get to the core content. And still others rely on the backs of other peer professionals that have trenched it out, have placed the proverbial toothpicks between their eyelids, burned the midnight oil, to uncover tidbits of knowledge that results in a moment of Eureka!!! Well consider this a lob my friends….
Enter Cisco Bug ID CSCsr49954 and CSCsu02826, if these were a man I would punch them as they caused me hours of delay over the past few days. So what happened? Well on the Gen 3 RPA’s <remember I talked a little bit about them here> the Qlogic 2462’s have the ability to function as both initiators and targets, dual mode capabilities if you will. So here is where using Fabric Manager bites you, here are the Symptoms below (Primus emc198785)…In short, don’t believe your eyes…
Yeah I know it goes against all that your mother told you was true (sorry didn’t mean to bring mothers into this)…but facts are facts, and the fact is you should be zoning your RPA’s from the command line it greatly simplifies this intermediary step that is a one time core requirement. So here are your options other than CLI:
- Upgrade to Fabric Manager 4.1 (1b) or higher <first hand, this is not 100%, it fixes the RPA switch entries but FLOGI for your devices still show as NPV and in some instances iSCSI for the port info, what?? Toss it, get over you laziness (myself included) and spend the extra 10 mins from the CLI.
- Upgrade to Recoverpoint 3.1- Wait a minute, if you using SRM (Site Recovery Manager), this isn’t a supported option. Check your HCL’s!!!
And here are the symptoms for your pleasure…
Symptom: Under the switch tab in FM, see entries that show ‘No IP Address’ under the status column and ‘Kashya’ under the vendor column.The Topology map shows red line through the RPA device. This by far, should be your biggest tip off to this issue.
Symptom:
Under output of show fcns database details, see the fc4-types:fc4_features showing all the features including npv and in some cases virtual:
VSAN:1 FCID:0×3b0017
————————
port-wwn (vendor) :50:01:24:82:00:13:9e:61
[RP1_N2_I2_VSAN1]
node-wwn :50:01:24:82:00:13:9e:60
class :3
node-ip-addr :0.0.0.0
ipa :ff ff ff ff ff ff ff ff
fc4-types:fc4_features :scsi-fcp:both hippi-fp 66 70 73 74 78 bbl-ctrl 86 fc-vi 89 94 fc-av 99 102 131 132 133 134 135 136 138 141 145 146 147 149 151 152 153 154 155 163 164 165 166 167 168 170 173 177 178 179 181 183 184 185 186 187 195 196 197 198 199 200 202 205 209 210 211 213 215 216 217 218 219 227 sdv npv 230 232 233 234 236 237 239 240 241 242 244 252
symbolic-port-name :
symbolic-node-name :
port-type :N
port-ip-addr :0.0.0.0
fabric-port-wwn :20:02:00:0d:ec:69:e1:80
hard-addr :0×000000
permanent-port-wwn (vendor) :50:01:24:82:00:13:9e:61
Symptom: fc4-types:fc4_features :scsi-fcp:both hippi-fp 66 70 73 74 78 bbl-ctrl 86 fc-vi 89 94 fc-av 99 102 131 132 133 134 135 136 138 141 145 146 147 149 151 152 153 154 155 163 164 165 166 167 168 170 173 177 178 179 181 183 184 185 186 187 192 197 198 201 202 203 204 207 208 209 212 213 216 virtual
Thanks for listening, Its been a long week..cant wait to drown myself in turkey : )
Sunshine with INQ…
You know, this may be common knowledge, there may be a thousand other utilities that serve this function but I thought I would further propagate this utilities cause especially in the virtual world. INQ is the tool I speak of and it brings a whole lot of goodness. Ok its not that great, but it does allow you from a host perspective to map the LUN id of a specific volume. Useable? I think so.
Here is what you need, the binaries…located here, ftp://ftp.emc.com/pub/symm3000/inquiry/latest. Run the utility from the command line on any OS of which there is a tool. Below is what you will see…
If multiple arrays are attached to the host then verify the correct vendor. Here particularly we are focused on the DGC VEND which as you know is an EMC CLARiiON array. What we want to focus on is the first 2 digits which represent the LUN ID in Hex. Simply convert that number to decimal and you have your LUN ID.
Now here’s the downside, I haven’t figured out how to recognize LUN ID’s beyond 255 (FF) as the last two LUNs represent LUN ID 257 and 258. See what I mean, not so recognizable anymore from this utility but I assume I am missing something as most Symm LUN ID’s run well past 255. Maybe this is just a limitation of other array’s outside of Symm’s, anybody know?
EMC RecoverPoint-RPA Volumes (Bit II)
Wow…there is so much to Recoverpoint, the overwhelming content of it has sparked a fire in me and gotten me extremely jazzed about learning this product and CDP in general. So lets continue…
The 3 three volumes I am about to mention within RecoverPoint are everything. Not only are they required as part of the installation but they set the stage for how RP will perform and function. Remember its all about planning with RP do it right up front, size it properly up front and you will experience quite a remarkable product.
So the 3 classifications of volumes that exist in a Recover Point implementation are user volumes, journals and the repository. What follows is a breakdown of each:
- Journal-This is nothing more than a container for snapshot images for a particular replicated user LUN. But lets not stop there it most certainly serves other purposes, what follows are the percentages of the journal allocated to each said function.
- 75% (variable) is dedicated to snapshot images and will hold as many as its capacity allows. Assuming the images have been distributed to the remote copy in the case of CRR and CLR, it follows a FIFO process, or First In First Out. As new images come in the old ones are removed to house the new ones.
- 20% (variable) is used for the sole purpose of logged image access (physical and virtual). Image access is as it implies, accessing a PIT for the point of reading or writing to that replica volume, all changes are logged to this area of the journal. This is a variable area of the journal that can be resized from space from the snapshot image retainer (75% area), but will require an outage and loss of PIT’s within the volume. Keep in mind, that image access is temporary and prolong access could cause replication to cease.
- 5% (fixed) is system partitioned space for RP. It holds the virtual pointers needed to bring physical and virtual image stitched access to fruition.
Depending on what type of replication you are using, whether its CDP, CLR, or CRR, sets the stage for how many journal volumes are needed.
- Repository-This volume in particular is key to housing the configuration for all clustered RPA’s as well as consistency group, replication set, policy settings and group sets among other things. The idea here is by maintaining all config info on the array you seamlessly allow all replication activities on a failing RPA to failover to another RPA(s). Only one volume per cluster (both local and the remote side) is needed to which both RPA’s are enlightened. The minimum size for the repository is 4 GB, however it SHOULD be 124G, that is what is realistic, here’s what I mean..
- The first 4G is used for the aforementioned cluster configuration information. Outside of that, each consistency group created is earmarked 2G of the 120G left. Simple math tells us that the limit in relation to CG’s is 60 per cluster or 30 per RPA. This 2G is used within the replication process during points of WAN flap or drops, a temp caching location of sorts.
- Furthermore, replication marking data is stored here which creates the grounds for more efficient resynchronization of replicated volumes.
Don’t skimp on this volume, it should be sized up front (remember 124G) as resizing on the fly is not practical. A resize will require disablement of all CG’s, all journals will be cleared, a full sweep of your environment, and a new activation license. Not only that, be prudent and give this volume the juice, it should exist on fast spinning disk as its role is quite ponderous. Take away…Give it 124G upfront and save your self a lot of pain on the back end.
- User-This refers to the production source, the local copy (CDP) and the remote copy (CRR). Every production volume has a copy volume, which is defined in what is called a replication set. Replication sets define a mapping between the prod volume and the local or remote copy. Every write on the production volume is replicated to the remote or local journal and then copied to the remote or local copy, consistency is the name of the game here. The replica volumes should be the same size as the production source, bigger and you are wasting space, smaller and errors will occur.
Alright so enough of that, what’s next? Design considerations or maybe clarification on some of the terminology I used…its getting interesting agreed?
EMC RecoverPoint-Hardware Awareness (Bit I)
Here is a quick rundown of what shells the idea of an RPA or RecoverPoint appliance. Under the covers its nothing more than a Dell 1950 (BTW, EOL’d Q2 2008). Corresponding to the 2.4 and the 3.0 release there exists two generations of the RPA’s easily recognizable by the HBA’s in use, Gen1 2Gb HBA’s, Gen3 4Gb HBA’s. Here is the breakdown..
- Gen 1:
- Phase 2 1950
- PCI-X bus architecture
- 2 2Gb Qlogic 2342-Dual ported
- 2 sockets Dual Core, Intel Xeon Woodcrest
- Compatible with RP V2.4 or V3.0 (V2.4 will not work with Gen 3 hardware)
- Gen 3 (No I didn’t skip 2, huh?):
- Phase 3 1950
- PCI-E bus architecture
- 2 4Gb Qlogic 2462-Dual ported
- 2 sockets Quad Core, Intel Xeon Harpertown
- Compatible with RP V3.0 and above only
Note–the shear fact that these are commodity servers implies that with the RP software and a linux 2.6 kernel in hand you could effectively build your own RP appliance for a lab environment, more on this in another post…
The HBA’s on the Gen3 hardware are auto sensing dual mode adaptors, meaning they act as either an initiator or target depending on what they are connected to. This, I imagine, would be a welcome change from the Gen1 HBA’s as only ports 0 and 2 were designated as targets, 1 and 3 as initiators.
In addition, there are two GiGe ports per node. One for management and LAN replication and one for WAN replication traffic. Each RPA needs an ip address as well as one VIP for floating IP management.
The linux kernel and RP software are installed locally on the RPA’s. Each appliance has 2 73G drives configured as a mirror set. Furthermore, the local identity of the RPA is also stored here, name, ip address, etc. All consistency group, replication sets, bit mapping, etc are stored within the repository (SAN based, expansion in future posts).
Something to keep in mind, EMC sells RP in a minimum 2 node configuration, however this doesn’t imply that a single node will not work only that HA is no longer possible. If CRR or CLR is in play, your destination site hardware must mirror your source site hardware. If you have a 2 node cluster at Site A, you must have a 2 node cluster at Site B, this collectively is known as a System.
How about that for a quickie, next post…explanation of the volume type functions within RecoverPoint and perhaps a tad bit more..stay with me.
