Celerra Deduplication – Addressing Points of Conversation
To highlight a bold new feature brought to the Celerra line, the following are relevant points that may be questioned during an average “what does what” 3D conversation. And of course all of this is spelled out clearly here, so don’t take my word for it. Onward…
You know its funny, never say never in this industry. Here I shunned the use of 3D on primary storage but find myself completely drawn in by such a use case.NetApp has obviously long been the pacesetter in defining this trend and for that I commend them. But what makes their solution any better than the next? Always on/scheduled, compression/no compression, fixed/variable/file-level, file system size limitations, operability with existing feature sets, etc..all are approaches and design considerations that must be addressed and more importantly must mesh in your environment. So that being said any comments pertaining to why one way or the other works better compared to another (based on your experiences or testing) would be greatly appreciated.
- DART Code needed - 5.6.43 or later. Free upgrade assuming your support agreements are up to date.
- CD (Celerra Dedup) is an “always on” function as opposed to scheduled
- Celerra Replicator V1 in play for a file system? then no dedup. Dedup on a file system? Then no replicator V1. Of course V2 lifts these restrictions and implies the destination Celerra is also running the above DART or later.
- CD will not process files less than 24KB.
- A filesystem enabled for dedup must have 1MB worth of free space
- CD minimizes CIFS timeouts by not modifying deduped files that are over 200MB. This however is the behavior tied to CIFS and not NFS.
- It goes without saying that dedup has the potential to negatively impact the system as a whole as any dedup solution. Although the Celerra does throttle this process once the processor (X-Blade) hits 75% usage to mitigate this risk. Additionally, dedup’s nature is to focus on files that have not been accessed recently, typically based on the following criteria:
- Last accessed/modified time
- min/max size
- file type
- dir name
- Max volume size with 3D is 16TB from the NX4 all the way to the NS-G8
- Configured on a per filesystem basis all GUI based with one click enablement
- iSCSI? No Dedup. CIFS and NFS only. No block.
- Deduplication in the Celerra sense equates to file level single instance storage and compression (greatly simplified) This ultimately extends the reach of available space savings beyond what 3D can provide alone, presumably up to 40%. Both of these methods were chosen to decrease storage consumption while minimizing resource overhead.
- Read operations do not require the dedup’d file to be decompressed on disk, only in memory
- SHA-1 hashing is how identical files are detected and used as part of the single instancing process
- NDMP Volume Based Backups does NOT cause an inflation or reduplication of dedup’d data as this operation occurs at the block level. However non-VBB NDMP backups that occurs over the network are reduplicated.
- 3D immediately releases blocks within the PFS (production file system). But in doing so, these released blocks get copied to the SavVol. The Celerra, by default, will terminate any 3D activity to prevent automatic SavVol extension.
- Full Celerra functionality support for the following: Celerra Manager, SnapSure, Replicator, NDMP, FileMover, and FLR
Related articles by Zemanta


Sunday, April 12, 2009 at 6:05PM

![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_a.png?x-id=f070d14e-c2b7-42a4-ac27-90bd168a8509)
