Scott over at EMC recently posted his thoughts about deduplication ratios and how they vary widely. I agree with his assessment that compression ratios, change rates and retention are key ingredients in deduplication ratios. However, he makes a global statement, “If you don’t know those three things, you simply cannot state a deduplication ratio with any level of honesty….It is impossible”, and uses this point to suggest that SEPATON’s Exchange guarantee program is “ridiculous”. Obviously the blogger, being an EMC employee, brings his own perspectives as do I, a SEPATON employee. Let’s dig into this a bit more.
As the original author mentioned, the key metrics for deduplication include compression, change rate and retention. Clearly these can vary by data types; however, certain data types provide more consistent deduplication results. As you can imagine, these are applications that are backed up fully every night, have fixed data structures and relatively low data change rates. Some examples include Exchange, Oracle, VMware and others.