There is an interesting blog discussion between Dipash Patel from CommVault and W. Curtis Preston from Backup Central and TruthinIT regarding the increasing or decreasing benefits of deduplication ratios. They take different perspectives on the benefits of increasing deduplication ratios and I will highlight their points and add an additional one to consider.
Patel argues that increasing deduplication ratios beyond 10:1 provides only a marginal benefit. He calculates that going from 10:1 to 20:1 results in only a 5% increase in capacity efficiency and suggests that this provides only a marginal benefit. He adds that vendors who suggest that a doubling in deduplication ratios will result in a doubling cost savings are using a “sleight of hand.” He makes an interesting point, but I disagree with his core statement that increasing deduplication ratios beyond 10:1 provides only marginal savings.
Preston responds to Patel by suggesting that there is a real cost to purchase, manage and power/cool disk systems. An increase in deduplication ratios from 10:1 to 20:1 reduces the amount of required disk storage by two. (e.g. 10 TB at 10:1 requires 1 TB of disk while a 20:1 reduction would require .5TB disk.) He argues that this provides real management cost, power and cooling savings. I believe that Preston makes a good point, but there is another element that is also worth considering.
Most end users purchase deduplication with an end goal of replicating their data for disaster recovery purposes. The benefit of deduplication is not just about retaining data locally, but also about reducing bandwidth requirements for replication. The implications of going from 10:1 to 20:1 can have a major impact on replication and disaster recovery and in some cases can make the difference between meeting or missing SLAs. If we take the same example above, 10TB will shrink to 1 TB and will take 49.4 hours to replicate over a T-3. The same math with a 20:1 ratio yields 24.7 hours. (The model assumes that the T-3 delivers 45 Mb/sec and can be fully utilized for backup.) In this scenario, if the customer’s requirement is to get data offsite in 24 hours, they barely miss it with 20:1 and completely miss it with 10:1. If the deduplication ratio were to increase by one point to 21:1, the customer could replicate their data in 23.5 hours and meet their window. As you can see, in this case, data reduction ratios really matter. In fact, they are critical to the customer meeting their SLAs.
In summary, I believe that Patel is wrong in his assessment of deduplication ratios. Increasing ratios can and will have a meaningful impact on customer environments and suggesting that the benefits only result in 5% savings is misleading.