I periodically peruse the blogosphere looking for interesting articles on storage, data protection and deduplication. As you can imagine, blog content varies from highly product centric (usually from vendors) to product agnostic (usually from analysts). I recently ran across a post over at the Data Domain blog, Dedupe Matters. This is a corporate blog where it appears that the content is carefully crafted by the PR team and is updated infrequently. Personally, I find canned blogs like this boring. That said, I wanted to respond to a post entitled “Keeping it Real” by Brian Biles, VP of Product Management. As usual, I will be quoting the original article.
A year or more later, Data Domain is scaling as promised, but the bolt-ons are struggling to meet expectations in robustness and economic impact.
When the author refers to Data Domain scalability, he is missing a major point. To this day, the company’s biggest and fastest single appliance scales to 380 MB/sec. Any large enterprise customer that looks at the solution will laugh because it would require them to purchase and manage so many units. (Note that I exclude the DDX which is essentially a screen scraped GUI for multiple units. It is not a real appliance since it still includes 16 separate devices with separate performance metrics and deduplication domains.) I am sure that the author is referring to the scalability required in SMB/SME markets which has been Data Domain’s historic strength.
Finally, Data Domain’s strategy for solving the performance bottleneck above will be a clustered solution of multiple boxes. Which brings the ultimate irony, Data Domain needs a bolt-on solution to meet enterprise requirements! Their solutions are just not designed for enterprise performance or capacity and so they have to revert to bolt-ons to overcome their own limitations.
A post-process system will fill up much faster than expected and/or get controller-bound and slow if it gets too far behind……Data Domain systems are easy: with inline dedupe, the only throughput is dedupe throughput.
He is making generic assumptions about the speed of post-process deduplication in the first sentence. This gross generalization certainly does not apply to SEPATON’s DeltaStor software, and I doubt that it applies to most other post-process solutions either. Perhaps he is really referring to an individual vendor’s solutions (Quantum?) If so, why not just say so instead of making a bogus generalization?
The second point is a classic DD misdirection. They love to point out that with inline dedupe you only need to worry about one throughput measure. I respectfully disagree. What really matters is protecting the data. If it takes you 21 hours to backup and dedupe with a Data Domain solution and 2 hours to backup to an S2100-ES2 and 5 to dedupe it, which approach is safer for your data? For an SMB with limited requirements inline is simple; however, in an enterprise environment with large backups, there is little benefit to inline processing. Data Domain typically promotes how they think inline is so much better, but what really matters is data protection and restoration. You must get your data protected as rapidly as possible to ensure a secure copy is available for immediate restore.
As expected, Data Domain’s post is highly centered on their products and their view of the world. In this post, I have highlighted some of the areas that are misleading in their post. As always, customers must do the research to ensure that they are choosing the right solution for their environment.