DeltaStor Deduplication, cont….

Scott from EMC and author of the backup blog responded to my previous post on DeltaStor. First, thank you for welcoming me to the world of blogdom. This blog is brand new and it is always interesting to engage in educated debate.

I do not want this to go down a “mine is better than yours” route. That just becomes annoying and can lead to a fight that benefits no one. I am particularly concerned since Scott, judging by his picture on EMC’s site, looks much tougher than me! đŸ™‚

The discussion really came down to a few points. For the sake of simplicity I will quote him directly.

So, putting hyperbole aside, the support situation (and just as importantly, the mandate to test every one of those configurations) is a pretty heavy burden.

DeltaStor takes a different approach to deduplication than hash-based solutions like EMC/Quantum and Data Domain. It requires SEPATON to do some additional testing for different applications and modules. The real question under debate is how much additional work. In his first post, Scott characterized this as being entirely unmanageable. (My words, not his.) I continue to disagree with this assessment. Like most things, the Pareto Principle applies here (Otherwise known as the 80-20 rule.).  Will we support every possible combination of every application, maybe not.  Will we support the applications and environments that our customers and prospects use? Absolutely.

I think that certain vendors have a vested interest in their approach to deduplication succeeding, and alternative approaches failing. Would those vendors deliberately change tape format to ensure that? Well, who knows?

This comment is awesome. I can just imagine these executives in a dark room strategizing how they are going to ruin SEPATON and other data deduplication vendors by changing their tape format: “We will get them, Muhhhaaaaaahhhhhh”. The reality is tape format certainly could change and would impact DeltaStor, but it could also impact EMC/Quantum and Data Domain as well. (Did you know that Data Domain makes you choose the backup application when configuring the system? Why would they need that if they were truly agnostic? Is it the same with EMC/Quantum?) We are beta testers for most of the enterprise applications out there and so even if they did change the format, we would have plenty of advanced notice. Additionally, major changes like this don’t typically happen in mid-cycle but are usually reserved for .0 releases of backup software. How many customers do you know jump on the first .0 release of any new application?

At the end of the day, I think you can fairly choose between a deduplication appliance that supports data from any source and solutions that only support data from some sources, some of the time. Everything else being equal, I think we would all choose the former.

What really matters is the benefit to the customer. Each deduplication algorithm is different and will often provide different ratios on the same data. Saying “Everything else being equal”, does not make sense. All things are never equal, every environment and customer situation is different and each deduplication algorithm will handle the data differently. An end user needs to look at each solution in their own environment and only then can they decide what works best.

More importantly, I don’t think it [deduplication ratios beyond 20:1] matters?… So how much storage does this [deduplication ratio of 50:1 vs 25:1] save us? About 2 TB per 100 TB of data backed up. That’s right, 2%.

I checked your math on this one and agree that the difference is about 2%. However, data deduplication is typically used to retain more data online. If we take your 100TB model and assume retention of 30 days that equates to 3 PB of undeduplicated storage. Following your math, the savings would be 60 TB (2% of 3 PB). Is a 60 TB reduction in storage meaningful? I would argue that it is although Scott might argue otherwise. It is left to the reader to decide for themselves and they should test the different deduplication algorithms with their data to better understand what benefit they will see with deduplication. (All ratios are hypothetical at best until you test the technology in your environment.)

The other place where deduplication ratios really matter is replication; the more you can reduce the footprint, the less you have to send over the wire.

In summary, EMC has partnered with Quantum for deduplication and Scott thinks that their solution is the best. (Duh! He’s from EMC) I think otherwise for obvious reasons. We can argue about facts and figures and support this and performance that, but what really matters is how the technology works in customer environments. I have 100% confidence that DeltaStor will outperform competing solutions in scalability, performance, deduplication and that support will not be an issue. Scott doubts this and thinks that the DL3D is superior. In the end the market will decide…

Be Sociable, Share!
  • Twitter
  • Facebook
  • email
  • StumbleUpon
  • Delicious
  • LinkedIn

No comments yet... Be the first to leave a reply!

Leave a Reply