The email below appeared in my inbox yesterday. The EDL/DL3D 1500/3000 has officially been discontinued. It was obvious from the moment EMC purchased Data Domain that the Quantum stuff was dead, but it took time for EMC to finally admit this. The strongest statement came in Frank Slootman’s TechTarget interview. Clearly the EMC/QTM relationship was a rocky one from the beginning and so the outcome is not surprising.
Scott from EMC posted about the EMC DL3D 4000 today. He was responding to some questions by W. Curtis Preston regarding the product and GA. I am not going to go into detail about the post, but wanted to clarify one point. He says:
Restores from this [undeduplicated data] pool can be accomplished at up to 1,600 MB/s. Far faster than pretty much any other solution available today, from anybody. At 6 TB an hour, that is certainly much faster than any deduplication solution.
(Text in brackets added by me for clarification)
As recently discussed in this post, SEPATON restores data at up to 3,000 MB/sec (11.0 TB/hr) both with deduplication enabled and disabled. Scott insinuates that only EMC is capable of the performance he mentions and I wanted to clarify for the record that SEPATON is almost twice as fast as the fastest EMC system.
Scott from EMC has challenged SEPATON’s advertised performance for backup, deduplication, and restore. As industry analyst, W. Curtis Preston so succinctly put it, “do you really want to start a ‘we have better performance than you’ blog war with one of the products that has clustered dedupe?” However, I wanted to clarify the situation in this post.
Let me answer the questions specifically:
1. The performance data you refer to with the link in his post three words in is both four months old, and actually no data at all.
SEPATON customers want to know how much data they can backup and deduplicate in a given day. That is what is important in a real life usage of the product. The answer is 25 TB per day per node. If a customer has five nodes and a twenty-four hour day, that’s 125 TB of data backed up and deduplicated. This information has been true and accurate for four months and is still true today.
I have posted in the past about the challenges of restoring data from a reverse referenced deduplication solution. In short, the impact can be substantial. You might wonder whether I am the only one pointing out this issue, and what the impact really is.
An EMC blogger recently posted on this topic and provided insights on the reduction in restore performance he sees from both the DL3D and Data Domain. He said, “I will have to rely on what customers tell me: data reads from a DD [Data Domain] system are typically 25-33% of the speed of data writes.” He then goes on to confirm that “…the DL3D performs very similarly to a Data Domain box”. He is referring to restore performance on deduplicated data in reverse referenced environment. (Both Data Domain and EMC/Quantum rely on reverse referencing.) He recommends that you maintain a cache of undeduplicated on the DL3D to avoid this penalty. Of course, this brings up a range of additional questions such as how much extra storage will the holding area require, how many days should you retain and what does this do to deduplication ratios?
The simplest solution to the above problem is to use forward referencing, but neither DD nor EMC/Quantum support this technology. EMC’s workaround is to force the customer to use more disk to store undeduplicated data which adds to the management burden and cost.
This reminds me of a classic quote from John “Hannibal” Smith from the A-Team:
I love it when a plan comes together!
What more confirmation do you need?
I have fond memories from my childhood of Rube Goldberg contraptions. I was always amazed at how he would creatively use common elements to implement these crazy machines. By using every day items for complicated contraptions, he made even the simplest process look incredibly complex and difficult. But that was the beauty of it, no one would ever use the devices in practice, but it was the whimsical and complex nature of his drawings that made them so fun to look it.
Image courtesy of rubegoldberg.com
It is the in the context of Rube Goldberg that I find myself thinking about the EMC DL3D 4000 virtual tape library. Like, Goldberg, EMC has taken an approach to VTL and deduplication that revolves around adding complexity to what should be a relatively simple process. Unfortunately, I don’t think that customers will treat the solution with the same whimsical and fun perspective as they did with Goldberg’s machines.
You may think that this is just sour grapes from an EMC competitor, but I am not the only one questioning the approach. Many industry analysts and backup administrators are confused and left scratching their heads just like this author. Why the confusion? Let me explain.
There is an interesting discussion on The Backup Blog related to deduplication and EMC’s DL3D. The conversation relates to performance and the two participants are W. Curtis Preston the author of the Mr. Backup Blog and the The Backup Blog’s author, Scott from EMC. Here are some excerpts that I find particularly interesting with my commentary included. (Note that I am directly quoting Scott below.)
VTL performance is 2,200 MB/sec native. We can actually do a fair bit better than that…. 1,600 MB/sec with hardware compression enabled (and most people do enable it for capacity benefits.)
The 2200 MB/sec is not new, it is what EMC specifies on their datasheet; however, it is interesting that performance declines with hardware compression. The hardware compression card must be a performance bottleneck. Is the reduction in performance of 28% meaningful? It depends on the environment and is certainly worth noting especially for datacenters where backup and restore performance are the primary concern.