Deduplication, Restore Performance and the A-Team

I have posted in the past about the challenges of restoring data from a reverse referenced deduplication solution. In short, the impact can be substantial. You might wonder whether I am the only one pointing out this issue, and what the impact really is.

An EMC blogger recently posted on this topic and provided insights on the reduction in restore performance he sees from both the DL3D and Data Domain.  He said, “I will have to rely on what customers tell me: data reads from a DD [Data Domain] system are typically 25-33% of the speed of data writes.” He then goes on to confirm that “…the DL3D performs very similarly to a Data Domain box”. He is referring to restore performance on deduplicated data in reverse referenced environment. (Both Data Domain and EMC/Quantum rely on reverse referencing.) He recommends that you maintain a cache of undeduplicated on the DL3D to avoid this penalty. Of course, this brings up a range of additional questions such as how much extra storage will the holding area require, how many days should you retain and what does this do to deduplication ratios?

The simplest solution to the above problem is to use forward referencing, but neither DD nor EMC/Quantum support this technology. EMC’s workaround is to force the customer to use more disk to store undeduplicated data which adds to the management burden and cost.

This reminds me of a classic quote from John “Hannibal” Smith from the A-Team:

I love it when a plan comes together!

What more confirmation do you need?

Be Sociable, Share!
  • Twitter
  • Facebook
  • email
  • StumbleUpon
  • Delicious
  • LinkedIn

4 Responses to “Deduplication, Restore Performance and the A-Team”

  1. So publish some performance numbers for Sepaton then. Meaningful ones.

    And I didn’t recommend anything other than you pay attention to your SLAs and size appropriately. 🙂

    Honestly, the “forward reference” approach of Sepaton is just marketing unless you are willing to back it up with real world numbers, like EMC has done.

    You claim that you get 600 MB/s ingest per SRE. What about during deduplication? What about during replication. Compare a restore from one day ago to a restore from seven days ago. During simultaneous ingest and replication too please.

    See what I mean? Mr. T has an appropriate quote for this situation too: “Quit yo jibber jabber!”
    Without data that fits to, well, a T.

  2. Our performance numbers are available on our website and in our datasheets; I have only referred to them indirectly here figuring that people could find them elsewhere. Given your confusion, I will provide them. I am not going to embed them in this comment, but will create a dedicated blog post in the next couple of days. Stay tuned

  3. Gonna have to ding you on this.

    What I heard in my head when I read Scott’s original blog entry you’re referring to was “people tell me my competitor stinks!” Now I’m reading your blog and hearing “that guy said that people tell him his competitor stinks!” Wow. Now THERE’S two reliable sources of information.

    I commented on Scott’s blog that this didn’t reflect what I’ve heard from EVERY Data Domain customer I’ve talked to. So I’ll say the same on your blog. While I’m not saying they have no issues at all, I’m saying I don’t see this 75% drop in restore performance that Scott is talking about. It is apparently a problem with DL3D/Quantum if restoring from truncated, deduped data (that information is coming directly from Scott — who works at EMC). But I’ve never had a single DD customer tell me that, and actually have quite a bit of evidence to the contrary.

    Focus on talking about why you think SEPATON is awesome. Stop these “why our competitor stinks” posts.

    As to the Scott’s comment about performance numbers and your statement that they’re on your website… Um, where would that be? You state only one performance number that I could find on your website — 34.5 TB/hr, but it’s talking about your ingest speed. You talk nothing about your dedupe speed on any page I could find. Your DeltaStor product brief mentions nothing about throughput. The ESG whitepaper does talk about performance, but that’s not where anyone’s going to look.

    I’m waiting with bated breath for your upcoming post. Although it will still not be official SEPATON numbers any more than Scott’s posts are official EMC numbers. (Until they appear on your actual corporate website…) FWIW, this is what EMC is publishing on their website: http://www.emc.com/collateral/hardware/comparison/emc-disk-library.htm

  4. Wow, who would have thought that such a short post would generate such discussion! Perhaps it is a deep rooted love of the A-Team. 🙂

    The point of my post was to simply highlight that reverse referencing negatively impacts restore performance. Scott’s numbers were there so I quoted them. Are they 100% accurate, probably not, but they do make the point that restore performance can be materially slower than backup performance. I have seen this in practice with Data Domain and it sounds like it could be even worse with the EMC systems.

    Regarding performance numbers, we have long quoted ingest numbers and our concurrent aproach means that this does not change when deduplication is enabled. Your point about deduplication performance is a good one. The data is available on our website although not in an obvious location. I will include this in my post and will be sure that it is stated clearly when we update our collateral and web site.

Leave a Reply to W. Curtis Preston