Categories
Deduplication Restore

Deduplication and restore performance redux

A week ago, I wrote an article highlighting how deduplication can impact restore performance and the difference between forward and reverse referencing. Many people are not familiar with these two deduplication technologies and their importance.  SEPATON is the only vendor to implement forward referencing technology in a large scale enterprise appliance and it is important to understand why we did that.

Lauren Whitehouse from the Enterprise Strategy Group posted an article on a similar topic on Searchstorage.com on 8/11/08. It is gratifying to know that I am not the only one focused on the importance of deduplication and restore performance!

Categories
Deduplication Restore

Deduplication and restore performance

One of the hidden landmines of deduplication is its impact on restore performance. Most vendors gloss over this issue in their quest to sell bigger and faster systems. Credit goes to Scott from EMC who acknowledged that restore performance declines on deduplicated data in the DL3D. We have seen other similar solutions suffer restore performance degradation of greater than 60% over time. Remember, the whole point of backing up is to restore when/if necessary. If you are evaluating deduplication solutions, you must consider several questions.

  1. What are the implications to your business on the decreasing restore performance?
  2. What is it about deduplication technology that hurts restore performance?
  3. Can you reduce the impact on restore performance?
  4. Is there a solution that does not have this limitation?
Categories
Backup Deduplication

6 Reasons not to Deduplicate Data

Deduplication is a hot buzzword these days. I previously posted about how important it is to understand your business problems before evaluating data protection solutions. Here are six reasons why you might not want to deduplicate data.

1. Your data is highly regulated and/or frequently subpoenaed
The challenge with these types of data is the question of whether deduplicated data meets compliance requirements. John Toigo over at Drunken Data has numerous posts on this topic including feedback from a corporate compliance usergroup. In short, the answer is that companies need to carefully review deduplication in the context of their regulatory requirements. The issue is not of actual data loss, but the risk of someone challenging the validity of subpoenaed data that was stored on deduplicated disk. TThe defendent would then face the added burden of proving the validity of the deduplication algorithm. (Many large financial institutions have decided that they will never deduplicate certain data for this reason.)

2. You are deduplicating at the client level
Products like PureDisk from Symantec, Televaulting from Asigra or Avamar from EMC deduplicate data at the client level. With these solutions, the client bears burden of deduplication and only transfers deduplicated (e.g. net new) data across the LAN. The master server maintains a disk repository containing only deduplicated data. Trying to deduplicate the already deduplicated repository will not result in storage savings.

Categories
Backup Deduplication Restore Virtual Tape

DeltaStor Deduplication, cont….

Scott from EMC and author of the backup blog responded to my previous post on DeltaStor. First, thank you for welcoming me to the world of blogdom. This blog is brand new and it is always interesting to engage in educated debate.

I do not want this to go down a “mine is better than yours” route. That just becomes annoying and can lead to a fight that benefits no one. I am particularly concerned since Scott, judging by his picture on EMC’s site, looks much tougher than me! 🙂

The discussion really came down to a few points. For the sake of simplicity I will quote him directly.

So, putting hyperbole aside, the support situation (and just as importantly, the mandate to test every one of those configurations) is a pretty heavy burden.

DeltaStor takes a different approach to deduplication than hash-based solutions like EMC/Quantum and Data Domain. It requires SEPATON to do some additional testing for different applications and modules. The real question under debate is how much additional work. In his first post, Scott characterized this as being entirely unmanageable. (My words, not his.) I continue to disagree with this assessment. Like most things, the Pareto Principle applies here (Otherwise known as the 80-20 rule.).  Will we support every possible combination of every application, maybe not.  Will we support the applications and environments that our customers and prospects use? Absolutely.

Categories
Backup Deduplication Restore Virtual Tape

DeltaStor Deduplication

I track numerous sites focused on data protection and storage and am always interested in what other industry participants are discussing.  The blogroll on this site includes links to some informative and interesting blogs.  However, as with anything else on the web, everyone brings their view of the world which inevitably influences their perspective.

This brings me to my next point; I recently ran across a blog post where an EMC guy bad mouths SEPATON’s DeltaStor technology which is used in HP’s VLS platform.  The obvious conclusion is “This is an EMC guy so what do you expect?” which is true.  However, I think it deserves a brief response and wanted to respond to his his major point.  To summarize, he is commenting on the fact that DeltaStor requires an understanding of each supported backup application and data type and that this makes testing difficult.  Here is an excerpt:

Given that there are at least 5 or 6 common backup applications, each with at least 2 or 3 currently supported versions, and probably 10 or 15 applications, each with at least 2 or 3 currently supported version, the number of combinations approaches a million pretty rapidly

This is a tremendous overstatement.  HP’s (and SEPATON’s) implementation of DeltaStor is targeted at the enterprise.  If you look at enterprise datacenters there is actually a very small number of backup applications in use.  You will typically see NetBackup, TSM and much less frequently Legato; this narrows the scope of testing substantially.

This is not the case in small environments where you see many more applications such as BackupExec, ARCserve and others which is why HP is selling a backup application agnostic product for these environments.  (SEPATON is focused on the enterprise and so our solution is not targeted here.)

He then talks about how there are many different versions of supported backup applications and many different application modules.  This is true; however it is misleading.  The power of backup applications is their ability to maintain tape compatibility across versions.  This feature means that tape and file formats change infrequently.  In the case of NetBackup, it has not changed substantially since 4.5.  The result is that qualification requirements for a new application are typically minimal.  (Clearly, if the tape format radically changes, the process is more involved.)  The situation is similar with backup application modules.

The final question in his entry is:

do you want to buy into an architecture that severely limits what you can and cannot deduplicate? Or do you want an architecture that can deduplicate anything?

I would suggest an alternative question: do you want a generic deduplication solution that supports all applications reduces your performance by 90% (200 MB/sec vs 2200 MB/sec) and provides mediocre deduplication ratios or do you want an enterprise focused solution that provides the fastest performance, most scalability, most granular deduplication ratios and is optimized for your backup application??

You decide…

Oh as an afterthought, stay tuned to this blog if you are interested in more detail on the first part of my question above.