Categories
Backup Deduplication Restore Virtual Tape

Keeping it Factual

I periodically peruse the blogosphere looking for interesting articles on storage, data protection and deduplication. As you can imagine, blog content varies from highly product centric (usually from vendors) to product agnostic (usually from analysts). I recently ran across a post over at the Data Domain blog, Dedupe Matters. This is a corporate blog where it appears that the content is carefully crafted by the PR team and is updated infrequently. Personally, I find canned blogs like this boring. That said, I wanted to respond to a post entitled “Keeping it Real” by Brian Biles, VP of Product Management. As usual, I will be quoting the original article.

A year or more later, Data Domain is scaling as promised, but the bolt-ons are struggling to meet expectations in robustness and economic impact.

Categories
Deduplication Restore

Deduplication and restore performance redux

A week ago, I wrote an article highlighting how deduplication can impact restore performance and the difference between forward and reverse referencing. Many people are not familiar with these two deduplication technologies and their importance.  SEPATON is the only vendor to implement forward referencing technology in a large scale enterprise appliance and it is important to understand why we did that.

Lauren Whitehouse from the Enterprise Strategy Group posted an article on a similar topic on Searchstorage.com on 8/11/08. It is gratifying to know that I am not the only one focused on the importance of deduplication and restore performance!

Categories
Backup Restore Virtual Tape

Rube Goldberg reborn as a VTL

I have fond memories from my childhood of Rube Goldberg contraptions. I was always amazed at how he would creatively use common elements to implement these crazy machines. By using every day items for complicated contraptions, he made even the simplest process look incredibly complex and difficult. But that was the beauty of it, no one would ever use the devices in practice, but it was the whimsical and complex nature of his drawings that made them so fun to look it.

Rube Goldberg Definition
Image courtesy of rubegoldberg.com

It is the in the context of Rube Goldberg that I find myself thinking about the EMC DL3D 4000 virtual tape library. Like, Goldberg, EMC has taken an approach to VTL and deduplication that revolves around adding complexity to what should be a relatively simple process. Unfortunately, I don’t think that customers will treat the solution with the same whimsical and fun perspective as they did with Goldberg’s machines.

You may think that this is just sour grapes from an EMC competitor, but I am not the only one questioning the approach. Many industry analysts and backup administrators are confused and left scratching their heads just like this author. Why the confusion? Let me explain.

Categories
Deduplication Restore

Deduplication and restore performance

One of the hidden landmines of deduplication is its impact on restore performance. Most vendors gloss over this issue in their quest to sell bigger and faster systems. Credit goes to Scott from EMC who acknowledged that restore performance declines on deduplicated data in the DL3D. We have seen other similar solutions suffer restore performance degradation of greater than 60% over time. Remember, the whole point of backing up is to restore when/if necessary. If you are evaluating deduplication solutions, you must consider several questions.

  1. What are the implications to your business on the decreasing restore performance?
  2. What is it about deduplication technology that hurts restore performance?
  3. Can you reduce the impact on restore performance?
  4. Is there a solution that does not have this limitation?
Categories
Backup Deduplication Restore

DL3D Discussion

There is an interesting discussion on The Backup Blog related to deduplication and EMC’s DL3D. The conversation relates to performance and the two participants are W. Curtis Preston the author of the Mr. Backup Blog and the The Backup Blog’s author, Scott from EMC.  Here are some excerpts that I find particularly interesting with my commentary included. (Note that I am directly quoting Scott below.)

VTL performance is 2,200 MB/sec native. We can actually do a fair bit better than that…. 1,600 MB/sec with hardware compression enabled (and most people do enable it for capacity benefits.)

The 2200 MB/sec is not new, it is what EMC specifies on their datasheet; however, it is interesting that performance declines with hardware compression. The hardware compression card must be a performance bottleneck. Is the reduction in performance of 28% meaningful? It depends on the environment and is certainly worth noting especially for datacenters where backup and restore performance are the primary concern.

Categories
Backup Deduplication Restore Virtual Tape

DeltaStor Deduplication, cont….

Scott from EMC and author of the backup blog responded to my previous post on DeltaStor. First, thank you for welcoming me to the world of blogdom. This blog is brand new and it is always interesting to engage in educated debate.

I do not want this to go down a “mine is better than yours” route. That just becomes annoying and can lead to a fight that benefits no one. I am particularly concerned since Scott, judging by his picture on EMC’s site, looks much tougher than me! 🙂

The discussion really came down to a few points. For the sake of simplicity I will quote him directly.

So, putting hyperbole aside, the support situation (and just as importantly, the mandate to test every one of those configurations) is a pretty heavy burden.

DeltaStor takes a different approach to deduplication than hash-based solutions like EMC/Quantum and Data Domain. It requires SEPATON to do some additional testing for different applications and modules. The real question under debate is how much additional work. In his first post, Scott characterized this as being entirely unmanageable. (My words, not his.) I continue to disagree with this assessment. Like most things, the Pareto Principle applies here (Otherwise known as the 80-20 rule.).  Will we support every possible combination of every application, maybe not.  Will we support the applications and environments that our customers and prospects use? Absolutely.

Categories
Backup Deduplication Restore

Deduplication, do I really need it?

I’m always puzzled when a customer tells me that they “need deduplication to run my backups better.” This drives me nuts. Deduplication in and of itself doesn’t make your backups run better. In fact some technologies make backups and restores take a lot longer. These customers aren’t really thinking about the root causes of their problems. They are like patients who see an ad for a prescription medication on TV and decide they need some. When the doctor asks why, the response is “because the ad sounded like something that will make me feel better.” Sounds ludicrous, doesn’t it? Well, that is no different from the dedupe statement above.

The simple reality is that, like the prescription drugs on TV, dedupe is not a silver bullet. It solves specific problems such as retention and reduction in $/GB. If that is your problem, then by all means please look at dedupe. But please, understand the problem you are trying to solve. Trust me, you don’t want to be the guy taking the drugs just because they sound good on TV.

Categories
Backup Deduplication Restore Virtual Tape

DeltaStor Deduplication

I track numerous sites focused on data protection and storage and am always interested in what other industry participants are discussing.  The blogroll on this site includes links to some informative and interesting blogs.  However, as with anything else on the web, everyone brings their view of the world which inevitably influences their perspective.

This brings me to my next point; I recently ran across a blog post where an EMC guy bad mouths SEPATON’s DeltaStor technology which is used in HP’s VLS platform.  The obvious conclusion is “This is an EMC guy so what do you expect?” which is true.  However, I think it deserves a brief response and wanted to respond to his his major point.  To summarize, he is commenting on the fact that DeltaStor requires an understanding of each supported backup application and data type and that this makes testing difficult.  Here is an excerpt:

Given that there are at least 5 or 6 common backup applications, each with at least 2 or 3 currently supported versions, and probably 10 or 15 applications, each with at least 2 or 3 currently supported version, the number of combinations approaches a million pretty rapidly

This is a tremendous overstatement.  HP’s (and SEPATON’s) implementation of DeltaStor is targeted at the enterprise.  If you look at enterprise datacenters there is actually a very small number of backup applications in use.  You will typically see NetBackup, TSM and much less frequently Legato; this narrows the scope of testing substantially.

This is not the case in small environments where you see many more applications such as BackupExec, ARCserve and others which is why HP is selling a backup application agnostic product for these environments.  (SEPATON is focused on the enterprise and so our solution is not targeted here.)

He then talks about how there are many different versions of supported backup applications and many different application modules.  This is true; however it is misleading.  The power of backup applications is their ability to maintain tape compatibility across versions.  This feature means that tape and file formats change infrequently.  In the case of NetBackup, it has not changed substantially since 4.5.  The result is that qualification requirements for a new application are typically minimal.  (Clearly, if the tape format radically changes, the process is more involved.)  The situation is similar with backup application modules.

The final question in his entry is:

do you want to buy into an architecture that severely limits what you can and cannot deduplicate? Or do you want an architecture that can deduplicate anything?

I would suggest an alternative question: do you want a generic deduplication solution that supports all applications reduces your performance by 90% (200 MB/sec vs 2200 MB/sec) and provides mediocre deduplication ratios or do you want an enterprise focused solution that provides the fastest performance, most scalability, most granular deduplication ratios and is optimized for your backup application??

You decide…

Oh as an afterthought, stay tuned to this blog if you are interested in more detail on the first part of my question above.

Categories
Backup General Restore

Why is this blog called About Restore?

You might be wondering why I choose the name About Restore for this blog.  The primary objective of data protection is restore. Sure you backup data every night, but you do this so you can restore the data. A recent situation reminded me of this.

I manage numerous programs and one of my responsibilities is an internal portal.  One morning, I found that the portal was inaccessible and the primary culprit seemed to be database corruption.  I had a manual backup, but it was old and so I called my IT department to ask about restoring the data.  As it turned out, my server was older and had never held critical data and so was being not backed up.  (My data is important, but certainly the company won’t come screeching to a halt without it…)  Uggghhh.  The goods news is that I later realized that the problem was due to a minor misconfiguration and my data was intact.

I bring the above story to illustrate a simple point.  Restore is what’s most important.  Sure, you need to backup data every night, but you do it to enable restores.  So as you are going through your daily protection activities remember, it is about restore!

BTW, have you conducted a restore test recently?