Backup D2D Restore Virtual Tape

InformationWeek on NEC HYDRAstor

Howard Marks recently posted an interesting article about NEC’s HYDRAstor over on his blog at InformationWeek. He discusses the product and how the device is targeted at backup and archiving applications. He makes some interesting points and mentions SEPATON. I wanted to respond to some of the points he raised.

…[the system starts with] a 1-accelerator node – 2-storage node system at $180,000…

Backup Deduplication Restore Virtual Tape

Keeping it Factual

I periodically peruse the blogosphere looking for interesting articles on storage, data protection and deduplication. As you can imagine, blog content varies from highly product centric (usually from vendors) to product agnostic (usually from analysts). I recently ran across a post over at the Data Domain blog, Dedupe Matters. This is a corporate blog where it appears that the content is carefully crafted by the PR team and is updated infrequently. Personally, I find canned blogs like this boring. That said, I wanted to respond to a post entitled “Keeping it Real” by Brian Biles, VP of Product Management. As usual, I will be quoting the original article.

A year or more later, Data Domain is scaling as promised, but the bolt-ons are struggling to meet expectations in robustness and economic impact.

D2D Deduplication Virtual Tape

Analyst Commentary on VTL

I am often perusing industry related sites to find what people are saying about disaster recovery and data protection. Most of these sites rely on independent contributors to provide the content. Given the myriad of viewpoints and experience levels, it is not uncommon to see a wide range of commentaries, some consistent with industry trends, and others not. I keep this in mind when reading these articles and generally ignore inconsistencies; however once in a while an article is so egregiously wrong that I feel a response is necessary.

In this case, I am referring to an article appearing in eWeek where the author makes gross generalizations about VTL that are misleading at best. Let’s walk through his key points:

VTLs are complex

I completely disagree. The reason most people purchase VTLs is that they simplify data protection and can be implemented with almost no change in tape policies or procedures. This means that companies do not have relearn new procedures after implementing a VTL and thus the implementation is relatively simple and not complex like he suggests.

He also mentions that most VTLs use separate VTL software and storage. This is true for solutions from some of the big storage vendors, but is not the case with the SEPATON S2100-ES2. We manage the entire appliance including storage provisioning and performance management.

Finally, he complains about the complexity of configuring Fibre Channel (FC). While it is true that FC can be more complex than Ethernet it really depends on how you configure the system. One option is to direct connect the VTL which requires none of the FC complexities he harps on. He also glosses over the fact that FC is much faster than the alternatives which is an important benefit. (My guess is that he is comparing the VTL to Ethernet, but he never clearly states this.)

D2D Deduplication Virtual Tape

Tape is not dead!

I am amazed when I hear some vendors aggressively promote that tape is dead. It seems that hyping the demise of tape is in vogue these days and the reality is quite different. Even so,  there is no stopping them from sharing their message with anyone who will listen. If you ask large enterprises, many of them are looking at alternatives to tape, but telling them that tape is completely dead and that they should rip out all tape hardware is ludicrous. Ironically, this is the approach of some deduplication vendors.  Jon Toigo states this succinctly in his blog.

The problem with tape is that it has become the whipping boy in many IT shops.
Courtesy: Drunken Data

The simple reality is that tape has been an important component of data protection for years and is likely to maintain a role far into the future. The reader should remember that in today’s highly regulated environments, companies often face strict requirements about data retention. For example, medical institutions can face some of the most stringent requirements:

HIPAA’s Privacy Rule, in effect since 2003 or 2004 depending on the size of the organization, requires confidentiality of patient records on paper and sets retention periods for some kinds of medical information, regardless of media. These retention requirements can stretch from birth to 21 years of age for pediatric records, or beyond the lifetime of the patient for other medical records.
Courtesy: Directory M

With this in mind, let’s look at the evolution of tape:

Backup Restore Virtual Tape

Rube Goldberg reborn as a VTL

I have fond memories from my childhood of Rube Goldberg contraptions. I was always amazed at how he would creatively use common elements to implement these crazy machines. By using every day items for complicated contraptions, he made even the simplest process look incredibly complex and difficult. But that was the beauty of it, no one would ever use the devices in practice, but it was the whimsical and complex nature of his drawings that made them so fun to look it.

Rube Goldberg Definition
Image courtesy of

It is the in the context of Rube Goldberg that I find myself thinking about the EMC DL3D 4000 virtual tape library. Like, Goldberg, EMC has taken an approach to VTL and deduplication that revolves around adding complexity to what should be a relatively simple process. Unfortunately, I don’t think that customers will treat the solution with the same whimsical and fun perspective as they did with Goldberg’s machines.

You may think that this is just sour grapes from an EMC competitor, but I am not the only one questioning the approach. Many industry analysts and backup administrators are confused and left scratching their heads just like this author. Why the confusion? Let me explain.

Backup Deduplication Restore Virtual Tape

DeltaStor Deduplication, cont….

Scott from EMC and author of the backup blog responded to my previous post on DeltaStor. First, thank you for welcoming me to the world of blogdom. This blog is brand new and it is always interesting to engage in educated debate.

I do not want this to go down a “mine is better than yours” route. That just becomes annoying and can lead to a fight that benefits no one. I am particularly concerned since Scott, judging by his picture on EMC’s site, looks much tougher than me! 🙂

The discussion really came down to a few points. For the sake of simplicity I will quote him directly.

So, putting hyperbole aside, the support situation (and just as importantly, the mandate to test every one of those configurations) is a pretty heavy burden.

DeltaStor takes a different approach to deduplication than hash-based solutions like EMC/Quantum and Data Domain. It requires SEPATON to do some additional testing for different applications and modules. The real question under debate is how much additional work. In his first post, Scott characterized this as being entirely unmanageable. (My words, not his.) I continue to disagree with this assessment. Like most things, the Pareto Principle applies here (Otherwise known as the 80-20 rule.).  Will we support every possible combination of every application, maybe not.  Will we support the applications and environments that our customers and prospects use? Absolutely.

Backup Deduplication Restore Virtual Tape

DeltaStor Deduplication

I track numerous sites focused on data protection and storage and am always interested in what other industry participants are discussing.  The blogroll on this site includes links to some informative and interesting blogs.  However, as with anything else on the web, everyone brings their view of the world which inevitably influences their perspective.

This brings me to my next point; I recently ran across a blog post where an EMC guy bad mouths SEPATON’s DeltaStor technology which is used in HP’s VLS platform.  The obvious conclusion is “This is an EMC guy so what do you expect?” which is true.  However, I think it deserves a brief response and wanted to respond to his his major point.  To summarize, he is commenting on the fact that DeltaStor requires an understanding of each supported backup application and data type and that this makes testing difficult.  Here is an excerpt:

Given that there are at least 5 or 6 common backup applications, each with at least 2 or 3 currently supported versions, and probably 10 or 15 applications, each with at least 2 or 3 currently supported version, the number of combinations approaches a million pretty rapidly

This is a tremendous overstatement.  HP’s (and SEPATON’s) implementation of DeltaStor is targeted at the enterprise.  If you look at enterprise datacenters there is actually a very small number of backup applications in use.  You will typically see NetBackup, TSM and much less frequently Legato; this narrows the scope of testing substantially.

This is not the case in small environments where you see many more applications such as BackupExec, ARCserve and others which is why HP is selling a backup application agnostic product for these environments.  (SEPATON is focused on the enterprise and so our solution is not targeted here.)

He then talks about how there are many different versions of supported backup applications and many different application modules.  This is true; however it is misleading.  The power of backup applications is their ability to maintain tape compatibility across versions.  This feature means that tape and file formats change infrequently.  In the case of NetBackup, it has not changed substantially since 4.5.  The result is that qualification requirements for a new application are typically minimal.  (Clearly, if the tape format radically changes, the process is more involved.)  The situation is similar with backup application modules.

The final question in his entry is:

do you want to buy into an architecture that severely limits what you can and cannot deduplicate? Or do you want an architecture that can deduplicate anything?

I would suggest an alternative question: do you want a generic deduplication solution that supports all applications reduces your performance by 90% (200 MB/sec vs 2200 MB/sec) and provides mediocre deduplication ratios or do you want an enterprise focused solution that provides the fastest performance, most scalability, most granular deduplication ratios and is optimized for your backup application??

You decide…

Oh as an afterthought, stay tuned to this blog if you are interested in more detail on the first part of my question above.