Categories
Backup Deduplication

Inline Deduplication: What Your Mother Never Told You

I was recently attending a show and enjoyed speaking with a variety of end users with different levels of interest and knowledge. One of the things that I found was that attendees were obsessed with the question of inline vs post process vs concurrent process deduplication. Literally, people would come up and say “Do you do inline or post process dedupe?” This is crazy. Certainly there are differences between the approaches, but the real issue should be about data protection not arcane techno speak.

Before I go into details, let me start with the basics, inline deduplication means that deduplication occurs in the primary data path. No data is written to disk until the deduplication process is complete. The other two approaches post process and concurrent process, first store data on disk and then deduplicate. As the name suggests, post process approaches do not begin the deduplication process until all backups are complete. The concurrent process approach begins deduplication can start before the backups are completed and can backup and deduplicate concurrently. Let’s look at each of these in more detail.

Categories
Backup Restore

Data protection and natural disasters – Part 1

Hurricane Ike has been in the news lately and my sympathy goes out to all those affected. It is events like these that test IT resiliency. The damage can range from slight to severe and we invest in reliable and robust data protection processes to protect from disasters like this. The unfortunate reality is that, no matter how much you plan for it, the recovery process often takes longer and is more difficult than expected.

In many respects, data protection is an insurance policy. You hate to pay your homeowners premium every month, you do it because you know that it is your only protection if major damage ever happens to your house. In the case of data protection, you invest hours managing your backup environment to enable recovery from incidents like this. The unfortunate reality is that even with the best planning and policies things still may not turn out as expected. Four of the most common pitfalls I hear from customers include:

Categories
Backup D2D Restore Virtual Tape

InformationWeek on NEC HYDRAstor

Howard Marks recently posted an interesting article about NEC’s HYDRAstor over on his blog at InformationWeek. He discusses the product and how the device is targeted at backup and archiving applications. He makes some interesting points and mentions SEPATON. I wanted to respond to some of the points he raised.

…[the system starts with] a 1-accelerator node – 2-storage node system at $180,000…

Categories
Backup Deduplication

IBM Storage Announcement

As previously posted, I was confused about the muted launch of IBM’s XIV disk platform. Well, the formal launch finally occurred at IBM Storage Symposium in Montpelier, France. Congratulations to IBM, although I am still left scratching my head why they informally announced the product a month ago!

Another part of the announcement was the TS7650G which is Diligent’s software running on an IBM server. Surprisingly, there is not much new; it appears that they are banking on the IBM brand and salesforce to jumpstart Diligent’s sales. Judging by the lack of success in selling the TS75xx series, it will be interesting to see whether they will have any more success with this platform.

From a VTL perspective, IBM has backed themselves into a box. Like EMC, they have a historic relationship with FalconStor and have chosen a different supplier for deduplication. This creates an interesting dichotomy. Let’s look at the specs of their existing FalconStor-based VTL and newly announced technology.

Categories
Backup Deduplication Restore Virtual Tape

Keeping it Factual

I periodically peruse the blogosphere looking for interesting articles on storage, data protection and deduplication. As you can imagine, blog content varies from highly product centric (usually from vendors) to product agnostic (usually from analysts). I recently ran across a post over at the Data Domain blog, Dedupe Matters. This is a corporate blog where it appears that the content is carefully crafted by the PR team and is updated infrequently. Personally, I find canned blogs like this boring. That said, I wanted to respond to a post entitled “Keeping it Real” by Brian Biles, VP of Product Management. As usual, I will be quoting the original article.

A year or more later, Data Domain is scaling as promised, but the bolt-ons are struggling to meet expectations in robustness and economic impact.

Categories
Backup Restore Virtual Tape

Rube Goldberg reborn as a VTL

I have fond memories from my childhood of Rube Goldberg contraptions. I was always amazed at how he would creatively use common elements to implement these crazy machines. By using every day items for complicated contraptions, he made even the simplest process look incredibly complex and difficult. But that was the beauty of it, no one would ever use the devices in practice, but it was the whimsical and complex nature of his drawings that made them so fun to look it.

Rube Goldberg Definition
Image courtesy of rubegoldberg.com

It is the in the context of Rube Goldberg that I find myself thinking about the EMC DL3D 4000 virtual tape library. Like, Goldberg, EMC has taken an approach to VTL and deduplication that revolves around adding complexity to what should be a relatively simple process. Unfortunately, I don’t think that customers will treat the solution with the same whimsical and fun perspective as they did with Goldberg’s machines.

You may think that this is just sour grapes from an EMC competitor, but I am not the only one questioning the approach. Many industry analysts and backup administrators are confused and left scratching their heads just like this author. Why the confusion? Let me explain.

Categories
Backup Deduplication Restore

DL3D Discussion

There is an interesting discussion on The Backup Blog related to deduplication and EMC’s DL3D. The conversation relates to performance and the two participants are W. Curtis Preston the author of the Mr. Backup Blog and the The Backup Blog’s author, Scott from EMC.  Here are some excerpts that I find particularly interesting with my commentary included. (Note that I am directly quoting Scott below.)

VTL performance is 2,200 MB/sec native. We can actually do a fair bit better than that…. 1,600 MB/sec with hardware compression enabled (and most people do enable it for capacity benefits.)

The 2200 MB/sec is not new, it is what EMC specifies on their datasheet; however, it is interesting that performance declines with hardware compression. The hardware compression card must be a performance bottleneck. Is the reduction in performance of 28% meaningful? It depends on the environment and is certainly worth noting especially for datacenters where backup and restore performance are the primary concern.

Categories
Backup Deduplication

6 Reasons not to Deduplicate Data

Deduplication is a hot buzzword these days. I previously posted about how important it is to understand your business problems before evaluating data protection solutions. Here are six reasons why you might not want to deduplicate data.

1. Your data is highly regulated and/or frequently subpoenaed
The challenge with these types of data is the question of whether deduplicated data meets compliance requirements. John Toigo over at Drunken Data has numerous posts on this topic including feedback from a corporate compliance usergroup. In short, the answer is that companies need to carefully review deduplication in the context of their regulatory requirements. The issue is not of actual data loss, but the risk of someone challenging the validity of subpoenaed data that was stored on deduplicated disk. TThe defendent would then face the added burden of proving the validity of the deduplication algorithm. (Many large financial institutions have decided that they will never deduplicate certain data for this reason.)

2. You are deduplicating at the client level
Products like PureDisk from Symantec, Televaulting from Asigra or Avamar from EMC deduplicate data at the client level. With these solutions, the client bears burden of deduplication and only transfers deduplicated (e.g. net new) data across the LAN. The master server maintains a disk repository containing only deduplicated data. Trying to deduplicate the already deduplicated repository will not result in storage savings.

Categories
Backup Deduplication Restore Virtual Tape

DeltaStor Deduplication, cont….

Scott from EMC and author of the backup blog responded to my previous post on DeltaStor. First, thank you for welcoming me to the world of blogdom. This blog is brand new and it is always interesting to engage in educated debate.

I do not want this to go down a “mine is better than yours” route. That just becomes annoying and can lead to a fight that benefits no one. I am particularly concerned since Scott, judging by his picture on EMC’s site, looks much tougher than me! 🙂

The discussion really came down to a few points. For the sake of simplicity I will quote him directly.

So, putting hyperbole aside, the support situation (and just as importantly, the mandate to test every one of those configurations) is a pretty heavy burden.

DeltaStor takes a different approach to deduplication than hash-based solutions like EMC/Quantum and Data Domain. It requires SEPATON to do some additional testing for different applications and modules. The real question under debate is how much additional work. In his first post, Scott characterized this as being entirely unmanageable. (My words, not his.) I continue to disagree with this assessment. Like most things, the Pareto Principle applies here (Otherwise known as the 80-20 rule.).  Will we support every possible combination of every application, maybe not.  Will we support the applications and environments that our customers and prospects use? Absolutely.

Categories
Backup Deduplication Restore

Deduplication, do I really need it?

I’m always puzzled when a customer tells me that they “need deduplication to run my backups better.” This drives me nuts. Deduplication in and of itself doesn’t make your backups run better. In fact some technologies make backups and restores take a lot longer. These customers aren’t really thinking about the root causes of their problems. They are like patients who see an ad for a prescription medication on TV and decide they need some. When the doctor asks why, the response is “because the ad sounded like something that will make me feel better.” Sounds ludicrous, doesn’t it? Well, that is no different from the dedupe statement above.

The simple reality is that, like the prescription drugs on TV, dedupe is not a silver bullet. It solves specific problems such as retention and reduction in $/GB. If that is your problem, then by all means please look at dedupe. But please, understand the problem you are trying to solve. Trust me, you don’t want to be the guy taking the drugs just because they sound good on TV.