Trials and Tribble-lations of Deduplication

November 3, 2008 – 1:49 pm by JL

One of my favorite episodes from Star Trek was “Trouble with Tribbles.” In the episode, Uhura adopted a creature called a tribble only to find that it immediately started to reproduce uncontrollably, resulting in an infestation in the Enterprise’s critical business err spaceship systems. You can read a synopsis of the episode here or even better, watch it here. What does this have to do with restoration and deduplication? I’m glad you asked.

As I previously posted, the key driver in sizing deduplication environments and solutions is performance. This is because most solutions are performance constrained by deduplication. Like the tribbles from Star Trek, the risk end-users run is rapid growth in the number of deduplication appliances. It may seem easy to size the environment initially, but what happens if your data growth is faster than expected or stricter SLAs require you to reduce your backup and/or restore windows? The inevitable answer in most cases is more deduplication appliances. All of a sudden what seemed like one cute tribble (err, deduplication appliance) becomes a massive quantity of independent devices with different capacity and performance metrics. This large growth in machines will add complexity to your environment and will dramatically reduce any cost savings that you may have originally expected.

To avoid the above issues, you need to think about your needs not just today but into the future. The ideal solution is to purchase a system today that can meet your needs going forward. This stresses the importance of performance scalability and you must understand how this applies to any given solution.

In the world of Star Trek, Scotty easily beamed the excess tribbles to a nearby Klingon vessel. In the world of the data center, we are not so lucky. Besides who would be the unwilling recipient? Perhaps you could beam them to Data Domain?

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

NetApp Dedupe: The Worst of Inline and Post-process Deduplication

October 30, 2008 – 2:22 pm by JL

NetApp finally entered the world of deduplication in data protection. While they have supported a flavor of the technology in their filers since May 2007, they have never launched the technology for their VTL. Why? Because their VTL does not use any of the core filer IP. It relies on an entirely separate software architecture that they acquired from Alacritus. Thus all the features of ONTAP do not apply to their VTL. However, I digress from the topic at hand.

I posted recently about three different approaches to deduplication timing: inline, post process and concurrent process. I talked about the benefits of each and highlighted the fact that post process and concurrent process benefit from the fastest backup performance since deduplication occurs outside of the primary data path while inline benefits from the smallest possible disk space since undeduplicated data is never written to disk. Now comes NetApp with a whole new take. Their model combines the worst of post process and inline, by requiring a disk holding area and reduced backup performance. After all this time developing the product, this is what they come up with? Hmmm, maybe they should stick to filers.
Read the rest of this entry »

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Deduplication: It’s About Performance

October 23, 2008 – 1:42 pm by JL

I have recently been thinking about the real benefits of deduplication. Although the technology is all about capacity, when you analyze the cost and benefits in the real world, the thing that jumps out at you is performance.

Performance is the key driver in sizing and assessing the number of units required. That means it also drives cost. Deduplication enables longer retention but usually reduces backup and restore performance. For example a 40 TB system can hold 800 TB of data assuming a ratio of 20:1. This is a large number, but it soon becomes clear that the system’s capacity is limited by backup speed. The graph below shows the relationship between data protected and backup window assuming performance of 400 MB/sec.


Click for larger image

Read the rest of this entry »

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Tradeshow giveaway gone bad: the video

October 20, 2008 – 1:30 pm by JL

Tradeshow marketers spend hours trying to scheme up new and unique programs to drive booth traffic and these often include free giveaways. Ironically, the simplest things such as t-shirts or bags can be good traffic generators, and it is amazing that people can get so excited about tchotchkes that cost $2 or less.

One common approach is a two tiered program where you hand out an inexpensive item (like a t-shirt) and tell booth visitors that they must be wearing it to be eligible for a future drawing for a more expensive item. In order for this to work, the vendor must have an ample supply of the initial giveaway and the final item must be of high enough value to encourage participation. As you can imagine, marketers spend a ton of time and money putting together these programs.

Now fast forward to the recent VMWorld show, FalconStor used a two tier program where they offered free t-shirts at their booth and then had a drawing for a Segway scooter. The program stipulated that attendees must be wearing the FalconStor t-shirt at the time of the drawing to be eligible.

Well, in classic case of sales people ignoring the marketing people, the sales folks at the booth picked a winner who was not wearing a t-shirt and decided to give him the Segway anyway. This contradicted the terms of the program and the audience did not react favorably. This is a marketers worst nightmare; their carefully orchestrated program has been ruined and it is clear that many booth visitors left feeling angry. Click more to see the YouTube video which shows what happened; it is quite humorous and makes you wonder “what were they thinking?”
Read the rest of this entry »

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

HIFN – Commoditizing hash-based deduplication?

October 17, 2008 – 3:46 pm by JL

HIFN recently announced a card that accelerates hash-based deduplication. For those unfamiliar with HIFN, they provide infrastructure components that accelerate CPU intensive processes such as compression, encryption and now deduplication. The products are primarily embedded inside appliances, and you may be using one of their products today.

The interesting thing about the HIFN card is that they are positioning it as an all-in-one hash deduplication solution. Here are the key processes that the device performs:

  1. Hash creation
  2. Hash database creation and management
  3. Hash lookups
  4. Write to disk

Read the rest of this entry »

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

How a lack of innovation put Overland under water

October 14, 2008 – 4:56 pm by JL

I wanted to post a quick commentary on Overland Data.

I recently ran across this post over at The Register that discusses the fact that Overland Data is at risk of being delisted from the NASDAQ due to a stock price below $1. (Ticker: OVRL, currently $.45)

In a past life, I sold Overland products and was very familiar with their tape and disk systems. They were one of the first companies to provide a cost effective D2D solution targeted at data protection. In 2003, they unveiled the REO 2000 product and 7 months later, they released the REO 4000 which provided greater capacity and scalability. Overland was on a roll with the new REO appliances, generating industry buzz and excitement while their tape library business remained strong.

Fast forward five years, and Overland’s situation looks bleak. Their D2D products have stagnated and their tape business has collapsed. Along the way, they have made a number of false starts including the purchase of Zetta Systems and the launch of the Ultamus array, which they later silently pulled from the market.

Situations like these make you realize the importance of innovation. Initially, Overland was very successful with their disk products, but were unable to maintain their position. As the market innovated, they did not and their financial and business performance suffered. Their current situation is a reminder that you must innovate or risk suffering a similar fate. Steve Jobs said this eloquently:

Innovation distinguishes between a leader and a follower.

I feel fortunate to be working for a company that has a long history of innovation in data protection and there are more exciting things to come…

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Blog commenting

October 10, 2008 – 3:51 pm by JL

W. Curtis Preston from Glasshouse and the author of the Mr. Backup Blog recently voiced his frustration with certain bloggers censoring visitor comments. He was annoyed that some folks from EMC configured their blogs for comment moderation (all comments must be approved before they appear on the site) and used the power to delete certain responses. He contrasted this to NetApp whose blogs are not moderated. (As a point of clarification, AboutRestore.com’s comments are not moderated; reader comments are posted immediately.) Whether you believe in comment moderation or not, at least these blogs all provide a mechanism for the visitor to respond.
Read the rest of this entry »

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Inline Deduplication: What Your Mother Never Told You

October 7, 2008 – 2:54 pm by JL

I was recently attending a show and enjoyed speaking with a variety of end users with different levels of interest and knowledge. One of the things that I found was that attendees were obsessed with the question of inline vs post process vs concurrent process deduplication. Literally, people would come up and say “Do you do inline or post process dedupe?” This is crazy. Certainly there are differences between the approaches, but the real issue should be about data protection not arcane techno speak.

Before I go into details, let me start with the basics, inline deduplication means that deduplication occurs in the primary data path. No data is written to disk until the deduplication process is complete. The other two approaches post process and concurrent process, first store data on disk and then deduplicate. As the name suggests, post process approaches do not begin the deduplication process until all backups are complete. The concurrent process approach begins deduplication can start before the backups are completed and can backup and deduplicate concurrently. Let’s look at each of these in more detail.
Read the rest of this entry »

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 3.5 out of 5)
Loading ... Loading ...

Tradeshow perspectives

October 3, 2008 – 9:41 am by JL

I spent last week at a tradeshow in New York. These events are interesting because of the various end user perspectives. Those of us in the industry often get embroiled in the minutiae of products and features, and so it is very useful to understand the views of the end users on the show floor. Storage Decisions is a show that prides itself on highly qualified attendees.

One of the most curious things about the show was attendees’ obsession with inline vs post process deduplication. Numerous end users stopped by asking only about when DeltaStor deduplicates data. In the rush of the show, there was little time to discuss the question in much detail. It struck me as odd that these attendees focused on this question which in my opinion is the wrong question to ask. I can only surmise that they had gotten an earful form competing vendors who swore that inline is the best approach.
Read the rest of this entry »

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

AboutRestore.com recognition

October 1, 2008 – 11:45 am by JL

W. Curtis Preston from Glasshouse and author of the Mr. Backup Blog recently posted an article about the blogs that he frequents. I was honored that he recognized AboutRestore.com along with blogs from other major vendors.

Curtis mentioned his frustration with the comment filtering policies on some blogs and I wanted to clarify AboutRestore.com’s policy. (A synopsis of the policy is contained in the disclaimer in the sidebar.) Comments are not moderated; whatever you post appears on the site instantly. I have little interest in censorship; however, I reserve the right to delete comments containing abusive or personal attacks. I hope I never have to use my power of deletion, but as Uncle Ben said to Peter Parker/Spiderman:

With great power comes great responsibility.

Now back to regularly scheduled programming…..

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...