Categories
Deduplication Virtual Tape

Choosing a Data Protection Solution in a Down Economy

I hate to turn on the TV these days because it is full of bad news. There always seems to be some pundit talking about troubles in the housing market, credit markets, automotive industry, consumer confidence and so many other areas. It does not take a rocket scientist to recognize that the economy is in tough shape right now. As a reader of this blog, you are likely feeling some of the pain in your budget. This obviously brings up an important question: how do I justify IT purchases in these environments.

In situations like these, IT departments must go back to the basics. Purchases must be all about ROI. You must look beyond just acquisition cost and consider how a given solution can save your organization money both upon acquisition and into the future.

Categories
Backup Restore

SEPATON S2100-ES2 Performance

The SEPATON S2100-ES2 was designed for speed. Our solution is based around the concept of a unified appliance which provides one GUI for managing and monitoring all embedded hardware. We also automate the disk provisioning and configuration to provide consistent scalability and performance. The result is an appliance that can easily be managed by an administrator who understands tape and wants to avoid the traditional complexities of disk.

Our performance is quite simple to understand. We use a grid architecture which means that all nodes can see all storage and can access the same deduplication repository; you can backup and restore from any node. Today we support five nodes with DeltaStor while the VTL supports 16. We will be adding support for larger node counts in the near future. Each node provides an additional 2.2 TB/hr ingest and 25 TB/day deduplication performance. The appliance deduplicates data concurrently that means that backups and dedupe occur simultaneously with no performance impact. Let’s look at the actual numbers.

Categories
Deduplication Restore

Deduplication, Restore Performance and the A-Team

I have posted in the past about the challenges of restoring data from a reverse referenced deduplication solution. In short, the impact can be substantial. You might wonder whether I am the only one pointing out this issue, and what the impact really is.

An EMC blogger recently posted on this topic and provided insights on the reduction in restore performance he sees from both the DL3D and Data Domain.  He said, “I will have to rely on what customers tell me: data reads from a DD [Data Domain] system are typically 25-33% of the speed of data writes.” He then goes on to confirm that “…the DL3D performs very similarly to a Data Domain box”. He is referring to restore performance on deduplicated data in reverse referenced environment. (Both Data Domain and EMC/Quantum rely on reverse referencing.) He recommends that you maintain a cache of undeduplicated on the DL3D to avoid this penalty. Of course, this brings up a range of additional questions such as how much extra storage will the holding area require, how many days should you retain and what does this do to deduplication ratios?

The simplest solution to the above problem is to use forward referencing, but neither DD nor EMC/Quantum support this technology. EMC’s workaround is to force the customer to use more disk to store undeduplicated data which adds to the management burden and cost.

This reminds me of a classic quote from John “Hannibal” Smith from the A-Team:

I love it when a plan comes together!

What more confirmation do you need?

Categories
Backup D2D Restore

The Fallacy of Faster Tape

I often talk about disk-based backup and virtual tape libraries (VTL) and wanted to discuss physical tape. While VTLs are popular these days, tape is still in widespread use. LTO tape, the market share leader, continues to highlight increased density and performance. Do not be fooled with these claims. In the real world faster tape often provides little or no improvement in backup and/or restore performance. Ironically, faster tape increases (not decreases) the need for high performance disk devices like VTLs. Let me explain.

Modern tape drives use a linear technology where the tape head is stationary and the tape moves at high speed above it. Through each generation of LTO, the tape speed is largely unchanged while tape density doubles. At the same time, LTO drives have not expanded their ability to vary the speed of tape. Thus if you go from LTO-3 to LTO-4, you have doubled the density of your tape and you must double the throughput of data handled by the drive to keep tape speed unchanged. Why does tape speed matter? Because LTO tapes have a limited ability to throttle tape speeds, your performance will suffer terribly if you cannot meet the drives minimum streaming requirement.

If you are unable to stream enough data to your tape drives as mentioned above, the tape drive will go into a condition called “shoe shining” where it is constantly stopping and starting. It will try to stop when its buffer empties, but the tape is moving so fast that it will overshoot its stopping point and need to slowly stop, rewind to where it stopped writing and begin writing again. The tape moves forward and backward like shoe shine cloth. This process causes a massive reduction in performance and excessive wear on the tape drives and media. The table below comes from a Quantum whitepaper entitled “When to Choose LTO-3” and highlights the real world performance requirements of LTO-2 and LTO-3. I have estimated LTO-4 requirements for completeness.

Categories
Backup Deduplication Restore

Trials and Tribble-lations of Deduplication

One of my favorite episodes from Star Trek was “Trouble with Tribbles.” In the episode, Uhura adopted a creature called a tribble only to find that it immediately started to reproduce uncontrollably, resulting in an infestation in the Enterprise’s critical business err spaceship systems. You can read a synopsis of the episode here or even better, watch it here. What does this have to do with restoration and deduplication? I’m glad you asked.

As I previously posted, the key driver in sizing deduplication environments and solutions is performance. This is because most solutions are performance constrained by deduplication. Like the tribbles from Star Trek, the risk end-users run is rapid growth in the number of deduplication appliances. It may seem easy to size the environment initially, but what happens if your data growth is faster than expected or stricter SLAs require you to reduce your backup and/or restore windows? The inevitable answer in most cases is more deduplication appliances. All of a sudden what seemed like one cute tribble (err, deduplication appliance) becomes a massive quantity of independent devices with different capacity and performance metrics. This large growth in machines will add complexity to your environment and will dramatically reduce any cost savings that you may have originally expected.

To avoid the above issues, you need to think about your needs not just today but into the future. The ideal solution is to purchase a system today that can meet your needs going forward. This stresses the importance of performance scalability and you must understand how this applies to any given solution.

In the world of Star Trek, Scotty easily beamed the excess tribbles to a nearby Klingon vessel. In the world of the data center, we are not so lucky. Besides who would be the unwilling recipient? Perhaps you could beam them to Data Domain?

Categories
Deduplication Virtual Tape

NetApp Dedupe: The Worst of Inline and Post-process Deduplication

NetApp finally entered the world of deduplication in data protection. While they have supported a flavor of the technology in their filers since May 2007, they have never launched the technology for their VTL. Why? Because their VTL does not use any of the core filer IP. It relies on an entirely separate software architecture that they acquired from Alacritus. Thus all the features of ONTAP do not apply to their VTL. However, I digress from the topic at hand.

I posted recently about three different approaches to deduplication timing: inline, post process and concurrent process. I talked about the benefits of each and highlighted the fact that post process and concurrent process benefit from the fastest backup performance since deduplication occurs outside of the primary data path while inline benefits from the smallest possible disk space since undeduplicated data is never written to disk. Now comes NetApp with a whole new take. Their model combines the worst of post process and inline, by requiring a disk holding area and reduced backup performance. After all this time developing the product, this is what they come up with? Hmmm, maybe they should stick to filers.

Categories
Backup Deduplication Restore

Deduplication: It’s About Performance

I have recently been thinking about the real benefits of deduplication. Although the technology is all about capacity, when you analyze the cost and benefits in the real world, the thing that jumps out at you is performance.

Performance is the key driver in sizing and assessing the number of units required. That means it also drives cost. Deduplication enables longer retention but usually reduces backup and restore performance. For example a 40 TB system can hold 800 TB of data assuming a ratio of 20:1. This is a large number, but it soon becomes clear that the system’s capacity is limited by backup speed. The graph below shows the relationship between data protected and backup window assuming performance of 400 MB/sec.


Click for larger image

Categories
Marketing

Tradeshow giveaway gone bad: the video

Tradeshow marketers spend hours trying to scheme up new and unique programs to drive booth traffic and these often include free giveaways. Ironically, the simplest things such as t-shirts or bags can be good traffic generators, and it is amazing that people can get so excited about tchotchkes that cost $2 or less.

One common approach is a two tiered program where you hand out an inexpensive item (like a t-shirt) and tell booth visitors that they must be wearing it to be eligible for a future drawing for a more expensive item. In order for this to work, the vendor must have an ample supply of the initial giveaway and the final item must be of high enough value to encourage participation. As you can imagine, marketers spend a ton of time and money putting together these programs.

Now fast forward to the recent VMWorld show, FalconStor used a two tier program where they offered free t-shirts at their booth and then had a drawing for a Segway scooter. The program stipulated that attendees must be wearing the FalconStor t-shirt at the time of the drawing to be eligible.

Well, in classic case of sales people ignoring the marketing people, the sales folks at the booth picked a winner who was not wearing a t-shirt and decided to give him the Segway anyway. This contradicted the terms of the program and the audience did not react favorably. This is a marketers worst nightmare; their carefully orchestrated program has been ruined and it is clear that many booth visitors left feeling angry. Click more to see the YouTube video which shows what happened; it is quite humorous and makes you wonder “what were they thinking?”

Categories
Deduplication

HIFN – Commoditizing hash-based deduplication?

HIFN recently announced a card that accelerates hash-based deduplication. For those unfamiliar with HIFN, they provide infrastructure components that accelerate CPU intensive processes such as compression, encryption and now deduplication. The products are primarily embedded inside appliances, and you may be using one of their products today.

The interesting thing about the HIFN card is that they are positioning it as an all-in-one hash deduplication solution. Here are the key processes that the device performs:

  1. Hash creation
  2. Hash database creation and management
  3. Hash lookups
  4. Write to disk
Categories
General Marketing

How a lack of innovation put Overland under water

I wanted to post a quick commentary on Overland Data.

I recently ran across this post over at The Register that discusses the fact that Overland Data is at risk of being delisted from the NASDAQ due to a stock price below $1. (Ticker: OVRL, currently $.45)

In a past life, I sold Overland products and was very familiar with their tape and disk systems. They were one of the first companies to provide a cost effective D2D solution targeted at data protection. In 2003, they unveiled the REO 2000 product and 7 months later, they released the REO 4000 which provided greater capacity and scalability. Overland was on a roll with the new REO appliances, generating industry buzz and excitement while their tape library business remained strong.

Fast forward five years, and Overland’s situation looks bleak. Their D2D products have stagnated and their tape business has collapsed. Along the way, they have made a number of false starts including the purchase of Zetta Systems and the launch of the Ultamus array, which they later silently pulled from the market.

Situations like these make you realize the importance of innovation. Initially, Overland was very successful with their disk products, but were unable to maintain their position. As the market innovated, they did not and their financial and business performance suffered. Their current situation is a reminder that you must innovate or risk suffering a similar fate. Steve Jobs said this eloquently:

Innovation distinguishes between a leader and a follower.

I feel fortunate to be working for a company that has a long history of innovation in data protection and there are more exciting things to come…