Categories
Backup Restore

SEPATON Performance Revisited

In this post, I highlighted SEPATON’s S2100-ES2 performance both with and without deduplication enabled. In a comment, I had also indicated that we would be adding additional performance information to our website and collateral and  am happy to report that the update is complete.  You can find our deduplication performance numbers on multiple different locations on SEPATON’s website including:

The DeltaStor Datasheet
DeltaStor Page

These documents now highlight per node deduplication performance of 25 TB per node per day.

Happy New Year!

Categories
Backup Restore

SEPATON S2100-ES2 Performance

The SEPATON S2100-ES2 was designed for speed. Our solution is based around the concept of a unified appliance which provides one GUI for managing and monitoring all embedded hardware. We also automate the disk provisioning and configuration to provide consistent scalability and performance. The result is an appliance that can easily be managed by an administrator who understands tape and wants to avoid the traditional complexities of disk.

Our performance is quite simple to understand. We use a grid architecture which means that all nodes can see all storage and can access the same deduplication repository; you can backup and restore from any node. Today we support five nodes with DeltaStor while the VTL supports 16. We will be adding support for larger node counts in the near future. Each node provides an additional 2.2 TB/hr ingest and 25 TB/day deduplication performance. The appliance deduplicates data concurrently that means that backups and dedupe occur simultaneously with no performance impact. Let’s look at the actual numbers.

Categories
Deduplication Restore

Deduplication, Restore Performance and the A-Team

I have posted in the past about the challenges of restoring data from a reverse referenced deduplication solution. In short, the impact can be substantial. You might wonder whether I am the only one pointing out this issue, and what the impact really is.

An EMC blogger recently posted on this topic and provided insights on the reduction in restore performance he sees from both the DL3D and Data Domain.  He said, “I will have to rely on what customers tell me: data reads from a DD [Data Domain] system are typically 25-33% of the speed of data writes.” He then goes on to confirm that “…the DL3D performs very similarly to a Data Domain box”. He is referring to restore performance on deduplicated data in reverse referenced environment. (Both Data Domain and EMC/Quantum rely on reverse referencing.) He recommends that you maintain a cache of undeduplicated on the DL3D to avoid this penalty. Of course, this brings up a range of additional questions such as how much extra storage will the holding area require, how many days should you retain and what does this do to deduplication ratios?

The simplest solution to the above problem is to use forward referencing, but neither DD nor EMC/Quantum support this technology. EMC’s workaround is to force the customer to use more disk to store undeduplicated data which adds to the management burden and cost.

This reminds me of a classic quote from John “Hannibal” Smith from the A-Team:

I love it when a plan comes together!

What more confirmation do you need?

Categories
Backup D2D Restore

The Fallacy of Faster Tape

I often talk about disk-based backup and virtual tape libraries (VTL) and wanted to discuss physical tape. While VTLs are popular these days, tape is still in widespread use. LTO tape, the market share leader, continues to highlight increased density and performance. Do not be fooled with these claims. In the real world faster tape often provides little or no improvement in backup and/or restore performance. Ironically, faster tape increases (not decreases) the need for high performance disk devices like VTLs. Let me explain.

Modern tape drives use a linear technology where the tape head is stationary and the tape moves at high speed above it. Through each generation of LTO, the tape speed is largely unchanged while tape density doubles. At the same time, LTO drives have not expanded their ability to vary the speed of tape. Thus if you go from LTO-3 to LTO-4, you have doubled the density of your tape and you must double the throughput of data handled by the drive to keep tape speed unchanged. Why does tape speed matter? Because LTO tapes have a limited ability to throttle tape speeds, your performance will suffer terribly if you cannot meet the drives minimum streaming requirement.

If you are unable to stream enough data to your tape drives as mentioned above, the tape drive will go into a condition called “shoe shining” where it is constantly stopping and starting. It will try to stop when its buffer empties, but the tape is moving so fast that it will overshoot its stopping point and need to slowly stop, rewind to where it stopped writing and begin writing again. The tape moves forward and backward like shoe shine cloth. This process causes a massive reduction in performance and excessive wear on the tape drives and media. The table below comes from a Quantum whitepaper entitled “When to Choose LTO-3” and highlights the real world performance requirements of LTO-2 and LTO-3. I have estimated LTO-4 requirements for completeness.

Categories
Backup Deduplication Restore

Trials and Tribble-lations of Deduplication

One of my favorite episodes from Star Trek was “Trouble with Tribbles.” In the episode, Uhura adopted a creature called a tribble only to find that it immediately started to reproduce uncontrollably, resulting in an infestation in the Enterprise’s critical business err spaceship systems. You can read a synopsis of the episode here or even better, watch it here. What does this have to do with restoration and deduplication? I’m glad you asked.

As I previously posted, the key driver in sizing deduplication environments and solutions is performance. This is because most solutions are performance constrained by deduplication. Like the tribbles from Star Trek, the risk end-users run is rapid growth in the number of deduplication appliances. It may seem easy to size the environment initially, but what happens if your data growth is faster than expected or stricter SLAs require you to reduce your backup and/or restore windows? The inevitable answer in most cases is more deduplication appliances. All of a sudden what seemed like one cute tribble (err, deduplication appliance) becomes a massive quantity of independent devices with different capacity and performance metrics. This large growth in machines will add complexity to your environment and will dramatically reduce any cost savings that you may have originally expected.

To avoid the above issues, you need to think about your needs not just today but into the future. The ideal solution is to purchase a system today that can meet your needs going forward. This stresses the importance of performance scalability and you must understand how this applies to any given solution.

In the world of Star Trek, Scotty easily beamed the excess tribbles to a nearby Klingon vessel. In the world of the data center, we are not so lucky. Besides who would be the unwilling recipient? Perhaps you could beam them to Data Domain?

Categories
Backup Deduplication Restore

Deduplication: It’s About Performance

I have recently been thinking about the real benefits of deduplication. Although the technology is all about capacity, when you analyze the cost and benefits in the real world, the thing that jumps out at you is performance.

Performance is the key driver in sizing and assessing the number of units required. That means it also drives cost. Deduplication enables longer retention but usually reduces backup and restore performance. For example a 40 TB system can hold 800 TB of data assuming a ratio of 20:1. This is a large number, but it soon becomes clear that the system’s capacity is limited by backup speed. The graph below shows the relationship between data protected and backup window assuming performance of 400 MB/sec.


Click for larger image

Categories
Deduplication Restore

The hidden cost of deduplicated replication

On the surface, the idea of deduplicated replication is compelling. By replicating deltas, the technology sends data across a WAN and dramaically reduces the required bandwidth. Many customers are looking to this technology to allow them to move to a tapeless environment in the future. However, there is a major challenge that most vendors gloss over.

The most common approach to deduplication in use today is hash-based technology which uses reverse referencing. I covered the implications of this approach in another post. To summarize, the issue is that restore performance is impacted as data is retained in a reverse referenced environment. Now let’s look at how this impacts deduplicated replication.

Categories
Restore Virtual Tape

Data protection and natural disasters – Part 2

In part 1, I touched on four of the most common challenges with data restoration in a disaster scenario. In this post, I will review some other key considerations. These examples focus on the infrastructure required after a disaster has occurred.

Categories
Backup Restore

Data protection and natural disasters – Part 1

Hurricane Ike has been in the news lately and my sympathy goes out to all those affected. It is events like these that test IT resiliency. The damage can range from slight to severe and we invest in reliable and robust data protection processes to protect from disasters like this. The unfortunate reality is that, no matter how much you plan for it, the recovery process often takes longer and is more difficult than expected.

In many respects, data protection is an insurance policy. You hate to pay your homeowners premium every month, you do it because you know that it is your only protection if major damage ever happens to your house. In the case of data protection, you invest hours managing your backup environment to enable recovery from incidents like this. The unfortunate reality is that even with the best planning and policies things still may not turn out as expected. Four of the most common pitfalls I hear from customers include:

Categories
Backup D2D Restore Virtual Tape

InformationWeek on NEC HYDRAstor

Howard Marks recently posted an interesting article about NEC’s HYDRAstor over on his blog at InformationWeek. He discusses the product and how the device is targeted at backup and archiving applications. He makes some interesting points and mentions SEPATON. I wanted to respond to some of the points he raised.

…[the system starts with] a 1-accelerator node – 2-storage node system at $180,000…