StorageMojo recently wrote a blog post discussing the results of a study by twinstrata comparing the costs and availability of Google apps and Microsoft Office/Exchange. The study showed that the Google apps were cheaper than MS Office/Exchange for a 20 person firm and the options were similar for a 50 person company. The challenge in the larger company was the increased cost of data loss and downtime. This is a very informative finding and I wanted to highlight it in the context of data protection.
As I have posted before, IBM/Diligent requires Fibre Channel drives due to the highly I/O intensive nature of their deduplication algorithm. I recently came across a situation that provides an interesting lesson and an important data point for anyone considering IBM/Diligent technology.
A customer was backing up about 25 TB nightly and was searching for a deduplication solution. Most vendors, including IBM/Diligent, initially specified systems in the 40 – 80 TB range using SATA disk drives.
Initial pricing from all vendors was around $500k. However as discussions continued and final performance and capacity metrics were defined, the IBM/Diligent configuration changed dramatically. The system went from 64TB to 400TB resulting in a price increase of over 2x and capacity increase of 6x. The added disk capacity was not due to increased storage requirements (none of the other vendors had changed their configs) but was due to performance requirements. In short, they could not deliver the required performance with 64TB of SATA disk and were forced to include more.
The key takeaway is that if considering IBM/Diligent you must be cognizant of disk configuration. The I/O intensive nature of ProtectTier means that it is highly sensitive to disk technology and so Fibre Channel drives are the standard requirement for Diligent solutions. End users should always request Fibre Channel disk systems for the best performance and SATA configurations must be scrutinized. Appliance-based solutions can help avoid this situation by providing known disk solutions and performance guarantees.
One of the questions I often get asked is “how do your products compare to Data Domain’s?” In my opinion, we really don’t compare because we play in different market segments. Data Domain’s strength is in the low-end of the market, think SMB/SME while SEPATON plays in the enterprise segment. These two segments have very different needs, which are reflected in the fundamentally different architectures of the SEPATON and Data Domain products. Here are some of the key differences to consider.
W. Curtis Preston recently wrote an article on the state of physical tape for SearchDataBackup. He talks about the technologies that backup software vendors have created technology to more effectively stream tape drives. As I posted before, if you cannot stream your tape drives, their performance will decline dramatically.
In enterprise environments, performance is the key driver of data protection. You must ensure that you can backup and recover massive amounts of data in prescribed windows, and tape’s inconsistent performance and complex manageability makes it difficult as a primary backup target. This fact can also make tape a challenging solution in small environments.
The problem with tape drive streaming is a common one and Preston agrees that it is the key reason for adopting disk-based backup technologies. Our customers typically see a dramatic improvement in performance with SEPATON’s VTL solutions since they are no longer limited by the streaming requirements of tape.
Even with new disk and deduplication technologies, most customers are still using tape today and will do so into the future. However, tape will likely be used more for archiving than for secondary storage. Deduplication enables longer retention, but most customers are probably not going to retain more than a year online. Tape is a good medium for deep archive where you store data for years, but is complex and costly as a target for enterprise backup.
Scott from EMC has challenged SEPATON’s advertised performance for backup, deduplication, and restore. As industry analyst, W. Curtis Preston so succinctly put it, “do you really want to start a ‘we have better performance than you’ blog war with one of the products that has clustered dedupe?” However, I wanted to clarify the situation in this post.
Let me answer the questions specifically:
1. The performance data you refer to with the link in his post three words in is both four months old, and actually no data at all.
SEPATON customers want to know how much data they can backup and deduplicate in a given day. That is what is important in a real life usage of the product. The answer is 25 TB per day per node. If a customer has five nodes and a twenty-four hour day, that’s 125 TB of data backed up and deduplicated. This information has been true and accurate for four months and is still true today.
In this post, I highlighted SEPATON’s S2100-ES2 performance both with and without deduplication enabled. In a comment, I had also indicated that we would be adding additional performance information to our website and collateral and am happy to report that the update is complete. You can find our deduplication performance numbers on multiple different locations on SEPATON’s website including:
These documents now highlight per node deduplication performance of 25 TB per node per day.
Happy New Year!
The SEPATON S2100-ES2 was designed for speed. Our solution is based around the concept of a unified appliance which provides one GUI for managing and monitoring all embedded hardware. We also automate the disk provisioning and configuration to provide consistent scalability and performance. The result is an appliance that can easily be managed by an administrator who understands tape and wants to avoid the traditional complexities of disk.
Our performance is quite simple to understand. We use a grid architecture which means that all nodes can see all storage and can access the same deduplication repository; you can backup and restore from any node. Today we support five nodes with DeltaStor while the VTL supports 16. We will be adding support for larger node counts in the near future. Each node provides an additional 2.2 TB/hr ingest and 25 TB/day deduplication performance. The appliance deduplicates data concurrently that means that backups and dedupe occur simultaneously with no performance impact. Let’s look at the actual numbers.
I often talk about disk-based backup and virtual tape libraries (VTL) and wanted to discuss physical tape. While VTLs are popular these days, tape is still in widespread use. LTO tape, the market share leader, continues to highlight increased density and performance. Do not be fooled with these claims. In the real world faster tape often provides little or no improvement in backup and/or restore performance. Ironically, faster tape increases (not decreases) the need for high performance disk devices like VTLs. Let me explain.
Modern tape drives use a linear technology where the tape head is stationary and the tape moves at high speed above it. Through each generation of LTO, the tape speed is largely unchanged while tape density doubles. At the same time, LTO drives have not expanded their ability to vary the speed of tape. Thus if you go from LTO-3 to LTO-4, you have doubled the density of your tape and you must double the throughput of data handled by the drive to keep tape speed unchanged. Why does tape speed matter? Because LTO tapes have a limited ability to throttle tape speeds, your performance will suffer terribly if you cannot meet the drives minimum streaming requirement.
If you are unable to stream enough data to your tape drives as mentioned above, the tape drive will go into a condition called “shoe shining” where it is constantly stopping and starting. It will try to stop when its buffer empties, but the tape is moving so fast that it will overshoot its stopping point and need to slowly stop, rewind to where it stopped writing and begin writing again. The tape moves forward and backward like shoe shine cloth. This process causes a massive reduction in performance and excessive wear on the tape drives and media. The table below comes from a Quantum whitepaper entitled “When to Choose LTO-3” and highlights the real world performance requirements of LTO-2 and LTO-3. I have estimated LTO-4 requirements for completeness.
One of my favorite episodes from Star Trek was “Trouble with Tribbles.” In the episode, Uhura adopted a creature called a tribble only to find that it immediately started to reproduce uncontrollably, resulting in an infestation in the Enterprise’s critical business err spaceship systems. You can read a synopsis of the episode here or even better, watch it here. What does this have to do with restoration and deduplication? I’m glad you asked.
As I previously posted, the key driver in sizing deduplication environments and solutions is performance. This is because most solutions are performance constrained by deduplication. Like the tribbles from Star Trek, the risk end-users run is rapid growth in the number of deduplication appliances. It may seem easy to size the environment initially, but what happens if your data growth is faster than expected or stricter SLAs require you to reduce your backup and/or restore windows? The inevitable answer in most cases is more deduplication appliances. All of a sudden what seemed like one cute tribble (err, deduplication appliance) becomes a massive quantity of independent devices with different capacity and performance metrics. This large growth in machines will add complexity to your environment and will dramatically reduce any cost savings that you may have originally expected.
To avoid the above issues, you need to think about your needs not just today but into the future. The ideal solution is to purchase a system today that can meet your needs going forward. This stresses the importance of performance scalability and you must understand how this applies to any given solution.
In the world of Star Trek, Scotty easily beamed the excess tribbles to a nearby Klingon vessel. In the world of the data center, we are not so lucky. Besides who would be the unwilling recipient? Perhaps you could beam them to Data Domain?
I have recently been thinking about the real benefits of deduplication. Although the technology is all about capacity, when you analyze the cost and benefits in the real world, the thing that jumps out at you is performance.
Performance is the key driver in sizing and assessing the number of units required. That means it also drives cost. Deduplication enables longer retention but usually reduces backup and restore performance. For example a 40 TB system can hold 800 TB of data assuming a ratio of 20:1. This is a large number, but it soon becomes clear that the system’s capacity is limited by backup speed. The graph below shows the relationship between data protected and backup window assuming performance of 400 MB/sec.