Categories
Deduplication

Software and Hardware Deduplication

CA recently announced the addition of deduplication to ARCserve. Every time an ISV releases deduplication technology, I get inundated with questions about hardware (e.g. appliance-based) vs software (e.g. software-only where separate hardware is required) deduplication. In this post, I will discuss the difference between these two models when using target-based deduplication. (e.g. deduplication happens at the media server or virtual tape appliance.) Client-based deduplication (e.g. deduplication happens at the client) is another option offered by some vendors and will be covered in another post.

Most backup software ISVs offer target-based deduplication in one form or another. In some cases, it is an extra application like PureDisk from Symantec and in other cases it is a plugin like CommVault, ITSM or the new ARCserve release. In all cases, it is packaged as a software option and does not include server or storage infrastructure. Contrast this with appliance-based solution like those from SEPATON that include hardware and storage.

Categories
Backup Deduplication Virtual Tape

War Stories: Diligent

As I have posted before, IBM/Diligent requires Fibre Channel drives due to the highly I/O intensive nature of their deduplication algorithm. I recently came across a situation that provides an interesting lesson and an important data point for anyone considering IBM/Diligent technology.

A customer was backing up about 25 TB nightly and was searching for a deduplication solution. Most vendors, including IBM/Diligent, initially specified systems in the 40 – 80 TB range using SATA disk drives.

Initial pricing from all vendors was around $500k. However as discussions continued and final performance and capacity metrics were defined, the IBM/Diligent configuration changed dramatically. The system went from 64TB to 400TB resulting in a price increase of over 2x and capacity increase of 6x. The added disk capacity was not due to increased storage requirements (none of the other vendors had changed their configs) but was due to performance requirements. In short, they could not deliver the required performance with 64TB of SATA disk and were forced to include more.

The key takeaway is that if considering IBM/Diligent you must be cognizant of disk configuration. The I/O intensive nature of ProtectTier means that it is highly sensitive to disk technology and so Fibre Channel drives are the standard requirement for Diligent solutions. End users should always request Fibre Channel disk systems for the best performance and SATA configurations must be scrutinized. Appliance-based solutions can help avoid this situation by providing known disk solutions and performance guarantees.

Categories
Marketing

Now on Twitter

For those of you unfamiliar, Twitter is a micro-blogging application with posts of 140 words or less. It is a great forum for discussion and interaction with others in the industry. I particularly enjoy the real-time updates from various analysts and members of the press. You can view my profile here and see my posts in my new sidebar widget.

I am curious about Twitter usage and created the small survey below.

Do you use Twitter?

View Results

Loading ... Loading ...
Categories
Backup Deduplication Restore

SEPATON Versus Data Domain

One of the questions I often get asked is “how do your products compare to Data Domain’s?” In my opinion, we really don’t compare because we play in different market segments. Data Domain’s strength is in the low-end of the market, think SMB/SME while SEPATON plays in the enterprise segment. These two segments have very different needs, which are reflected in the fundamentally different architectures of the SEPATON and Data Domain products. Here are some of the key differences to consider.

Categories
Backup Deduplication Restore

W. Curtis Preston on physical tape

W. Curtis Preston recently wrote an article on the state of physical tape for SearchDataBackup. He talks about the technologies that backup software vendors have created technology to more effectively stream tape drives. As I posted before, if you cannot stream your tape drives, their performance will decline dramatically.

In enterprise environments, performance is the key driver of data protection. You must ensure that you can backup and recover massive amounts of data in prescribed windows, and tape’s inconsistent performance and complex manageability makes it difficult as a primary backup target. This fact can also make tape a challenging solution in small environments.

The problem with tape drive streaming is a common one and Preston agrees that it is the key reason for adopting disk-based backup technologies. Our customers typically see a dramatic improvement in performance with SEPATON’s VTL solutions since they are no longer limited by the streaming requirements of tape.

Even with new disk and deduplication technologies, most customers are still using tape today and will do so into the future. However, tape will likely be used more for archiving than for secondary storage.  Deduplication enables longer retention, but most customers are probably not going to retain more than a year online. Tape is a good medium for deep archive where you store data for years, but is complex and costly as a target for enterprise backup.

Categories
Deduplication Replication

Recent Comment

Recently an end user commented about how the replication performance on his DL3D 1500 was less than expected. As he retained more data online, his replication speed decreased substantially and EMC support responded that this is normal behavior. This is a major challenge since slow replication times increase replication windows and can make DR goals unachievable.

The key takeaway from the comment is that testing is vital. When considering any deduplication solution, you must thoroughly review it with limited and extended retention. In this case, the degradation appeared when data was retained and would not have been found if the solution was tested with limited retention. The key elements you should test include:

  1. Backup performance
    1. On the first backup
    2. With retention
  2. Restore performance
    1. On the first backup
    2. With retention
  3. Replication performance
    1. On the first backup
    2. With retention
Categories
Deduplication

Data Domain Announcement

Data Domain recently announced that their new OS release dramatically improved appliance performance. On the surface, the announcement seems compelling, but upon further review, it creates a number of questions.

Performance Improvement
Deduplication software such as Data Domain’s is complex and can contain hundreds of thousands of interrelated lines of code. As products mature, companies will fine tune and improve their code for greater efficiency and performance. You would expect to see performance improvements from these changes of about 20-30%. Clearly, if an application is highly inefficiently coded, you will see greater performance gains. However, larger improvements like those quoted in the release are usually only achieved with major product architecture updates and coincide with a major new software release.

In this case, I am not suggesting that Data Domain’s software is bad, but rather that the stated performance improvement is suspect. They positioned this as a dot code release and so it is not a major product re-architecture. Additionally, if it was a major architecture update, they would have highlighted it in the release.

To summarize, the stated performance gains in the release are too large to attribute to a simple code tweak and I believe that the gains are only attainable in very specific circumstances. Data Domain appears to have optimized their appliances for Symantec’s OST and is trumpeting their performance gains. However, OST represents only a small fraction of Data Domain’s customer base and it seems that customers using non-Symantec backup apps will see uncertain performance improvements. Read on to learn more.

Categories
Deduplication Restore

Restore Performance

Scott from EMC posted about the EMC DL3D 4000 today. He was responding to some questions by W. Curtis Preston regarding the product and GA. I am not going to go into detail about the post, but wanted to clarify one point. He says:

Restores from this [undeduplicated data] pool can be accomplished at up to 1,600 MB/s. Far faster than pretty much any other solution available today, from anybody. At 6 TB an hour, that is certainly much faster than any deduplication solution.
(Text in brackets added by me for clarification)

As recently discussed in this post, SEPATON restores data at up to 3,000 MB/sec (11.0 TB/hr) both with deduplication enabled and disabled. Scott insinuates that only EMC is capable of the performance he mentions and I wanted to clarify for the record that SEPATON is almost twice as fast as the fastest EMC system.

Categories
Backup Deduplication Restore Uncategorized Virtual Tape

SEPATON Performance — Again

Scott from EMC has challenged SEPATON’s advertised performance for backup, deduplication, and restore. As industry analyst, W. Curtis Preston so succinctly put it, “do you really want to start a ‘we have better performance than you’ blog war with one of the products that has clustered dedupe?” However, I wanted to clarify the situation in this post.

Let me answer the questions specifically:

1. The performance data you refer to with the link in his post three words in is both four months old, and actually no data at all.

SEPATON customers want to know how much data they can backup and deduplicate in a given day. That is what is important in a real life usage of the product. The answer is 25 TB per day per node. If a customer has five nodes and a twenty-four hour day, that’s 125 TB of data backed up and deduplicated. This information has been true and accurate for four months and is still true today.

Categories
General Marketing

W. Curtis Preston Now with TechTarget

About a week ago, Curtis posted on his blog that he is joining TechTarget as an Executive Editor which essentially means that he will continue to present at various events. He is still an independent consultant and can keep working on his other projects including his Mr. Backup Blog and BackupCentral.

In my opinion, this is a great outcome for both TechTarget and Curtis. The Backup/Deduplication schools will benefit from Curtis’s continued tenure as a featured speaker. He is an engaging presenter and provides a balanced perspective. It is also beneficial for Curtis because he is free to pursue his personal and business interests.

A big congratulations to both TechTarget and Curtis!