TSM Target Deduplication: You Get What You Pay For

I was recently pondering TSM’s implementation of target deduplication and decided to review ESG’s Lab Validation report on IBM TSM 6.1. There is quite a bit of good information in the paper, and some really interesting data about TSM’s target deduplication.

Before discussing the results, it is important to understand the testing methodology. Enterprise Strategy Group clearly states that the article was based on “hands-on testing [in IBM’s Tucson, AZ labs], audits of IBM test environments, and detailed discussions with IBM TSM experts.” (page 5) This means that IBM installed and configured the environment and allowed ESG to test the systems and review the results. Clearly, IBM engineers are experts in TSM and so you would assume that any systems provided would be optimally configured for performance and deduplication. The results experienced by ESG are likely the best case scenario since the average customer may not have the flexibility (or knowledge) to configure a similar system. This is not a problem, per se, but readers should keep this in mind.


TSM and Deduplication: 4 Reasons Why TSM Deduplication Ratios Suffer

TSM presents unique deduplication challenges due to its progressive incremental backup strategy and architectural design. This contrasts with the traditional full/incremental model used by competing backup software vendors. The result is that TSM users will see smaller deduplication ratios than their counterparts using NetBackup, NetWorker or Data Protector. This post explores four key reasons why TSM is difficult to deduplicate.


TSM Deduplication

IBM recently announced the addition of deduplication technology to their Tivoli Storage Manager (ITSM) backup application. ITSM is a powerful application that uses a progressive incremental approach to data protection that is completely different from most other backup applications. The addition of deduplication to ITSM provides a benefit in disk space utilization, but also creates some new challenges.

The first challenge for many TSM environments is that administrators are already over-burdened with having to manage numerous discrete processes to ensure that backup operations are meeting their business requirements. The deduplication functionality within ITSM adds another process to an already complex backup environment. In addition to scheduling and managing processes such as reclamation, migration and expiration as part of daily operations, administrators now have to manage deduplication as well. This management may involve activities as disparate as capacity planning, fine-tuning, and system optimization. The alternative is to use a VTL-based deduplication solution like a SEPATON® S2100®-ES2 VTL with DeltaStor® software, which will provide deduplication benefits without having to create and manage a new process.

Backup Deduplication Restore Virtual Tape

DeltaStor Deduplication

I track numerous sites focused on data protection and storage and am always interested in what other industry participants are discussing.  The blogroll on this site includes links to some informative and interesting blogs.  However, as with anything else on the web, everyone brings their view of the world which inevitably influences their perspective.

This brings me to my next point; I recently ran across a blog post where an EMC guy bad mouths SEPATON’s DeltaStor technology which is used in HP’s VLS platform.  The obvious conclusion is “This is an EMC guy so what do you expect?” which is true.  However, I think it deserves a brief response and wanted to respond to his his major point.  To summarize, he is commenting on the fact that DeltaStor requires an understanding of each supported backup application and data type and that this makes testing difficult.  Here is an excerpt:

Given that there are at least 5 or 6 common backup applications, each with at least 2 or 3 currently supported versions, and probably 10 or 15 applications, each with at least 2 or 3 currently supported version, the number of combinations approaches a million pretty rapidly

This is a tremendous overstatement.  HP’s (and SEPATON’s) implementation of DeltaStor is targeted at the enterprise.  If you look at enterprise datacenters there is actually a very small number of backup applications in use.  You will typically see NetBackup, TSM and much less frequently Legato; this narrows the scope of testing substantially.

This is not the case in small environments where you see many more applications such as BackupExec, ARCserve and others which is why HP is selling a backup application agnostic product for these environments.  (SEPATON is focused on the enterprise and so our solution is not targeted here.)

He then talks about how there are many different versions of supported backup applications and many different application modules.  This is true; however it is misleading.  The power of backup applications is their ability to maintain tape compatibility across versions.  This feature means that tape and file formats change infrequently.  In the case of NetBackup, it has not changed substantially since 4.5.  The result is that qualification requirements for a new application are typically minimal.  (Clearly, if the tape format radically changes, the process is more involved.)  The situation is similar with backup application modules.

The final question in his entry is:

do you want to buy into an architecture that severely limits what you can and cannot deduplicate? Or do you want an architecture that can deduplicate anything?

I would suggest an alternative question: do you want a generic deduplication solution that supports all applications reduces your performance by 90% (200 MB/sec vs 2200 MB/sec) and provides mediocre deduplication ratios or do you want an enterprise focused solution that provides the fastest performance, most scalability, most granular deduplication ratios and is optimized for your backup application??

You decide…

Oh as an afterthought, stay tuned to this blog if you are interested in more detail on the first part of my question above.