TSM Target Deduplication: You Get What You Pay For

I was recently pondering TSM’s implementation of target deduplication and decided to review ESG’s Lab Validation report on IBM TSM 6.1. There is quite a bit of good information in the paper, and some really interesting data about TSM’s target deduplication.

Before discussing the results, it is important to understand the testing methodology. Enterprise Strategy Group clearly states that the article was based on “hands-on testing [in IBM’s Tucson, AZ labs], audits of IBM test environments, and detailed discussions with IBM TSM experts.” (page 5) This means that IBM installed and configured the environment and allowed ESG to test the systems and review the results. Clearly, IBM engineers are experts in TSM and so you would assume that any systems provided would be optimally configured for performance and deduplication. The results experienced by ESG are likely the best case scenario since the average customer may not have the flexibility (or knowledge) to configure a similar system. This is not a problem, per se, but readers should keep this in mind.

The whitepaper highlights the data reduction realized using TSM and mentions capacity savings of 19:1. However, if you look carefully, you see that the space savings calculations are based on capacity reduction from TSM’s proprietary progressive incremental technology and deduplication. Progressive incremental technology reduces the amount of storage required by bypassing full backups. The really interesting question is “what additional benefits are gained by using TSM’s target deduplication?” Fortunately, ESG provides an answer.

The idea behind deduplication is that it provides capacity savings by removing redundancies within backup data. Theoretically, IBM should have an advantage deduplicating TSM backups since they are intimately familiar with the application and its data formats. However, ESG’s results do not support this assertion. The paper states, “Data deduplication enhanced data reduction in TSM nearly 50% over progressively incremental backup schemes alone.” (page 8 ) This suggests that IBM’s deduplication provides a 2:1 space savings! Wow, talk about a minimal benefit; you could get close to the same results with hardware compression.

In summary, TSM deduplication appears to provide minimal capacity savings while creating management challenges. As an end user concerned about backup and recovery, you should carefully evaluate your options. The improved manageability, performance and data reduction of dedicated target deduplication appliances like SEPATON’s S2100 product family is a better option for all but the smallest environments. TSM includes target deduplication for free, but remember, in this case, you get what you pay for!

6 replies on “TSM Target Deduplication: You Get What You Pay For”

TSM doesn’t take full backups, which results in much less redundant data as it is. The ROI on TSM over Netbackup or other full/incremental vendors is incredible, and that’s with their progressive incremental only. If the same data isn’t backed up week after week after week, it stands to reason dedupe ratios may be less. Also, the IBM algorithm is the same as most other vendors, so if there was more data to be deduplicated, I would assume it would have done it. It all depends on the type of data, because one product sees changed blocks the same as any other. You can’t trust vendor numbers. My buddy does DM for a huge bank in NY, and they use one of EMC’s products. EMC consistently told them they’d see 50:1, the best he’s every seen is 3:1.

Hi and thank you for your comment. I agree that deduplication results can vary and often are overstated. However, the 2:1 number included in the ESG whitepaper seems low even when including the impact of TSM’s progressive incremental technology.

Regarding ROI, it varies depending on numerous factors. TSM certainly improves backup windows, but that does not mean that it provides an “incredible” ROI. In fact, some argue that the overhead processing associated with progressive incremental backups offsets the ROI benefits.

Strange global chealnning going on here I had the exact same discussion with Chris Mellor at (now) TheReg over the same issue just yesterday.The technology in question was FusionI/O, who must have a corker of a PR agency.I said wait a minute the same reasons people put storage into arrays will be the reason they put something like EFD in an array they want to share it, manage it, protect it, etc. like well storage!Now, if you position the same NAND stuff as cheaper DRAM , sure, it makes sense to put it in the server, where it belongs.Strange to read the same discussion here Chuck

می‌گه:امین:هیچ کس نماد وب فارسی نیست. حالا فرض کن چند تا وبلاگ نوشته بودن، آیا نماد وب فارسی بودن؟به صورت کلی نوشتم، نه خاص. منظورم این بود که این افکار در وب فارسی وجود داره و سعی کردم از کلمه‌هایی مثل «بعضی از ما» یا «گروهی» استفاده کنم. به هر حال عنوان رو هم نمی‌شه خیلی طولانی کرد.ممنون از کامنتت.

Leave a Reply

Your email address will not be published.