TSM Deduplication

IBM recently announced the addition of deduplication technology to their Tivoli Storage Manager (ITSM) backup application. ITSM is a powerful application that uses a progressive incremental approach to data protection that is completely different from most other backup applications. The addition of deduplication to ITSM provides a benefit in disk space utilization, but also creates some new challenges.

The first challenge for many TSM environments is that administrators are already over-burdened with having to manage numerous discrete processes to ensure that backup operations are meeting their business requirements. The deduplication functionality within ITSM adds another process to an already complex backup environment. In addition to scheduling and managing processes such as reclamation, migration and expiration as part of daily operations, administrators now have to manage deduplication as well. This management may involve activities as disparate as capacity planning, fine-tuning, and system optimization. The alternative is to use a VTL-based deduplication solution like a SEPATON® S2100®-ES2 VTL with DeltaStor® software, which will provide deduplication benefits without having to create and manage a new process.


SEPATON Announcement

SEPATON recently announced fourth quarter results in this release. I am not going to repeat the content here, but wanted to highlight that the company had a record Q4 and achieved an important milestone.

I am excited about the prospects for SEPATON in 2009. Don’t get me wrong, 2009 is likely to be a tough year for all vendors, but those companies with compelling products and value propositions will fare better in these difficult times. SEPATON is uniquely positioned with our core focus on Scale-Out Deduplication™ for the enterprise.


IBM Deduplication Appliances

I have been on hiatus as of late and apologize for my tardiness in blogging.

IBM released their new deduplication applications based on the technology they acquired from Diligent. At first glance, it might appear that this could be a competitive alternative to SEPATON, but when you look at it, it quickly becomes apparent that this is not the case.

IBM previously sold one product, TS7650G gateway which they now target at the enterprise. The new appliance products use similar server hardware and a de-featured version of the DS4700 disk array. As with all Diligent installations, the solution uses Fibre Channel drives that reduce density and add cost. They will never be price leaders. The configurations are as follows:

Capacity Nodes
7 TB One
18 TB One
36 TB One
36 TB Two

You can’t move beyond the configurations listed above. If you want to grow the system beyond 36 TB, you are out of luck. Your only choice is a forklift upgrade to the TS7650G gateway. What if you want dual nodes and less than 36TB? Same answer. How about replication? Same answer. (That is, if you can consider the array-based approach in the TS7650G a realistic replication option.)

The ultimate irony is that by creating appliance VTLs, IBM has actually made their customers’ lives more difficult. Customers now have to choose whether to purchase a gateway (which adds complexity and cost) or a simple bounded appliance (which has limited configurations). Why should a customer have to make this trade-off? Why not offer an appliance that is simple, cost-effective AND scalable? Well, the simple answer to the question is to get a SEPATON S2100-ES2!

Deduplication Virtual Tape

Customer perspectives on SEPATON, IBM and Data Domain

SEPATON issued a press release on Monday that is worth mentioning here on the blog. SearchStorage also published a related article here. The release highlights MultiCare a SEPATON customer that uses DeltaStor deduplication software in a two-node VTL.

In the release, the customer characterizes their testing of solutions from Diligent/IBM (now IBM TS7650G) and Data Domain. Specifically, they mention that the TS7650G was difficult to configure and get running and that the gateway head nature of the product also made it difficult for them to scale capacity. These difficulties illustrate the challenges of implementing the TS7650G’s head only design. With this solution, the burden of integrating and managing the deduplication software and disk subsystem falls on the end user. Contrast this with a SEPATON appliance that manages the entire device in a fully integrated, completely automated fashion.

They had a typical Data Domain experience. That is, their initial purchase looked simple and cost effective but rapidly become complex and costly. In this case, MultiCare hit the Data Domain scalability wall, requiring them to purchase multiple separate units. The result is that MultiCare had to perform two costly upgrades and had to rip and replace their Data Domain solutions with newer, faster units. Scalability is the challenge with Data Domain solutions and it is not uncommon for customers to purchase one unit to meet their initial needs and then be forced to add additional units or perform a forklift upgrade.

As MultiCare found, customers must thoroughly understand their requirements when considering deduplication solutions. They tested the head-only approach and found it to be too complex to operate and manage to meet their needs. They tried the small appliance approach and found that they outgrew their initial system and were forced to pursue costly upgrades. In the end, they recognized that the best solution for their environment was a highly scalable S2100-ES2 solution which provided the performance and scalability that could not be achieved with either the TS7650G or Data Domain.

Backup Restore

SEPATON Performance Revisited

In this post, I highlighted SEPATON’s S2100-ES2 performance both with and without deduplication enabled. In a comment, I had also indicated that we would be adding additional performance information to our website and collateral and  am happy to report that the update is complete.  You can find our deduplication performance numbers on multiple different locations on SEPATON’s website including:

The DeltaStor Datasheet
DeltaStor Page

These documents now highlight per node deduplication performance of 25 TB per node per day.

Happy New Year!


RIP: Storage Magazine

I was disappointed to see this announcement from TechTarget. As one would expect, their actions are in response to the current economic situation. I can understand the cutbacks in headcount, but I am disappointed with the cancellation of the print version of Storage Magazine.

Storage is one of the best publications focused on the storage and data protection industry. The periodical contains high quality commentary from numerous industry pundits and provides an opportunity to advertise in print to a targeted audience. Personally, I think that it provided strong value on both fronts.

Those from Tech Target will argue that the content will still be available on the Web, a media that is more in line with their reader requirements. I disagree, I find strong value in physical print. The ability to take the magazine with me and read it on a plane or during a moment of downtime is valuable and that experience is not duplicated on the web. Yes, I have a “smart phone”, but the experience is still is not the same. Additionally, I believe that advertising in a physical magazine is very different from advertising on the Web. Sadly the print option is no longer available.

In short, these are tough times in the economy and I am saddened that the printed Storage has been canned. It was a great publication and its value (at least to me) will decline when it goes web-only. Of course, Tech Target needs to manage its own business and this was a business decision, , Personally, I think that it is a mistake.


TS7650G and Fibre Channel Drives

IBM/Diligent TS7650G uses a pattern matching approach to deduplication, which is different from the hash-based solutions used by many vendors or the ContentAwareTM approach pioneered by SEPATON.

Diligent’s technology requires Fibre Channel (FC) drives for the best performance because pattern matching is highly I/O intensive and needs the additional I/O from FC drives. FC drives in turn, negatively affect disk density, require more power and dramatically increase the price of the system.

The pattern matching technology used in the TS7650G is an inline process. Therefore, all duplicate data has to be identified before data is committed to disk. Pattern matching only provides an approximate match on redundant data and requires a byte-level compare to verify the redundancy. All byte-level compares must be completed before any data is written to disk and the next piece of data accepted. FC drives are required because they provide the random I/O performance needed to handle inline byte-level comparisons. Diligent specified a 110 disk FC array for the ESG performance whitepaper that they sponsored back in July of 2006. (Local copy of the ESG whitepaper.)  This is not to say that the algorithm will not work with SATA, but these drives will dramatically reduce performance.

If you are considering the TS7650G, you must carefully consider the associated disk sub-system. It is not clear what disk system and capacity was used when IBM/Diligent generated their performance specifications. As part of the evaluation you should also test single stream and aggregate backup performance because as previously mentioned single stream performance may be a challenge.

Deduplication Virtual Tape

Falconstor, SIR and OEMs

This article on highlights enhancements to FalconStor’s SIR deduplication platform, but I have to wonder whether anyone cares. FalconStor was a big player in providing VTL software to OEMs; but their deduplication software has been largely ignored.

FalconStor had their heyday in VTL. They aggressively pursued OEM deals with large vendors including EMC, IBM, and Sun. EMC was the most successful with their EDL family of products. As the market moved to deduplication, you would think that FalconStor would be the default OEM supplier of deduplication software as well. You would be wrong.

Ironically, FalconStor’s VTL success was their downfall in deduplication. Their OEMs realized that they were all selling the same VTL software and did not want to repeat the situation with deduplication.  EMC and IBM, have already announced that they are using alternative deduplication providers.

Deduplication General Marketing

Surviving A Down Economy – A vendor Perspective

The outlook on the economy continues to be less than stellar. The National Bureau of Economic Research formally declared that we are in a recession. Thanks guys for stating the obvious! Tough times create difficulties for everyone. We have already seen vendors including NetApp, Quantum and Copan announcing cutbacks. Sequoia Capital added to the bleak forecast with their gloomy outlook slide deck. The big question is what does this mean to technology vendors?

In these difficult times, companies must focus on their bottom line. Every technology purchase will be scrutinized and the payback must be clearly quantified. As I posted previously, ROI is vital.

The good news for data protection companies is that data volumes do not go down in a recession and retention times do not shorten. The current difficulties in the financial sector suggest that we may see even stricter regulations and longer retentions. Deduplication-enabled solutions can still thrive in this environment because they provide compelling value. They reduce backup administration time and cost  while dramatically lowering acquisition cost. However, remember that not all systems are alike and you must consider future performance and capacity requirements. Adding multiple independent systems will negatively impact ROI. The result is that scalable deduplication solutions like those sold by SEPATON can provide strong ROIs and thus can weather the storm of a tough economy better than other technologies with weaker value propositions.

Recently, an independent market research firm who reviews the purchasing trends of companies of all sizes told us that their research indicates that companies over-purchased primary storage in the first half of 2008 and that the outlook for this sector was gloomy. In contrast, deduplication technology was the one bright spot. So far our experience has suggested that their analysis is accurate.

A difficult economy is a test of everyone’s staying power. Companies are scrutinizing every purchase and focus only on those technologies that provide truly compelling value. Deduplication enabled solutions are fortunate because of the value they bring. This is not to say that these technologies are immune, but rather that they will fare better than most.


10 Things I Am Thankful For

The Thanksgiving holiday is a time to reflect on things that you are thankful for and so I figured that this would be a great topic for my one blog post this week.

1. That the Somali pirates have not hijacked SEPATON although Bloomberg suggests in a humorous press release that Citibank may be in their sights.
2. That the backup guy is no longer treated like an ugly step child and locked in the tape silo when naughty.
3. That data retention requirements are likely to get even stricter thanks to our friends on Wall Street.
4. I have a job.
5. My job is not selling Rube Goldberg contraptions.
6. Data Domain has spent millions educating the market on why dedupe matters but only offers solutions for SMBs.
7. That all those cubicle gophers are still jacking up their company’s capacity requirements by downloading, sharing and storing all of their personal MP3s, videos and photos.
8. That gas prices have declined so I no longer have to skateboard to work.
9. Our VTL is so easy to install and operate that a consultant with no SEPATON experience set it up and got it running in 15 minutes.
10. That the loud CS guy who sat across from me was relocated to the broom closet. 🙂

Feel free to post what you are thankful for in the comments. Have a great Thanksgiving.