Categories
Deduplication

Data Domain Announcement

Data Domain recently announced that their new OS release dramatically improved appliance performance. On the surface, the announcement seems compelling, but upon further review, it creates a number of questions.

Performance Improvement
Deduplication software such as Data Domain’s is complex and can contain hundreds of thousands of interrelated lines of code. As products mature, companies will fine tune and improve their code for greater efficiency and performance. You would expect to see performance improvements from these changes of about 20-30%. Clearly, if an application is highly inefficiently coded, you will see greater performance gains. However, larger improvements like those quoted in the release are usually only achieved with major product architecture updates and coincide with a major new software release.

In this case, I am not suggesting that Data Domain’s software is bad, but rather that the stated performance improvement is suspect. They positioned this as a dot code release and so it is not a major product re-architecture. Additionally, if it was a major architecture update, they would have highlighted it in the release.

To summarize, the stated performance gains in the release are too large to attribute to a simple code tweak and I believe that the gains are only attainable in very specific circumstances. Data Domain appears to have optimized their appliances for Symantec’s OST and is trumpeting their performance gains. However, OST represents only a small fraction of Data Domain’s customer base and it seems that customers using non-Symantec backup apps will see uncertain performance improvements. Read on to learn more.

Categories
Deduplication Restore

Restore Performance

Scott from EMC posted about the EMC DL3D 4000 today. He was responding to some questions by W. Curtis Preston regarding the product and GA. I am not going to go into detail about the post, but wanted to clarify one point. He says:

Restores from this [undeduplicated data] pool can be accomplished at up to 1,600 MB/s. Far faster than pretty much any other solution available today, from anybody. At 6 TB an hour, that is certainly much faster than any deduplication solution.
(Text in brackets added by me for clarification)

As recently discussed in this post, SEPATON restores data at up to 3,000 MB/sec (11.0 TB/hr) both with deduplication enabled and disabled. Scott insinuates that only EMC is capable of the performance he mentions and I wanted to clarify for the record that SEPATON is almost twice as fast as the fastest EMC system.

Categories
Backup Deduplication Restore Uncategorized Virtual Tape

SEPATON Performance — Again

Scott from EMC has challenged SEPATON’s advertised performance for backup, deduplication, and restore. As industry analyst, W. Curtis Preston so succinctly put it, “do you really want to start a ‘we have better performance than you’ blog war with one of the products that has clustered dedupe?” However, I wanted to clarify the situation in this post.

Let me answer the questions specifically:

1. The performance data you refer to with the link in his post three words in is both four months old, and actually no data at all.

SEPATON customers want to know how much data they can backup and deduplicate in a given day. That is what is important in a real life usage of the product. The answer is 25 TB per day per node. If a customer has five nodes and a twenty-four hour day, that’s 125 TB of data backed up and deduplicated. This information has been true and accurate for four months and is still true today.

Categories
Deduplication

TSM Deduplication

IBM recently announced the addition of deduplication technology to their Tivoli Storage Manager (ITSM) backup application. ITSM is a powerful application that uses a progressive incremental approach to data protection that is completely different from most other backup applications. The addition of deduplication to ITSM provides a benefit in disk space utilization, but also creates some new challenges.

The first challenge for many TSM environments is that administrators are already over-burdened with having to manage numerous discrete processes to ensure that backup operations are meeting their business requirements. The deduplication functionality within ITSM adds another process to an already complex backup environment. In addition to scheduling and managing processes such as reclamation, migration and expiration as part of daily operations, administrators now have to manage deduplication as well. This management may involve activities as disparate as capacity planning, fine-tuning, and system optimization. The alternative is to use a VTL-based deduplication solution like a SEPATON® S2100®-ES2 VTL with DeltaStor® software, which will provide deduplication benefits without having to create and manage a new process.

Categories
Deduplication

IBM Deduplication Appliances

I have been on hiatus as of late and apologize for my tardiness in blogging.

IBM released their new deduplication applications based on the technology they acquired from Diligent. At first glance, it might appear that this could be a competitive alternative to SEPATON, but when you look at it, it quickly becomes apparent that this is not the case.

IBM previously sold one product, TS7650G gateway which they now target at the enterprise. The new appliance products use similar server hardware and a de-featured version of the DS4700 disk array. As with all Diligent installations, the solution uses Fibre Channel drives that reduce density and add cost. They will never be price leaders. The configurations are as follows:

Capacity Nodes
7 TB One
18 TB One
36 TB One
36 TB Two


You can’t move beyond the configurations listed above. If you want to grow the system beyond 36 TB, you are out of luck. Your only choice is a forklift upgrade to the TS7650G gateway. What if you want dual nodes and less than 36TB? Same answer. How about replication? Same answer. (That is, if you can consider the array-based approach in the TS7650G a realistic replication option.)

The ultimate irony is that by creating appliance VTLs, IBM has actually made their customers’ lives more difficult. Customers now have to choose whether to purchase a gateway (which adds complexity and cost) or a simple bounded appliance (which has limited configurations). Why should a customer have to make this trade-off? Why not offer an appliance that is simple, cost-effective AND scalable? Well, the simple answer to the question is to get a SEPATON S2100-ES2!

Categories
Deduplication Virtual Tape

Customer perspectives on SEPATON, IBM and Data Domain

SEPATON issued a press release on Monday that is worth mentioning here on the blog. SearchStorage also published a related article here. The release highlights MultiCare a SEPATON customer that uses DeltaStor deduplication software in a two-node VTL.

In the release, the customer characterizes their testing of solutions from Diligent/IBM (now IBM TS7650G) and Data Domain. Specifically, they mention that the TS7650G was difficult to configure and get running and that the gateway head nature of the product also made it difficult for them to scale capacity. These difficulties illustrate the challenges of implementing the TS7650G’s head only design. With this solution, the burden of integrating and managing the deduplication software and disk subsystem falls on the end user. Contrast this with a SEPATON appliance that manages the entire device in a fully integrated, completely automated fashion.

They had a typical Data Domain experience. That is, their initial purchase looked simple and cost effective but rapidly become complex and costly. In this case, MultiCare hit the Data Domain scalability wall, requiring them to purchase multiple separate units. The result is that MultiCare had to perform two costly upgrades and had to rip and replace their Data Domain solutions with newer, faster units. Scalability is the challenge with Data Domain solutions and it is not uncommon for customers to purchase one unit to meet their initial needs and then be forced to add additional units or perform a forklift upgrade.

As MultiCare found, customers must thoroughly understand their requirements when considering deduplication solutions. They tested the head-only approach and found it to be too complex to operate and manage to meet their needs. They tried the small appliance approach and found that they outgrew their initial system and were forced to pursue costly upgrades. In the end, they recognized that the best solution for their environment was a highly scalable S2100-ES2 solution which provided the performance and scalability that could not be achieved with either the TS7650G or Data Domain.

Categories
Deduplication

TS7650G and Fibre Channel Drives

IBM/Diligent TS7650G uses a pattern matching approach to deduplication, which is different from the hash-based solutions used by many vendors or the ContentAwareTM approach pioneered by SEPATON.

Diligent’s technology requires Fibre Channel (FC) drives for the best performance because pattern matching is highly I/O intensive and needs the additional I/O from FC drives. FC drives in turn, negatively affect disk density, require more power and dramatically increase the price of the system.

The pattern matching technology used in the TS7650G is an inline process. Therefore, all duplicate data has to be identified before data is committed to disk. Pattern matching only provides an approximate match on redundant data and requires a byte-level compare to verify the redundancy. All byte-level compares must be completed before any data is written to disk and the next piece of data accepted. FC drives are required because they provide the random I/O performance needed to handle inline byte-level comparisons. Diligent specified a 110 disk FC array for the ESG performance whitepaper that they sponsored back in July of 2006. (Local copy of the ESG whitepaper.)  This is not to say that the algorithm will not work with SATA, but these drives will dramatically reduce performance.

If you are considering the TS7650G, you must carefully consider the associated disk sub-system. It is not clear what disk system and capacity was used when IBM/Diligent generated their performance specifications. As part of the evaluation you should also test single stream and aggregate backup performance because as previously mentioned single stream performance may be a challenge.

Categories
Deduplication Virtual Tape

Falconstor, SIR and OEMs

This article on Byteandswitch.com highlights enhancements to FalconStor’s SIR deduplication platform, but I have to wonder whether anyone cares. FalconStor was a big player in providing VTL software to OEMs; but their deduplication software has been largely ignored.

FalconStor had their heyday in VTL. They aggressively pursued OEM deals with large vendors including EMC, IBM, and Sun. EMC was the most successful with their EDL family of products. As the market moved to deduplication, you would think that FalconStor would be the default OEM supplier of deduplication software as well. You would be wrong.

Ironically, FalconStor’s VTL success was their downfall in deduplication. Their OEMs realized that they were all selling the same VTL software and did not want to repeat the situation with deduplication.  EMC and IBM, have already announced that they are using alternative deduplication providers.

Categories
Deduplication General Marketing

Surviving A Down Economy – A vendor Perspective

The outlook on the economy continues to be less than stellar. The National Bureau of Economic Research formally declared that we are in a recession. Thanks guys for stating the obvious! Tough times create difficulties for everyone. We have already seen vendors including NetApp, Quantum and Copan announcing cutbacks. Sequoia Capital added to the bleak forecast with their gloomy outlook slide deck. The big question is what does this mean to technology vendors?

In these difficult times, companies must focus on their bottom line. Every technology purchase will be scrutinized and the payback must be clearly quantified. As I posted previously, ROI is vital.

The good news for data protection companies is that data volumes do not go down in a recession and retention times do not shorten. The current difficulties in the financial sector suggest that we may see even stricter regulations and longer retentions. Deduplication-enabled solutions can still thrive in this environment because they provide compelling value. They reduce backup administration time and cost  while dramatically lowering acquisition cost. However, remember that not all systems are alike and you must consider future performance and capacity requirements. Adding multiple independent systems will negatively impact ROI. The result is that scalable deduplication solutions like those sold by SEPATON can provide strong ROIs and thus can weather the storm of a tough economy better than other technologies with weaker value propositions.

Recently, an independent market research firm who reviews the purchasing trends of companies of all sizes told us that their research indicates that companies over-purchased primary storage in the first half of 2008 and that the outlook for this sector was gloomy. In contrast, deduplication technology was the one bright spot. So far our experience has suggested that their analysis is accurate.

A difficult economy is a test of everyone’s staying power. Companies are scrutinizing every purchase and focus only on those technologies that provide truly compelling value. Deduplication enabled solutions are fortunate because of the value they bring. This is not to say that these technologies are immune, but rather that they will fare better than most.

Categories
Deduplication Virtual Tape

Choosing a Data Protection Solution in a Down Economy

I hate to turn on the TV these days because it is full of bad news. There always seems to be some pundit talking about troubles in the housing market, credit markets, automotive industry, consumer confidence and so many other areas. It does not take a rocket scientist to recognize that the economy is in tough shape right now. As a reader of this blog, you are likely feeling some of the pain in your budget. This obviously brings up an important question: how do I justify IT purchases in these environments.

In situations like these, IT departments must go back to the basics. Purchases must be all about ROI. You must look beyond just acquisition cost and consider how a given solution can save your organization money both upon acquisition and into the future.