Categories
Deduplication

Deduplication Strategy and Dell/Ocarina

This week, Dell acquired Ocarina, a provider of primary storage deduplication.  The acquisition provides technology that they can integrate with existing storage platforms such as EqualLogic.  However, Dell also sells deduplication technology from EMC/Data Domain, CommVault and Symantec.  Dave West at CommVault suggests that these technologies are complementary, and I agree. However, the announcement raises a significant strategic question – which is a better deduplication strategy, “one size fits all” or “best of breed”?

Deduplication is an important technology in the datacenter and reduces power footprint and cooling requirements.  However, it typically brings a performance trade-off during read or write operations due to the additional processing required to re-hydrate or deduplicate data.  The benefits of the technology are compelling and we have seen multiple large companies promote different deduplication strategies.  Their approaches fall into two broad categories:“best of breed” (BoB) or “one size fits all,” (OSFA) and the choice of approach has a major impact.  Let’s look at each strategy individual.

Categories
Deduplication

Storage pools and why they matter

Today SEPATON announced the addition of Storage Pools to our data protection platform.  The technology marks a major step in the path to data protection lifecycle management, and I am excited about the new functionality and wanted share some brief thoughts.

To summarize, storage pooling allows data to be segmented into discrete pools that do not share deduplication.  Data sent to one pool will only be deduplicated against information in that pool and will not co-mingle with other data.  Additionally, pools provide configuration flexibility by supporting different types of disks with different performance profiles.  Pools also benefit from SEPATON’s DeltaScale architecture which allows for dynamic capacity and performance scalability.  Pools are a no-cost option with our latest software release and customers have the ability to implement them in the way that best meets their business requirements.  Some of the benefits include:

Categories
Backup Deduplication Replication

Deduplication ratios and their impact on DR cost savings

There is an interesting blog discussion between Dipash Patel from CommVault and W. Curtis Preston from Backup Central and TruthinIT regarding the increasing or decreasing benefits of deduplication ratios. They take different perspectives on the benefits of increasing deduplication ratios and I will highlight their points and add an additional one to consider.

Patel argues that increasing deduplication ratios beyond 10:1 provides only a marginal benefit. He calculates that going from 10:1 to 20:1 results in only a 5% increase in capacity efficiency and suggests that this provides only a marginal benefit. He adds that vendors who suggest that a doubling in deduplication ratios will result in a doubling cost savings are using a “sleight of hand.” He makes an interesting point, but I disagree with his core statement that increasing deduplication ratios beyond 10:1 provides only marginal savings.

Categories
Deduplication

TSM Target Deduplication: You Get What You Pay For

I was recently pondering TSM’s implementation of target deduplication and decided to review ESG’s Lab Validation report on IBM TSM 6.1. There is quite a bit of good information in the paper, and some really interesting data about TSM’s target deduplication.

Before discussing the results, it is important to understand the testing methodology. Enterprise Strategy Group clearly states that the article was based on “hands-on testing [in IBM’s Tucson, AZ labs], audits of IBM test environments, and detailed discussions with IBM TSM experts.” (page 5) This means that IBM installed and configured the environment and allowed ESG to test the systems and review the results. Clearly, IBM engineers are experts in TSM and so you would assume that any systems provided would be optimally configured for performance and deduplication. The results experienced by ESG are likely the best case scenario since the average customer may not have the flexibility (or knowledge) to configure a similar system. This is not a problem, per se, but readers should keep this in mind.

Categories
Deduplication

TSM and Deduplication: 4 Reasons Why TSM Deduplication Ratios Suffer

TSM presents unique deduplication challenges due to its progressive incremental backup strategy and architectural design. This contrasts with the traditional full/incremental model used by competing backup software vendors. The result is that TSM users will see smaller deduplication ratios than their counterparts using NetBackup, NetWorker or Data Protector. This post explores four key reasons why TSM is difficult to deduplicate.

Categories
Deduplication

Four Must Ask Questions About Metadata and Deduplication

When backing up data to a deduplication system, two types of data are generated. The first comprises objects being protected such as the Word documents, databases or Exchange message stores. These files will be deduplicated and for simplicity I will call this “object storage”. The second type of data generated is metadata. This is information that is used by the deduplication software to recognize redundancies and potentially re-hydrate data in the case of restoration. These two types of data are critical and are typically required for writing the data to the system and potentially reading data. Here are four key questions that you should ask about protecting metadata.

Categories
Deduplication Marketing

SNW Recap

I returned from SNW in Phoenix last night and wanted to recap the event.  I had 10+ meetings at the show and there were multiple sessions and so am providing my perspectives on the event in general and the sessions I did attend.

Deduplication remains hot and still confuses many
I attended 5 different sessions on deduplication.  The content overlapped quite a bit and yet all but one of them was full.  The presentation in all cases focused primarily on deduplication and data protection.  I heard that there was a great panel discussion on primary storage deduplication which I unfortunately missed. Clearly, primary storage dedupe was not ignored, but it appeared that data protection remained the focus of the dedupe sessions.

Anecdotally, the most common deduplication question related to the difference between target and source deduplication.  It also appeared that deduplication adoption was limited.  When asked who was using some form of deduplication about 50% of the audience raised their hand, but when queried about system size, hands went down rapidly at around 10-15 TB.

The key takeaway is that deduplication remains a strong point of interest.  It appears that end users are still trying to understand the technology and how to implement it on a larger scale.

Categories
Deduplication

Global Deduplication Explained

W. Curtis Preston recently authored an article on Searchstorage.com.au explaining global deduplication.  This is an important topic which frequently causes confusion.  Curtis does a good job explaining the technology and what it means to end users and  I recommend the article.

A quick summary is that global deduplication means that a common deduplication repository is shared by multiple nodes in a system.  In these environments, a customer can backup their data to any node on a system and it will be deduplicated against related data.  This provides improved ease of use and scalability.

Categories
Deduplication

Deduplication 2.0

The folks over at the Online Storage Optimization blog recently wrote a post entitled Get Ready for Dedupe 2.0 where they outline their vision for the future of deduplication.  I read the post and was amazed at the similarity between their views and SEPATON’s core VTL architecture. I thought that it would be useful to address each of their points and indicate how they apply to SEPATON’s DeltaScale Architecture.

Categories
Deduplication Restore

Defragmentation, rehydration and deduplication

W. Curtis Preston recently blogged about The Rehydration Myth. In his post he discusses how restore performance on deduplicated data declines because of the method used to reassemble the fragmented deduplicated data on disk. He also addresses the ways various technologies attempt to overcome these issues, including disk caching, forward referencing (used by SEPATON’s DeltaStor technology) and built-in defrag. In this post I wanted to discuss the last option because it is a widely-used approach for inline deduplication that has some little-known pitfalls.