Deduplication Strategy and Dell/Ocarina

This week, Dell acquired Ocarina, a provider of primary storage deduplication. The acquisition provides technology that they can integrate with existing storage platforms such as EqualLogic. However, Dell also sells deduplication technology from EMC/Data Domain, CommVault and Symantec. Dave West at CommVault suggests that these technologies are complementary, and I agree. However, the announcement raises a significant strategic question – which is a better deduplication strategy, “one size fits all” or “best of breed”?

Deduplication is an important technology in the datacenter and reduces power footprint and cooling requirements. However, it typically brings a performance trade-off during read or write operations due to the additional processing required to re-hydrate or deduplicate data. The benefits of the technology are compelling and we have seen multiple large companies promote different deduplication strategies. Their approaches fall into two broad categories:“best of breed” (BoB) or “one size fits all,” (OSFA) and the choice of approach has a major impact. Let’s look at each strategy individual.

One size fits all

This approach is best exemplified by EMC’s Viper project and HP’s StoreOnce initiative. The idea is to create one global deduplication technology that is used across all applications and hardware. In theory, this allows for deduplication with consistent results and the ability to move deduplicated data from one system to another with out having to re-hydrate data. The idea is a bold one, but has some challenges in implementation.

The fundamental problem is that both data characteristics and the requirement for read/write performance vary widely. For example, compare the performance and SLA requirements of an Oracle database to a user file backup. Not only are these data types completely unrelated, but they are also written to storage in a very different ways. Given these variations how can you use one deduplication algorithm? Deduplication technology is complex and must be designed to solve specific problems. A generic algorithm is just that and will likely provide mediocre results across the board.

These vendors have chosen a difficult path and their challenge is to create a flexible deduplication algorithm that will work effectively across all data types and business SLAs. Having seen deduplication development first hand, I can tell you that it is extremely complex and creating a generic solution that works across all tiers of storage and types of data and provides compelling results is extraordinarily challenging if not impossible. It will be interesting to see how these vendors address this.

Best of breed

This approach is best exemplified by Dell and their recent acquisition of Ocarina. In this model, a vendor assembles a range of data reduction options to meet different needs. For example, a supplier might offer an in-band compression option that would minimize the performance impact and provide an adequate but limited data reduction benefit for performance sensitive customers. They would also offer a backup-specific deduplication solution to minimize backup windows, recovery times and storage footprint. By offering these options, the vendor could provide solutions that meet a range of customer needs.

The challenge with the BoB approach is that you do not gain the interoperabilty benefits of OSFA. Moving data from one environment to another will require the re-hydration and re-deduplication of the data, potentially impacting performance. Although this creates inefficiencies, it does provide the benefit of optimized data reduction that best aligns with business requirements

Summary

Based on my experience with deduplication in the secondary storage arena, the BoB approach is the way to go. The OSFA model promises a level of simplicity with one algorithm used universally; however, it also creates substantial development and management complexities due to varying data and business requirements. The vendors promoting this approach position it as a long-term strategy and so it is unlikely to appear any time soon. This means that it is unclear if/when the vision will come to fruition and it is certainly possible that it may never live up to expectations.

The BoB approach provides an immediate time-to-market benefit. There are suppliers providing a range of data reduction options today that fit nicely into this model, and Dell appears to be pursuing this approach with Ocarina for primary storage and various partners for backup deduplication. The challenge is that it can add to management complexity and force re-hydration as data moves between solutions. However, even with these limitations, I believe that this is the best approach available today. It provides the flexibility and optimized performance that customers’ need to meet their varying business requirements.

The Future

Long-term, I believe that we will see a combination of these two models. The simplicity of OSFA is beneficial, but business requirements will mandate the improved performance and data reduction characteristics of BoB. The best strategy is to define two or three broad tiers of data and use appropriate data reduction methods for each. The data reduction algorithms will likely vary by tier and may range from simple compression to sophisticated deduplication approaches. An important consideration is manageability and scalability of each tier and SEPATON will play an important role by providing a highly scalable platform for protecting, storing and retaining secondary data. SEPATON has spent a great deal of time solving the deduplication conundrums in the secondary storage area – with our ContentAware approach, we can deliver both the speed and simplicity of the OSFA approach and the flexibility and efficiency of the BoB approach. Take my advice. BoB is the way to go.

Leave a Reply Cancel reply