I was recently reading this document from CommVault that highlights their deduplication technology and was surprised by their use of the term “forward referencing”. Forward referencing is a common term in deduplication with a generally agreed upon definition. CommVault appears to have redefined the word and promoted their version as a feature. This is confusing and possibly misleading because a reader might not realize that the definition of “forward referencing” in this document is completely different from the one everywhere else in the industry.
I have posted in the past about the challenges of restoring data from a reverse referenced deduplication solution. In short, the impact can be substantial. You might wonder whether I am the only one pointing out this issue, and what the impact really is.
An EMC blogger recently posted on this topic and provided insights on the reduction in restore performance he sees from both the DL3D and Data Domain. He said, “I will have to rely on what customers tell me: data reads from a DD [Data Domain] system are typically 25-33% of the speed of data writes.” He then goes on to confirm that “…the DL3D performs very similarly to a Data Domain box”. He is referring to restore performance on deduplicated data in reverse referenced environment. (Both Data Domain and EMC/Quantum rely on reverse referencing.) He recommends that you maintain a cache of undeduplicated on the DL3D to avoid this penalty. Of course, this brings up a range of additional questions such as how much extra storage will the holding area require, how many days should you retain and what does this do to deduplication ratios?
The simplest solution to the above problem is to use forward referencing, but neither DD nor EMC/Quantum support this technology. EMC’s workaround is to force the customer to use more disk to store undeduplicated data which adds to the management burden and cost.
This reminds me of a classic quote from John “Hannibal” Smith from the A-Team:
I love it when a plan comes together!
What more confirmation do you need?
A week ago, I wrote an article highlighting how deduplication can impact restore performance and the difference between forward and reverse referencing. Many people are not familiar with these two deduplication technologies and their importance. SEPATON is the only vendor to implement forward referencing technology in a large scale enterprise appliance and it is important to understand why we did that.
Lauren Whitehouse from the Enterprise Strategy Group posted an article on a similar topic on Searchstorage.com on 8/11/08. It is gratifying to know that I am not the only one focused on the importance of deduplication and restore performance!