I have posted in the past about the challenges of restoring data from a reverse referenced deduplication solution. In short, the impact can be substantial. You might wonder whether I am the only one pointing out this issue, and what the impact really is.
An EMC blogger recently posted on this topic and provided insights on the reduction in restore performance he sees from both the DL3D and Data Domain. He said, “I will have to rely on what customers tell me: data reads from a DD [Data Domain] system are typically 25-33% of the speed of data writes.” He then goes on to confirm that “…the DL3D performs very similarly to a Data Domain box”. He is referring to restore performance on deduplicated data in reverse referenced environment. (Both Data Domain and EMC/Quantum rely on reverse referencing.) He recommends that you maintain a cache of undeduplicated on the DL3D to avoid this penalty. Of course, this brings up a range of additional questions such as how much extra storage will the holding area require, how many days should you retain and what does this do to deduplication ratios?
The simplest solution to the above problem is to use forward referencing, but neither DD nor EMC/Quantum support this technology. EMC’s workaround is to force the customer to use more disk to store undeduplicated data which adds to the management burden and cost.
This reminds me of a classic quote from John “Hannibal” Smith from the A-Team:
I love it when a plan comes together!
What more confirmation do you need?