I was recently reading this document from CommVault that highlights their deduplication technology and was surprised by their use of the term “forward referencing”. Forward referencing is a common term in deduplication with a generally agreed upon definition. CommVault appears to have redefined the word and promoted their version as a feature. This is confusing and possibly misleading because a reader might not realize that the definition of “forward referencing” in this document is completely different from the one everywhere else in the industry.
To summarize, all deduplication solutions need to store at least one complete copy of data to use as a target for pointers. As new data is ingested, the redundancies are identified and replaced with pointers. In the reverse referenced model, the first backup is held in an undeduplicated state and all future backups are made of pointers that point backward to earlier backups. The result is that the more data you retain, the slower your restore performance becomes because you need to reassemble information from more and more pointers to reconstitute the complete data.
Forward referencing is the opposite of the reverse referenced model. In this approach, the latest data is maintained in its entirety and the older data is made of pointers forward to the newest data. As a new backup is ingested, the previous backup is replaced with pointers forward to the latest backup. The result is that the newest data is always maintained in its entirety on disk (e.g. non-deduplicated) thus enabling the fastest restore performance on the newest backups.
It is in the context of the above that CommVault introduces their concept of “Automatic forward referencing.” Their datasheet indicates that this term refers to a reverse referenced model where they limit the number of pointers and when that number is reached they store a new full backup (that is not deduplicated) and deduplication starts over.
There is a misunderstanding here. No matter how many full backups CommVault lays down, their technology is still reverse referenced. Reverse referencing defines how pointers are created and where they point and has nothing to do with the number of deduplicated or non-deduplicated full backups. CommVault appears to have decided to ignore this definition and created an inconsistent one.
CommVault has chosen to highlight their ability to automatically limit pointers and store more undeduplicated backups. It appears that this would reduce data reduction, but it is their product and they can do what they want. I find it troubling and misleading that they are misusing an industry standard term and are promoting a feature that they do not have.
Am I the only one confused about this?