IBM/Diligent TS7650G uses a pattern matching approach to deduplication, which is different from the hash-based solutions used by many vendors or the ContentAwareTM approach pioneered by SEPATON.
Diligent’s technology requires Fibre Channel (FC) drives for the best performance because pattern matching is highly I/O intensive and needs the additional I/O from FC drives. FC drives in turn, negatively affect disk density, require more power and dramatically increase the price of the system.
The pattern matching technology used in the TS7650G is an inline process. Therefore, all duplicate data has to be identified before data is committed to disk. Pattern matching only provides an approximate match on redundant data and requires a byte-level compare to verify the redundancy. All byte-level compares must be completed before any data is written to disk and the next piece of data accepted. FC drives are required because they provide the random I/O performance needed to handle inline byte-level comparisons. Diligent specified a 110 disk FC array for the ESG performance whitepaper that they sponsored back in July of 2006. (Local copy of the ESG whitepaper.) This is not to say that the algorithm will not work with SATA, but these drives will dramatically reduce performance.
If you are considering the TS7650G, you must carefully consider the associated disk sub-system. It is not clear what disk system and capacity was used when IBM/Diligent generated their performance specifications. As part of the evaluation you should also test single stream and aggregate backup performance because as previously mentioned single stream performance may be a challenge.