The hidden cost of deduplicated replication

On the surface, the idea of deduplicated replication is compelling. By replicating deltas, the technology sends data across a WAN and dramaically reduces the required bandwidth. Many customers are looking to this technology to allow them to move to a tapeless environment in the future. However, there is a major challenge that most vendors gloss over.

The most common approach to deduplication in use today is hash-based technology which uses reverse referencing. I covered the implications of this approach in another post. To summarize, the issue is that restore performance is impacted as data is retained in a reverse referenced environment. Now let’s look at how this impacts deduplicated replication.

Most vendors position the technology as a tape replacement. The customer first backs up their data to a local device and then the data is replicated to a remote site. In the typical use case, customers will retain a smaller amount of data locally and will use the remote site for long-term retention. Thus the remote system will include substantially more disk capacity than the local device.

During normal operations, you would expect to access the device at the remote data center only if data has expired locally. If this is a frequent occurrence, then you should increase local retention to reduce your usage of the remote system. The most important use of the remote site is to protect against a complete disaster. In a disaster situation, the end user must rely on the remote copy of their data and begin a massive restore process. Unlike a traditional restore where you only need access a few files or at most a few servers, in a DR scenario you need to restore multiple servers and huge amounts of data.

The pitfall with deduplication and replication revolves around the importance of restoration. In this scenario, your business depends on your ability to restore data at high speed. The longer the restore time the longer the outage and the greater the cost. The irony is that with reverse referencing restore performance declines with retention and with long retentions at the DR site, the restore impact becomes substantial. Thus your data that you backed up at 300 MB/sec will be restored at closer to 100 MB/sec. For example, your 10 TB backup which takes about 10 hours to protect, will take 30 hours to restore. This is a major bottleneck and will substantially lengthen the time that it will take to fully recover your data and return to normal operation. In this scenario, your restore performance is not much better than tape!

The point here is not that deduplicated replication is a bad thing, but that like any technology, there are trade-offs. Restore performance is a key metric that should be evaluated in all solutions. The impact of reverse referencing should not be underestimated particularly in the context of DR where long-term retention is the norm. SEPATON’s forward referencing technology is designed to minimize this impact. When evaluation solutions, you must be cognizant of the impact of deduplication on restore performance, and any test plans you create should include testing of restore performance on retained data.

Leave a Reply Cancel reply