Defragmentation, rehydration and deduplication

W. Curtis Preston recently blogged about The Rehydration Myth. In his post he discusses how restore performance on deduplicated data declines because of the method used to reassemble the fragmented deduplicated data on disk. He also addresses the ways various technologies attempt to overcome these issues, including disk caching, forward referencing (used by SEPATON’s DeltaStor technology) and built-in defrag. In this post I wanted to discuss the last option because it is a widely-used approach for inline deduplication that has some little-known pitfalls.

Defrag is resource intensive
Despite the benefits of defragmentation technology highlighted by vendors, the actual restore time improvement delivered by it depend on the deduplication technology it is connected to and it always requires substantial processing resources. This technology typically requires numerous reads and writes to the filesystem as well as frequent access to the deduplication database causing significant processing overhead and slowing system performance. Some technologies throttle the process, which may improve performance, but lengthens the time that the system will be affected. It forces you to choose between very slow performance for a shorter time or less slow performance for a longer time.

The other challenge is that defragmentation activities are typically scheduled. Most systems require you to set the window for when the defragmentation operation happens. Data Domain defaults to 6:00 AM on Tuesdays. (Do they have something against Tuesdays? 😀 ) The customer must understand when the operation is scheduled and ideally minimize system activity during the window. This can be a challenge in rapidly changing enterprise environments. This process is typically performed in conjunction with a high overhead deduplication process that Data Domain calls housekeeping – other inline systems have similar processes. During housekeeping – which can take as long as 14 hours, the deduplication software acknowledges expired data, updates pointers, eliminates duplicate data from the disk, and, as the name suggests, cleans up its databases.

Defrag may not be beneficial
This seems counter-intuitive, but in some implementations defrag will actually slow restore performance, particularly when it is performed after cleaning and housekeeping is completed. I know of an actual enterprise customer that ran into this in its test environment. The customer performed 20 weeks of full and incremental backups with the defrag or “clean” process disabled and then tested the restore performance. They then ran the same restore test after running “clean” three times. The results are shown in the graph below.

Restore Performance After Clean

The customer’s restore performance declined by about 40 percent after cleaning. This is because the clean process replaced data with pointers, increasing the number of pieces of data that had to be retrieved and reassembled before a restore. In effect, this process is actually adding to the data fragmentation. The benefit of the process is that it increased deduplication ratios, but at a cost of a dramatic reduction in restore performance.

The key takeaway is that end users must carefully review their deduplication solutions and understand how backup and restore performance changes over time. Testing a single backup and restore is not an accurate representation of the long-term performance of the device. The concepts of disk caching, forward referencing and defragmentation may seem unimportant, but they can have a major long-term impact on the manageability and performance of a device.

Be Sociable, Share!
  • Twitter
  • Facebook
  • email
  • StumbleUpon
  • Delicious
  • LinkedIn

10 Responses to “Defragmentation, rehydration and deduplication”

  1. Test everything, believe nothing.

    And copying 300 GB to and from a dedupe device is a worthless test. They ALL look good when you do that. You have to mimic what you’re actually going to do with it in real life both in terms of throughput, capacity, and retention. Anything less isn’t testing; it’s testing your patience. 😉

  2. And BTW, the point of my post was not that defragging from any particular vendor fixed anything, but that the core problem is fragmentation, not rehydration. Rehydration is a myth. All a dedupe system is doing is reading a bunch of blocks on disk when it is restoring. They’re either getting those blocks from a few places (low fragmentation, mostly disk reads and few disk seeks, good restore speeds), or a lot of places (high fragmentation, lots of disk seeks, bad restore speeds). And the only way you’re going to see the difference is to test, test, test.

  3. Thank you for your comments. I agree with you. My point was simply that just because a system “defragments” or “cleans” does not mean that performance will improve. In many cases, these processes are used for housekeeping activities that have nothing to do with improving backup or recovery speeds. The benefit of defrag (or lack thereof) will vary depending on deduplication solution and it is imperative for end users to test this as part of any evaluation.

  4. Defragmentation is a process that reuceds the amount of fragmentation in file systems.It does this by physically organizing the contents of the disk to store the pieces of each file close together and contiguously.It also attempts to create larger regions of free space using compaction to impede the return of fragmentation. Some defragmenters also try to keep smaller files within a single directory together, as they are often accessed in sequence.The movement of the hard drive’s read/write heads over different areas of the disk when accessing fragmented files is slower, compared to accessing a non fragmented file in sequence, without moving the read/write heads.

    • I read so many articles with potily-wrrtoen content that your article is very impressive. It’s good to know there are writers that can write well and make their points clear.

    • F4,its indeed quite touching ah! i feel so "hey hui" when i saw the dogs!TT,我會將同兩隻狗嘅合照放入時間囊入面,但願我哋呢一份友誼,永遠都唔會褪色吖 (伍燕妮腔)naruto,very laan gag.. tan q!ocean,u should spend sometimes to revisit!! memories.. like the cornes of my mind, misty water-coloured memories.. of the way we were!!

    • Thanks Kevin for sharing your success with all of us. You have motivated a lot of people. I wanted to get big again myself and started the same time as you. I started at 180 not Im 207. Thank you once again.I wish you the best!

    • Hvis det blir sÃ¥nn at vi flytter til et trafikksikkert sted, sÃ¥ sier mamma at hun kommer til Ã¥ gjøre sÃ¥nn at jeg ikke kan fÃ¥ kattunger for de ville hun aldri klart Ã¥ gi vekk, sier hun

  5. That’s a skillful answer to a difficult question

  6. Isn’t that the guy who fkd the munger from sheris ranch? Like EW she’s a damn behemoth trainee NOTHING to brag about seriously! I can’t even believe she could get paid to bare her giant vag! I bet that’s why she went and dated Fred until his money ran out…naive men! Where is she now Freddy?

Leave a Reply

Page optimized by WP Minify WordPress Plugin