When backing up data to a deduplication system, two types of data are generated. The first comprises objects being protected such as the Word documents, databases or Exchange message stores. These files will be deduplicated and for simplicity I will call this “object storage”. The second type of data generated is metadata. This is information that is used by the deduplication software to recognize redundancies and potentially re-hydrate data in the case of restoration. These two types of data are critical and are typically required for writing the data to the system and potentially reading data. Here are four key questions that you should ask about protecting metadata.
- How is metadata stored?
- How is metadata protected?
- How does metadata storage requirements grow over time?
- What happens if the metadata it is corrupted?
It is important to understand how the storage system is configured to protect this data. Simple questions like “how much storage is allocated to metadata?” can be vital since running out of metadata space is essentially the equivalent of running out of object space. (e,g, if you run out of metadata capacity then you cannot add any more items to object storage since this creates metadata.) I have heard of numerous scenarios with competing systems where customers run out of metadata space while still having ample object space available. This results in inefficient space utilization.
Metadata is a critical part of the deduplication process and its loss or corruption can be very problematic. You should understand how metadata is protected. Are backup copies made? If so, how are they created, stored and recovered.
As mentioned above, running out of metadata space results in poor capacity utilization. You should understand how metadata is created and its growth over time. In many implementations, the size depends on deduplication efficiency and so the more data you backup that cannot be deduplicated (e.g. compressed or encrypted data), the more metadata space you will use. Replication may also require meta space and you should ascertain this impact as well.
Since this data is critical for the operation of deduplication, you should understand what happens if you permanently lose the metadata repository due to corruption or other problem. In extreme cases this could result in a complete system loss while in other cases the impact may be a reduction in deduplication ratios. The impact will vary by deduplication algorithm.
In summary, the creation management and handling of metadata is a key component of every deduplication algorithm. Many vendors prefer to gloss over the importance of metadata and focus on other areas such as deduplication ratios or post-process vs inline. The reality is that the management and creation of metadata is critical and each end user should evaluate how this effects various solutions and their business SLAs. In a future blog post, I will discuss how SEPATON addresses each of these four questions.