Scott from EMC has challenged SEPATON’s advertised performance for backup, deduplication, and restore. As industry analyst, W. Curtis Preston so succinctly put it, “do you really want to start a ‘we have better performance than you’ blog war with one of the products that has clustered dedupe?” However, I wanted to clarify the situation in this post.
Let me answer the questions specifically:
1. The performance data you refer to with the link in his post three words in is both four months old, and actually no data at all.
SEPATON customers want to know how much data they can backup and deduplicate in a given day. That is what is important in a real life usage of the product. The answer is 25 TB per day per node. If a customer has five nodes and a twenty-four hour day, that’s 125 TB of data backed up and deduplicated. This information has been true and accurate for four months and is still true today.
2. Sepaton claims the same performance for deduplication on and off. Which is not realistic. They make no mention of what happens when you do both tasks (ingest and deduplication) simultaneously.
Our numbers are conservative and include ingest, deduplication and house-cleaning tasks. Backup is even faster than that. We use the 25TB per node per day as our metric so our customers get the bottom line information they need to make a sound business decision about how fast we can get their data backed up and deduplicated. They don’t care if it backs up slightly faster than that.
As for dedupe off—a SEPATON VTL with 16 nodes can backup at 34.5 TB per HOUR.
3. Just like TSM deduplication. Coming soon.
TSM deduplication is GA.
4. They do have an issue restoring from deduplicated data, they just don’t want to discuss it.
Not sure where you get this information. We use a process called forward referencing so our restore speed from deduplicated data is the fastest in the industry. We store the most current data in full and deduplicate older data with pointers forward in time to it. The processing load of restoring day-old or weeks-old data is miniscule. Think about it, the newer the data, the less reconstitution necessary. Just the opposite of the EMC solution. In fact, data would have to be ready for cold storage (six months) before it was old enough to slow a SEPATON VTL down to EMC’s restore speed. Even then, it would likely be faster.
5. We have the ability to size how big the cached data pool is.
You highlight that EMC’s restore performance from the deduplication repository is a whopping 75% slower than ingest. The DL3D has the ability to size the cached data pool because it needs a cached data pool. SEPATON doesn’t. We restore as fast as we back up. Period. If the data is months old, we restore it nearly as fast as we backed it up. The slow down is smaller than a rounding error.