Interesting Performance Data
NetApp announced some interesting SPC-1 benchmark data today. They compared their mid-level FAS 3040 with the EMC Clariion CX 3 Model 40. The results are published here:
Interesting points:
- NetApp's configuration ran RAID-6. I'm not sure I've ever seen another storage benchmark where someone ran anything but RAID 1 or 10 because that's the highest performing. That also means that by default they get much better utilization of the space, which plays in a lower overall cost (though all that's listed are list prices, and lord knows you have to be from Pluto to pay list price).
- Both systems ran roughly equivalent natively – though Netapp did outperform the Clariion 31,000 to 25,000 IOPS. Neither is lighting the world on fire, but with fewer disks running RAID-6 they were able to beat a pure block device – in a pure block test. I've heard them say it before, but I guess I always had my doubts that it was possible to do that running through a file system.
- If you use the systems the way intended – with Snapshots, Netapp squished the Clariion. On their own box they created a new snap every 15 minutes, keeping a rolling (3) and deleting the old as they went. Their performance stayed pretty much the same. The Clariion created only 2 snapshots during the 3 hour test, and deleted 1 – and lost over 60% of its performance – going from 25,000 to 9,000 IOPS. Yikes.
Two take aways:
A: Netapp appears to have legit block performance, and shouldn't be dismissed because people (like me) presume it can't be true. It looks like its true. Better yet, they don't lose anything by creating, using, and deleting snapshots, one of the most important functions any storage manager has at their disposal.
B: EMC has some work to do on their own snapshots.
Disclaimers: I don't know the configuration and use case and fully expect EMC to pooh-pooh it in some way. I expect you'll see commentary or questions as to how full the NetApp filer was during the test, as EMC will tell everyone that if the file system is 80% full then the performance of the NetApp box will stink. I don't know if that's true, but it is what they will say. I welcome the logic and argument.
There is one guy who is the only authorized "independent" auditor of these tests, and I have no idea who he is or what his motivation is, other than the monopoly he appears to have created for himself. I'm jealous. I don't care so much about the overall numbers, what I do care about is finding out that Netapp can really run block services – not only without sucking pond water but beating a pretty darn good machine at its own game. I always knew the Netapp snapshot was a killer feature, but also sort of assumed it would take more of a toll than it apparently does.
EMC doesn't participate in performance benchmarking – and I can't say that I blame them. Someone can always build a better mousetrap designed to win a benchmark. It's chasing ghosts. Anytime a vendor can do their own benchmarking, people will be suspect. Having said that, I don't think Netapp cares about "beating" the CX – I'm sure they are happy just to show that popular opinion (or at least mine) is completely wrong – they clearly have legitimate block performance. They want a seat at the table, and this clearly shows they deserve that shot. The fact that they are demonstrating that their snap technology is 99% efficient and the Clariion's so inefficient is the most interesting thing to come out of this (I'll happily let EMC show me how I'm interpreting this incorrectly). It was my friend Chuck Hollis of EMC who blogged on the silliness of SPC and other benchmarks back in August, 2007, who said "We've never done an SPC test, and probably will never do one. Anyone is free, however, to download the SPC code, lash it up to their CLARiiON, and have at it." I happen to agree with the overall assertion of the blog, but don't think Chuck could have figured those tricky Netapp folks would come at it from this angle. If performance isn't really an issue, then the focus moves to functionality, and snap is one of the most used (for good reason) functions in the storage world.
Finally, my own guys are going to stick their noses into this one. Tony Palmer, an ESG Lab guy and former EMCer, has already been telling me that the Clariion snapshot needs to be tuned to get good performance, but my point is that if that is true, why not make it known? This was a customer installable configuration I assume.
So let's go to work people. I need a little pre-superbowl excitement.



Hi Steve -- glad you found this as "interesting" as I did.
There's different levels to this that I think most people would find interesting.
The big question in our mind is "what trick did they pull?". We know EMC products, we know the NetApp's. Inside and out.
Just like the magician on stage, we know there's probably some sleight-of-hand involved, we're just not sure what they did.
And, of course, no one's talking.
Anything you could do to shed some light on the matter from ESG's impartial stance would be useful to the industry and to customers, I'd offer.
Of course, I blogged my initial reaction, but -- again -- I'm looking at it from my own (and EMC's) perspective.
Even maybe getting into the business of running the SPCs yourself just to get the vendor creativity out of the mix.
Thanks!
Posted by: Chuck Hollis | January 30, 2008 at 10:45 AM
Hi Steve,
Your earlier perceptions around NetApp performance were precisely why we went ahead and conducted these head to head tests using the most neutral and respected 3rd party in the industry (SPC).
Your point #2 above reminds me I need to blog on the difference between our block I/O control path for LUN management (using a convenient & familiar filesystem namespace) and our data path for block I/O which has no filesystem semantics or "overhead" at all in the way - hence our excellent performance proven here.
We look forward to working with Tony Palmer to answer all your collective questions.
Finally, as per my own blog on Tuesday the 29th (http://blogs.netapp.com/exposed/2008/01/a-brief-history.html) - none of this is surprising to NetApp customers, partners or staff. That's because these SPC results are consistent with all other SAN or NAS performance results / benchmarks we have published over the past decade I have been with NetApp.
Posted by: Val Bercovici | January 30, 2008 at 10:49 AM
"EMC doesn't participate in performance benchmarking"
Steve - an interesting point - EMC makes this claim a lot but it's untrue. They are an active participant in SPEC for NAS. So they just don't put Clariion to the test...
See: http://www.spec.org/sfs97r1/results/sfs97r1.html
Posted by: Taylor Allis | January 30, 2008 at 11:40 AM
Taylor brings up a good point...
If EMC submits spec NFS benchmarks for Celerra kit and ESRP benchmarks to impress would be Exchange customers;
See: http://technet.microsoft.com/en-us/exchange/bb412165.aspx
Then I'm more than a little confused with EMC's stance on benchmarks!?!?
Posted by: Chris Banes | January 31, 2008 at 08:55 PM
OK, so I have a couple of questions and a comment on this topic.
1. Were the NetApp snapshots read-only, or read-write? There's a big difference so it would be interesting to know.
2. Were the snapshots actually used for anything while the SPC benchmarks were running?
Why do I ask? Because the NetApp model for snapshots (read/write snapshots especially) means that when I take a snapshot of a production system, mount it on a test system, and start pounding the database on that test system they are probably going to feel it in a big way on the production system. EMC's approach doesn't suffer nearly as much from that issue. They have plenty of other issues, but at least killing my production system because of something that's happening on a test system isn't one of them.
--joerg
Posted by: Joerg Hallbauer | February 01, 2008 at 06:38 PM
If you consider snapshots, Raid6, performance, and cost, Sun's X4500 with ZFS is very interesting. I haven't benchmarked it but it has provided a 100 MB video stream to Windows based servers at a cost about an order of magnitude cheaper. After having laughed at Sun's storage for so long, it at least looks worth a try.
Posted by: Frank Palone | February 02, 2008 at 04:25 PM
Disclaimer; I'm a NetApp employee.
In reply to Joerg;
1. Were the NetApp snapshots read-only, or read-write? There's a big difference so it would be interesting to know.
There's no difference in performance or capacity between a read-only snapshot (as used in the SPC test) and writeable snapshots; unless (of course) the snapshot is used. And because of the NetApp system's architecture, there's no additional performance overhead when reading or writing to the snapshot as compared to the underlying LUN.
2. Were the snapshots actually used for anything while the SPC benchmarks were running?
No. Just as well in the case of the EMC box; where the NetApp system incurred a 3% overhead, with multiple 15 minute snapshots, the EMC system fell of a cliff with only 1 snapshot, and only 1 per hour.
Killing your production system with EMC is easy; just turn on snapshots in the first place. Using them becomes an academic exercise. I would have loved to have seen a test where the snapshots -- writeable or otherwise --- were used; but I suspect that the numbers would have been further confirmation that the CLARiiON's copy-on-first-write snapshots aren't enterprise strength technology.
Posted by: Alex McDonald | February 03, 2008 at 08:59 AM
Disclaimer - I am a Netapp Employee, my opinions are not impartial, I believe in the superiority of our technology.
Jeorg states - "the NetApp model for snapshots (read/write snapshots especially) means that when I take a snapshot of a production system, mount it on a test system, and start pounding the database on that test system they are probably going to feel it in a big way on the production system. EMC's approach doesn't suffer nearly as much from that issue."
1. When you say pounding the database, what workload are you talking about ? Writes in a Netapp environment dont generally suffer from spindle contention, and read hotspots are generally cached very effectively. If you _really_ hammered the database, Both arrays would probably degrade to unacceptable levels because of controller limitations rather than spindle contention.
2. If you want to have your test/dev data on completely different spindles, you can do that with NetApp too by splitting the flexclone, or by using snapmirror to a different disk aggregate. Nobody I know of does this; not because of cost, but because it doesn't seem to be necessary.
2. In my experience with both EMC and Netapp environments like the ones you describe where you hammer a mirrored data set, this is generally done usng Test/DR equipment on remotely mirrored data. Snapmirror makes this process easy and much less expensive than EMC equivalents.
3. Pointer based snapshots are generally much more useful than split mirrors, not just because of the costs ($$, power, cooling, rackspace etc), but because of their instantaneous nature. With Netapp snapshots you can make hundreds of snapshots in no time to accelerate application Test/Dev and DR testing.
I'm sure there are edge cases where a clariion will be a better solution for a given workload, but most of the time, you'll be better off with Netapp.
Posted by: John Martin | February 26, 2008 at 11:00 PM