Initial ZFS performance Stats

Some people have been asking for some performance stats from our Zpool. We’re still working on developing a rigorous testing mechanism to produce reliable and relevant numbers, but for those of you who cannot wait, here’s a few tidbits!

We’ve chosen FileBench for our initial benchmarking because it seems to be pretty popular, and is relatively easy to set up and get results from quickly. All of these tests were run with the default settings.

20 Drives Mirrored Vdev’s (2 drives mirrored, 10 sets of drives striped together)

Random Read/Write (randomrw test)
IO Summary: 1581822 ops 158173.0 ops/s, 98617/59556 r/w 1235.6mb/s, 15uscpu/op

Varmail
IO Summary: 214671 ops 5366.0 ops/s, 825/826 r/w 19.4mb/s, 140uscpu/op

20 Drives Mirrored Vdev’s with two Intel X25E ZIL drives (mirrored)

Random Read/Write (randomrw test)
IO Summary: 6230755 ops 155758.9 ops/s, 96568/59190 r/w 1216.8mb/s, 16uscpu/op

Varmail
IO Summary: 1500998 ops 37522.5 ops/s, 5773/5773 r/w 135.8mb/s, 93uscpu/op

20 Drives Mirrored Vdev’s with two Intel X25E ZIL drives (mirrored) and 2 Intel X25M-G2 cache drives

Random Read/Write (randomrw test)
IO Summary: 6295204 ops 157373.8 ops/s, 97506/59868 r/w 1229.5mb/s, 15uscpu/op

Varmail
IO Summary: 1489818 ops 37243.4 ops/s, 5730/5730 r/w 134.5mb/s, 93uscpu/op

We’re planning more in-depth testing, and some interesting tweaks to the test setup. Just a hint – Some of the random read/write tests are hitting almost 3000MB/sec. More to follow!

Monday, May 24th, 2010 Benchmarks

21 Comments to Initial ZFS performance Stats

  • rens says:

    Wow, i already think the 1235.6mb/s is a lot for 20 drives 🙂 With 7 drives (+1 hotspare) in hardware raid 10 i’m getting 290mb/s (same WD drives). 20/7 * 290 = 829 mb/s. ZFS is performing 50% better with this small test. SSD benchmarks are crazy, but the performance in practice will be more important.

  • admin says:

    Yeah, the ARC and L2ARC really seem to help. 158k IOPS is really awesome, and there is no way 20 SATA drives would be able to do that without the caching built into ZFS. On a standard hardware RAID controller in a non ZFS server, those same 20 drives would probably only be able to deliver 2k IOPS.

  • rens says:

    Yes, but also without the SSD’s you are hitting 1200mb/s if i read it correctly. That is much better than our hardware raid (areca) setups would get with 20 disks.

    How are you planning to backup the ZFS SAN by the way? Going to mirror it to a cheaper setup?

  • admin says:

    We will probably mirror to a cheaper setup, since the backup target won’t need nearly as much performance. For example, we won’t need SSD drives in the backup target and that will save some money.

  • rens says:

    Hope you could write a post on that when it’s ready. If i remember correctly you have some kind of zfs push function to send all updates. You can’t keep it 100% in sync (like drdb).

    Thanks again for all the great posts!

  • admin says:

    One way to mirror is to do snapshots and then push the changes to another node using SSH. It is not perfect, but it is cheap and easy. We are still considering other options for mirroring between multiple ZFS nodes. We plan to post an article about the topic at some point.

  • […] you are curious about what kind of performance to expect from this configuration, check out our initial FileBench benchmarks. Thursday, June 3rd, 2010 RAID, […]

  • Matt says:

    Would you mind showing a varmail test over Infiband? In the process of setting up our own ZFS we’re getting 33k iops for varmail test on the filer, but only 4k over iSCSI (at a transfer rate of 13MB/s). All other tests saturate our gig-e links. I am curious to see if you also see drastically reduced varmail over iscsi results.

  • admin says:

    We are still working on the InfiniBand network. It is still not working as well as we would have hoped. Once we have it working better, we’ll run some additional tests.

    As a general rule, we won’t be able to obtain performance levels over iSCSI (even with IB) that we see locally. The numbers posted on this page are from FileBench running directly on the OpenSolaris box.

  • are those big or little M’s you seem to be using MB and mb interchangeably.

  • admin says:

    screamingservers: The M’s and B’s in the output are copied and pasted directly from the output of the FileBench benchmark tool. I think the FileBench output is in MBytes/s, but feel free to look through the FileBench docs to confirm.

    We posted the FileBench results because a few people asked us to. We ran FileBench on the ZFS box, so the FileBench results are not really applicable as SAN benchmarks in this case. All of the other results we have posted on this site were from from another server over the network to get a SAN benchmark.

  • […] dedicated raid controllers vs. current software raid controller, I cannot agree. Have a read here. Sun have been running software arrays for years on their servers. The advantage with software is […]

  • I just ran some filebench tests on my older dual zeon 2.8 with 4GB ram openindiana sat2-mv8, 4 x 2tb RE4 in raidz, no ssd, or zil.

    varmail: IO Summary: 248656 ops, 4144.1 ops/s, (637/638 r/w) 14.5mb/s, 355us cpu/op, 12.4ms latency

    randomrw: IO Summary: 2551364 ops, 42520.2 ops/s, (24654/17499 r/w) 329.6mb/s, 61us cpu/op, 0.1ms latency

  • 3piece says:

    I was wanting to setup a similar SAN using Linux, however I read that ZFS doesn’t support parallel/simultanous file reads. Is this correct, and if so can you suggest a file system that dynamically manages tiered storage and offers parallel access. I was looking at Lustre or MooseFS, however they seem complex to manage.

  • admin says:

    ZFS for Linux is not really production ready at this point. If you want use ZFS, you should look at OpenIndiana (fork of latest build of OpenSolaris).

    By parallel/simultanous file reads, do you mean multiple processes connecting to to various files within one iSCSI target or do you mean multiple physical servers connecting to one iSCSI target? ZFS can easily do a lot of parallel IO, but if you need to have multiple physical servers connect to the same iSCSI target, you will want to format that iSCSI target using a cluster aware file system.

  • 3piece says:

    Yes, the second option “multiple physical servers connect to the same iSCSI target”, sorry I wasn’t more clear.

    From my understanding ZFS is NOT a cluster aware filesystem, please correct me if I’m wrong.

    Can you suggest a cluster aware file system that dynamically manages tiered storage (OpenSource of course) and is relatively easy to manage?

  • admin says:

    3piece: It is not really a ZFS issue. What you do is set up the ZFS pool and share the pool as an iSCSI target. Then mount the iSCSI target from the clients, most likely Linux or Windows servers. It is on the Linux or Windows servers where you would need to worry about formatting the iSCSI as a cluster aware file system. The Linux or Windows servers just see the iSCSI target as a big block level drive. The Linux or Windows servers do not see the ZFS pool directly.

  • 3piece says:

    Thank you for the info. So after setting up this SAN with ZFS on Solaris, I could then format it from an initiator as GSF2? Would it then be possible to have a diskless server that PXE booted from an image on the SAN and served XEN/KVM virtual machines from the SAN? I’m asking this as I’m about to build a 2 node HA cluster and was going to put 120GB RevoDrives SSD’s in each, however If I can network boot from the SAN, then I could use these drives in the SAN instead and come closer to my budget.

  • admin says:

    Yes, you could mount the iSCSI target from Linux and format it with the file system of your choice (including GSF2). You can probably get PXE or even iSCSI based booting set up if your BIOS supports it. I still prefer to run local boot drives in physical nodes and store virtualized drives on SANs. I don’t personally like booting physical nodes from the SAN, but a lot of guys do it with good results. I certainly would not use 120GB SSD drives as boot drives, though. I like the 40GB SSD drives for boot drives.

  • 3piece says:

    Owing to your willingness to share your knowledge, I’m now researching the best hardware to use for a similar SAN setup, however I plan on using a DP AMD G34 socket motherboard and am still looking at options.

    I do have another query that I’ve been trying to find the info for. I’m looking at buying 2 Supermicro H8DGT-HIBQF motherboards to use in the HA cluster to serve virtual machines. They have inbuilt “Mellanox ConnectX-2 IB with single QSFP connector support” which I’m lead to understand are 4x QDR ports capable of 40Gbps. I don’t have the budget to buy a Managed 16 port and above (I couldn’t find any 8 port) QDR Switch. Can I purchase a cheap SDR switch (i.e. Flextronics 8 Port 4X SDR InfiniBand Switch – Part ID: F-X430066) for the time being and connect the QDR ports on the motherboards and the SAN to it?

  • admin says:

    3piece: You should ask Mellanox that question. InfiniBand can be tricky to get working properly, especially if you mix and match things.

  • Leave a Reply

    You must be logged in to post a comment.