HowTo : Our Zpool configuration

We’ve decided to go with striped mirrored vdev’s (similar to RAID10) for our ZFS configuration. It gives us the best performance and fault tolerance for what we use the system for. To reproduce our ZFS configuration, you would use all of the commands in the image below (assuming your drives were named the same way ours were) :

Our ZFS Configuration

We have striped together 9 mirrored Vdev’s (like RAID 10 – 18 drives total) consisting of Western Digital 1TB RE3 SATA drives. Once they were mirrored and striped together, we added a mirrored pair of Intel X25E 32GB SSD drives for the ZIL (ZFS Intent Log) for enhanced write performance. We also added two Intel X25M-G2 160GB SSD drives for the L2ARC (Read) cache. We also added two spare Western Digital 1TB RE3 SATA drives as hot spares.

We believe that this will be the best performing ZFS configuration for our scenarios. We plan to benchmark this configuration and compare it with a few other possible configurations (RAID10 vs RAID50).

If you are curious about what kind of performance to expect from this configuration, check out our initial FileBench benchmarks.

Thursday, June 3rd, 2010 RAID, ZFS

12 Comments to HowTo : Our Zpool configuration

  • Benji says:

    Great series of articles!

    Will you elaborate as what happens when a drive in the mirrored vdev fails? How is it reported to you? How do you identify which physical drive has failed? What zfs commands are required to replace it?

    Thanks!

  • admin says:

    We are actually planning to post an entire article that answers those questions. It is well beyond the scope of a simple comment post.

  • intel says:

    Hi

    Can you please post how you are monitoring the performance and when you realize there is a bottleneck? Also have you found a graphical tool to let you determine this? Sun Fishworks requires a license: http://www.youtube.com/watch?v=tDacjrSCeq4

    I would also like to know if you have found a way to setup automated failover, iSNS or such?

    Thanks

  • admin says:

    If you are refering to the Initial FileBench Benchmarks (http://www.zfsbuild.com/2010/05/24/initial-zfs-performance-stats/), those numbers are from FileBench running on the OpenSolaris box. While those numbers are impressive, they are not as useful as benchmarks run from nodes within our bladecenters. We we currently running benchmarks from nodes in our bladecenters and from virtual machines running under various hypervisors on those nodes. Those benchmarks will be a lot more useful.

  • intel says:

    No, I wasn’t referring to that kind of benchmark. More like ARC usage and disk-io (hardware utilization rw ops) Nexenta.org a ZFS Appliance based on ZFS has a web GUI – no ZFS replication however, and a limit of 12TB storage on the free edition. I got it working, but I want to try setup an Opensolaris and get the same features or something close to that.

  • admin says:

    Ok. I understand what you are asking about now. I have played around with the Nexenta stuff a bit and it is slick. I am not sure what the easiest way would be to implement some of the graphs into a normal OpenSolaris deployment. One option might be to grab SNMP metrics using MRTG.

  • rens says:

    New posts expected soon? :)

  • admin says:

    Yes, we are planning to post some additional articles soon.

  • avatar says:

    Is it possible to use the TRIM function on an SSD opensolaris installation …?

  • admin says:

    avatar: TRIM support was definitely not implemented in earlier build. I have heard that it has been added into later builds, but I have not personally checked the source code to confirm. Either way, ZFS already does CoW which reduces the need for TRIM support.

  • jcdmacleod says:

    It looks like we will be going with a similar disk setup, I was going to load the chassis day one, but finance nixed that idea with a “we don’t have the data, why do we need the drives?” comment.

    That brought up a good question. What is the best practice when growing/adding devices to the zpool? As I understand it, ZFS will immediately span writes over all available spindles, but reads will still be done from the original set.

    Is there a way to force zfs to balance the existing data over all of the devices?

  • There is no built in re-balance function at this time. Depending on how you are sharing the data out, and what your maintenance windows look like, you could offline the system and copy/move the data to a new folder on the system. This would read the data out, then write it back to disk balancing across all new disks.

    If you just add new disks to the pool it will write across all disks, concentrating more on the new disks. If you have a large dynamic set of data, it should rebalance on it’s own. If you put 20TB of static data on there that doesn’t ever get edited, and then add additional data, it’s never going to properly rebalance.

  • Leave a Reply

    You must be logged in to post a comment.