Why We Chose InfiniBand Instead of 10GigE

For years we have successfully connected all of our blade centers to our storage area networks using 1GigE. Each time we needed more bandwidth, we simply added more networking ports. For our ZFS Build project, we decided to break from this tradition and try out higher performance networking solutions in place of the 1GigE networking.

The obvious upgrades to consider would be Fibre Channel, direct SAS, 10GigE, and various flavors of InfiniBand. Each of those options can deliver significantly better performance per port than 1GigE, in terms of both bandwidth and latency. We also had to factor in some limitations of our specific situation. All of our blade centers are SuperMicro SBE-710E units. The SBE-710E cannot connect to Fibre Channel or SAS, so we had to rule those options out. There are other blade centers available that do have options for connecting to Fibre Channel and SAS, but we did not want to replace all of our existing blade centers and blade modules.

With the SBE-710E, you can run multiple 1GigE switches and you have the option to install InfiniBand or 10GigE. We were already using dual 1GigE switches, and all of our iSCSI traffic was already segregated onto the second 1GigE switch. The next logical upgrade to us would be either 10GigE or InfiniBand.

Naturally, we assumed we would ultimately choose 10GigE for two reasons. We figured InfiniBand would cost a lot more money, and we actually did not have any immediate need for InfiniBand levels of performance. InfiniBand simply seemed like an unnecessary luxury.

Then we realized something very interesting about how 10GigE is implemented on the SuperMicro SBE-710E units. To install 10GigE into the existing blade modules in each blade center, we would actually need to install an InfiniBand mezzanine card into each blade. Since the exact same mezzanine cards would be used for both 10GigE and InfiniBand, there was no cost difference to upgrade the blade modules to InfiniBand instead of 10GigE. This is significant, because there are ten blade modules in each blade center. If there was a cost difference, then the difference would be multiplied by ten for each blade center.

The other major upgrade would be the new networking switch that would need to get installed into each blade center to let the new mezzanine cards talk to each other and to the ZFS server located outside of the blade centers. If we chose to use 10GigE, we would need to install the 10GigE pass-through unit, part number SBM-XEM-002. In addition to the pass-through module, we also need an external 10GigE switch. The external switch would be required to let all of the blade modules share the ZFS server.

If we went with InfiniBand, we would need to install an InfiniBand switch, part number SBM-IBS-001. While the InfiniBand switch was more expensive than the 10GigE pass-through module, the InfiniBand switch eliminated the need for any additional external network switch. In our specific situation, the total cost of upgrading to InfiniBand was actually less than 10GigE. Both 10GigE and InfiniBand would have worked quite well. We literally chose InfiniBand because it cost less in our case. We were surprised, since we had always assumed InfiniBand was going to cost more. It is not common to see the clearly higher performance technology actually cost less. To clearly demonstrate the performance difference between common current datacenter interconnects, check out this graph.

Datacenter Interconnects

Specifics about InfiniBand
There are several flavors of InfiniBand. The InfiniBand that we ended up using is 4x DDR, which is commonly known as 20Gbps InfiniBand. That means each network port can run 20Gbps. If we had been using the SuperMicro SBE-710Q, then we would have had the option to run 4x QDR, which would have been 40Gbps per network port. Our SBE-710E blade centers are “limited” to 20Gbps per network port, or a total of 960Gbps for the entire switch. This limitation did not bother us in the least, since we could have gotten by with 10Gbps just fine.

On each ZFS server, we have two InfiniBand 4x DDR ports for a combined total bandwidth of 40Gbps. To have this much bandwidth with conventional 1GigE, we would have needed 40 ports on each ZFS server. I can only imagine the cabling mess that could have been.

In addition to using InfiniBand for connecting the blade centers to the ZFS servers, another exciting use for the InfiniBand is during Live Migrations of virtual machines. Now when we need to quickly live migrate a VM to another blade module, the InfiniBand network can be used instead of the 1GigE network. This means virtual machines can be live migrated even more quickly than before.

Thursday, April 15th, 2010 InfiniBand

3 Comments to Why We Chose InfiniBand Instead of 10GigE

  • LB@SDS says:

    Now that i have read every word of this site, I have some definite plans now on how I would approach our first stab at building a custom high performance SAN target from commodity hardware.

    Would you be interested in discussing with me a myriad of questions i have about my plan? I really could use a few answers and I fear i may not get all the questions into a post here.

    I will put a few initial ones here and see what you say, but since there is no obvious way to email you that I can see, I’ll start here.

    1) I think a mixed 10gb/infiniband filer head is our target now.

    I would like to use a seperate 1, 2, or 3U chassis with external SAS cabling (in/out) to JBOD boxes.

    This would make moving and upgrading the system easier I think.

    Thoughts on this physical setup?

    2) I see that NEXENTA CORE outperformed everything. and it’s not available anymore, therefore, what is the option?

    3) I do want a GUI to manage the SAN. (I’m noty oppsed to command line, but i avoid it if i can)

    So, Can NAPP-IT on top of nexenta core solve the problem of not having a GUI?

    I seriously have a lot of questions, so some phone time would be optimal.

    Are you open to that?

  • I’m open to conversations over Skype or email if you’re interested in that. You can contact me at matt.breitbach.

    As far as split 10GbE/IB would work fine, but it may make management a little more involved.
    Using a dedicated head and SAS cables is the preferred method for future upgrade ability and a requirement for HA.

    As far as your other questions, No, Nexenta-core is no longer available. Regular Illumos builds would be your best bet if you want command line only. If you want a GUI and you are using under 18TB, Nexenta community is a good choice but you don’t get the advanced lun mapping/masking that the enterprise edition gets you. Napp-It looks promising, and could be a fine product for your needs.

    Personally, in any bigger environments, we’ve gone with Nexenta Enterprise. Support is fantastic, and the performance characteristics are wonderful. Hope to post up some information about one of the large HA systems that we’ve worked with in the near future.

  • […] If you want iscsi or nfs then minimally you’ll want a few 10/40gb ports or infiniband which is the cheapest option by far but native storage solutions for infiniband seem to be limited. The issue will be the module for the bladecenter what are its options, usually 8gb fc or 101gbe and maybe infiniband. Note that infiniband can be used with nfs and nothing comes closed to it in terms of performanceprice. if the blade center supports qdr infiniband i’d do that with a linux host of some kind with an qdr infiniband tca via nfs. Here’s a good link describing this http://www.zfsbuild.com/2010/04/15/why-we-chose-infiniband-instead-of-10gige […]

  • Leave a Reply

    You must be logged in to post a comment.