What we’ve learned – RAM

This post is possibly the most important lesson that we learned.  RAM is of MAJOR importance to Nexenta.  You can’t have enough of it.  Our original Nexenta deployment was 12GB of RAM.  It seemed like a silly amount of ram just a year ago.  Today we’re looking at it as barely a starting point.  Consider these facts :

1 – RAM is an order of magnitude (or more) faster than Flash.

2 – RAM is getting cheaper every day.

3 – You can put silly amounts of RAM in a system today.

4 – Data ages, and goes cold, and doesn’t get accessed as it gets older, reducing your Hot data footprint.

Lets go through these statements one by one.

1 – RAM is an order of magnitude (or more) faster than Flash.  Flash will deliver, on average, between 2,000 and 5,000 IOPS, depending on the type of SSD, the wear on the SSD, and garbage collection routines.  RAM has the capability to deliver hundreds of thousands of IOPS.  It doesn’t wear out, and there’s no garbage collection.

2 – RAM is getting cheaper every day.  When we built this platform last year, we paid over US $200 per 6GB of RAM.  Today you can buy 8GB Registered ECC DIMMS for under US $100.  16GB DIMM’s are hovering around US $300-$400.  Given the trends, I’d expect those to drop over the next year or two significantly.

3 – You can put silly amounts of RAM in a system today.  Last year, we were looking at reasonably priced boards that could fit 24GB of RAM in them.  Today we’re looking at reasonably priced barebones systems that you can fit 288GB of RAM in.  Insane systems (8 socket Xeon) support 2TB of RAM.  Wow.

4 – Data ages, goes cold, and doesn’t get accessed as much.  Even with only 12GB of RAM and 320GB of SSD, much of our working set is cached.  With 288GB of RAM, you greatly expand your capability of adding L2ARC (remember, L2ARC uses some of main memory) and increase your ARC cache capacity.  If your working set was 500GB on our old system you’d be running at least 200GB of it from spinning disk.  New systems configured with nearly 300GB of ARC and a reasonable (say 1TB) amount of L2ARC would cache that entire working set.  You’d see much of that working set cached in RAM (delivering hundreds of thousands of IOPS) part of it delivered from Flash (delivering maybe 10,000 IOPS), and only very old, cold data being served up from disk.  Talk about a difference in capabilities.  This also allows you to leverage larger, slower disks for older data.  If the data isn’t being accessed, who cares if it’s on slow 7200RPM disks?  That powerpoint presentation from 4 years ago isn’t getting looked at every day, but you’ve still got to save it.  Why not put it on the slowest disk you can find.

This being said, our new Nexenta build is going to have boatloads of RAM.  Maybe not 288GB (16GB DIMMS are still expensive compared to 8GB DIMMS) but I would put 144GB out there as a high probability.

 

Tuesday, November 15th, 2011 Configuration, Hardware

6 Comments to What we’ve learned – RAM

  • marrtins says:

    What happens when system goes down with writes cached in RAM? I read there is no 100% guarantee even for SSDs to complete writes at high load in case of power loss. Are this cache intended only for caching reads?

  • mbreitbach says:

    The SSD’s used in our build have been certified by Nexenta to properly flush their writes to stable storage when required to do so.

    There are also two types of writes, synchronous writes, and asynchronous writes. Synchronous writes are flushed to stable storage before the write is acknowledged to the application. This means that if the application requests a synchronous write, it goes to stable storage before the application moves on to the next piece of data. Thus the application is aware of whether that data has been written or not.

    If the application is issuing asynchronous writes, then those writes can be lost if there is a power failure or system crash. Typically for important data an application will issue synchronous writes to protect that data. SQL databases would be a great example of this.

    As to your other question – most of the SSD capacity that we had was used for read caching. The data is never permanently stored on those SSD’s, so whether the writes complete to the cache SSD’s during a power failure does not really matter. After a reboot, those devices are treated as empty and “cold” and the cache is rebuild from scratch.

  • shotel says:

    Not only has the price of RAM become ridiculously cheap (in contrast to past prices), but SSD cost are now equal to 15K SAS2 spinning rust disk as well per GB. (see: OCZ OCTANE SATA III 2.5″ SSD).

    With that said, we are considering a 8 drive/8TB raw RAIDz volume composed entirely of SSD’s…not for Zil or L2ARC, but as a main production volume for multiple I/O hog VM databases (Zil/L2ARC be dammed).

    Conversely, $8 Grand will buy more than 512GB of DDR3/1333 ECC memory (16GB sticks in a SM X8QB6-LF).

    We’re hosting primarily DB and custom applications -WS2K8/Citrix/ESXi – so PB of capacity is not our need, but TB’s of I/O’s that can saturate 10GbEth is in our Solaris ZFS/Nexenta environment.

    Which would you consider the better investment?

  • I would tend to lean towards more RAM and then strategic use of SSD for L2ARC. The OCZ Octane is not on the HSL yet, and may never make it there. I tested some OCZ devices that behaved very poorly and caused SCSI bus resets. That device was not on the HSL, and will likely never get there. Since then I have been very cautious with picking hardware for builds.

    Secondly – the use of SSD’s as your primary storage does not negate the need for a ZIL device. If you put your ZIL on the SSD’s, not only will it be slower than a dedicated ZIL device, it will put excessive wear on those SSD’s that you are using for primary storage.

    I would look at a system that is RAM heavy (512GB may be overkill) that has 1-2TB of certified SSD. I would look at the OCZ Talos line of SSD’s.

    You’ll also want to find out what kind of working set you’re looking at. It’s likely that a lot of the data in your DB isn’t accessed all of the time. What you’ll probably find is that 10-20% of your DB is active, and the rest of it is data that has aged to the point that it doesn’t get accessed nearly as often.

  • jcdmacleod says:

    How are you finding the CPU usage on this? With the QC 2.0GHz?

  • The CPU usage is ok for iSCSI traffic, but for NFS traffic it needed a little more oomph. We upgraded to a 6 core 2.4Ghz proc to give some additional horsepower. I believe there are some tuning paramaters on the NFS layout that we can probably tweak to get even a little bit better performance on it. My recommendation today would be to buy as much CPU as you can afford. It will really help with compression and if you use NFS.

  • Leave a Reply

    You must be logged in to post a comment.