When is enough memory too much? Part 2

So one of the Nexenta systems that I’ve been working on quadrupled memory, and ever since then has been having some issues (as detailed in the previous post – that was actually supposed to go live a few weeks ago).  Lots of time spend on Skype with Nexenta support has led us in a few directions.  Yesterday, we made a breakthrough.

We have been able to successfully correlate VMware activities with the general wackyness of our Nexenta system.  This occurs at the end of a snapshot removal, or at the end of a storage vmotion.  Yesterday, we stumbled across something that we hadn’t noticed before.  After running the storage vmotion, the Nexenta freed up the same amount of RAM from the ARC cache as the size of the VMDK that just got moved.  This told us something very interesting.

1 – There is no memory pressure at all.  The entire VMDK got loaded into the ARC cache as it was being read out.  And it wasn’t replaced.

2 – Even after tuning the arc_shrink_shift variables, we were still freeing up GOBS of memory.  50GB in this case.

3 – When we free up that much RAM, Nexenta performs some sort of cleanup, and gets _very_ busy.

After reviewing the facts of the case, we started running some dtrace scripts that I’ve come across. Arcstat.pl (from Mike Harsch) showed that as the data was being deleted from disk, arc usage was plummeting, and as soon as it settled down, the arc target size was reduced by the same amount.  When that target size was reduced, bad things happened.

At the same time, I ran mpstat to show what was going on with the CPU.  While this was going on, we consistently saw millions of cross-calls from one processor core to another, and 100% system time.  The system was litterally falling over trying to free up RAM.

Currently the solution that we have put into place is setting arc_c_min to arc_max -1GB.  This has so far prevented arc_c (target size) from shrinking aggressively and causing severe outages.

There still appears to be a little bit of a hiccup going on when we do those storage vmotions, but the settings that we are using now appear to at least be preventing the worst of the outages.

Monday, March 5th, 2012 Hardware

9 Comments to When is enough memory too much? Part 2

  • mysidia says:

    I would probably say more than 32gb of RAM or so per CPU socket is possibly getting to be a bit much, depending on the architecture, and a high arc_c_min and arc max seems like a reasonable workaround when there are stupid amounts of RAM available.

    I would call freeing up huge chunks of memory like that a bug. I’m sure freeing up the memory is helpful on a memory bound system with applications running on it, but there should be sanity checks.

    The workload generated by such cleanup activity ideally ought to be a background activity scheduled like any process, when it’s taking too many CPU cycles, free up smaller chunks of memory at a time, then let other kernel threads have a chance to run, resume later.

    Just as the workload generated by ZFS operations ought to be rate-throttled background activity, when it’s taking too much IO; it’s a bit bothersome when a background snapshot or clone destroy operation takes a long time, then operations such as “zfs list” block, and sometimes filesystem I/Os block, or NFS I/O is severely degraded or frozen entirely for what should be low-priority cleanup maintenance.

    It’s a pattern I think I see over and over again.
    The FS doesn’t necessarily always do a good job making sure low-priority/bulk maintenance tasks do not block things, or kill performance.

  • This system previously had 48GB of RAM total, and was memory constrained with only 48 spindles attached to it. Expansion of the system is to include the possibility of another 96+ spindles, and while there was opportunity the RAM was upgraded in the system.

    I would tend to agree with you that on a system that has other applications running RAM should be free’d as quickly as possible (for a SQL server, or a virtual machine) but for a storage-specific appliance, there should not be anything running that would compete for that memory, and therefore those operations should be pushed to the background. Freeing 50+GB of RAM in one fell swoop, and tying up the CPU for 10 seconds or more causing problems with iSCSI and NFS should be avoided whenever possible.

  • haj says:

    Did you ever find a solution for this problem?
    I’m seeing it too, and a migration of a large VM can literally hang my 196GB fileserver for minutes when it’s freeing cache. Not sure if arc_shrink_shift set to 11 is actually making it worse.

  • We never found a great end-all solution to this. We’ve been seeing fewer problems as memory pressure increased (since a 1 time read wouldn’t elevate it into the arc). You may want to actually try shrinking the amount of RAM available to your server. I know it’s a silly thing to try, but dropping your RAM back to 96GB may actually be a good thing. If you’re seeing that much data sitting in the arc that’s only been accessed once, you’ve got too much RAM. I know that all of the documentation says that you can never have too much RAM, but the reality is that right now, with that much memory being mostly idle, it’s hurting during svmotions more than it would hurt by decreasing the amount of RAM.

    I would do this as a software switch though (set arc_c to about half of the RAM available) rather than physically removing the RAM. This way you can gradually increase RAM available to ARC cache without a reboot.

  • haj says:

    I was actually thinking that one solution could be to lower the arc max a gig every ten minuts in a loop before doing any large storage vMotions. 😉 Just seems a bit silly..

  • I think that’s a possible solution, but I think a better solution would be for the ZFS developers to fix this in-code. I understand releasing memory should be a priority, but releasing 50GB or 100GB of RAM all at once shouldn’t bring a storage system to a standstill for minutes at a time. This should be a lazy cleanup of RAM somehow. Unfortunately I am not nearly well versed enough coding _anything_ to tackle this.

  • […] 64GB RAM – Generic Kingston ValueRAM.  The original ZFSBuild was based on 12GB of memory, which 2 years ago seemed like a lot of RAM for a storage server.  Today we’re going with 64 GB right off the bat using 8GB DIMM’s.  The motherboard has the capacity to go to 256GB with 32GB DIMM’s.  With 64GB of RAM, we’re going to be able to cache a _lot_ of data.  My suggestion is to not go super-overboard on RAM to start with, as you can run into issues as noted here : http://www.zfsbuild.com/2012/03/05/when-is-enough-memory-too-much-part-2/ […]

  • jwarnier says:

    Is this not all about Large Pages (related to the TLB)?

    Freeing a lot of memory pages can be a time-consuming process on most CPUs, so any way to significantly reduce their number might help.

  • I am not sure if this is related to Large Pages or not. I believe that this is being worked on from the Nexenta/Illumos side of things.

  • Leave a Reply

    You must be logged in to post a comment.