When is enough memory too much?

Good question.  One would think that there’s never too much memory, but in some cases, you’d be dead wrong (at least, not without tuning).  I’m battling that exact issue today.  On a system that I’m working with, we upgraded the RAM from 48GB to 192GB of RAM.  ZFS Evil Tuning guide says don’t worry, we auto-tune better than Chris Brown.  I’m starting to not believe that.  We’ve been intermittently seeing the system go dark (literally dropping portchannels to Cisco Nexus 5010 switches), then roaring back to life.  Standard logging doesn’t appear to be giving much insight, but after digging through ZenOSS logs and multiple dtrace scripts, I think we’ve found a pattern.

It appears as though by default, Nexenta will de-allocate a certain percentage of your memory when it does memory cleanup related to the ARC cache.  When you get to larger memory systems, the amount of memory it frees grows.  I monitored an event where it free’d up something to the tune of 8GB of RAM.  That happened to coincide with a portchannel dropping.

Through all of this, support has been great.  We’ve been tuning the amount of memory it free’s up.  We’ve tuned the minimum amount of RAM to free up (in an effort to get it to free memory more often).  We’ve allocated more memory to ARC metadata.  Pretty much we’ve thrown the kitchen sink at it.  The last tweak was done today, and I’m monitoring the system to see if we continue to see problems.  Hopefully, once this is all done I can post some tuneables  for larger memory systems.

Friday, March 2nd, 2012 Hardware

4 Comments to When is enough memory too much?

  • Eugene says:

    What system variables do you tunes?

  • We’ve modified a few things on the Nexenta platform to get things under control.

    1 – arc_shrink_shift – This variable controls the amount of RAM that arc_shrinks will try to reclaim. By default this is set to 5, which equates to shrinking by 1/32 of arc_max. We tuned this to 11, which is 1/2048 of arc_max. Based on that, we would be shrinking the arc by about 100MB per shrink event, rather than 6GB of RAM.

    2 – arc_p_min_shift – This variable sets the minimum size of the MRU cache. If most cache hits come from MFU (most frequently used) then reducing MRU can improve the MFU hit rate. Default value is 4 (1/16th of arc_max) we changed to 9 (1/512th of arc_max).

    We’re exploring some other options as well, and I’m putting together a full post of what we’ve found. Hope to have more info later this week.

  • sor says:

    Did you ever find a resolution to this?

  • sor says:

    ahh, found part 2. You should link to it here.

  • Leave a Reply

    You must be logged in to post a comment.