Dtrace broken with SRP targets?

Anyone that’s using Infiniband, SRP targets, Dtrace, and version 3.1.3.5 of Nexenta Community edition, please raise your hands.

Nobody?  Not suprising 🙂  Pulled some dtrace scripts off of another system to evaluate performance on the ZFSBuild2012 system, and got a very wierd error :

 

#./arcreap.d

dtrace: failed to compile script ./arcreap.d: “/usr/lib/dtrace/srp.d”, line 49: translator member ci_local definition uses incompatible types: “string” = “struct hwc_parse_mt”

 

I’ve never seen this before, and the exact same script on ZFSBuild2010 works flawlessly.  Something in SRP I’m guessing, that’s the part that’s throwing the error, and we aren’t using SRP on the ZFSBuild2010 system.  If anyone at Nexenta or anyone working on the Illumos project sees anything here that make sense, I’d love to hear about it.

Thursday, December 27th, 2012 ZFS

4 Comments to Dtrace broken with SRP targets?

  • Nick Bryant says:

    Probably doesn’t help your existing issue, but is it worth testing with the Nexenta 4 beta milestone 20 release? I understand IB and SRP are ‘officially’ supported in that version.

    http://nexentastor.org/boards/13/topics/8785

  • ZFSBuild2012 is in production already, so unfortunately we cannot upgrade to beta 4 to test. I have a suspicion though based on some conversations that I’ve had that the SRP stuff is still broken in 4.0.

  • eMiz0r says:

    “Raising our hand” 😉

    Yes we are (or should I say were) using Infiniband with SRP targets on Nexenta 3.1.3.5 and had dtrace broken after the update 🙂

    Unfortunately… this was a production environment. After experiencing some serious issues (after about 8 months) with randomly losing LUN’s and/or locked datastores, we switched back to IPoIB. The only way to recover from those missing LUN’s (and hanging VM’s) was to reboot the Nexenta box which caused us a lot of troubles.

    Are you using SRP in combination with ESXi? If so, is it still possible for you to review the ESXi logs in /var/log/messages? We’ve seen during those outages massive amounts of H:0x7, H:0x2 and Storage Initiator errors in the logs. After switching to IPoIB, they all disappeared. We’re not sure yet what was causing our issues, but we think it could be related with bugs in COMSTAR in combination with SRP. Mellanox and Nexenta are currently reviewing our logs.

    As soon as we have another testsetup back online we’re able to investigate the dtrace issues. Every Nexenta box we’ve upgraded to 3.1.3.5 also has broken gauges by the way. No idea where that comes from, but after some reboots they magically work… and after another reboot they’re broken again. We’ve got clue yet 🙂

  • We are using SRP with Windows 2008R2 and Hyper-V, so unfortunately we cannot comment on ESXi logs. As far as the gauges goes, we’ve had that on and off with every version of Nexenta that we’ve used. I’ve also had it work in some browsers, but not others. I think it’s more of a browser bug rather than a bug on the Nexenta side.

  • Leave a Reply

    You must be logged in to post a comment.