Frequent NFS hangs?

SMFII · Post by **SMFII** » 2012/03/16 14:19:22

I have CentOS 6.2 installed, and I'm seeing frequent NFS hangs. I haven't found any reports of an general issue, so I'm at a loss where to go...

My /etc/exports file is ridiculously simple:
[font=Courier]$ cat /etc/exports
/ftproot/evidence *(rw,all_squash,anonuid=10101,anongid=10)[/font]

I'm mounting the exported /ftproot/evidence locally on /evidence, have it listed in fstab:
[font=Courier]$ grep evidence /etc/fstab
UUID=82ae4073-a720-4a28-a14c-1c91172dc540 /ftproot/evidence ext4 defaults 1 2
localhost:/ftproot/evidence /evidence nfs intr 0 0
[/font]
There is nothing in dmesg or /var/log/messages to suggest any problem. I can mount everything just fine, but after a few minutes all attempts to access the share via nfs hang, and don't ever come back. Any attempt to 'ls' exiles processes into the dreaded D state, never to return...

The same file system is shared via samba, and samba access continues without issue, and I can always get to the data under the /ftproot/evidence mount point. So clearly, SOMETHING is going on with NFS - but what??

Thanks for any help!

Post by **toracat** » 2012/03/16 17:18:05

Without any info in /var/log/message, it is difficult to troubleshoot. Try enabling debugging by:

[code]
echo 1 > /proc/sys/sunrpc/nfs_debug
[/code]

and see if something turns up.

SMFII · Post by **SMFII** » 2012/03/19 12:42:25

I rebooted & added debug, and for a while things were good. An 'ls' of my mount would generate a lot of info in /var/log/messages, so I sat back to wait. Sure enough, I tail -f /var/log/messages and issue an ls, and the ls hangs...but there's absolutely nothing going into messages. Le sigh.

pschaff · Post by **pschaff** » 2012/03/19 13:10:51

Check other network-related things. Can you still access there server by other means? Any possibility of an IP address conflict?

SMFII · Post by **SMFII** » 2012/03/19 19:35:58

No, the network is small. IPv6 is disabled, there are no signs of any network related issues. Samba access continues to work just fine, only NFS seems affected.

There are 4 processes in ?? state:
[code]
1430 1427 2 c1b08ab0 ?? 0.0 5228 1736 bash
1515 1 2 c18ee570 ?? 0.0 2912 1324 rpc.mountd
13254 13187 2 c1fc0030 ?? 0.0 4496 788 ls
26916 28898 2 f316bab0 ?? 0.0 4496 808 ls
[/code]

They're all go into nfs3_rpc_wrapper.clone.0 and get stuck on __wait_on_bit at c082f7c2
[font=Courier]
crash> bt 13254
PID: 13254 TASK: c1fc0030 CPU: 2 COMMAND: "ls"
#0 [f312fd28] schedule at c082e833
#1 [f312fdec] rpc_wait_bit_killable at f87498ff [sunrpc]
#2 [f312fdf0] __wait_on_bit at c082f7c2
#3 [f312fe08] out_of_line_wait_on_bit at c082f853
#4 [f312fe3c] __rpc_execute at f8749dc7 [sunrpc]
#5 [ec5ade6c] rpc_run_task at f87437cc [sunrpc]
#6 [ec5ade78] rpc_call_sync at f87438e4 [sunrpc]
#7 [ec5adea0] nfs3_rpc_wrapper.clone.0 at f89cd816 [nfs]
[/font]

Some got there through __nfs_revalidate_inode, some through nfs3_proc_access.

Any suggestions for tracking down what they're waiting on?

CentOS