CentOS 6.2 CIFS Network Freeze (ds, mount, ls)
Posted: 2012/01/06 18:06:38
Our CentOS 6.2 system on a blade server (hostname: Blade1) sporadically freezes when accessing drives mounted with mount.cifs. When the issue occurs, the command df will freeze when checking cifs mounts. ls will also freeze when run in the directory where the cifs mounts are mounted. Eventually, several minutes later, the system will respond, and continue normally. All of our work is being done with networked Windows cifs mounted drives, and when this happens, we cannot access the networked drives, so this freezing is a huge hindrance.
I've done Google searches on the topic, and nothing seems to resolve the issue.
The kernel is 2.6.32-220.2.1.el6.x86_64. This is a minimal install with no GUI, and additional networking packages added to facilitate Samba integration onto our Windows Server network. The cifs mounts are being hosted by a Windows SBS Server.
output of modinfo /lib/modules/2.6.32-220.2.1.el6.x86_64/kernel/fs/cifs/cifs.ko:
version: 1.68
The issue always occurs at the same time /var/log/messages includes messages of the following type:
Jan 6 11:40:38 Blade1 kernel: CIFS VFS: Unexpected lookup error -512
And you can see where the df command hangs:
Jan 6 11:46:19 Blade1 kernel: INFO: task df:2387 blocked for more than 120 seconds.
Jan 6 11:46:19 Blade1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 6 11:46:19 Blade1 kernel: df D 000000000000000d 0 2387 2373 0x00000080
Jan 6 11:46:19 Blade1 kernel: ffff882ffbea7bc8 0000000000000086 0000000000000000 0000000000000880
Jan 6 11:46:19 Blade1 kernel: 0000000000000000 ffff882ffbea7b88 ffff882ffbea7bc8 ffffffff8122b9e4
Jan 6 11:46:19 Blade1 kernel: ffff882ffb885078 ffff882ffbea7fd8 000000000000f4e8 ffff882ffb885078
Jan 6 11:46:19 Blade1 kernel: Call Trace:
Jan 6 11:46:19 Blade1 kernel: [] ? context_struct_compute_av+0x324/0x420
Jan 6 11:46:19 Blade1 kernel: [] __mutex_lock_slowpath+0x13e/0x180
Jan 6 11:46:19 Blade1 kernel: [] ? find_nls+0x59/0x100
Jan 6 11:46:19 Blade1 kernel: [] mutex_lock+0x2b/0x50
Jan 6 11:46:19 Blade1 kernel: [] cifs_reconnect_tcon+0x15a/0x340 [cifs]
Jan 6 11:46:19 Blade1 kernel: [] ? mntput_no_expire+0x30/0x110
Jan 6 11:46:19 Blade1 kernel: [] ? avc_has_perm+0x71/0x90
Jan 6 11:46:19 Blade1 kernel: [] ? __link_path_walk+0x768/0x1030
Jan 6 11:46:19 Blade1 kernel: [] smb_init+0x39/0x70 [cifs]
Jan 6 11:46:19 Blade1 kernel: [] CIFSSMBQFSInfo+0x64/0x250 [cifs]
... and it goes on.
Based on web searches, I've tried decreasing wsize in the mount options, and tried changing parameters in /proc/fs/cifs/, yet this issue still occurs. I believe that this issue may have been reported on other Linux Distros, and resolutions seem to rely on Kernel updates and updates to the cifs version, but I am hoping there is a workaround I can use in the meantime.
I've done Google searches on the topic, and nothing seems to resolve the issue.
The kernel is 2.6.32-220.2.1.el6.x86_64. This is a minimal install with no GUI, and additional networking packages added to facilitate Samba integration onto our Windows Server network. The cifs mounts are being hosted by a Windows SBS Server.
output of modinfo /lib/modules/2.6.32-220.2.1.el6.x86_64/kernel/fs/cifs/cifs.ko:
version: 1.68
The issue always occurs at the same time /var/log/messages includes messages of the following type:
Jan 6 11:40:38 Blade1 kernel: CIFS VFS: Unexpected lookup error -512
And you can see where the df command hangs:
Jan 6 11:46:19 Blade1 kernel: INFO: task df:2387 blocked for more than 120 seconds.
Jan 6 11:46:19 Blade1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 6 11:46:19 Blade1 kernel: df D 000000000000000d 0 2387 2373 0x00000080
Jan 6 11:46:19 Blade1 kernel: ffff882ffbea7bc8 0000000000000086 0000000000000000 0000000000000880
Jan 6 11:46:19 Blade1 kernel: 0000000000000000 ffff882ffbea7b88 ffff882ffbea7bc8 ffffffff8122b9e4
Jan 6 11:46:19 Blade1 kernel: ffff882ffb885078 ffff882ffbea7fd8 000000000000f4e8 ffff882ffb885078
Jan 6 11:46:19 Blade1 kernel: Call Trace:
Jan 6 11:46:19 Blade1 kernel: [] ? context_struct_compute_av+0x324/0x420
Jan 6 11:46:19 Blade1 kernel: [] __mutex_lock_slowpath+0x13e/0x180
Jan 6 11:46:19 Blade1 kernel: [] ? find_nls+0x59/0x100
Jan 6 11:46:19 Blade1 kernel: [] mutex_lock+0x2b/0x50
Jan 6 11:46:19 Blade1 kernel: [] cifs_reconnect_tcon+0x15a/0x340 [cifs]
Jan 6 11:46:19 Blade1 kernel: [] ? mntput_no_expire+0x30/0x110
Jan 6 11:46:19 Blade1 kernel: [] ? avc_has_perm+0x71/0x90
Jan 6 11:46:19 Blade1 kernel: [] ? __link_path_walk+0x768/0x1030
Jan 6 11:46:19 Blade1 kernel: [] smb_init+0x39/0x70 [cifs]
Jan 6 11:46:19 Blade1 kernel: [] CIFSSMBQFSInfo+0x64/0x250 [cifs]
... and it goes on.
Based on web searches, I've tried decreasing wsize in the mount options, and tried changing parameters in /proc/fs/cifs/, yet this issue still occurs. I believe that this issue may have been reported on other Linux Distros, and resolutions seem to rely on Kernel updates and updates to the cifs version, but I am hoping there is a workaround I can use in the meantime.