Centos 7.7 - Frequent kernel panics on long running I/O processes?
Posted: 2019/11/12 11:42:25
Hi,
I am having huge problems with kernel panics and hanging systems on Centos 7.7 (1908) on some new hardware (Intel Xeon Silver 4214.)
I do not get these issues on Centos 7.2 with older hardware but I cannot downgrade to that OS version on the new systems as the CPU's are not supported by 7.2.
Does anyone have any insight into kernel panics on long I/O processes?
I have tried network shares via AUTOFS and FSTAB and various SMB versions and get no success in reducing this issue (as we are outputting data to image sequences on our network.) The system will run for X amount of time then panic and hang forever and only a reboot lets the system run again for a while. I have plenty of available RAM and Swap memory. i am running a 10gb connection aswell, the network does not seem to be getting taxed over its limit.
Here is a dump below to hopefully at least prompt some help as a starting point.
This has been driving me mad for months. Centos 8 is not suitable due to the number of package changes in it and it also not being a VFX supported platform.
Thanks,
--------------------------------
I am having huge problems with kernel panics and hanging systems on Centos 7.7 (1908) on some new hardware (Intel Xeon Silver 4214.)
I do not get these issues on Centos 7.2 with older hardware but I cannot downgrade to that OS version on the new systems as the CPU's are not supported by 7.2.
Does anyone have any insight into kernel panics on long I/O processes?
I have tried network shares via AUTOFS and FSTAB and various SMB versions and get no success in reducing this issue (as we are outputting data to image sequences on our network.) The system will run for X amount of time then panic and hang forever and only a reboot lets the system run again for a while. I have plenty of available RAM and Swap memory. i am running a 10gb connection aswell, the network does not seem to be getting taxed over its limit.
Here is a dump below to hopefully at least prompt some help as a starting point.
This has been driving me mad for months. Centos 8 is not suitable due to the number of package changes in it and it also not being a VFX supported platform.
Thanks,
--------------------------------
Nov 6 11:28:37 RENDER_NODE1L kernel: INFO: task mantra-bin:37395 blocked for more than 120 seconds.
Nov 6 11:28:37 RENDER_NODE1L kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 6 11:28:37 RENDER_NODE1L kernel: mantra-bin D ffff8da4572f20e0 0 37395 37248 0x00000080
Nov 6 11:28:37 RENDER_NODE1L kernel: Call Trace:
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91d7eb09>] schedule+0x29/0x70
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91d804f5>] rwsem_down_read_failed+0x105/0x1c0
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff919913f8>] call_rwsem_down_read_failed+0x18/0x30
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91d7dc90>] down_read+0x20/0x40
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffffc0c5184b>] cifs_has_mand_locks+0x1b/0x80 [cifs]
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffffc0c533fd>] cifs_reopen_file+0x53d/0x840 [cifs]
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffffc0c54005>] cifs_readpage_worker+0x195/0x630 [cifs]
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91986f04>] ? __radix_tree_lookup+0x84/0xf0
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffffc0c54765>] cifs_readpage+0x85/0x240 [cifs]
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff917bdb10>] generic_file_aio_read+0x3f0/0x790
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffffc0c5ab79>] cifs_strict_readv+0x149/0x180 [cifs]
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91848353>] do_sync_read+0x93/0xe0
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91848d8f>] vfs_read+0x9f/0x170
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91849c4f>] SyS_read+0x7f/0xf0
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91d8bede>] system_call_fastpath+0x25/0x2a
Nov 6 11:28:37 RENDER_NODE1L kernel: INFO: task kworker/29:1:38802 blocked for more than 120 seconds.
Nov 6 11:28:37 RENDER_NODE1L kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 6 11:28:37 RENDER_NODE1L kernel: kworker/29:1 D ffff8da455ce1070 0 38802 2 0x00000080
Nov 6 11:28:37 RENDER_NODE1L kernel: Workqueue: cifsoplockd cifs_oplock_break [cifs]
Nov 6 11:28:37 RENDER_NODE1L kernel: Call Trace:
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91d7eb09>] schedule+0x29/0x70
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91d80245>] rwsem_down_write_failed+0x215/0x3c0
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91991427>] call_rwsem_down_write_failed+0x17/0x30
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91d7dcdd>] down_write+0x2d/0x3d
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffffc0c528ab>] cifs_oplock_break+0xdb/0x390 [cifs]
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff916bd1df>] process_one_work+0x17f/0x440
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff916be2f6>] worker_thread+0x126/0x3c0
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff916be1d0>] ? manage_workers.isra.26+0x2a0/0x2a0
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff916c51b1>] kthread+0xd1/0xe0
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff916c50e0>] ? insert_kthread_work+0x40/0x40
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff91d8bd1d>] ret_from_fork_nospec_begin+0x7/0x21
Nov 6 11:28:37 RENDER_NODE1L kernel: [<ffffffff916c50e0>] ? insert_kthread_work+0x40/0x40