BUG: soft lockup - CPU#7 stuck

Postby LF42 » 2018/02/11 05:59:17


I have been receiving CPU stuck messages from the kernel, after which the machine appears to lock up.

The messages are of the form:
Feb 10 19:03:09 nas kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [splunkd:1999]
Feb 10 19:03:09 nas kernel: Call Trace:
Feb 10 19:03:09 nas kernel: native_flush_tlb_others+0x7d/0x130
Feb 10 19:03:09 nas kernel: flush_tlb_mm_range+0xab/0x110
Feb 10 19:03:09 nas kernel: zap_page_range+0xcd/0x140
Feb 10 19:03:09 nas kernel: SyS_madvise+0x40e/0x8f0
Feb 10 19:03:09 nas kernel: ? __audit_syscall_entry+0xac/0xf0
Feb 10 19:03:09 nas kernel: ? syscall_trace_enter+0x1cd/0x2b0
Feb 10 19:03:09 nas kernel: do_syscall_64+0x74/0x1b0
Feb 10 19:03:09 nas kernel: entry_SYSCALL_64_after_hwframe+0x21/0x86

Any ideas on how to troubleshoot the problem further?

I have attached the messages file. The messages file starts with the soft lockup messages and then has all of the logs after I hard reset the computer.

Thank you.

[root@nas log]# uname -a
Linux nas.local 4.15.2-1.el7.elrepo.x86_64 #1 SMP Wed Feb 7 17:26:44 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

[root@nas log]# more /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD Ryzen 7 1700 Eight-Core Processor
stepping : 1
microcode : 0x8001129
cpu MHz : 1374.304
cache size : 512 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp
lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclm
ulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdran
d lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb
hw_pstate sme vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt s
ha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_
lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthre
shold avic v_vmsave_vmload vgif overflow_recov succor smca
bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2
bogomips : 5988.20
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate eff_freq_ro [13] [14]

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Product Name: PRIME X370-PRO
Version: Rev X.0x
Serial Number: 170296198101354
Asset Tag: Default string
Board is a hosting board
Board is replaceable
Location In Chassis: Default string
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
