BUG: soft lockup - CPU#7 stuck

Issues related to hardware problems
Posts: 1
Joined: 2018/02/11 05:43:42

BUG: soft lockup - CPU#7 stuck

Postby LF42 » 2018/02/11 05:59:17


I have been receiving CPU stuck messages from the kernel, after which the machine appears to lock up.

The messages are of the form:
Feb 10 19:03:09 nas kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [splunkd:1999]
Feb 10 19:03:09 nas kernel: Call Trace:
Feb 10 19:03:09 nas kernel: native_flush_tlb_others+0x7d/0x130
Feb 10 19:03:09 nas kernel: flush_tlb_mm_range+0xab/0x110
Feb 10 19:03:09 nas kernel: zap_page_range+0xcd/0x140
Feb 10 19:03:09 nas kernel: SyS_madvise+0x40e/0x8f0
Feb 10 19:03:09 nas kernel: ? __audit_syscall_entry+0xac/0xf0
Feb 10 19:03:09 nas kernel: ? syscall_trace_enter+0x1cd/0x2b0
Feb 10 19:03:09 nas kernel: do_syscall_64+0x74/0x1b0
Feb 10 19:03:09 nas kernel: entry_SYSCALL_64_after_hwframe+0x21/0x86

Any ideas on how to troubleshoot the problem further?

I have attached the messages file. The messages file starts with the soft lockup messages and then has all of the logs after I hard reset the computer.

Thank you.

[root@nas log]# uname -a
Linux nas.local 4.15.2-1.el7.elrepo.x86_64 #1 SMP Wed Feb 7 17:26:44 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

[root@nas log]# more /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD Ryzen 7 1700 Eight-Core Processor
stepping : 1
microcode : 0x8001129
cpu MHz : 1374.304
cache size : 512 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp
lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclm
ulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdran
d lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb
hw_pstate sme vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt s
ha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_
lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthre
shold avic v_vmsave_vmload vgif overflow_recov succor smca
bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2
bogomips : 5988.20
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate eff_freq_ro [13] [14]

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Product Name: PRIME X370-PRO
Version: Rev X.0x
Serial Number: 170296198101354
Asset Tag: Default string
Board is a hosting board
Board is replaceable
Location In Chassis: Default string
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
(24.79 KiB) Downloaded 13 times

Posts: 2509
Joined: 2014/09/20 11:22:14

Re: BUG: soft lockup - CPU#7 stuck

Postby aks » 2018/02/25 14:40:26

Stop running splunk. If the problem goes away then it's splunks' problem and you can ask them why their software doesn't yield CPU appropriately (they'll probably come back with something like your disks are to slow).