Random reboots of CentOS 7 on a Ryzen 5

Issues related to hardware problems
Post Reply
giuliom_95
Posts: 8
Joined: 2016/06/27 21:24:26

Random reboots of CentOS 7 on a Ryzen 5

Post by giuliom_95 » 2018/05/27 21:13:59

Hi all,
Sometimes CentOS just reboots without notice. At the beginning I thought that this was an incompatibility with new generation AMD CPUs, especially since I noticed this message on dmesg:

Code: Select all

[    0.000000] Detected CPU family 17h model 1
[    0.000000] Warning: AMD Processor - this hardware has not undergone upstream testing. Please consult http://wiki.centos.org/FAQ for more information
But today when it abnormally rebooted I noticed something odd flashing during boot and after further investigation I found this on dmesg:

Code: Select all

[    0.119374] smpboot: CPU0: AMD Ryzen 5 1600 Six-Core Processor (fam: 17, model: 01, stepping: 01)
[    0.221156] Performance Events: Fam17h core perfctr, AMD PMU driver.
[    0.221160] ... version:                0
[    0.221160] ... bit width:              48
[    0.221161] ... generic registers:      6
[    0.221162] ... value mask:             0000ffffffffffff
[    0.221163] ... max period:             00007fffffffffff
[    0.221163] ... fixed-purpose events:   0
[    0.221164] ... event mask:             000000000000003f
[    0.222198] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[    0.237470] mce: [Hardware Error]: Machine check events logged
[    0.237526] mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
[    0.222243] smpboot: Booting Node   0, Processors  #1 #2
[    0.237530] mce: [Hardware Error]: TSC 0 
[    0.237532] ADDR 1ffff89f16396 MISC d012000101000000 SYND 4d000000 IPID 500b000000000 
[    0.237534] mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1527449868 SOCKET 0 APIC 2 microcode 8001129
[    0.237534]  #3 #4 #5 #6 #7 #8 #9 #10 #11
[    0.306145] Brought up 12 CPUs
So, I documented myself on mce and I found that to read its messages and gather further info I need mcelog. I've installed it but when I try to run it:

Code: Select all

mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor.  Please use the edac_mce_amd module instead.
CPU is unsupported
I want to find out more about this error because I need to understand if my CPU is faulty or damaged. A strange thing is that I never experienced these abnormal reboots on Windows...

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: Random reboots of CentOS 7 on a Ryzen 5

Post by hunter86_bg » 2018/06/09 18:12:03

Why don't you load the module?

Code: Select all

 echo edac_mce_amd > /etc/modules.load.d/edac_mce_amd.conf && modprobe edac_mce_amd
If you want to really stress test the CPU , run prime95 with small fft (won't use the RAM, just rhe cache).

kbocek
Posts: 242
Joined: 2005/05/30 15:40:15
Location: Benicia CA, USA

Re: Random reboots of CentOS 7 on a Ryzen 5

Post by kbocek » 2018/06/24 17:39:28

Hey giuliom_95, I don't want to hijack your thread. Would you look at my thread:

viewtopic.php?f=49&t=67426

How did you install in the first place?

Thanks

Post Reply