Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Issues related to hardware problems
Post Reply
bvardham
Posts: 5
Joined: 2018/03/24 15:39:57

Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Post by bvardham » 2018/03/24 16:13:14

Hello all
we are trying to use the latest Cent OS 7.4 "CentOS-7-x86_64-DVD-1708_fromODC.iso". in the " AMD EPYC 7351 16-Core Processor" based server.

But we are facing the below error messages.

Attached in the screenshot.


could someone help me to understand that the abvoe image is supported in the AMD EPYC 7351 16-core Processor?
If not when the Cent OS official start supporting that.

Thanks in Advance.
Brahamaprakash Vardhaman
Attachments
centos-kernel-panic.jpeg
centos-kernel-panic.jpeg (125.64 KiB) Viewed 4080 times

User avatar
TrevorH
Site Admin
Posts: 33191
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Post by TrevorH » 2018/03/24 17:05:10

"CentOS-7-x86_64-DVD-1708_fromODC.iso".
That is not a CentOS provided iso image filename. What is it and where did you get it from? CentOS isos are on http://mirror.centos.org/centos-7/7.4.1708/isos/
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

bvardham
Posts: 5
Joined: 2018/03/24 15:39:57

Re: Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Post by bvardham » 2018/03/28 09:32:34

Thanks we will test the image and let you know the results.

bvardham
Posts: 5
Joined: 2018/03/24 15:39:57

Re: Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Post by bvardham » 2018/03/28 12:00:51

We are getting same error even after using the image from above link.
I used the below image.

CentOS-7-x86_64-DVD-1708.iso

Regards
Brahamaprakash Vardhaman

bvardham
Posts: 5
Joined: 2018/03/24 15:39:57

Re: Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Post by bvardham » 2018/04/03 09:43:55

Hi anyone help here?

Regards
Brahamaprakash Vardhaman

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Post by hunter86_bg » 2018/04/10 19:35:15

Is this an HPE-based server ? I used to have issues with HPE servers and vanilla CentOS/RHEL 7.

bvardham
Posts: 5
Joined: 2018/03/24 15:39:57

Re: Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Post by bvardham » 2018/05/04 08:44:58

Hi All
any help on this is appreciated.

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Post by hunter86_bg » 2018/05/06 18:00:14

What is the server model ?
Do you boot in UEFI or BIOS mode?

bobx001
Posts: 6
Joined: 2018/03/24 15:59:41

Re: Kernel Panic on " AMD EPYC 7351 16-Core Processor" server

Post by bobx001 » 2018/05/25 16:44:40

Did you solve your kernel panic ?

I am running Centos 7.4 on a dual-Epyc Supermicro box, the following is the inxi output:

Code: Select all

[root@boa default]# inxi -Fxmz
System:    Host: boa Kernel: 3.10.0-862.3.2.el7.x86_64 x86_64 bits: 64 gcc: 4.8.5
           Console: tty 1 Distro: CentOS Linux release 7.5.1804 (Core)                                                                            
Machine:   Device: kvm System: Supermicro product: AS -2023US-TR4 v: 0123456789 serial: <filter>                                                  
           Mobo: Supermicro model: H11DSU-iN v: 1.02A serial: <filter>                                                                            
           UEFI [Legacy]: American Megatrends v: 1.1 date: 02/07/2018                                                                             
CPU(s):    2 16 core AMD EPYC 7351s (-MCP-SMP-) arch: Zen rev.2 cache: 16384 KB                                                                   
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm) bmips: 153581                                                               
           clock speeds: max: 2400 MHz 1: 1200 MHz 2: 1200 MHz 3: 2400 MHz 4: 2400 MHz 5: 1200 MHz 6: 1200 MHz                                    
           7: 1200 MHz 8: 1200 MHz 9: 1800 MHz 10: 1200 MHz 11: 1200 MHz 12: 1200 MHz 13: 1200 MHz 14: 1200 MHz                                   
           15: 1200 MHz 16: 1200 MHz 17: 1200 MHz 18: 1200 MHz 19: 1200 MHz 20: 1200 MHz 21: 1200 MHz                                             
           22: 1200 MHz 23: 1200 MHz 24: 1200 MHz 25: 1200 MHz 26: 1800 MHz 27: 1200 MHz 28: 1200 MHz                                             
           29: 1200 MHz 30: 1200 MHz 31: 1200 MHz 32: 1200 MHz 33: 1200 MHz 34: 1200 MHz 35: 2400 MHz                                             
           36: 1200 MHz 37: 1200 MHz 38: 1200 MHz 39: 1200 MHz 40: 1200 MHz 41: 1800 MHz 42: 1800 MHz                                             
           43: 1200 MHz 44: 1200 MHz 45: 1200 MHz 46: 1200 MHz 47: 2400 MHz 48: 1200 MHz 49: 1200 MHz                                             
           50: 1200 MHz 51: 2400 MHz 52: 2400 MHz 53: 1200 MHz 54: 1200 MHz 55: 1200 MHz 56: 1200 MHz                                             
           57: 1800 MHz 58: 1200 MHz 59: 1200 MHz 60: 1200 MHz 61: 1200 MHz 62: 1200 MHz 63: 1200 MHz                                             
           64: 1200 MHz                                                                                                                           
Memory:    Array-1 capacity: 2 TB (check) devices: 32 EC: Multi-bit ECC                                                                           
           Device-1: P0_Node0_Channel0_Dimm0 size: No Module Installed type: N/A                                                                  
           Device-2: P0_Node0_Channel0_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                         
           Device-3: P0_Node0_Channel1_Dimm0 size: No Module Installed type: N/A                                                                  
           Device-4: P0_Node0_Channel1_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                         
           Device-5: P0_Node0_Channel2_Dimm0 size: No Module Installed type: N/A                                                                  
           Device-6: P0_Node0_Channel2_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                         
           Device-7: P0_Node0_Channel3_Dimm0 size: No Module Installed type: N/A                                                                  
           Device-8: P0_Node0_Channel3_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                         
           Device-9: P0_Node0_Channel4_Dimm0 size: No Module Installed type: N/A                                                                  
           Device-10: P0_Node0_Channel4_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                        
           Device-11: P0_Node0_Channel5_Dimm0 size: No Module Installed type: N/A                                                                 
           Device-12: P0_Node0_Channel5_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                        
           Device-13: P0_Node0_Channel6_Dimm0 size: No Module Installed type: N/A                                                                 
           Device-14: P0_Node0_Channel6_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                        
           Device-15: P0_Node0_Channel7_Dimm0 size: No Module Installed type: N/A                                                                 
           Device-16: P0_Node0_Channel7_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                        
           Device-17: P1_Node0_Channel0_Dimm0 size: No Module Installed type: N/A                                                                 
           Device-18: P1_Node0_Channel0_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                        
           Device-19: P1_Node0_Channel1_Dimm0 size: No Module Installed type: N/A                                                                 
           Device-20: P1_Node0_Channel1_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1                                        
           Device-21: P1_Node0_Channel2_Dimm0 size: No Module Installed type: N/A
           Device-22: P1_Node0_Channel2_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1
           Device-23: P1_Node0_Channel3_Dimm0 size: No Module Installed type: N/A
           Device-24: P1_Node0_Channel3_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1
           Device-25: P1_Node0_Channel4_Dimm0 size: No Module Installed type: N/A
           Device-26: P1_Node0_Channel4_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1
           Device-27: P1_Node0_Channel5_Dimm0 size: No Module Installed type: N/A
           Device-28: P1_Node0_Channel5_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1
           Device-29: P1_Node0_Channel6_Dimm0 size: No Module Installed type: N/A
           Device-30: P1_Node0_Channel6_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1
           Device-31: P1_Node0_Channel7_Dimm0 size: No Module Installed type: N/A
           Device-32: P1_Node0_Channel7_Dimm1 size: 8 GB speed: 2667 MHz type: DDR4 part: 9ASF1G72PZ-2G6D1
Graphics:  Card: ASPEED ASPEED Graphics Family bus-ID: 06:00.0
           Display Server: X.org 1.19.5 drivers: modesetting (unloaded: fbdev,vesa)
           tty size: 146x38 Advanced Data: N/A for root out of X
Network:   Card-1: Intel I350 Gigabit Network Connection driver: igb v: 5.4.0-k port: 3060 bus-ID: 11:00.0
           IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
           Card-2: Intel I350 Gigabit Network Connection driver: igb v: 5.4.0-k port: 3040 bus-ID: 11:00.1
           IF: eno2 state: up speed: 1000 Mbps duplex: full mac: <filter>
           Card-3: Intel I350 Gigabit Network Connection driver: igb v: 5.4.0-k port: 3020 bus-ID: 11:00.2
           IF: eno3 state: down mac: <filter>
           Card-4: Intel I350 Gigabit Network Connection driver: igb v: 5.4.0-k port: 3000 bus-ID: 11:00.3
           IF: eno4 state: down mac: <filter>
Drives:    HDD Total Size: 34425.1GB (60.2% used)
           ID-1: /dev/nvme0n1 model: INTEL_SSDPE7KX500G7 size: 500.1GB
           ID-2: /dev/nvme1n1 model: INTEL_SSDPE7KX500G7 size: 500.1GB
           ID-3: /dev/sda model: INTEL_SSDSC2BB96 size: 960.2GB temp: 26C
           ID-4: /dev/sdb model: INTEL_SSDSC2BB96 size: 960.2GB temp: 25C
           ID-5: /dev/sdc model: Logical_Volume size: 24003.0GB temp: 0C
           ID-6: /dev/sdd model: ST8000NM0075 size: 8001.6GB temp: 33C
Partition: ID-1: / size: 766G used: 19G (3%) fs: xfs dev: /dev/sdb2
           ID-2: swap-1 size: 137.44GB used: 0.00GB (0%) fs: swap dev: /dev/sdb1
RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors:   System Temperatures: cpu: No active sensors found. Have you configured your sensors yet? mobo: N/A
Info:      Processes: 735 Uptime: 7:57 Memory: 12907.3/128827.4MB Init: systemd runlevel: 3 Gcc sys: 4.8.5
           Client: Shell (bash 4.2.462) inxi: 2.3.56 

And it "seems" to run fine, however , at random, when the machine is nicely loaded with a couple of KVM/QEMU guests on an SSD, postgresql on an NVME, also lots of http-php requests doing image convertion using convert, and also serving NFS on a partition............

it reboots. nothing in the /var/log/messages about it either.
example, this is a snippet from the messages during one of those random reboots:

Code: Select all

May 25 08:20:01 boa systemd: Started Session 164 of user root.
May 25 08:20:01 boa systemd: Starting Session 164 of user root.
May 25 08:30:01 boa systemd: Started Session 165 of user root.
May 25 08:30:01 boa systemd: Starting Session 165 of user root.
May 25 08:41:10 boa kernel: Initializing cgroup subsys cpuset
May 25 08:41:10 boa kernel: Initializing cgroup subsys cpu
May 25 08:41:10 boa kernel: Initializing cgroup subsys cpuacct
May 25 08:41:10 boa kernel: Linux version 3.10.0-862.3.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-2
8) (GCC) ) #1 SMP Mon May 21 23:36:36 UTC 2018
May 25 08:41:10 boa kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.3.2.el7.x86_64 root=UUID=6ad54fc2-ac3e-4e51-b159-0d161718dc21 ro rhg
b selinux=0
May 25 08:41:10 boa kernel: e820: BIOS-provided physical RAM map:
May 25 08:41:10 boa kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009bfff] usable
May 25 08:41:10 boa kernel: BIOS-e820: [mem 0x000000000009c000-0x000000000009ffff] reserved
.
.
.
EDIT2: removed speculation.


And I haven't the slightest clue as to why. The server is certified by Supermicro to run Centos 7.4, and the performance is usually great, lovely box, if it wasn't for the darn reboots....

Post Reply