RHEL6/CentOS 6 on an AMD EPYC server running VMware ESXi 6.5U1

Issues related to hardware problems
Post Reply
majun
Posts: 145
Joined: 2010/03/11 11:33:59

RHEL6/CentOS 6 on an AMD EPYC server running VMware ESXi 6.5U1

Post by majun » 2018/03/20 16:24:59

We need to replace our current VMware Servers with new ones, and are currently seriously considering switching from our current Xeon servers (Sandy Bridge and even Westmere) to Dell PowerEdge R7425 AMD EPYC-based servers that will be running VMware ESXi 6.5U1 due to their higher core count and support for up to 1 TB RAM per CPU (compared to only 768 GB for Intel's current line-up). We have several virtualized Linux servers running on these VMware clusters, among others some CentOS 5 and even SuSE 9 legacy servers. Those aren't mission critical and we might be able to either replace them or keep one or two Westmere-Hosts around for these specific machines until we can get rid of them for good. What worries me most are our production servers running CentOS 6 - according to RedHat, EPYC is not and will not be supported on RHEL6: https://access.redhat.com/articles/65431. The latest Xeon processors (Skylake-SP) are on the other hand supported but will require at least RHEL 6.9: https://access.redhat.com/support/policy/intel

As far as I was able to find out this only applies to RHEL installed directly on bare metal without any virtualization layers in-between the OS and the hardware. In my understanding, the VMware hypervisor layer should take care of the hardware incompatibility issue, but there's that little bit of uncertainty in the back of my head mainly because VMware seems to be passing through most of the CPUID flags to the virtual operating system (as in: the virtual machines can see what exact CPU type and architecture the host has installed) which could cause potential problems with older kernels. On the other hand, our 32-bit SuSE 9 servers were once running on Pentium 4 based servers, virtualized a couple of years ago, and are now doing perfectly fine on Sandy Bridge with an old 2.6.19 kernel. I don't really expect any problems due to the fact that VMware has that entire virtual layer between the OS and the hardware, and the PowerEdge R7425 servers are fully supported and certified by VMware. My understanding is that the old operating systems are obviously not going to support the latest features and most likely ignore unknown CPUID flags but should still run just fine given their x86/x86_64 compatibility. Also, VMware doesn't pass through every single flag but only a limited subset thereof. At least that's what I see when looking at the CPUID flags reported by /proc/cpuinfo on two identical servers, one with CentOS 6 installed on bare metal and one with a virtualized CentOS 6 running atop of VMware ESX 5.5:

virtualized:

Code: Select all

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
stepping	: 2
cpu MHz		: 2394.270
cache size	: 12288 KB
physical id	: 1
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat epb dts
bogomips	: 4788.54
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:
bare metal:

Code: Select all

processor	: 15
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
stepping	: 2
microcode	: 21
cpu MHz		: 1596.000
cache size	: 12288 KB
physical id	: 1
siblings	: 8
core id		: 10
cpu cores	: 4
apicid		: 53
initial apicid	: 53
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 4788.36
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:
Needless to say I'm a bit confused right now. Does anyone already have EPYC (or Ryzen or Threadripper) servers up and running?

Post Reply