Hey guys,
We were having issue with an application not responding at certain times, and We are seeing a bunch of these in the log:
CPU 6 THERMAL EVENT TSC 2d2c4389660f7db
TIME 1497995064 Tue Jun 20 14:44:24 2017
Processor 6 heated above trip temperature. Throttling enabled.
Please check your system cooling. Performance will be impacted
STATUS 88000bc3 MCGSTATUS 0
MCGCAP 1000c14 APICID 6 SOCKETID 0
CPUID Vendor Intel Family 6 Model 45
Hardware event. This is not a software error.
MCE 3
CPU 6 THERMAL EVENT TSC 2d2c43896610d3a
TIME 1497995064 Tue Jun 20 14:44:24 2017
Processor 6 below trip temperature. Throttling disabled
STATUS 88010a82 MCGSTATUS 0
MCGCAP 1000c14 APICID 6 SOCKETID 0
CPUID Vendor Intel Family 6 Model 45
Hardware event. This is not a software error.
MCE 0
CPU 16 THERMAL EVENT TSC 2d2c761f3b7a9ff
TIME 1497996644 Tue Jun 20 15:10:44 2017
Processor 16 heated above trip temperature. Throttling enabled.
Please check your system cooling. Performance will be impacted
STATUS 88000bc3 MCGSTATUS 0
MCGCAP 1000c14 APICID 1 SOCKETID 0
CPUID Vendor Intel Family 6 Model 45
Hardware event. This is not a software error.
MCE 1
CPU 16 THERMAL EVENT TSC 2d2c761f3d6c88f
TIME 1497996644 Tue Jun 20 15:10:44 2017
Processor 16 below trip temperature. Throttling disabled
STATUS 88010a82 MCGSTATUS 0
MCGCAP 1000c14 APICID 1 SOCKETID 0
CPUID Vendor Intel Family 6 Model 45
Hardware event. This is not a software error.
Working with dell they do not see any sort of overheating issue. And given the processor is throttled/throttle disabled within the same second makes me think this is a bug?
The only other reference I found was this:
viewtopic.php?t=47989
Could this be a bug in CentOS?
CPU THERMAL EVENT TSC - Dell Poweredge
Re: CPU THERMAL EVENT TSC - Dell Poweredge
The other thread ends with
I'd be pretty sure this is not a software problem.
What hardware is it? What BIOS do you run?This Problem has been fixed , Due to Dell hardware issue only , we just did the firmware update in the servers.
I'd be pretty sure this is not a software problem.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke
Re: CPU THERMAL EVENT TSC - Dell Poweredge
This is a dell Poweredge R730xd. I'm not sure what you mean on BIOS - did you mean the version?
I didn't think it was software either, but looking closer, the dell idrac hardware monitoring (built into the MB) does not see the cpu temp ever go above ~52c?
It's a dual CPU, 8 cores per CPU.
I'm just trying to get some guidance, dell says there is nothing wrong with the hardware, and while I didn't agree with them at first, the fact that these are tripping and throttling/un-throttling within the same second doesn't make any sense, does it?
I didn't think it was software either, but looking closer, the dell idrac hardware monitoring (built into the MB) does not see the cpu temp ever go above ~52c?
It's a dual CPU, 8 cores per CPU.
I'm just trying to get some guidance, dell says there is nothing wrong with the hardware, and while I didn't agree with them at first, the fact that these are tripping and throttling/un-throttling within the same second doesn't make any sense, does it?
Re: CPU THERMAL EVENT TSC - Dell Poweredge
You don't get machine checks from software though.
I'd check for a BIOS update for your machine. I have R430 and R630 machines and they have current BIOS revisions 2.4.2 and 2.4.3 respectively so I suspect that the 730 should have a similarly numbered current release too (dmidecode will tell you what it is now).
I'm not sure how often the iDRAC updates its sensor readings. You could modprobe coretemp on the machine and then run sensors when it starts doing this and see what that reports.
I'd check for a BIOS update for your machine. I have R430 and R630 machines and they have current BIOS revisions 2.4.2 and 2.4.3 respectively so I suspect that the 730 should have a similarly numbered current release too (dmidecode will tell you what it is now).
I'm not sure how often the iDRAC updates its sensor readings. You could modprobe coretemp on the machine and then run sensors when it starts doing this and see what that reports.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke