Page 1 of 1

Issue with CPU in centos7

Posted: 2018/06/14 15:08:56
by mohit.solanki
Got issue with frozen filesystem with these logs -

Jun 11 15:47:30 laxasperacla1 kernel: Code: 48 63 35 26 45 a2 00 89 c2 39 f0 0f 8d 85 fe ff ff 48 98 49 8b 0f 48 03 0c c5 e0 fd b0 81 f6 41 20 01 74 cd 0f 1f 44 00 00 f3 90 <f6> 41 20 01 75 f8 48 63 35 f5 44 a2 00 eb b7 0f b6 4d cc 4c 89
Jun 11 15:47:39 laxasperacla1 kernel: audit_log_start: 62 callbacks suppressed
Jun 11 15:47:39 laxasperacla1 kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
Jun 11 15:47:39 laxasperacla1 kernel: audit: audit_lost=50863 audit_rate_limit=0 audit_backlog_limit=8192
Jun 11 15:47:39 laxasperacla1 kernel: audit: backlog limit exceeded
Jun 11 15:47:39 laxasperacla1 kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
Jun 11 15:47:39 laxasperacla1 kernel: audit: audit_lost=50864 audit_rate_limit=0 audit_backlog_limit=8192
Jun 11 15:47:39 laxasperacla1 kernel: audit: backlog limit exceeded
Jun 11 15:47:39 laxasperacla1 kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
Jun 11 15:47:39 laxasperacla1 kernel: audit: audit_lost=50865 audit_rate_limit=0 audit_backlog_limit=8192
Jun 11 15:47:39 laxasperacla1 kernel: audit: backlog limit exceeded
Jun 11 15:47:39 laxasperacla1 kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
Jun 11 15:47:50 laxasperacla1 kernel: audit_log_start: 41 callbacks suppressed
Jun 11 15:47:50 laxasperacla1 kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
Jun 11 15:47:50 laxasperacla1 kernel: audit: audit_lost=50880 audit_rate_limit=0 audit_backlog_limit=8192
Jun 11 15:47:50 laxasperacla1 kernel: audit: backlog limit exceeded
Jun 11 15:47:50 laxasperacla1 kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
Jun 11 15:47:50 laxasperacla1 kernel: audit: audit_lost=50881 audit_rate_limit=0 audit_backlog_limit=8192
Jun 11 15:47:50 laxasperacla1 kernel: audit: backlog limit exceeded
Jun 11 15:47:50 laxasperacla1 kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
Jun 11 15:47:50 laxasperacla1 kernel: audit: audit_lost=50882 audit_rate_limit=0 audit_backlog_limit=8192
Jun 11 15:47:50 laxasperacla1 kernel: audit: backlog limit exceeded
Jun 11 15:47:50 laxasperacla1 kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
Jun 11 15:47:55 laxasperacla1 kernel: audit_log_start: 74 callbacks suppressed
Jun 11 15:47:55 laxasperacla1 kernel: audit: audit_backlog=8193 > audit_backlog_limit=8192
Jun 11 15:47:55 laxasperacla1 kernel: audit: audit_lost=50908 audit_rate_limit=0 audit_backlog_limit=8192
Jun 11 15:47:55 laxasperacla1 kernel: audit: backlog limit exceeded

Dell support says, hardware is fine. Could be issue with OS.

Jun 11 15:47:02 laxasperacla1 kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [chronyd:1343]
Jun 11 15:47:02 laxasperacla1 kernel: Modules linked in: cvfs(POE) bonding iTCO_wdt iTCO_vendor_support dcdbas skx_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev ipmi_ssif sg mei_me mei lpc_ich i2c_i801 shpchp ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod sd_mod cdrom mgag200 lpfc drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crc32c_intel drm igb i40e crc_t10dif crct10dif_generic tg3 crct10dif_pclmul ahci megaraid_sas scsi_transport_fc libahci dca libata scsi_tgt i2c_algo_bit crct10dif_common ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
Jun 11 15:47:02 laxasperacla1 kernel: CPU: 4 PID: 1343 Comm: chronyd Tainted: P OEL ------------ 3.10.0-693.el7.x86_64 #1
with Hardware error on CPU 4

Re: Issue with CPU in centos7

Posted: 2018/06/14 15:15:22
by TrevorH
3.10.0-693.el7.x86_64
Run yum update to get up to date. That is the original 7.4 kernel from August 2017 and there have been numerous updates to 7.4 since then including all the fixes for meltdown/spectre et al. You'll also find that 7.5 was released in April this year and at that point 7.4 became unsupported and will receive no more updates.
with Hardware error on CPU 4
Is that part of the kernel ooops errors and if so can you quote that verbatim without editing please.