160 bytes file content corruption in CentOS 7.4

General support questions
Post Reply
ngurumahesh
Posts: 3
Joined: 2018/08/03 08:05:35

160 bytes file content corruption in CentOS 7.4

Post by ngurumahesh » 2018/08/03 08:18:50

Hi All,
Recently we have moved our product from CentOS 6.7 to CentOS 7.4. We are seeing an intermittent issue where 160 bytes of a file content is getting corrupted. We have seen this problem multiple times and each time different file content is corrupting. But it is exactly 160 bytes.
We have added some rpm checks and cron jobs to catch this problem. But we are unable to find exact root cause.
Can you please give some pointers on this?

Following is the kernel version
Linux version 3.10.0-693.17.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jan 25 20:13:58 UTC 2018

User avatar
TrevorH
Site Admin
Posts: 33219
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: 160 bytes file content corruption in CentOS 7.4

Post by TrevorH » 2018/08/03 08:48:17

First step: yum update. 7.4 is out of date and replaced by 7.5. There is a newer kernel and many many fixes.

What filesystem are you using both on the old system and the new? How are you detecting this "corruption"? What hardware are you using? Do you use ECC RAM?
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

ngurumahesh
Posts: 3
Joined: 2018/08/03 08:05:35

Re: 160 bytes file content corruption in CentOS 7.4

Post by ngurumahesh » 2018/08/07 04:19:24

Using ext4 in older and current version of CentOS
Running following command to identify the corrupted file

rpm -Va --nodeps --noscripts --nolinkto --nosize --nouser --nogroup --nomode --nordev --nocaps | grep -e '^..5....\..'

The hardware is HPE Proliant Server

Yes. We are using ECC RAM

We are already working to move to Centos 7.5 But needs to maintain Centos 7.4 some more time hence looking for some help to triage this issue.

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: 160 bytes file content corruption in CentOS 7.4

Post by hunter86_bg » 2018/08/07 17:51:41

It sounds like corrupted memory ...but you are using ECC.
Do you have anything in the AHS logs indicating hardware issue ?

ngurumahesh
Posts: 3
Joined: 2018/08/03 08:05:35

Re: 160 bytes file content corruption in CentOS 7.4

Post by ngurumahesh » 2018/08/09 03:59:38

I don't have AHS logs for this run. I need to wait for next occurrence. This is not reproducible always. Will check if i can find anything from AHS logs.

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: 160 bytes file content corruption in CentOS 7.4

Post by hunter86_bg » 2018/08/09 20:37:15

AHS logs can be obtained from the iLO and if you have HPE account , you can analyze it yourself... They are constantly written and are available via the following guide (depends on HPE Proliant version and age).

Post Reply