File System corruption - ext3

Issues related to hardware problems
Post Reply
uwotter
Posts: 2
Joined: 2013/12/23 01:33:34

File System corruption - ext3

Post by uwotter » 2013/12/23 02:08:38

Hello.

Wanted to post this issue up here to get feedback and ideas on possible causes.

We're running CentOS5.6 for a postgres database server. (Postgres 9.1.11).

We are seeing intermittent issues with disk corruption. Postgres reports finding zero pages in indexes/table files when it is expecting data.

We are also seeing the following in our messages file:
kernel: EXT3-fs error (device dm-10): ext3_journal_start_sb: Detected aborted journal

We have run down our hardware pretty extensively since this normally indicates an issue with disk. We are running on a netapp via brocade switch. (4GB SAN). We have had netapp review the filer and they cannot find any errors, and all of other hosts (approximately 8) that share the same filer head and san switches are not showing any issues. We have also tried running on a different server with a freshly installed OS and still continue to see the issue. It occurs every 2-48 hours, most often on weekdays.

We have upgraded from postgres 9.1.2 to 9.1.11 to resolve bugs in postgres with data corruption. This has not resovled the issue.

We have done full tests of the postgres files via postgres utilities and found no errors. We have also taken the file systems off line and performed fsck and found no errors.

We have reviewed all cron tabs and found nothing running that aligns with the outages. (Count is about 7 now)

Any advice/ideas/suggestions/pointers to known issues we haven't found yet would be much appreciated.

Thanks.

Regards.

Brian


Detailed data from getinfo.sh for the affected system.
Information for disk problems.

Code: Select all

== BEGIN uname -rmi ==
2.6.18-238.el5 x86_64 x86_64
== END   uname -rmi ==

== BEGIN rpm -qa \*-release\* ==
centos-release-5-6.el5.centos.1
epel-release-5-4
centos-release-notes-5.6-0
== END   rpm -qa \*-release\* ==

== BEGIN cat /etc/redhat-release ==
CentOS release 5.6 (Final)
== END   cat /etc/redhat-release ==

== BEGIN getenforce ==
Permissive
== END   getenforce ==

== BEGIN free -m ==
             total       used       free     shared    buffers     cached
Mem:        774019     771717       2301          0       1759     758094
-/+ buffers/cache:      11863     762155
Swap:            0          0          0
== END   free -m ==

== BEGIN cat /etc/fstab ==
/dev/vg01/lvol2         /                       ext3    defaults        1 1
/dev/vg01/lvol3         /var                    ext3    defaults        1 2
LABEL=/boot             /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
== END   cat /etc/fstab ==

== BEGIN df -h ==
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg01-lvol2
                       20G  8.0G   11G  44% /
/dev/mapper/vg01-lvol3
                      522G   16G  480G   4% /var
/dev/cciss/c0d0p1      99M   13M   82M  14% /boot
tmpfs                 378G     0  378G   0% /dev/shm
store20.prod.iad1.foo.com:/export/nethome/grakala
                      200G  130G   71G  65% /nethome/grakala
store20.prod.iad1.foo.com:/export/nethome/cdhaneku
                      200G  130G   71G  65% /nethome/cdhaneku
/dev/mapper/pg_archive-pg_archive
                      493G   93G  375G  20% /var/lib/pgsql/foo/archive
/dev/mapper/pg_data-pg_data
                       99G  1.2G   93G   2% /var/lib/pgsql/foo/data
/dev/mapper/pg_sitedb-pg_sitedb
                      985G  593G  342G  64% /var/lib/pgsql/foo/tablespaces/site
/dev/mapper/pg_sitedb2-pg_sitedb2
                      985G  479G  456G  52% /var/lib/pgsql/foo/tablespaces/site2
store20.prod.iad1.foo.com:/export/nethome/kkumar
                      200G  130G   71G  65% /nethome/kkumar
== END   df -h ==

== BEGIN fdisk -lu ==
Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/sdc doesn't contain a valid partition table
Disk /dev/sdd doesn't contain a valid partition table
Disk /dev/sdh doesn't contain a valid partition table
Disk /dev/sdi doesn't contain a valid partition table
Disk /dev/sdj doesn't contain a valid partition table
Disk /dev/sdn doesn't contain a valid partition table
Disk /dev/sdo doesn't contain a valid partition table
Disk /dev/sdp doesn't contain a valid partition table
Disk /dev/sdt doesn't contain a valid partition table

Disk /dev/cciss/c0d0: 600.0 GB, 600093712384 bytes
255 heads, 63 sectors/track, 72957 cylinders, total 1172058032 sectors
Units = sectors of 1 * 512 = 512 bytes

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d0p1   *          63      208844      104391   83  Linux
/dev/cciss/c0d0p2          208845  1172054204   585922680   8e  Linux LVM

Disk /dev/sda: 67 MB, 67108864 bytes
3 heads, 43 sectors/track, 1016 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdb: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdc: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdd: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sde: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdf: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdg: 67 MB, 67108864 bytes
3 heads, 43 sectors/track, 1016 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdh: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdi: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdj: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdk: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdl: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdm: 67 MB, 67108864 bytes
3 heads, 43 sectors/track, 1016 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdn: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdo: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdp: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdq: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdr: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sds: 67 MB, 67108864 bytes
3 heads, 43 sectors/track, 1016 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdt: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdDisk /dev/sdu doesn't contain a valid partition table
Disk /dev/sdv doesn't contain a valid partition table
Disk /dev/dm-3 doesn't contain a valid partition table
Disk /dev/dm-4 doesn't contain a valid partition table
Disk /dev/dm-5 doesn't contain a valid partition table
u: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdv: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/sdw: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdx: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/dm-2: 67 MB, 67108864 bytes
255 heads, 63 sectors/track, 8 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes

     Device Boot      Start         End      Blocks   Id  System

Disk /dev/dm-3: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/dm-4: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/dm-5: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes


Disk /dev/dm-6: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes

     Device Boot      Start         End      Blocks   Id  System

Disk /dev/dm-7: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes

     Device Boot      Start         End      Blocks   Id  System
== END   fdisk -lu ==

== BEGIN blkid ==
/dev/mapper/vg01-lvol3: UUID="e875e5eb-3bf0-4ece-b4fe-dd59be8a96ea" TYPE="ext3"
/dev/mapper/vg01-lvol2: UUID="d314ac24-ff4e-4445-b757-517c9da5b0b8" TYPE="ext3"
/dev/cciss/c0d0p1: LABEL="/boot" UUID="f6d76c5b-c65d-4950-90f3-4f59b1b7809f" TYPE="ext3" SEC_TYPE="ext2"
/dev/vg01/lvol2: UUID="d314ac24-ff4e-4445-b757-517c9da5b0b8" TYPE="ext3"
/dev/mapper/pg_sitedb2-pg_sitedb2: UUID="180b0460-83c4-431a-9ea8-dc2bb2853a06" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/pg_data-pg_data: UUID="c837b153-5f2f-4b53-b188-235f44347606" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/pg_archive-pg_archive: UUID="d66f03c8-c8ec-410f-9228-3f60a298ac59" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/pg_sitedb-pg_sitedb: UUID="aa271131-36e8-4aba-8d44-7ffa9219b8d2" SEC_TYPE="ext2" TYPE="ext3"
== END   blkid ==

== BEGIN cat /proc/mdstat ==
Personalities :
unused devices: <none>
== END   cat /proc/mdstat ==

== BEGIN pvs ==
  PV                 VG         Fmt  Attr PSize    PFree
  /dev/cciss/c0d0p2  vg01       lvm2 a-    558.75G    0
  /dev/mapper/mpath1 pg_data    lvm2 a-    100.00G    0
  /dev/mapper/mpath2 pg_sitedb  lvm2 a-   1000.03G    0
  /dev/mapper/mpath3 pg_sitedb2 lvm2 a-   1000.15G    0
  /dev/mapper/mpath4 pg_archive lvm2 a-    500.01G    0
== END   pvs ==

== BEGIN vgs ==
  VG         #PV #LV #SN Attr   VSize    VFree
  pg_archive   1   1   0 wz--n-  500.01G    0
  pg_data      1   1   0 wz--n-  100.00G    0
  pg_sitedb    1   1   0 wz--n- 1000.03G    0
  pg_sitedb2   1   1   0 wz--n- 1000.15G    0
  vg01         1   2   0 wz--n-  558.75G    0
== END   vgs ==

== BEGIN lvs ==
  LV         VG         Attr   LSize    Origin Snap%  Move Log Copy%  Convert
  pg_archive pg_archive -wi-ao  500.01G
  pg_data    pg_data    -wi-ao  100.00G
  pg_sitedb  pg_sitedb  -wi-ao 1000.03G
  pg_sitedb2 pg_sitedb2 -wi-ao 1000.15G
  lvol2      vg01       -wi-ao   20.00G
  lvol3      vg01       -wi-ao  538.75G
== END   lvs ==


User avatar
avij
Retired Moderator
Posts: 3046
Joined: 2010/12/01 19:25:52
Location: Helsinki, Finland
Contact:

Re: File System corruption - ext3

Post by avij » 2013/12/23 19:53:27

You could try updating to a newer kernel. CentOS 5.6 was released some 2.5 years ago. I'd guess a number of device drivers have received updates in the meantime, some of them might be relevant to your problem.

uwotter
Posts: 2
Joined: 2013/12/23 01:33:34

Re: File System corruption - ext3

Post by uwotter » 2013/12/26 19:26:26

We have upgraded the affected system to CentOS 6.3 and reverted the file systems to ext2 since we were also seeing issues with the journal running out of blocks. So far the issue has not resurfaced.

Post Reply