Wanted to post this issue up here to get feedback and ideas on possible causes.
We're running CentOS5.6 for a postgres database server. (Postgres 9.1.11).
We are seeing intermittent issues with disk corruption. Postgres reports finding zero pages in indexes/table files when it is expecting data.
We are also seeing the following in our messages file:
kernel: EXT3-fs error (device dm-10): ext3_journal_start_sb: Detected aborted journal
We have run down our hardware pretty extensively since this normally indicates an issue with disk. We are running on a netapp via brocade switch. (4GB SAN). We have had netapp review the filer and they cannot find any errors, and all of other hosts (approximately 8) that share the same filer head and san switches are not showing any issues. We have also tried running on a different server with a freshly installed OS and still continue to see the issue. It occurs every 2-48 hours, most often on weekdays.
We have upgraded from postgres 9.1.2 to 9.1.11 to resolve bugs in postgres with data corruption. This has not resovled the issue.
We have done full tests of the postgres files via postgres utilities and found no errors. We have also taken the file systems off line and performed fsck and found no errors.
We have reviewed all cron tabs and found nothing running that aligns with the outages. (Count is about 7 now)
Any advice/ideas/suggestions/pointers to known issues we haven't found yet would be much appreciated.
Thanks.
Regards.
Brian
Detailed data from getinfo.sh for the affected system.
Information for disk problems.
Code: Select all
== BEGIN uname -rmi ==
2.6.18-238.el5 x86_64 x86_64
== END uname -rmi ==
== BEGIN rpm -qa \*-release\* ==
centos-release-5-6.el5.centos.1
epel-release-5-4
centos-release-notes-5.6-0
== END rpm -qa \*-release\* ==
== BEGIN cat /etc/redhat-release ==
CentOS release 5.6 (Final)
== END cat /etc/redhat-release ==
== BEGIN getenforce ==
Permissive
== END getenforce ==
== BEGIN free -m ==
total used free shared buffers cached
Mem: 774019 771717 2301 0 1759 758094
-/+ buffers/cache: 11863 762155
Swap: 0 0 0
== END free -m ==
== BEGIN cat /etc/fstab ==
/dev/vg01/lvol2 / ext3 defaults 1 1
/dev/vg01/lvol3 /var ext3 defaults 1 2
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
== END cat /etc/fstab ==
== BEGIN df -h ==
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg01-lvol2
20G 8.0G 11G 44% /
/dev/mapper/vg01-lvol3
522G 16G 480G 4% /var
/dev/cciss/c0d0p1 99M 13M 82M 14% /boot
tmpfs 378G 0 378G 0% /dev/shm
store20.prod.iad1.foo.com:/export/nethome/grakala
200G 130G 71G 65% /nethome/grakala
store20.prod.iad1.foo.com:/export/nethome/cdhaneku
200G 130G 71G 65% /nethome/cdhaneku
/dev/mapper/pg_archive-pg_archive
493G 93G 375G 20% /var/lib/pgsql/foo/archive
/dev/mapper/pg_data-pg_data
99G 1.2G 93G 2% /var/lib/pgsql/foo/data
/dev/mapper/pg_sitedb-pg_sitedb
985G 593G 342G 64% /var/lib/pgsql/foo/tablespaces/site
/dev/mapper/pg_sitedb2-pg_sitedb2
985G 479G 456G 52% /var/lib/pgsql/foo/tablespaces/site2
store20.prod.iad1.foo.com:/export/nethome/kkumar
200G 130G 71G 65% /nethome/kkumar
== END df -h ==
== BEGIN fdisk -lu ==
Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/sdc doesn't contain a valid partition table
Disk /dev/sdd doesn't contain a valid partition table
Disk /dev/sdh doesn't contain a valid partition table
Disk /dev/sdi doesn't contain a valid partition table
Disk /dev/sdj doesn't contain a valid partition table
Disk /dev/sdn doesn't contain a valid partition table
Disk /dev/sdo doesn't contain a valid partition table
Disk /dev/sdp doesn't contain a valid partition table
Disk /dev/sdt doesn't contain a valid partition table
Disk /dev/cciss/c0d0: 600.0 GB, 600093712384 bytes
255 heads, 63 sectors/track, 72957 cylinders, total 1172058032 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/cciss/c0d0p1 * 63 208844 104391 83 Linux
/dev/cciss/c0d0p2 208845 1172054204 585922680 8e Linux LVM
Disk /dev/sda: 67 MB, 67108864 bytes
3 heads, 43 sectors/track, 1016 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdb: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdc: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdd: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sde: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdf: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdg: 67 MB, 67108864 bytes
3 heads, 43 sectors/track, 1016 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdh: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdi: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdj: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdk: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdl: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdm: 67 MB, 67108864 bytes
3 heads, 43 sectors/track, 1016 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdn: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdo: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdp: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdq: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdr: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sds: 67 MB, 67108864 bytes
3 heads, 43 sectors/track, 1016 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdt: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdDisk /dev/sdu doesn't contain a valid partition table
Disk /dev/sdv doesn't contain a valid partition table
Disk /dev/dm-3 doesn't contain a valid partition table
Disk /dev/dm-4 doesn't contain a valid partition table
Disk /dev/dm-5 doesn't contain a valid partition table
u: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdv: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sdw: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/sdx: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/dm-2: 67 MB, 67108864 bytes
255 heads, 63 sectors/track, 8 cylinders, total 131072 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/dm-3: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/dm-4: 1073.7 GB, 1073775378432 bytes
255 heads, 63 sectors/track, 130545 cylinders, total 2097217536 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/dm-5: 1073.9 GB, 1073905401856 bytes
255 heads, 63 sectors/track, 130561 cylinders, total 2097471488 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/dm-6: 536.8 GB, 536887689216 bytes
255 heads, 63 sectors/track, 65272 cylinders, total 1048608768 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
Disk /dev/dm-7: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
== END fdisk -lu ==
== BEGIN blkid ==
/dev/mapper/vg01-lvol3: UUID="e875e5eb-3bf0-4ece-b4fe-dd59be8a96ea" TYPE="ext3"
/dev/mapper/vg01-lvol2: UUID="d314ac24-ff4e-4445-b757-517c9da5b0b8" TYPE="ext3"
/dev/cciss/c0d0p1: LABEL="/boot" UUID="f6d76c5b-c65d-4950-90f3-4f59b1b7809f" TYPE="ext3" SEC_TYPE="ext2"
/dev/vg01/lvol2: UUID="d314ac24-ff4e-4445-b757-517c9da5b0b8" TYPE="ext3"
/dev/mapper/pg_sitedb2-pg_sitedb2: UUID="180b0460-83c4-431a-9ea8-dc2bb2853a06" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/pg_data-pg_data: UUID="c837b153-5f2f-4b53-b188-235f44347606" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/pg_archive-pg_archive: UUID="d66f03c8-c8ec-410f-9228-3f60a298ac59" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/pg_sitedb-pg_sitedb: UUID="aa271131-36e8-4aba-8d44-7ffa9219b8d2" SEC_TYPE="ext2" TYPE="ext3"
== END blkid ==
== BEGIN cat /proc/mdstat ==
Personalities :
unused devices: <none>
== END cat /proc/mdstat ==
== BEGIN pvs ==
PV VG Fmt Attr PSize PFree
/dev/cciss/c0d0p2 vg01 lvm2 a- 558.75G 0
/dev/mapper/mpath1 pg_data lvm2 a- 100.00G 0
/dev/mapper/mpath2 pg_sitedb lvm2 a- 1000.03G 0
/dev/mapper/mpath3 pg_sitedb2 lvm2 a- 1000.15G 0
/dev/mapper/mpath4 pg_archive lvm2 a- 500.01G 0
== END pvs ==
== BEGIN vgs ==
VG #PV #LV #SN Attr VSize VFree
pg_archive 1 1 0 wz--n- 500.01G 0
pg_data 1 1 0 wz--n- 100.00G 0
pg_sitedb 1 1 0 wz--n- 1000.03G 0
pg_sitedb2 1 1 0 wz--n- 1000.15G 0
vg01 1 2 0 wz--n- 558.75G 0
== END vgs ==
== BEGIN lvs ==
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
pg_archive pg_archive -wi-ao 500.01G
pg_data pg_data -wi-ao 100.00G
pg_sitedb pg_sitedb -wi-ao 1000.03G
pg_sitedb2 pg_sitedb2 -wi-ao 1000.15G
lvol2 vg01 -wi-ao 20.00G
lvol3 vg01 -wi-ao 538.75G
== END lvs ==