[SOLVED] SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

General support questions including new installations
jaytech
Posts: 8
Joined: 2010/05/23 13:20:19

[SOLVED] SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Postby jaytech » 2010/05/23 14:10:22

Greetings,

I've been browsing these forums for years and have always found solutions to my problems with CentOS - until today. I've looked high and low but I haven't been able to find any example of what I'm currently experiencing with my hard disks.

First off, I'm running CentOS as a Samba file server, on a Soltek SL-K8TPro-939 and AMD 64 3200+ (all the rage of five years ago). Here's my disk setup

Drive #1 (80 GB)
-Boot partition
-LVM partition (this drive holds the root filesystem)

Drive #2 (750GB)
NTFS filesystem

Drive #3 (1 TB)
ext3 Filesystem


Ok, so I get a notification in my system mail yesterday:


Code: Select all

The following warning/error was logged by the smartd daemon:

Device: /dev/sda, unable to open device

For details see host's SYSLOG (default: /var/log/messages).

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.


/var/log/messages displays many errors:

Code: Select all

May 23 01:09:03 SERVER kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May 23 01:09:03 SERVER kernel: ata3.00: cmd 35/00:f8:47:8e:88/00:03:4c:00:00/e0 tag 0 dma 520192 out
May 23 01:09:03 SERVER kernel:          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
May 23 01:09:03 SERVER kernel: ata3.00: status: { DRDY }
May 23 01:09:03 SERVER kernel: ata3: hard resetting link
May 23 01:09:08 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:09:13 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:09:13 SERVER kernel: ata3: hard resetting link
May 23 01:09:17 SERVER smbd[26347]: [2010/05/23 01:09:17, 0] lib/util_sock.c:write_data(562)
May 23 01:09:17 SERVER smbd[26347]:   write_data: write failure in writing to client 192.168.1.100. Error Connection reset by peer
May 23 01:09:17 SERVER smbd[26347]: [2010/05/23 01:09:17, 0] lib/util_sock.c:send_smb(761)
May 23 01:09:17 SERVER smbd[26347]:   Error writing 4 bytes to client. -1. (Connection reset by peer)
May 23 01:09:18 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:09:23 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:09:23 SERVER kernel: ata3: hard resetting link
May 23 01:09:28 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:09:58 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:09:58 SERVER kernel: ata3: limiting SATA link speed to 1.5 Gbps
May 23 01:09:58 SERVER kernel: ata3: hard resetting link
May 23 01:10:03 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:10:03 SERVER kernel: ata3: reset failed, giving up
May 23 01:10:03 SERVER kernel: ata3.00: disabled
May 23 01:10:03 SERVER kernel: ata3: EH complete
May 23 01:10:03 SERVER kernel: sd 2:0:0:0: SCSI error: return code = 0x00040000
May 23 01:10:03 SERVER kernel: end_request: I/O error, dev sda, sector 1284017735
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502209
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502210
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502211
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502212
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502213
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502214
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502215
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502216
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502217
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502218
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: sd 2:0:0:0: SCSI error: return code = 0x00040000

... continues similar errors...


May 23 01:10:03 SERVER kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
May 23 01:10:03 SERVER kernel: ata3: hotplug_status 0x11
May 23 01:10:03 SERVER kernel: ata3: hard resetting link
May 23 01:10:08 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:08 SERVER ntfs-3g[1769]: Failed to write buffer to bitmap (-1 != 8192). Leaving inconsistent metadata: Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Cluster deallocation failed (159925600, 604841): Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Failed to free clusters.  Leaving inconsistent metadata.
May 23 01:10:08 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:08 SERVER kernel: printk: 9512 messages suppressed.
May 23 01:10:08 SERVER kernel: Buffer I/O error on device sda1, logical block 786431
May 23 01:10:08 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:08 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Reading of last byte failed (-1). Leaving inconsistent metadata: Input/output error
May 23 01:10:08 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:08 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Reading of last byte failed (-1). Leaving inconsistent metadata: Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Failed to free base MFT record.  Leaving inconsistent metadata.
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read vcn 0x4: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read vcn 0x4: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read index block: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read vcn 0x4: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read vcn 0x4: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device

... continues similar errors...

May 23 01:14:03 SERVER kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen t1
May 23 01:14:03 SERVER kernel: ata3: hotplug_status 0x11
May 23 01:14:03 SERVER kernel: ata3: hard resetting link
May 23 01:14:09 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:14:13 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:14:13 SERVER kernel: ata3: hard resetting link
May 23 01:14:19 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:14:23 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:14:23 SERVER kernel: ata3: hard resetting link
May 23 01:14:29 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:14:58 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:14:58 SERVER kernel: ata3: limiting SATA link speed to 1.5 Gbps
May 23 01:14:58 SERVER kernel: ata3: hard resetting link
May 23 01:15:03 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:15:03 SERVER kernel: ata3: reset failed, giving up
May 23 01:15:03 SERVER kernel: ata3: EH pending after 5 tries, giving up
May 23 01:15:03 SERVER kernel: ata3: EH complete
May 23 01:15:03 SERVER kernel: ata3.00: detaching (SCSI 2:0:0:0)
May 23 01:37:51 SERVER smartd[16932]: Device: /dev/sda, No such device, open() failed
May 23 01:37:51 SERVER smartd[16932]: Sending warning via mail to root ...
May 23 01:37:52 SERVER smartd[16932]: Warning via mail to root: successful
May 23 01:42:54 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:42:54 SERVER kernel: printk: 200 messages suppressed.
May 23 01:42:54 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:42:54 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:42:54 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:42:55 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:42:55 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:42:55 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:42:55 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:42:55 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:42:55 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:42:55 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:42:55 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:43:54 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:43:54 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:43:54 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:43:54 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:43:54 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:43:54 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:43:54 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:43:54 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:43:55 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:43:55 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:43:55 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:43:55 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:43:55 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:43:55 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:43:55 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:43:55 SERVER kernel: Buffer I/O error on device sda1, logical block 786433


I'll attach the rest of the log on request.

So, my partition /dev/sda1 was unreadable. I pulled the drive, put it into an external enclosure on another PC (windows since it's NTFS), works great! No errors. I rebooted the CentOS system twice while troubleshooting a bit, nothing unusual. Stick the drive back in, boot up, now the games begin. Now my /dev/sdb1 won't mount (error on boot).
Manual mount produces the following:

Code: Select all

mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so


messages log:

Code: Select all

May 23 03:30:48 SERVER kernel: VFS: Can't find ext3 filesystem on dev sdb1.


Interesting to note, now my main disk is throwing errors on fdisk -l (it wasn't like that previously)
Here's my getinfo:

Code: Select all

== BEGIN uname -rmi ==
2.6.18-164.6.1.el5 i686 i386
== END   uname -rmi ==

== BEGIN rpm -q centos-release ==
centos-release-5-4.el5.centos.1
== END   rpm -q centos-release ==

== BEGIN getenforce ==
Disabled
== END   getenforce ==

== BEGIN cat /etc/fstab ==
/dev/VolGroup00/LogVol00 /                       ext3    defaults        1 1
LABEL=/boot             /boot                    ext3    defaults        1 2
tmpfs                   /dev/shm                 tmpfs   defaults        0 0
devpts                  /dev/pts                 devpts  gid=5,mode=620  0 0
sysfs                   /sys                     sysfs   defaults        0 0
proc                    /proc                    proc    defaults        0 0
/dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0
/dev/sda1               /mnt/backup              ntfs-3g rw,umask=0000,defaults 0 0
/dev/sdb1               /mnt/largebackup         ext3    defaults        0 0
== END   cat /etc/fstab ==

== BEGIN df -h ==
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       61G   26G   32G  45% /
/dev/hda1              99M   72M   23M  77% /boot
tmpfs                 505M     0  505M   0% /dev/shm
== END   df -h ==

== BEGIN fdisk -l ==

Disk /dev/hda: 80.0 GB, 80026361856 bytes
16 heads, 63 sectors/track, 155061 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1               1         428      215523   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/hda2   *         428      155061    77935189+  8e  Linux LVM
Partition 2 does not end on cylinder boundary.

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1      121601   976760001   83  Linux

Disk /dev/sdb: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1       91201   732572001    7  HPFS/NTFS
== END   fdisk -l ==

== BEGIN blkid ==
/dev/mapper/VolGroup00-LogVol01: TYPE="swap"
/dev/mapper/VolGroup00-LogVol00: UUID="91edfc1b-c241-4d6a-b05c-ed853de88a8c" TYPE="ext3"
/dev/sda1: UUID="d34bf517-f5f0-43f6-bdc0-0c7211fac79c" SEC_TYPE="ext2" TYPE="ext3"
/dev/hda1: LABEL="/boot" UUID="efba9db1-549f-4457-b65c-89f0063553cc" TYPE="ext3"
/dev/VolGroup00/LogVol00: UUID="91edfc1b-c241-4d6a-b05c-ed853de88a8c" SEC_TYPE="ext2" TYPE="ext3"
/dev/VolGroup00/LogVol01: TYPE="swap"
/dev/sdb1: TYPE="ntfs"
== END   blkid ==



At the moment, only the main LV mounts. Very very interesting... possibly a controller issue? I can't figure why there would be problems with THREE separate drives.

I would appreciate any insight into this very strange issue. Thanks!

User avatar
AlanBartlett
Forum Moderator
Posts: 9311
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

[SOLVED] SMART error (FailedOpenDevice) - Unreadable partiti

Postby AlanBartlett » 2010/05/23 15:09:51

... possibly a controller issue?

The fact that both SATA drives are now showing problems seems to indicate that is a distinct possibility. Even more so, as you report that the NTFS drive is alive and well when connected to a different controller, in a Windoze system.

You can also use the smartctl utility for further investigation.

I wonder what its status report tells about sd[a|b]1?

You say "problems with THREE separate drives". Surely some mistake? Isn't it only two drives? :-? Or have I misread the information?

For completeness, how have you configured smartd on the system?

Hint: grep -v ^# /etc/smart.conf

User avatar
jlehtone
Posts: 1887
Joined: 2007/12/11 08:17:33
Location: Finland

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Postby jlehtone » 2010/05/23 16:07:45

I counted three disks, one PATA and two SATA.

You don't "smartctl partition". But yes, running

Code: Select all

# smartctl -a /dev/hda
# smartctl -a /dev/sda
# smartctl -a /dev/sdb

could reveal something. And I bet two of those deserve a "-d" parameter too,
which one might find from /etc/smart.conf or from output of smartctl.

The /etc/smart.conf might be the default from setup, and in that case it is not
informative.

User avatar
TrevorH
Forum Moderator
Posts: 21762
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Postby TrevorH » 2010/05/23 16:54:06

You've plugged the disk you took out into a different socket and it has been renamed from sda to sdb so the disks are reversed.

jaytech
Posts: 8
Joined: 2010/05/23 13:20:19

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Postby jaytech » 2010/05/23 16:54:31

Thank you for your replies. Yes I have three separate drives, the main PATA and two SATA.

I'm hoping if it is the controller that I'm not going to permanently corrupt the data, but it's probably just having an issue reading the drives.

I'm going to drop the ext3 drive in another centos system to see if I can read it.

Meanwhile, here's the output of smartctl. I haven't had time to read it myself because I'm on the road at the moment; I'll get to it later this evening.

Code: Select all

smartctl -a /dev/hda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar SE family
Device Model:     WDC WD800JB-00JJC0
Serial Number:    WD-WCAM9F771736
Firmware Version: 05.01C05
User Capacity:    80,026,361,856 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun May 23 12:36:53 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (2460) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  35) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   159   159   021    Pre-fail  Always       -       3008
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2217
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   072   072   000    Old_age   Always       -       20640
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2199
194 Temperature_Celsius     0x0022   118   095   000    Old_age   Always       -       25
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance offline  Interrupted (host reset)      70%      5593         -
# 2  Conveyance offline  Completed without error       00%      5593         -
# 3  Conveyance offline  Completed without error       00%      2341         -
# 4  Conveyance offline  Completed without error       00%      2338         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code: Select all

smartctl -a /dev/sda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST31000528AS
Serial Number:    9VP0W9WE
Firmware Version: CC35
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sun May 23 12:38:25 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 600) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 196) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   100   006    Pre-fail  Always       -       225042961
  3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       40
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   069   060   030    Pre-fail  Always       -       8087362
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6646
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       20
183 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   078   071   045    Old_age   Always       -       22 (Lifetime Min/Max 22/23)
194 Temperature_Celsius     0x0022   022   040   000    Old_age   Always       -       22 (0 16 0 0)
195 Hardware_ECC_Recovered  0x001a   023   013   000    Old_age   Always       -       225042961
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       24936580127243
241 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       403142538
242 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       893135541

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code: Select all

smartctl -a /dev/sdb
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG HD753LJ
Serial Number:    S13UJ1MPC22237
Firmware Version: 1AA01107
User Capacity:    750,156,374,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:    Sun May 23 12:40:37 2010 EDT

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (10245) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 172) minutes.
Conveyance self-test routine
recommended polling time:        (  19) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   068   068   011    Pre-fail  Always       -       10310
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       114
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       18466
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       114
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
184 Unknown_Attribute       0x0037   100   100   099    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   082   066   000    Old_age   Always       -       18 (Lifetime Min/Max 18/20)
194 Temperature_Celsius     0x0022   082   065   000    Old_age   Always       -       18 (Lifetime Min/Max 18/20)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       1149
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   200   200   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always       -       0
202 TA_Increase_Count       0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 0
Warning: ATA Specification requires self-test log structure revision number = 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

User avatar
AlanBartlett
Forum Moderator
Posts: 9311
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Postby AlanBartlett » 2010/05/23 17:14:44

I counted three disks, one PATA and two SATA.

Very true -- but only two of them, the SATA disks, seem to have a problem. ;-)

You don't "smartctl partition".

Agreed. Wetware malfunction here.

Looking at the smartctl output, everything seems to be o.k.

I recall having to add something like --

Code: Select all

/dev/sda -d sat -H -m root

-- into my smart.conf file to replace the default . . .

Between the pair of you, J & Trevor, I'm sure you'll get to the bottom of the OP's issue. (Which will be good for me, as I'm currently not at home and have very limited facilities to hand. :-) )

User avatar
jlehtone
Posts: 1887
Joined: 2007/12/11 08:17:33
Location: Finland

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Postby jlehtone » 2010/05/23 19:05:54

Trevor is right, there is a mismatch between fstab and blkid. The ext3 you could
mount by label or UUID, ignoring which device it gets named. Not sure if the same
applies to the ntfs.

The Seagate has read_errors, unlike the others. You could run tests with smartctl.
And naturally, do have backups.

jaytech
Posts: 8
Joined: 2010/05/23 13:20:19

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Postby jaytech » 2010/05/24 00:45:51

Trevor was correct, reversing the cables solved the mounting issue. I'm fairly certain the original FailedOpenDevice was due to a loose cable (at the motherboard header), which also accounted for the reason why when I tried swapping the cables the first time it did not solve the problem. I replaced both SATA cables for good measure. It's also possible that the Samsung drive malfunctioned and is on its way out; it has been powered on for over 10000+ hours. Whatever the case, all the drives are back up and running until return from my vacation and replace them.

The drives were only backup drives, but it's still annoying to lose several months of backups. Thankfully, all the data is intact. The problem with the overrun cylinder boundary on the Western Digital PATA drive is still a mystery, but at the moment it does not seem to be causing any issues to speak of.

Thank you all for your fast and helpful replies. You guys certainly know your stuff! Maybe when I get some free time this summer I'll contribute to the forums a bit myself.

User avatar
AlanBartlett
Forum Moderator
Posts: 9311
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: [SOLVED] SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Postby AlanBartlett » 2010/05/24 13:28:32

Thank you all for your fast and helpful replies. You guys certainly know your stuff!

You're welcome.

Maybe when I get some free time this summer I'll contribute to the forums a bit myself.

Please do -- everyone is welcome. :-)

For posterity, this thread is now marked as [SOLVED].