[SOLVED] SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

General support questions including new installations
Post Reply
jaytech
Posts: 8
Joined: 2010/05/23 13:20:19

[SOLVED] SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Post by jaytech » 2010/05/23 14:10:22

Greetings,

I've been browsing these forums for years and have always found solutions to my problems with CentOS - until today. I've looked high and low but I haven't been able to find any example of what I'm currently experiencing with my hard disks.

First off, I'm running CentOS as a Samba file server, on a Soltek SL-K8TPro-939 and AMD 64 3200+ (all the rage of five years ago). Here's my disk setup

Drive #1 (80 GB)
-Boot partition
-LVM partition (this drive holds the root filesystem)

Drive #2 (750GB)
NTFS filesystem

Drive #3 (1 TB)
ext3 Filesystem


Ok, so I get a notification in my system mail yesterday:


[code]The following warning/error was logged by the smartd daemon:

Device: /dev/sda, unable to open device

For details see host's SYSLOG (default: /var/log/messages).

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.[/code]

/var/log/messages displays many errors:

[code]
May 23 01:09:03 SERVER kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May 23 01:09:03 SERVER kernel: ata3.00: cmd 35/00:f8:47:8e:88/00:03:4c:00:00/e0 tag 0 dma 520192 out
May 23 01:09:03 SERVER kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
May 23 01:09:03 SERVER kernel: ata3.00: status: { DRDY }
May 23 01:09:03 SERVER kernel: ata3: hard resetting link
May 23 01:09:08 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:09:13 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:09:13 SERVER kernel: ata3: hard resetting link
May 23 01:09:17 SERVER smbd[26347]: [2010/05/23 01:09:17, 0] lib/util_sock.c:write_data(562)
May 23 01:09:17 SERVER smbd[26347]: write_data: write failure in writing to client 192.168.1.100. Error Connection reset by peer
May 23 01:09:17 SERVER smbd[26347]: [2010/05/23 01:09:17, 0] lib/util_sock.c:send_smb(761)
May 23 01:09:17 SERVER smbd[26347]: Error writing 4 bytes to client. -1. (Connection reset by peer)
May 23 01:09:18 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:09:23 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:09:23 SERVER kernel: ata3: hard resetting link
May 23 01:09:28 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:09:58 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:09:58 SERVER kernel: ata3: limiting SATA link speed to 1.5 Gbps
May 23 01:09:58 SERVER kernel: ata3: hard resetting link
May 23 01:10:03 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:10:03 SERVER kernel: ata3: reset failed, giving up
May 23 01:10:03 SERVER kernel: ata3.00: disabled
May 23 01:10:03 SERVER kernel: ata3: EH complete
May 23 01:10:03 SERVER kernel: sd 2:0:0:0: SCSI error: return code = 0x00040000
May 23 01:10:03 SERVER kernel: end_request: I/O error, dev sda, sector 1284017735
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502209
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502210
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502211
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502212
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502213
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502214
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502215
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502216
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502217
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: Buffer I/O error on device sda1, logical block 160502218
May 23 01:10:03 SERVER kernel: lost page write due to I/O error on sda1
May 23 01:10:03 SERVER kernel: sd 2:0:0:0: SCSI error: return code = 0x00040000

... continues similar errors...


May 23 01:10:03 SERVER kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
May 23 01:10:03 SERVER kernel: ata3: hotplug_status 0x11
May 23 01:10:03 SERVER kernel: ata3: hard resetting link
May 23 01:10:08 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:08 SERVER ntfs-3g[1769]: Failed to write buffer to bitmap (-1 != 8192). Leaving inconsistent metadata: Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Cluster deallocation failed (159925600, 604841): Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Failed to free clusters. Leaving inconsistent metadata.
May 23 01:10:08 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:08 SERVER kernel: printk: 9512 messages suppressed.
May 23 01:10:08 SERVER kernel: Buffer I/O error on device sda1, logical block 786431
May 23 01:10:08 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:08 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Reading of last byte failed (-1). Leaving inconsistent metadata: Input/output error
May 23 01:10:08 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:08 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Reading of last byte failed (-1). Leaving inconsistent metadata: Input/output error
May 23 01:10:08 SERVER ntfs-3g[1769]: Failed to free base MFT record. Leaving inconsistent metadata.
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read vcn 0x4: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read vcn 0x4: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read index block: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read vcn 0x4: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device
May 23 01:10:09 SERVER ntfs-3g[1769]: Failed to read vcn 0x4: Input/output error
May 23 01:10:09 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:10:09 SERVER kernel: sd 2:0:0:0: rejecting I/O to offline device

... continues similar errors...

May 23 01:14:03 SERVER kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen t1
May 23 01:14:03 SERVER kernel: ata3: hotplug_status 0x11
May 23 01:14:03 SERVER kernel: ata3: hard resetting link
May 23 01:14:09 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:14:13 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:14:13 SERVER kernel: ata3: hard resetting link
May 23 01:14:19 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:14:23 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:14:23 SERVER kernel: ata3: hard resetting link
May 23 01:14:29 SERVER kernel: ata3: link is slow to respond, please be patient (ready=-19)
May 23 01:14:58 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:14:58 SERVER kernel: ata3: limiting SATA link speed to 1.5 Gbps
May 23 01:14:58 SERVER kernel: ata3: hard resetting link
May 23 01:15:03 SERVER kernel: ata3: COMRESET failed (errno=-16)
May 23 01:15:03 SERVER kernel: ata3: reset failed, giving up
May 23 01:15:03 SERVER kernel: ata3: EH pending after 5 tries, giving up
May 23 01:15:03 SERVER kernel: ata3: EH complete
May 23 01:15:03 SERVER kernel: ata3.00: detaching (SCSI 2:0:0:0)
May 23 01:37:51 SERVER smartd[16932]: Device: /dev/sda, No such device, open() failed
May 23 01:37:51 SERVER smartd[16932]: Sending warning via mail to root ...
May 23 01:37:52 SERVER smartd[16932]: Warning via mail to root: successful
May 23 01:42:54 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:42:54 SERVER kernel: printk: 200 messages suppressed.
May 23 01:42:54 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:42:54 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:42:54 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:42:55 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:42:55 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:42:55 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:42:55 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:42:55 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:42:55 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:42:55 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:42:55 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:43:54 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:43:54 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:43:54 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:43:54 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:43:54 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:43:54 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:43:54 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:43:54 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:43:55 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:43:55 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:43:55 SERVER ntfs-3g[1769]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 23 01:43:55 SERVER ntfs-3g[1769]: Failed to read of MFT, mft=5 count=1 br=-1: Input/output error
May 23 01:43:55 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:43:55 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
May 23 01:43:55 SERVER kernel: scsi 2:0:0:0: rejecting I/O to dead device
May 23 01:43:55 SERVER kernel: Buffer I/O error on device sda1, logical block 786433
[/code]

I'll attach the rest of the log on request.

So, my partition /dev/sda1 was unreadable. I pulled the drive, put it into an external enclosure on another PC (windows since it's NTFS), works great! No errors. I rebooted the CentOS system twice while troubleshooting a bit, nothing unusual. Stick the drive back in, boot up, now the games begin. Now my /dev/sdb1 won't mount (error on boot).
Manual mount produces the following:
[code]mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
[/code]

messages log:
[code]May 23 03:30:48 SERVER kernel: VFS: Can't find ext3 filesystem on dev sdb1.[/code]

Interesting to note, now my main disk is throwing errors on fdisk -l (it wasn't like that previously)
Here's my getinfo:

[code]== BEGIN uname -rmi ==
2.6.18-164.6.1.el5 i686 i386
== END uname -rmi ==

== BEGIN rpm -q centos-release ==
centos-release-5-4.el5.centos.1
== END rpm -q centos-release ==

== BEGIN getenforce ==
Disabled
== END getenforce ==

== BEGIN cat /etc/fstab ==
/dev/VolGroup00/LogVol00 / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/VolGroup00/LogVol01 swap swap defaults 0 0
/dev/sda1 /mnt/backup ntfs-3g rw,umask=0000,defaults 0 0
/dev/sdb1 /mnt/largebackup ext3 defaults 0 0
== END cat /etc/fstab ==

== BEGIN df -h ==
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
61G 26G 32G 45% /
/dev/hda1 99M 72M 23M 77% /boot
tmpfs 505M 0 505M 0% /dev/shm
== END df -h ==

== BEGIN fdisk -l ==

Disk /dev/hda: 80.0 GB, 80026361856 bytes
16 heads, 63 sectors/track, 155061 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

Device Boot Start End Blocks Id System
/dev/hda1 1 428 215523 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/hda2 * 428 155061 77935189+ 8e Linux LVM
Partition 2 does not end on cylinder boundary.

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 121601 976760001 83 Linux

Disk /dev/sdb: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 91201 732572001 7 HPFS/NTFS
== END fdisk -l ==

== BEGIN blkid ==
/dev/mapper/VolGroup00-LogVol01: TYPE="swap"
/dev/mapper/VolGroup00-LogVol00: UUID="91edfc1b-c241-4d6a-b05c-ed853de88a8c" TYPE="ext3"
/dev/sda1: UUID="d34bf517-f5f0-43f6-bdc0-0c7211fac79c" SEC_TYPE="ext2" TYPE="ext3"
/dev/hda1: LABEL="/boot" UUID="efba9db1-549f-4457-b65c-89f0063553cc" TYPE="ext3"
/dev/VolGroup00/LogVol00: UUID="91edfc1b-c241-4d6a-b05c-ed853de88a8c" SEC_TYPE="ext2" TYPE="ext3"
/dev/VolGroup00/LogVol01: TYPE="swap"
/dev/sdb1: TYPE="ntfs"
== END blkid ==

[/code]

At the moment, only the main LV mounts. Very very interesting... possibly a controller issue? I can't figure why there would be problems with THREE separate drives.

I would appreciate any insight into this very strange issue. Thanks!

User avatar
AlanBartlett
Forum Moderator
Posts: 9319
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

[SOLVED] SMART error (FailedOpenDevice) - Unreadable partiti

Post by AlanBartlett » 2010/05/23 15:09:51

[quote]
... possibly a controller issue?
[/quote]
The fact that both SATA drives are now showing problems seems to indicate that is a distinct possibility. Even more so, as you report that the NTFS drive is alive and well when connected to a different controller, in a [i]Windoze[/i] system.

[quote]
You can also use the smartctl utility for further investigation.
[/quote]
I wonder what its status report tells about [i]sd[a|b]1[/i]?

You say "[i]problems with THREE separate drives[/i]". Surely some mistake? Isn't it only two drives? :-? Or have I misread the information?

For completeness, how have you configured [i]smartd[/i] on the system?

Hint: [b]grep -v ^# /etc/smart.conf[/b]

User avatar
jlehtone
Posts: 1944
Joined: 2007/12/11 08:17:33
Location: Finland

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Post by jlehtone » 2010/05/23 16:07:45

I counted three disks, one PATA and two SATA.

You don't "smartctl [i]partition[/i]". But yes, running
[code]# smartctl -a /dev/hda
# smartctl -a /dev/sda
# smartctl -a /dev/sdb[/code]
could reveal something. And I bet two of those deserve a "-d" parameter too,
which one might find from /etc/smart.conf or from output of smartctl.

The /etc/smart.conf might be the default from setup, and in that case it is not
informative.

User avatar
TrevorH
Forum Moderator
Posts: 23000
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Post by TrevorH » 2010/05/23 16:54:06

You've plugged the disk you took out into a different socket and it has been renamed from sda to sdb so the disks are reversed.

jaytech
Posts: 8
Joined: 2010/05/23 13:20:19

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Post by jaytech » 2010/05/23 16:54:31

Thank you for your replies. Yes I have three separate drives, the main PATA and two SATA.

I'm hoping if it is the controller that I'm not going to permanently corrupt the data, but it's probably just having an issue reading the drives.

I'm going to drop the ext3 drive in another centos system to see if I can read it.

Meanwhile, here's the output of smartctl. I haven't had time to read it myself because I'm on the road at the moment; I'll get to it later this evening.

[code]smartctl -a /dev/hda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar SE family
Device Model: WDC WD800JB-00JJC0
Serial Number: WD-WCAM9F771736
Firmware Version: 05.01C05
User Capacity: 80,026,361,856 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun May 23 12:36:53 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (2460) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 35) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 159 159 021 Pre-fail Always - 3008
4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2217
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 072 072 000 Old_age Always - 20640
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2199
194 Temperature_Celsius 0x0022 118 095 000 Old_age Always - 25
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Conveyance offline Interrupted (host reset) 70% 5593 -
# 2 Conveyance offline Completed without error 00% 5593 -
# 3 Conveyance offline Completed without error 00% 2341 -
# 4 Conveyance offline Completed without error 00% 2338 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[/code]

[code]
smartctl -a /dev/sda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: ST31000528AS
Serial Number: 9VP0W9WE
Firmware Version: CC35
User Capacity: 1,000,204,886,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sun May 23 12:38:25 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 196) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 100 006 Pre-fail Always - 225042961
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 40
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 8087362
9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 6646
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 20
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 078 071 045 Old_age Always - 22 (Lifetime Min/Max 22/23)
194 Temperature_Celsius 0x0022 022 040 000 Old_age Always - 22 (0 16 0 0)
195 Hardware_ECC_Recovered 0x001a 023 013 000 Old_age Always - 225042961
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 24936580127243
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 403142538
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 893135541

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[/code]

[code]
smartctl -a /dev/sdb
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HD753LJ
Serial Number: S13UJ1MPC22237
Firmware Version: 1AA01107
User Capacity: 750,156,374,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 3b
Local Time is: Sun May 23 12:40:37 2010 EDT

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (10245) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 172) minutes.
Conveyance self-test routine
recommended polling time: ( 19) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 068 068 011 Pre-fail Always - 10310
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 114
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 1
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 18466
10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 114
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0037 100 100 099 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 082 066 000 Old_age Always - 18 (Lifetime Min/Max 18/20)
194 Temperature_Celsius 0x0022 082 065 000 Old_age Always - 18 (Lifetime Min/Max 18/20)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 1149
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 200 200 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0
202 TA_Increase_Count 0x0032 100 100 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 0
Warning: ATA Specification requires self-test log structure revision number = 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[/code]

User avatar
AlanBartlett
Forum Moderator
Posts: 9319
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Post by AlanBartlett » 2010/05/23 17:14:44

[quote]
I counted three disks, one PATA and two SATA.
[/quote]
Very true -- but only two of them, the SATA disks, seem to have a problem. ;-)

[quote]
You don't "smartctl partition".
[/quote]
Agreed. Wetware malfunction here.

Looking at the [i]smartctl[/i] output, everything seems to be o.k.

I recall having to add something like --

[code]
/dev/sda -d sat -H -m root
[/code]
-- into my [i]smart.conf[/i] file to replace the default . . .

Between the pair of you, [b]J[/b] & [b]Trevor[/b], I'm sure you'll get to the bottom of the OP's issue. (Which will be good for me, as I'm currently not at home and have very limited facilities to hand. :-) )

User avatar
jlehtone
Posts: 1944
Joined: 2007/12/11 08:17:33
Location: Finland

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Post by jlehtone » 2010/05/23 19:05:54

[b]Trevor[/b] is right, there is a mismatch between fstab and blkid. The ext3 you could
mount by label or UUID, ignoring which device it gets named. Not sure if the same
applies to the ntfs.

The Seagate has read_errors, unlike the others. You could run tests with smartctl.
And naturally, do have backups.

jaytech
Posts: 8
Joined: 2010/05/23 13:20:19

Re: SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Post by jaytech » 2010/05/24 00:45:51

Trevor was correct, reversing the cables solved the mounting issue. I'm fairly certain the original FailedOpenDevice was due to a loose cable (at the motherboard header), which also accounted for the reason why when I tried swapping the cables the first time it did not solve the problem. I replaced both SATA cables for good measure. It's also possible that the Samsung drive malfunctioned and is on its way out; it has been powered on for over 10000+ hours. Whatever the case, all the drives are back up and running until return from my vacation and replace them.

The drives were only backup drives, but it's still annoying to lose several months of backups. Thankfully, all the data is intact. The problem with the overrun cylinder boundary on the Western Digital PATA drive is still a mystery, but at the moment it does not seem to be causing any issues to speak of.

Thank you all for your fast and helpful replies. You guys certainly know your stuff! Maybe when I get some free time this summer I'll contribute to the forums a bit myself.

User avatar
AlanBartlett
Forum Moderator
Posts: 9319
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: [SOLVED] SMART error (FailedOpenDevice) - Unreadable partitions on multiple drives

Post by AlanBartlett » 2010/05/24 13:28:32

[quote]
Thank you all for your fast and helpful replies. You guys certainly know your stuff!
[/quote]
You're welcome.

[quote]
Maybe when I get some free time this summer I'll contribute to the forums a bit myself.
[/quote]
Please do -- everyone is welcome. :-)

For posterity, this thread is now marked as [SOLVED].

Post Reply