Page 1 of 1

smartctl errors being emailed to root

Posted: 2017/12/07 03:07:02
by hoodcanaljim
Hi

If this isn't the place to bring this up please tell me where I should post to.
The computer is a older Dell Optiplex GX270 with a single 200gb hard drive. I is the firewall machine in my system and doesn't use but a small part of the hard drive.
uname -a
Linux nub 2.6.32-696.16.1.el6.i686 #1 SMP Wed Nov 15 16:16:47 UTC 2017 i686 i686 i386 GNU/Linux
This email was generated by the smartd daemon running on:

host name: nub
DNS domain: [Unknown]
NIS domain: (none)

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors


For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.
I never found the SYSLOG they reference.

I have run short and long test and I am not sure what the problem is. One web post suggested using smartctl -q noserial -a /dev/sda to find the sectors that have issues. But I don't get any thing I recognise.

here's the messages data from the last startup.
Dec 6 15:41:51 nub smartd[2383]: smartd 5.43 2016-09-28 r4347 [i686-linux-2.6.32-696.16.1.el6.i686] (local build)
Dec 6 15:41:51 nub smartd[2383]: Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
Dec 6 15:41:51 nub smartd[2383]: Opened configuration file /etc/smartd.conf
Dec 6 15:41:51 nub smartd[2383]: Configuration file /etc/smartd.conf parsed.
Dec 6 15:41:51 nub smartd[2383]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Dec 6 15:41:51 nub smartd[2383]: Device: /dev/sda [SAT], opened
Dec 6 15:41:51 nub smartd[2383]: Device: /dev/sda [SAT], WDC WD2000JB-00REA0, S/N:WD-WMANL1335413, FW:20.00K20, 200 GB
Dec 6 15:41:51 nub smartd[2383]: Device: /dev/sda [SAT], found in smartd database: Western Digital Caviar SE
Dec 6 15:41:51 nub smartd[2383]: Device: /dev/sda [SAT], enabled SMART Attribute Autosave.
Dec 6 15:41:51 nub smartd[2383]: Device: /dev/sda [SAT], enabled SMART Automatic Offline Testing.
Dec 6 15:41:51 nub smartd[2383]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Dec 6 15:41:51 nub smartd[2383]: Monitoring 1 ATA and 0 SCSI devices
Dec 6 15:41:52 nub smartd[2383]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Dec 6 15:41:52 nub smartd[2383]: Sending warning via mail to root ...
Dec 6 15:41:53 nub smartd[2383]: Warning via mail to root: successful
Dec 6 15:41:53 nub smartd[2390]: smartd has fork()ed into background mode. New PID=2390.
du -sh
6.3M .
fdisk -l
Disk /dev/sda: 200.0 GB, 200049647616 bytes
255 heads, 63 sectors/track, 24321 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0002d7a6

Device Boot Start End Blocks Id System
/dev/sda1 * 1 64 512000 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 64 24322 194847744 8e Linux LVM

Disk /dev/mapper/vg_nub-lv_root: 53.7 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/mapper/vg_nub-lv_swap: 3187 MB, 3187671040 bytes
255 heads, 63 sectors/track, 387 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/mapper/vg_nub-lv_home: 142.6 GB, 142648279040 bytes
255 heads, 63 sectors/track, 17342 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

long test
smartctl 5.43 2016-09-28 r4347 [i686-linux-2.6.32-696.16.1.el6.i686] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar SE
Device Model: WDC WD2000JB-00REA0
Serial Number: WD-WMANL1335413
Firmware Version: 20.00K20
User Capacity: 200,049,647,616 bytes [200 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Dec 6 18:57:46 2017 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 120) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 6780) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 80) minutes.
Conveyance self-test routine
recommended polling time: ( 6) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 196 047 021 Pre-fail Always - 5200
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 503
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 6
7 Seek_Error_Rate 0x000f 200 200 051 Pre-fail Always - 2
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 74299
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 500
194 Temperature_Celsius 0x0022 117 099 000 Old_age Always - 33
196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 2
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 80% 8762 83532641
# 2 Extended captive Interrupted (host reset) 90% 8762 -
# 3 Extended captive Interrupted (host reset) 90% 8762 -
# 4 Extended captive Interrupted (host reset) 90% 8760 -
# 5 Short offline Completed without error 00% 8747 -
# 6 Extended captive Interrupted (host reset) 90% 8737 -
# 7 Extended offline Completed: read failure 70% 8729 126261593
# 8 Short offline Completed without error 00% 8723 -
# 9 Short offline Completed without error 00% 8699 -
#10 Short offline Completed without error 00% 8675 -
#11 Extended offline Completed without error 00% 8653 -
#12 Short offline Completed without error 00% 8651 -
#13 Short offline Completed without error 00% 8627 -
#14 Short offline Completed without error 00% 8603 -
#15 Short offline Completed without error 00% 8579 -
#16 Short offline Completed without error 00% 8555 -
#17 Short offline Completed without error 00% 8531 -
#18 Short offline Completed without error 00% 8507 -
#19 Extended offline Completed without error 00% 8485 -
#20 Short offline Completed without error 00% 8483 -
#21 Short offline Completed without error 00% 8459 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Perhaps #7 above is where the error occurs?

AHA
Jim

Re: smartctl errors being emailed to root

Posted: 2017/12/07 04:39:30
by Whoever
This is the issue:

Code: Select all

5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 6
6 reallocated sectors isn't much to worry about, but if it continues to increase, get ready to replace the hard drive.

In fact, since it must be quite an old drive, get ready to replace it, it's probably getting towards the end of its life.

Re: smartctl errors being emailed to root

Posted: 2017/12/07 07:23:30
by TrevorH
You've got a few problems appearing in that list
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 6
7 Seek_Error_Rate 0x000f 200 200 051 Pre-fail Always - 2
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 74299
196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 2
So the drive has reallocated 6 sectors already and currently has 2 sectors marked as bad that it is waiting to have rewritten so that it can assign spares to replace them from its internal store. It's also been powered up for 74,299 hours - which at 24 hours/d, 365d/y works out at 8.48 years which is quite old for a hard disk.

However, judging by the self test at the top of the list that failed and the power on hours figure at which it occurred, it may well have been this way for some considerable time as that says it happened after 8762 PoH - which is almost exactly 365 days.

Re: smartctl errors being emailed to root

Posted: 2017/12/07 16:11:02
by hoodcanaljim
Well I was thinking this winter I needed to upgrade three of my computers to Centos7. And since I don't believe this computer can support 7 it maybe time to upgrade the whole machine.
Or did I see somewhere that one version of 7 does support the older i386 machine? Yes I found it and am downloading it now.

Don't know about others but most of my computers do run 24/7 as I got accustomed to at work. Since most upgrades and other maintenance was done late at night. And my yum updates run around 3 am.

The first indication in any of the var/log/messages files was on Dec4.

On some of the lager drives (2/4 tb) they have 61 relocated sectors. I have no idea what total number of reallocations this drive can take.


Thanks for you help

Jim