Configure 'smartmontools' to send email notifications when disk failure

Issues related to hardware problems
User avatar
Zorba
Posts: 75
Joined: 2016/03/05 13:23:59

Configure 'smartmontools' to send email notifications when disk failure

Post by Zorba » 2017/03/29 19:42:00

Hello

My server has no Raid so I installed 'smartmontools' to monitor the health of disks. now I need to make 'smartmontools' send notifications to my email in case any disk failed.

My server has 4 disks ( sda - sdb - sdc - sdd ) and I created 'smartd.conf' file for that and put it in the path : '/etc/smartd.conf' .... and it contains exactly the following lines :

Code: Select all

DEVICESCAN -H -m root
/dev/sda -a -s (S/../../(1|3|6)/01|L) -m xxxxxxx@yahoo.com -M diminishing
/dev/sdb -a -s (S/../../(1|3|6)/01|L) -m xxxxxxx@yahoo.com -M diminishing
/dev/sdc -a -s (S/../../(1|3|6)/01|L) -m xxxxxxx@yahoo.com -M diminishing
/dev/sdd -a -s (S/../../(1|3|6)/01|L) -m xxxxxxx@yahoo.com -M diminishing
Did I write the code correctly or there is something missing or some wrong ? ... Also after restart 'smartd' how can I make it send a test notification just to make sure the conf file is working well ?

Thank you

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Configure 'smartmontools' to send email notifications when disk failure

Post by TrevorH » 2017/03/29 20:50:41

The CentOS version uses /etc/smartmontools/smartd.conf and has lots of commented examples.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

User avatar
Zorba
Posts: 75
Joined: 2016/03/05 13:23:59

Re: Configure 'smartmontools' to send email notifications when disk failure

Post by Zorba » 2017/03/30 06:57:53

TrevorH wrote:The CentOS version uses /etc/smartmontools/smartd.conf and has lots of commented examples.
So I don't need to create a new 'smartd.conf' file.

Well I checked '/etc/smartmontools/smartd.conf ' and here is the content of it :

Code: Select all

# Sample configuration file for smartd.  See man smartd.conf.

# Home page is: http://smartmontools.sourceforge.net

# $Id: smartd.conf 3651 2012-10-18 15:11:36Z samm2 $

# smartd will re-read the configuration file if it receives a HUP
# signal

# The file gives a list of devices to monitor using smartd, with one
# device per line. Text after a hash (#) is ignored, and you may use
# spaces and tabs for white space. You may use '\' to continue lines.

# You can usually identify which hard disks are on your system by
# looking in /proc/ide and in /proc/scsi.

# The word DEVICESCAN will cause any remaining lines in this
# configuration file to be ignored: it tells smartd to scan for all
# ATA and SCSI devices.  DEVICESCAN may be followed by any of the
# Directives listed below, which will be applied to all devices that
# are found.  Most users should comment out DEVICESCAN and explicitly
# list the devices that they wish to monitor.
DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q

# Alternative setting to ignore temperature and power-on hours reports
# in syslog.
#DEVICESCAN -I 194 -I 231 -I 9

# Alternative setting to report more useful raw temperature in syslog.
#DEVICESCAN -R 194 -R 231 -I 9

# Alternative setting to report raw temperature changes >= 5 Celsius
# and min/max temperatures.
#DEVICESCAN -I 194 -I 231 -I 9 -W 5

# First (primary) ATA/IDE hard disk.  Monitor all attributes, enable
# automatic online data collection, automatic Attribute autosave, and
# start a short self-test every day between 2-3am, and a long self test
# Saturdays between 3-4am.
#/dev/hda -a -o on -S on -s (S/../.././02|L/../../6/03)

# Monitor SMART status, ATA Error Log, Self-test log, and track
# changes in all attributes except for attribute 194
#/dev/hdb -H -l error -l selftest -t -I 194 

# Monitor all attributes except normalized Temperature (usually 194),
# but track Temperature changes >= 4 Celsius, report Temperatures
# >= 45 Celsius and changes in Raw value of Reallocated_Sector_Ct (5).
# Send mail on SMART failures or when Temperature is >= 55 Celsius.
#/dev/hdc -a -I 194 -W 4,45,55 -R 5 -m admin@example.com

# An ATA disk may appear as a SCSI device to the OS. If a SCSI to
# ATA Translation (SAT) layer is between the OS and the device then
# this can be flagged with the '-d sat' option. This situation may
# become common with SATA disks in SAS and FC environments.
# /dev/sda -a -d sat

# A very silent check.  Only report SMART health status if it fails
# But send an email in this case
#/dev/hdc -H -C 0 -U 0 -m admin@example.com

# First two SCSI disks.  This will monitor everything that smartd can
# monitor.  Start extended self-tests Wednesdays between 6-7pm and
# Sundays between 1-2 am
#/dev/sda -d scsi -s L/../../3/18
#/dev/sdb -d scsi -s L/../../7/01

# Monitor 4 ATA disks connected to a 3ware 6/7/8000 controller which uses
# the 3w-xxxx driver. Start long self-tests Sundays between 1-2, 2-3, 3-4, 
# and 4-5 am.
# NOTE: starting with the Linux 2.6 kernel series, the /dev/sdX interface
# is DEPRECATED.  Use the /dev/tweN character device interface instead.
# For example /dev/twe0, /dev/twe1, and so on.
#/dev/sdc -d 3ware,0 -a -s L/../../7/01
#/dev/sdc -d 3ware,1 -a -s L/../../7/02
#/dev/sdc -d 3ware,2 -a -s L/../../7/03
#/dev/sdc -d 3ware,3 -a -s L/../../7/04

# Monitor 2 ATA disks connected to a 3ware 9000 controller which
# uses the 3w-9xxx driver (Linux, FreeBSD). Start long self-tests Tuesdays
# between 1-2 and 3-4 am.
#/dev/twa0 -d 3ware,0 -a -s L/../../2/01
#/dev/twa0 -d 3ware,1 -a -s L/../../2/03

# Monitor 2 SATA (not SAS) disks connected to a 3ware 9000 controller which
# uses the 3w-sas driver (Linux). Start long self-tests Tuesdays
# between 1-2 and 3-4 am.
# On FreeBSD /dev/tws0 should be used instead
#/dev/twl0 -d 3ware,0 -a -s L/../../2/01
#/dev/twl0 -d 3ware,1 -a -s L/../../2/03

# Same as above for Windows. Option '-d 3ware,N' is not necessary,
# disk (port) number is specified in device name.
# NOTE: On Windows, DEVICESCAN works also for 3ware controllers.
#/dev/hdc,0 -a -s L/../../2/01
#/dev/hdc,1 -a -s L/../../2/03

# Monitor 3 ATA disks directly connected to a HighPoint RocketRAID. Start long
# self-tests Sundays between 1-2, 2-3, and 3-4 am. 
#/dev/sdd -d hpt,1/1 -a -s L/../../7/01
#/dev/sdd -d hpt,1/2 -a -s L/../../7/02
#/dev/sdd -d hpt,1/3 -a -s L/../../7/03

# Monitor 2 ATA disks connected to the same PMPort which connected to the
# HighPoint RocketRAID. Start long self-tests Tuesdays between 1-2 and 3-4 am
#/dev/sdd -d hpt,1/4/1 -a -s L/../../2/01
#/dev/sdd -d hpt,1/4/2 -a -s L/../../2/03

# HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE.
# PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS
#
#   -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N
#   -T TYPE set the tolerance to one of: normal, permissive
#   -o VAL  Enable/disable automatic offline tests (on/off)
#   -S VAL  Enable/disable attribute autosave (on/off)
#   -n MODE No check. MODE is one of: never, sleep, standby, idle
#   -H      Monitor SMART Health Status, report if failed
#   -l TYPE Monitor SMART log.  Type is one of: error, selftest
#   -f      Monitor for failure of any 'Usage' Attributes
#   -m ADD  Send warning email to ADD for -H, -l error, -l selftest, and -f
#   -M TYPE Modify email warning behavior (see man page)
#   -s REGE Start self-test when type/date matches regular expression (see man page)
#   -p      Report changes in 'Prefailure' Normalized Attributes
#   -u      Report changes in 'Usage' Normalized Attributes
#   -t      Equivalent to -p and -u Directives
#   -r ID   Also report Raw values of Attribute ID with -p, -u or -t
#   -R ID   Track changes in Attribute ID Raw value with -p, -u or -t
#   -i ID   Ignore Attribute ID for -f Directive
#   -I ID   Ignore Attribute ID for -p, -u or -t Directive
#   -C ID   Report if Current Pending Sector count non-zero
#   -U ID   Report if Offline Uncorrectable count non-zero
#   -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit
#   -v N,ST Modifies labeling of Attribute N (see man page)
#   -a      Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198
#   -F TYPE Use firmware bug workaround. Type is one of: none, samsung
#   -P TYPE Drive-specific presets: use, ignore, show, showall
#    #      Comment: text after a hash sign is ignored
#    \      Line continuation character
# Attribute ID is a decimal integer 1 <= ID <= 255
# except for -C and -U, where ID = 0 turns them off.
# All but -d, -m and -M Directives are only implemented for ATA devices
#
# If the test string DEVICESCAN is the first uncommented text
# then smartd will scan for devices /dev/hd[a-l] and /dev/sd[a-z]
# DEVICESCAN may be followed by any desired Directives.

Which part of this text I have to remove the comment mark to make it active, and also to include in it all drivers and my email for notifications ?

Is it here :

Code: Select all

# A very silent check.  Only report SMART health status if it fails
# But send an email in this case
#/dev/hdc -H -C 0 -U 0 -m admin@example.com
So it will be :

Code: Select all

#A very silent check.  Only report SMART health status if it fails
#But send an email in this case
/dev/sda -H -C 0 -U 0 -m xxxxxx@yahoo.com
/dev/sdb -H -C 0 -U 0 -m xxxxxx@yahoo.com
/dev/sdc -H -C 0 -U 0 -m xxxxxx@yahoo.com
/dev/sdd -H -C 0 -U 0 -m xxxxxx@yahoo.com
I did that and restarted smartd, now how can I test if notifications works ?

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: Configure 'smartmontools' to send email notifications when disk failure

Post by hunter86_bg » 2017/04/05 20:41:42

Have you configured postfix on the machine?
I doubt that smartmontools will be able to play the role of SMART monitoring and e-mail client. Most utilities use mail command to send mail locally or remotely. But in all cases you will need postfix properly configured.

anoop121
Posts: 2
Joined: 2017/06/21 12:28:40

Re: Configure 'smartmontools' to send email notifications when disk failure

Post by anoop121 » 2017/06/21 12:33:16

Did you fix this issue, I am also facing the same issue, Could you help me how to fix this issue.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Configure 'smartmontools' to send email notifications when disk failure

Post by TrevorH » 2017/06/22 08:56:19

You could start by reading the previous replies...
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

anoop121
Posts: 2
Joined: 2017/06/21 12:28:40

Re: Configure 'smartmontools' to send email notifications when disk failure

Post by anoop121 » 2017/06/22 11:14:30

I have added the below entries in smartd.conf file, but unfortunately i am not getting any emails , currently /dev/sdd is failed in the server. how to test it, how to make sure the emails are sending when disk failed ? Also i have configured postfix and i am getting emails from server

# A very silent check. Only report SMART health status if it fails
# But send an email in this case
#/dev/hdc -H -C 0 -U 0 -m admin@example.com
/dev/sda -H -C 0 -U 0 -m anoopcvarghese@xxxx.com
/dev/sdb -H -C 0 -U 0 -m anoopcvarghese@xxxx.com
/dev/sdc -H -C 0 -U 0 -m anoopcvarghese@xxxx.com
/dev/sdd -H -C 0 -U 0 -m anoopcvarghese@xxxx.com

simon_lefisch
Posts: 92
Joined: 2017/07/12 21:02:02

Re: Configure 'smartmontools' to send email notifications when disk failure

Post by simon_lefisch » 2017/08/03 23:21:31

I actually have a question about smartmontools....I have a RAID 1 array and I know mdadm should monitor the array. Is there any point in have smartmontools monitor the individual disks of the arrray or is mdadm good enough?

Maybe @TrevorH can chime in?
Hardware:
Supermicro X10SRi-F mobo
E5-2683v4 16-core CPU
112GB ECC RAM
2x 250GB SSD RAID1 (current CentOS 7 version)
2x 500GB SSD RAID1 (VM Disk Image Storage)
2x 4TB HDD RAID1 (Backup Storage via FreeNAS VM)
2X 6TB HDD RAID1 (Data Storage via FreeNAS VM)

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Configure 'smartmontools' to send email notifications when disk failure

Post by TrevorH » 2017/08/03 23:51:06

AFAIK mdadm will only tell you about the drive after it fails but smartmon will tell you about anything in SMART data that might be showing it about to fail.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

simon_lefisch
Posts: 92
Joined: 2017/07/12 21:02:02

Re: Configure 'smartmontools' to send email notifications when disk failure

Post by simon_lefisch » 2017/08/03 23:54:10

TrevorH wrote:AFAIK mdadm will only tell you about the drive after it fails but smartmon will tell you about anything in SMART data that might be showing it about to fail.
Thats what I figured. I think I'll run both just to be safe. Thanks for clearing that up Trevor!
Hardware:
Supermicro X10SRi-F mobo
E5-2683v4 16-core CPU
112GB ECC RAM
2x 250GB SSD RAID1 (current CentOS 7 version)
2x 500GB SSD RAID1 (VM Disk Image Storage)
2x 4TB HDD RAID1 (Backup Storage via FreeNAS VM)
2X 6TB HDD RAID1 (Data Storage via FreeNAS VM)

Post Reply