CentOS Icon CentOS Logo
CentOS Text
   
  
www.centos.org Forum Index
   CentOS 5 - General Support
  mismatch_cnt

 

 Bottom   Previous Topic   Next Topic
  •  Rate Thread
      Rate this Thread
      Excellent
      Good
      Average
      Bad
      Terrible
Poster Thread Rated:  3 Votes
  •  Defected
      Defected
mismatch_cnt
#1
Newbie
Joined: 2009/11/8
From
Posts: 6
Hi everybody. I have a server running centos 5.4 (updated from 5.3 a couple of days ago) and cpanel on it.Also the machine has 2x 750GB hdd's running with software raid1. Last night i saw that about 2 hours the machine had a high load for no obvious reasons, so i started looking around why was that happening.
First i saw a mail alert from cpanel:
WARNING: mismatch_cnt is not 0 on /dev/md6

Then i checked /var/log/messages and i found this:
Nov 8 04:02:01 phoebe syslogd 1.4.1: restart.
Nov 8 04:22:01 phoebe kernel: md: syncing RAID array md0
Nov 8 04:22:01 phoebe kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Nov 8 04:22:01 phoebe kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Nov 8 04:22:01 phoebe kernel: md: using 128k window, over a total of 16779776 blocks.
Nov 8 04:22:01 phoebe kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
Nov 8 04:22:01 phoebe kernel: md: delaying resync of md2 until md0 has finished resync (they share one or more physical units)
Nov 8 04:22:01 phoebe kernel: md: delaying resync of md4 until md0 has finished resync (they share one or more physical units)
Nov 8 04:22:01 phoebe kernel: md: delaying resync of md6 until md0 has finished resync (they share one or more physical units)
Nov 8 04:22:01 phoebe kernel: md: delaying resync of md5 until md0 has finished resync (they share one or more physical units)
Nov 8 04:24:44 phoebe kernel: md: md0: sync done.
Nov 8 04:24:44 phoebe kernel: md: delaying resync of md5 until md2 has finished resync (they share one or more physical units)
Nov 8 04:24:44 phoebe kernel: md: delaying resync of md4 until md2 has finished resync (they share one or more physical units)
Nov 8 04:24:44 phoebe kernel: md: delaying resync of md1 until md2 has finished resync (they share one or more physical units)
Nov 8 04:24:44 phoebe kernel: md: delaying resync of md6 until md2 has finished resync (they share one or more physical units)
Nov 8 04:24:44 phoebe kernel: md: delaying resync of md2 until md4 has finished resync (they share one or more physical units)
Nov 8 04:24:44 phoebe kernel: md: syncing RAID array md6
Nov 8 04:24:44 phoebe kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Nov 8 04:24:45 phoebe kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Nov 8 04:24:45 phoebe kernel: md: using 128k window, over a total of 417963008 blocks.
Nov 8 04:24:46 phoebe kernel: md: delaying resync of md4 until md6 has finished resync (they share one or more physical units)
Nov 8 04:24:46 phoebe kernel: RAID1 conf printout:
Nov 8 04:24:46 phoebe kernel: --- wd:2 rd:2
Nov 8 04:24:47 phoebe kernel: disk 0, wo:0, o:1, dev:sda1
Nov 8 04:24:47 phoebe kernel: disk 1, wo:0, o:1, dev:sdb1
Nov 8 06:03:58 phoebe kernel: md: md6: sync done.
Nov 8 06:03:58 phoebe kernel: md: delaying resync of md4 until md5 has finished resync (they share one or more physical units)
Nov 8 06:03:58 phoebe kernel: md: delaying resync of md2 until md4 has finished resync (they share one or more physical units)
Nov 8 06:03:58 phoebe kernel: md: delaying resync of md1 until md2 has finished resync (they share one or more physical units)
Nov 8 06:03:58 phoebe kernel: md: syncing RAID array md5
Nov 8 06:03:58 phoebe kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Nov 8 06:03:58 phoebe kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Nov 8 06:03:58 phoebe kernel: md: using 128k window, over a total of 31463232 blocks.
Nov 8 06:03:58 phoebe kernel: RAID1 conf printout:
Nov 8 06:03:58 phoebe kernel: --- wd:2 rd:2
Nov 8 06:03:58 phoebe kernel: disk 0, wo:0, o:1, dev:sda8
Nov 8 06:03:58 phoebe kernel: disk 1, wo:0, o:1, dev:sdb8

Also getting this when running fdisk -l :
Disk /dev/md6 doesn't contain a valid partition table

Right now i'm running echo check > /sys/block/mdX/md6/sync_action to have a look if data on both drives matches. the system looks to be working fine so far. I never had an error like this when i was running centos 5.3 for about 2 months. Is there something to worry about? (sorry for my bad english)


Thanks in advance.
Posted on: 2009/11/8 16:26
Create PDF from Post Print
Top
  •  Defected
      Defected
Re: mismatch_cnt
#2
Newbie
Joined: 2009/11/8
From
Posts: 6
any ideas?
Posted on: 2009/11/9 9:35
Create PDF from Post Print
Top
  •  gulikoza
      gulikoza
Re: mismatch_cnt
#3
Peeking in the Member Window
Joined: 2007/5/6
From
Posts: 19
5.4 added a raid check script. Before 5.4, mismatch_cnt was never checked...
How big a problem this is, I don't know. I read that a small mismatch_cnt is nothing to worry about, it usually occurs on an unused part of the filesystem. But a repair is basically just copy the block to the other disk, without actually knowing if your copying good over bad or the other way around...
Posted on: 2009/11/9 11:30
Create PDF from Post Print
Top
  •  Defected
      Defected
Re: mismatch_cnt
#4
Newbie
Joined: 2009/11/8
From
Posts: 6
thanks alot for your reply. I'm getting a bit confused now about repairing the raid. If i do so it might replace the "good" data with the corrupted one?
Posted on: 2009/11/9 13:42
Create PDF from Post Print
Top
  •  logan
      logan
Re: mismatch_cnt
#5
Jr Board Member
Joined: 2008/8/1
From
Posts: 28
Quote:
Also getting this when running fdisk -l :
Disk /dev/md6 doesn't contain a valid partition table

Usually the md device maps to a partition table on the hard drive (/dev/sda4+/dev/sdb4 = /dev/md3) - fdisk -l /dev/sd?

Quote:
5.4 added a raid check script. Before 5.4, mismatch_cnt was never checked...

I don't think the raid was checked at all. Debian has had a job to check md arrays for a while, but I've never seen any such scripts on RedHat/Fedora/CentOS systems until now. I've been using software mdadm raid1 for a while and have always done my own monthly checks, but mismatch_cnt is new to me... There's some stuff in Debian's bug tracker and on mailing lists that seem to downplay it a bit, but I'm still fuzzy. I added a second job yesterday to check for a non-zero mismatch_cnt and issue a repair if found. I've heard that smaller mismatches (<=128) may be alright, but I don't know if that's fact so I'm repairing anything > 0.

You can issue a 'repair' on the affected md device with "echo repair > /sys/block/md6/md/sync_action". This will attempt to fix the problem, but won't actually update /sys/block/md*/md/mismatch_cnt though. A followup 'check' will, which you could just let run at the normal interval from cron.
Posted on: 2009/11/11 19:21
Create PDF from Post Print
Top
  •  simpfeld
      simpfeld
Re: mismatch_cnt
#6
Peeking in the Member Window
Joined: 2008/7/11
From
Posts: 13
As far as I know if there is swap on the md device you will potentially see this issue (also if swap is part of an LVM on the md device).
We may have also seen this if you have a VMWare VM on an md (maybe cause their swap is on the VM), but I can't totally confirm that.

The repair then check, I believe is the best option, but the paranoid part of me wanted to fsck the filesystems after they were repaired to help make sure everything was happy (touching /forcefsck and rebooting)
Posted on: 2009/11/13 16:33
Create PDF from Post Print
Top
  •  mAcRoS
      mAcRoS
Re: mismatch_cnt
#7
Newbie
Joined: 2009/11/20
From
Posts: 2
We also have problems with mismatch_cnt on our servers. One is Vmware server (raid10), and another is a mail server, which is also used for some backups (raid1).

These mismatches are appearing every week, and sound like running a repair on it is not a good idea (bad "good" block decision making by md).

So the question is, how serious is this ? (we already asked that on linux-raid list, someone says it is corrupting files, other guys are saying that on the fs level nobody will "feel" that mismatches.)
Posted on: 2009/11/20 9:23
Create PDF from Post Print
Top
  •  mAcRoS
      mAcRoS
Re: mismatch_cnt
#8
Newbie
Joined: 2009/11/20
From
Posts: 2
Any updates on this issue ?
Posted on: 2009/11/24 13:13
Create PDF from Post Print
Top
  •  pschaff
      pschaff
Re: mismatch_cnt
#9
Moderator
Joined: 2006/12/13
From Tidewater, Virginia, North America
Posts: 7185
Seems to be a feature rather than an issue. Have you tried the repair and check procedure?
http://lists.centos.org/pipermail/centos/2009-October/084510.html

Might want to follow with
touch /forcefsck
shutdown -r now
_________________
Phil
Required reading: FAQ & Readme first ; Search hint: google "your topic site:centos.org"; Smart Questions
Posted on: 2009/11/24 15:15
Create PDF from Post Print
Top
  •  Defected
      Defected
Re: mismatch_cnt
#10
Newbie
Joined: 2009/11/8
From
Posts: 6
Quote:

pschaff wrote:
Seems to be a feature rather than an issue. Have you tried the repair and check procedure?
http://lists.centos.org/pipermail/centos/2009-October/084510.html

Might want to follow with
touch /forcefsck
shutdown -r now


Well i did repair the raid and checked it after that... and it reterned 0 unsynchronized blocks. Every sunday just after the weekly cron job i'm getting 128 unsynchronized blocks. Not more not less... every week the same thing.
Posted on: 2009/11/26 7:28
Create PDF from Post Print
Top
  •  pschaff
      pschaff
Re: mismatch_cnt
#11
Moderator
Joined: 2006/12/13
From Tidewater, Virginia, North America
Posts: 7185
That does sound problematic. At least it is deterministic. What is running weekly, or do you just mean that 99-raid-check is regularly finding the problem? A bug report seems in order unless you can pin it on something in cron.weekly..
_________________
Phil
Required reading: FAQ & Readme first ; Search hint: google "your topic site:centos.org"; Smart Questions
Posted on: 2009/11/26 17:54
Create PDF from Post Print
Top
  •  Defected
      Defected
Re: mismatch_cnt
#12
Newbie
Joined: 2009/11/8
From
Posts: 6
Quote:

pschaff wrote:
That does sound problematic. At least it is deterministic. What is running weekly, or do you just mean that 99-raid-check is regularly finding the problem? A bug report seems in order unless you can pin it on something in cron.weekly..


cat /sys/block/md6/md/mismatch_cnt always reports 128 after the weekly 99-raid-check check. I did repair the raid twice and checked it right after to see the result and it was 0. I also did a check last saturday 2 hours before the 99-raid-check check was going to start an the result was still 0. I said ok, looks fine but on sunday morning again WARNING: mismatch_cnt is not 0 on /dev/md6 and again sys/block/md6/md/mismatch_cnt reports 128 unsynchronized blocks. I'm a bit confused, i'm not goin to do any check or repair this week just to see what will be the output of the weekly cron this sunday morning... i bet it will be again 128

Anyway my server works fine but it would be good to find out what is causing this (if there is something that's causing it).

sorry for my poor english
Posted on: 2009/11/27 2:41
Create PDF from Post Print
Top
  •  BartSimpson
      BartSimpson
Re: mismatch_cnt
#13
Peeking in the Member Window
Joined: 2007/3/3
From
Posts: 18
I have the problem, after the script is running the Array will be rebuild.
It will only happens on the first of 3 Raid 1 arrays. All Array are build over the same disks.
Posted on: 2009/12/7 7:13
Create PDF from Post Print
Top
  •  pjwelsh
      pjwelsh
Re: mismatch_cnt
#14
Professional Board Member
Joined: 2007/1/7
From Central IL USA
Posts: 2195
What is the rebuild command that you are running? The 99-raid-check check does not rebuild, it just reports. You could (and *SHOULD*) run something like:
echo "check" > /sys/block/md?/md/sync_action
were the "?" is the number you need. As an emphasis however, you should have some cron job in place to validate your RAID (eg the "check" command noted above) on a periodic basis! I check weekly. You can choose maybe monthly depending on your needs, maybe. This operation is also called a "parity scrub" and is (IMHO) critical to maintain operational and physical consistency and not get screwed by some disk error surprise.
Posted on: 2009/12/7 15:04
Create PDF from Post Print
Top
  •  pschaff
      pschaff
Re: mismatch_cnt
#15
Moderator
Joined: 2006/12/13
From Tidewater, Virginia, North America
Posts: 7185
Looks to me like the action of 99-raid-check depends on what's defined in /etc/sysconfig/raid-check and that it should check weekly if configured to do so. Am I missing something?
_________________
Phil
Required reading: FAQ & Readme first ; Search hint: google "your topic site:centos.org"; Smart Questions
Posted on: 2009/12/7 16:01
Create PDF from Post Print
Top
  •  pjwelsh
      pjwelsh
Re: mismatch_cnt
#16
Professional Board Member
Joined: 2007/1/7
From Central IL USA
Posts: 2195
Quote:

pschaff wrote:
Looks to me like the action of 99-raid-check depends on what's defined in /etc/sysconfig/raid-check and that it should check weekly if configured to do so. Am I missing something?


Learned something new... "REPAIR_DEVS a space delimited list of devs that the user specifically wants to run a repair on." in the /etc/sysconfig/raid-check as @pschaff says.
Posted on: 2009/12/7 16:41
Create PDF from Post Print
Top
  •  pschaff
      pschaff
Re: mismatch_cnt
#17
Moderator
Joined: 2006/12/13
From Tidewater, Virginia, North America
Posts: 7185
Just looked back at the comments in /etc/sysconfig/raid-check and it looks like CHECK will actually repair things if finds automatically, so now I'm a bit confused as to the utility of the REPAIR option as check seems to do both. Found a FAQ but it still is not totally clear to me what is meant by "personalities" being "taught" about check. Seems others have suffered similar confusion. Any enlightenment welcome.
_________________
Phil
Required reading: FAQ & Readme first ; Search hint: google "your topic site:centos.org"; Smart Questions
Posted on: 2009/12/7 18:30
Create PDF from Post Print
Top
  •  pjwelsh
      pjwelsh
Re: mismatch_cnt
#18
Professional Board Member
Joined: 2007/1/7
From Central IL USA
Posts: 2195
If the check is suppose to also repair, it did not work that way for me on the box with that messages. I needed to run the sync operation (or maybe the explicit REPAIR_DEVS) in order for the message to stop after the upgrade from 5.3 -> 5.4.
Posted on: 2009/12/8 13:29
Create PDF from Post Print
Top
  •  pjwelsh
      pjwelsh
Re: mismatch_cnt
#19
Professional Board Member
Joined: 2007/1/7
From Central IL USA
Posts: 2195
Just to make sure all that have or have had this issue understand:
http://lists.centos.org/pipermail/centos/2009-December/086667.html

and to quote part that make me sleep better:Quote:
On 12/1/2009 8:05 AM, Paul Bijnens wrote:
> I have the problem on 2 servers, and both of those servers are also running
> a VMware image (very small, but constantly used) under VMware Server 2.
> Could it be that the .vmem file, or even the virtual disk is constantly
> written to, and the raid is constantly out of sync because of that?
> (All my other VMware servers have hardware raid cards; or are still on
> Centos4.)

... that fills me with dread. The whole point of RAID-1 is supposed to
be that data that gets written to one drive also gets written to the
other drive. But yes, apparently will see this on systems where the
file is being constantly written to.

http://bergs.biz/blog/2009/03/01/startled-by-component-device-mismatches-on-raid1-volumes/

http://www.issociate.de/board/goto/1675787/mismatch_cnt_worries.html
(this is a post from 2007 that discusses the issue)

http://forum.nginx.org/read.php?24,16699

Apparently, a non-zero number is common on RAID-1 and RAID-10 due to
various (harmless?) issues like aborted writes in a swap file.

http://www.centos.org/modules/newbb/viewtopic.php?topic_id=23164&forum=37

Also mentions that it can happen with VMWare VM files.

And lastly, "please explain mismatch_cnt so I can sleep better at night".

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=405919

So my take on all of that is, if you see it on RAID-5 or RAID-6, you
should worry. But if it's on an array with memory mapped files or swap
files/partitions that is RAID-1 or RAID-10, it's less of a worry.
Posted on: 2009/12/17 17:11
Create PDF from Post Print
Top
  •  simpfeld
      simpfeld
Re: mismatch_cnt
#20
Peeking in the Member Window
Joined: 2008/7/11
From
Posts: 13
Sorry to resurrect an old thread, but just to update that this has now become a bugzilla with already an updated mdadm package in the "pending" state on Fedora 12, and as it has a separate RH5 bugzilla which I'd hope has mdadm updates pending too.

The patch basically stops the script checking mismatch_cnt on RAID 1 devices where it isn't really meaningful, but does still do the check which is essential for all RAIDs.

Fedora 12 Bugzilla Entry

RH5 Bugzilla Entry
Posted on: 2010/2/20 0:51
Create PDF from Post Print
Top
 Top   Previous Topic   Next Topic

 


 You cannot start a new topic.
 You can view topic.
 You cannot reply to posts.
 You cannot edit your posts.
 You cannot delete your posts.
 You cannot add new polls.
 You cannot vote in polls.
 You cannot attach files to posts.
 You cannot post without approval.




"Linux" is a registered trademark of Linus Torvalds. | All other trademarks are property of their respective owners. | All other content is Copyright @ 2004-2009 by the CentOS Project or "each individual contributor (forums, comments, etc.) unless otherwise assigned".| Theme based on a theme by 7dana.com