Page 3 of 5

Re: software RAID-1 data recovery

Posted: 2012/02/27 21:43:30
by TrevorH
Can you also post the output from `cat /proc/mdstat` and `mdadm --detail /dev/md0` on this same machine.

The good news is that according to that mdadm output, your two disks are in a RAID 1 mirrored configuration so /dev/sda ought to be an exact copy of the other one. Unless the RAID previously failed and data was written to /dev/sdb then the two disks miraculously switched health and the first one died while the other revived I find it difficult to work out how you could have lost anything.

Re: software RAID-1 data recovery

Posted: 2012/02/28 23:40:55
by pschaff
The data could have been lost if the raid was synced with with the disk with the missing data as the "master". Hopefully that is not the case. Best I can suggest is to try mounting each disk, /dev/sda and /dev/sdb independently and see what is recoverable. The original setup was odd as the disks were not partitioned. properly.

Re: software RAID-1 data recovery

Posted: 2012/02/29 03:34:34
by markmh
Trevor, here are the results of the commands:

1) cat /proc/mdstat =>
[code]
Personalities : [raid1]
md0 : active raid1 sda[0] sdb[1]
1465138496 blocks [2/2] [UU]

unused devices: <none>
[/code]

[code]
2) /sbin/mdadm --detail /dev/md0 =>
/dev/md0:
Version : 00.90.01
Creation Time : Tue Jan 13 12:03:05 2009
Raid Level : raid1
Array Size : 1465138496 (1397.26 GiB 1500.30 GB)
Device Size : 1465138496 (1397.26 GiB 1500.30 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sun Jan 15 16:49:07 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0


Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 8 16 1 active sync /dev/sdb
UUID : 89cd271b:f66ca0cc:0d4ef2da:9877d7ef
Events : 0.27518273
[/code]



Phil,
How specifically should be try to mount them? or is there something we need to do to prepare? When we try to mount them right now it says it can't be done because they are busy. We thing this is because the RAID is still running.

Thanks.

Re: software RAID-1 data recovery

Posted: 2012/02/29 11:42:36
by TrevorH
Everything you've posted so far seems to show that your RAID array is synced and performing normally except for the output from

[code]
/sbin/mdadm --examine /dev/sdb
[/code]

You could try running

[code]
echo "check" > /sys/block/md0/md/sync_action
[/code]

This will check that all blocks on the first device are identical on the second. Results are logged to syslog and will take a while to run on 1.5TB - allow > 5 hours I'd guess.

Re: software RAID-1 data recovery

Posted: 2012/03/01 01:02:56
by markmh
this should be run from the head node, the same machine that /compute is connected to, right? And do I need to be root when I do this?
Thanks

Re: software RAID-1 data recovery

Posted: 2012/03/01 01:03:37
by pschaff
Yes, and yes.

Re: software RAID-1 data recovery

Posted: 2012/03/01 17:16:40
by markmh
the command gives:

[root@lcpp-cluster ~]# echo "check" > /sys/block/md0/md/sync_action

[code]
bash: /sys/block/md0/md/sync_action: No such file or directory

And if I go to the /sys/block/md0/ directory and do list -l I find this:

-r--r--r-- 1 root root 4096 Mar 1 10:11 dev
-r--r--r-- 1 root root 4096 Mar 1 10:11 range
-r--r--r-- 1 root root 4096 Mar 1 10:11 removable
-r--r--r-- 1 root root 4096 Mar 1 10:11 size
-r--r--r-- 1 root root 4096 Mar 1 10:11 stat
[/code]

So there is not sync_action.

Re: software RAID-1 data recovery

Posted: 2012/03/01 19:01:16
by TrevorH
Oh, that must be a feature enabled only in the CentOS 5 (and later) kernels.

I'm not sure where to go from here. Perhaps it would help if you told us what data you are missing and how you think it happened?

Re: software RAID-1 data recovery

Posted: 2012/03/02 01:37:04
by markmh
Thanks for the quick reply, Trevor!
I think what happened is that the disk /dev/sdb is broken and that's why the data on RAID is not available. From the analysis that I have posted earlier it looks like /dev/sda has still all the data.

I would like to access is somehow. For this I was trying to mount /dev/sda to some random directory after I have unmounted it from /compute but it says the disk is busy. So I guess I have to stop the RAID somehow which is still running. And then I would like to access the data from there.

Re: software RAID-1 data recovery

Posted: 2012/03/02 09:15:34
by TrevorH
The trouble is that I think you are already accessing the data on /dev/sda by using it via /dev/md0

However you can stop the array by using

[code]
mdadm --stop /dev/md0
[/code]

That should allow you to access /dev/sda directly but if you do this, I would be tempted to mount it readonly if you can - i.e. `mount -t ext3 -o ro /dev/sda /mnt/whatever`.