software RAID-1 data recovery

General support questions including new installations
User avatar
TrevorH
Forum Moderator
Posts: 24085
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Post by TrevorH » 2012/02/27 21:43:30

Can you also post the output from `cat /proc/mdstat` and `mdadm --detail /dev/md0` on this same machine.

The good news is that according to that mdadm output, your two disks are in a RAID 1 mirrored configuration so /dev/sda ought to be an exact copy of the other one. Unless the RAID previously failed and data was written to /dev/sdb then the two disks miraculously switched health and the first one died while the other revived I find it difficult to work out how you could have lost anything.

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Post by pschaff » 2012/02/28 23:40:55

The data could have been lost if the raid was synced with with the disk with the missing data as the "master". Hopefully that is not the case. Best I can suggest is to try mounting each disk, /dev/sda and /dev/sdb independently and see what is recoverable. The original setup was odd as the disks were not partitioned. properly.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/02/29 03:34:34

Trevor, here are the results of the commands:

1) cat /proc/mdstat =>
[code]
Personalities : [raid1]
md0 : active raid1 sda[0] sdb[1]
1465138496 blocks [2/2] [UU]

unused devices: <none>
[/code]

[code]
2) /sbin/mdadm --detail /dev/md0 =>
/dev/md0:
Version : 00.90.01
Creation Time : Tue Jan 13 12:03:05 2009
Raid Level : raid1
Array Size : 1465138496 (1397.26 GiB 1500.30 GB)
Device Size : 1465138496 (1397.26 GiB 1500.30 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sun Jan 15 16:49:07 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0


Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 8 16 1 active sync /dev/sdb
UUID : 89cd271b:f66ca0cc:0d4ef2da:9877d7ef
Events : 0.27518273
[/code]



Phil,
How specifically should be try to mount them? or is there something we need to do to prepare? When we try to mount them right now it says it can't be done because they are busy. We thing this is because the RAID is still running.

Thanks.

User avatar
TrevorH
Forum Moderator
Posts: 24085
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Post by TrevorH » 2012/02/29 11:42:36

Everything you've posted so far seems to show that your RAID array is synced and performing normally except for the output from

[code]
/sbin/mdadm --examine /dev/sdb
[/code]

You could try running

[code]
echo "check" > /sys/block/md0/md/sync_action
[/code]

This will check that all blocks on the first device are identical on the second. Results are logged to syslog and will take a while to run on 1.5TB - allow > 5 hours I'd guess.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/03/01 01:02:56

this should be run from the head node, the same machine that /compute is connected to, right? And do I need to be root when I do this?
Thanks

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Post by pschaff » 2012/03/01 01:03:37

Yes, and yes.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/03/01 17:16:40

the command gives:

[root@lcpp-cluster ~]# echo "check" > /sys/block/md0/md/sync_action

[code]
bash: /sys/block/md0/md/sync_action: No such file or directory

And if I go to the /sys/block/md0/ directory and do list -l I find this:

-r--r--r-- 1 root root 4096 Mar 1 10:11 dev
-r--r--r-- 1 root root 4096 Mar 1 10:11 range
-r--r--r-- 1 root root 4096 Mar 1 10:11 removable
-r--r--r-- 1 root root 4096 Mar 1 10:11 size
-r--r--r-- 1 root root 4096 Mar 1 10:11 stat
[/code]

So there is not sync_action.

User avatar
TrevorH
Forum Moderator
Posts: 24085
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Post by TrevorH » 2012/03/01 19:01:16

Oh, that must be a feature enabled only in the CentOS 5 (and later) kernels.

I'm not sure where to go from here. Perhaps it would help if you told us what data you are missing and how you think it happened?

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/03/02 01:37:04

Thanks for the quick reply, Trevor!
I think what happened is that the disk /dev/sdb is broken and that's why the data on RAID is not available. From the analysis that I have posted earlier it looks like /dev/sda has still all the data.

I would like to access is somehow. For this I was trying to mount /dev/sda to some random directory after I have unmounted it from /compute but it says the disk is busy. So I guess I have to stop the RAID somehow which is still running. And then I would like to access the data from there.

User avatar
TrevorH
Forum Moderator
Posts: 24085
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Post by TrevorH » 2012/03/02 09:15:34

The trouble is that I think you are already accessing the data on /dev/sda by using it via /dev/md0

However you can stop the array by using

[code]
mdadm --stop /dev/md0
[/code]

That should allow you to access /dev/sda directly but if you do this, I would be tempted to mount it readonly if you can - i.e. `mount -t ext3 -o ro /dev/sda /mnt/whatever`.

Post Reply