software RAID-1 data recovery

General support questions including new installations
User avatar
TrevorH
Forum Moderator
Posts: 21174
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/02/27 21:43:30

Can you also post the output from `cat /proc/mdstat` and `mdadm --detail /dev/md0` on this same machine.

The good news is that according to that mdadm output, your two disks are in a RAID 1 mirrored configuration so /dev/sda ought to be an exact copy of the other one. Unless the RAID previously failed and data was written to /dev/sdb then the two disks miraculously switched health and the first one died while the other revived I find it difficult to work out how you could have lost anything.

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Postby pschaff » 2012/02/28 23:40:55

The data could have been lost if the raid was synced with with the disk with the missing data as the "master". Hopefully that is not the case. Best I can suggest is to try mounting each disk, /dev/sda and /dev/sdb independently and see what is recoverable. The original setup was odd as the disks were not partitioned. properly.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/02/29 03:34:34

Trevor, here are the results of the commands:

1) cat /proc/mdstat =>

Code: Select all

Personalities : [raid1]
md0 : active raid1 sda[0] sdb[1]
      1465138496 blocks [2/2] [UU]
     
unused devices: <none>


Code: Select all

2) /sbin/mdadm --detail /dev/md0 =>
/dev/md0:
        Version : 00.90.01
  Creation Time : Tue Jan 13 12:03:05 2009
     Raid Level : raid1
     Array Size : 1465138496 (1397.26 GiB 1500.30 GB)
    Device Size : 1465138496 (1397.26 GiB 1500.30 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Jan 15 16:49:07 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb
           UUID : 89cd271b:f66ca0cc:0d4ef2da:9877d7ef
         Events : 0.27518273




Phil,
How specifically should be try to mount them? or is there something we need to do to prepare? When we try to mount them right now it says it can't be done because they are busy. We thing this is because the RAID is still running.

Thanks.

User avatar
TrevorH
Forum Moderator
Posts: 21174
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/02/29 11:42:36

Everything you've posted so far seems to show that your RAID array is synced and performing normally except for the output from

Code: Select all

/sbin/mdadm --examine /dev/sdb


You could try running

Code: Select all

echo "check" > /sys/block/md0/md/sync_action


This will check that all blocks on the first device are identical on the second. Results are logged to syslog and will take a while to run on 1.5TB - allow > 5 hours I'd guess.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/01 01:02:56

this should be run from the head node, the same machine that /compute is connected to, right? And do I need to be root when I do this?
Thanks

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Postby pschaff » 2012/03/01 01:03:37

Yes, and yes.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/01 17:16:40

the command gives:

[root@lcpp-cluster ~]# echo "check" > /sys/block/md0/md/sync_action

Code: Select all

bash: /sys/block/md0/md/sync_action: No such file or directory

And if I go to the /sys/block/md0/ directory and do list -l I find this:

-r--r--r--  1 root root 4096 Mar  1 10:11 dev
-r--r--r--  1 root root 4096 Mar  1 10:11 range
-r--r--r--  1 root root 4096 Mar  1 10:11 removable
-r--r--r--  1 root root 4096 Mar  1 10:11 size
-r--r--r--  1 root root 4096 Mar  1 10:11 stat


So there is not sync_action.

User avatar
TrevorH
Forum Moderator
Posts: 21174
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/01 19:01:16

Oh, that must be a feature enabled only in the CentOS 5 (and later) kernels.

I'm not sure where to go from here. Perhaps it would help if you told us what data you are missing and how you think it happened?

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/02 01:37:04

Thanks for the quick reply, Trevor!
I think what happened is that the disk /dev/sdb is broken and that's why the data on RAID is not available. From the analysis that I have posted earlier it looks like /dev/sda has still all the data.

I would like to access is somehow. For this I was trying to mount /dev/sda to some random directory after I have unmounted it from /compute but it says the disk is busy. So I guess I have to stop the RAID somehow which is still running. And then I would like to access the data from there.

User avatar
TrevorH
Forum Moderator
Posts: 21174
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/02 09:15:34

The trouble is that I think you are already accessing the data on /dev/sda by using it via /dev/md0

However you can stop the array by using

Code: Select all

mdadm --stop /dev/md0


That should allow you to access /dev/sda directly but if you do this, I would be tempted to mount it readonly if you can - i.e. `mount -t ext3 -o ro /dev/sda /mnt/whatever`.