software RAID-1 data recovery

General support questions including new installations
User avatar
TrevorH
Forum Moderator
Posts: 21209
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/02 09:15:34

The trouble is that I think you are already accessing the data on /dev/sda by using it via /dev/md0

However you can stop the array by using

Code: Select all

mdadm --stop /dev/md0


That should allow you to access /dev/sda directly but if you do this, I would be tempted to mount it readonly if you can - i.e. `mount -t ext3 -o ro /dev/sda /mnt/whatever`.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/03 01:56:41

Unfortunately, stopping the RAID didn't work, after using your command I got:

Code: Select all

[root@lcpp-cluster /]# /sbin/mdadm --stop /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy


However, I was able to deactivate the faulty device /dev/sdb by:

Code: Select all

[root@lcpp-cluster mario]# /sbin/mdadm --manage --set-faulty /dev/md0 /dev/sdb
mdadm: set /dev/sdb faulty in /dev/md0


However, stopping the array still didn't work and gave the same message. So how can I do it? Any idea?

User avatar
TrevorH
Forum Moderator
Posts: 21209
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/03 02:57:43

You'd have to unmount the file system first and before that, make sure that no-one is using it.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/03 19:38:39

Thanks again, Trevor!

Could you be more specific about the commands:

1) How to unmount the file system?

2) How to make sure that nobody is using it?

User avatar
TrevorH
Forum Moderator
Posts: 21209
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/04 13:48:21

The command to unmount a filesystem is `umount /where-ever`. To check if it's in use you need to use something like `lsof | grep -i /where-ever` or `fuser -m /where-ever`. In all cases, /where-ever is the place where the filesystem is mounted (is it /compute?).

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/04 14:57:41

Thanks, but I have unmounted /compute already on the head node where the discs /dev/sda and /dev/sdb physically are. (See post #20
We unmounted /compute by using umount -l.
)

Now I want to mount /dev/sda in order to access the data but
1) the RAID is still running although I have tried to stop it
2) the device is busy if I try to mount it

Maybe it is because the nfs is still running? So I have to login on each node and stop nfs there?

User avatar
TrevorH
Forum Moderator
Posts: 21209
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/04 17:07:17

umount -l does a lazy unmount so the chances are that it is still there and in use. Yes, you need to stop all NFS clients from using the share and then you need to stop the NFS server on this machine before you can umount the shared directory.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/04 18:41:52

I have found the command

Code: Select all

/sbin/service nfs stop


But since there are several devices shared with nfs I was wondering if I can specifically stop one device?

Also are the commands for stopping the nfs server and client identical? Do I have to execute the same command on the clients first and then on the server?

And also I have tried the two suggested commands for /compute
To check if it's in use you need to use something like `lsof | grep -i /where-ever` or `fuser -m /where-ever`


The first one gives no output but the second is printing:

/compute/: 1r 1c 1e 2r 2c 3r 3c 4r 4c 5r 5c 6r 6c 7r 7c 8r 8c 9r 9c 10r 10c 11r 11c 12r 12c 13r 13c 14r 14c 15r 15c 62r 62c 63r 63c 64r 64c 65r 65c 66r 66c 95r 95c 96r 96c 97r 97c 98r 98c 99r 99c 243r 243c 366r 366c 367r 367c 368r 368c 369r 369c 375r 375c 376r 376c 377r 377c 378r 378c 379r 379c 380r 380c 398r 398c 419r 419c 1741r 1741c 1741e 2619r 2619c 2744r 2744c 2772r 2772c 2773r 2773c 3296 3296r 3296c 3296e 3473r 3473c 3473e 3474r 3474c 3474e 3476r 3476c 3476e 3619r 3619c 3619e 3620r 3620c 3620e 3622r 3622c 3622e 3832r 3832c 3832e 3845 3845r 3845c 3845e 3849r 3849c 3849e 3860r 3860c 3860e 3870r 3870c 3870e 3890 3890r 3890c 3890e 3914r 3914c 3914e 3915r 3915c 3915e 3917r 3917c 3917e 3923r 3923c 3923e 3939r 3939c 3939e 4024 4024r 4024c 4024e 4067r 4067c 4067e 4068r 4068c 4068e 4070r 4070c 4070e 4198r 4198c 4198e 4209r 4209c 4209e 4210r 4210c 4210e 4212r 4212c 4212e 4239r 4239c 4239e 4263r 4263c 4263e 4264r 4264c 4264e 4265r 4265c 4265e 4275 4275r 4275c 4275e 4311r 4311c 4311e 4343r 4343c 4343e 4358r 4358c 4358e 4542 4542r 4542c 4542e 4577 4577r 4577c 4577e 4614r 4614c 4614e 4674 4674r 4674c 4674e 4683 4683r 4683c 4683e 4699 4699r 4699c 4699e 4709 4709r 4709c 4709e 4740r 4740c 4740e 4752r 4752c 4752e 4753r 4753c 4753e 4755r 4755c 4755e 4759 4759r 4759c 4759e 4775r 4775c 4775e 4785r 4785c 4785e 4796 4796r 4796c 4796e 4828r 4828c 4828e 4884r 4884c 4884e 4885r 4885c 4885e 4887r 4887c 4887e 4969r 4969c 4969e 5271r 5271c 5271e 5346r 5346c 5346e 5351r 5351c 5351e 5360r 5360c 5360e 5366 5366r 5366c 5366e 5371r 5371c 5371e 5372r 5372c 5372e 5373r 5373c 5373e 5376r 5376c 5376e 5377r 5377c 5377e 5378r 5378c 5378e 5935r 5935e 5936r 5936e 5941r 5941c 5941e 5952 5952r 5952c 5952e 5977r 5977c 5977e 6102 6102r 6102c 6102e 6113r 6113c 6113e 6114 6114r 6114c 6114e 6160r 6160c 6160e 6177r 6177c 6177e 6178r 6178c 6178e 6182 6182r 6182c 6182e 6185r 6185c 6185e 6187r 6187c 6187e 6189r 6189c 6189e 6195 6195r 6195c 6195e 6227r 6227c 6227e 6231r 6231c 6231e 6233r 6233c 6233e 6235r 6235c 6235e 6241r 6241c 6241e 6243r 6243c 6243e 6245 6245r 6245c 6245e 6248r 6248c 6248e 6251r 6251c 6251e 6260r 6260c 6260e 6263r 6263c 6263e 6272r 6272c 6272e 6274r 6274c 6274e 6276r 6276c 6276e 7809 7809r 7809c 7809e 7810 7810r 7810c 7810e 7811 7811r 7811c 7811e 7812 7812r 7812c 7812e 7813 7813r 7813c 7813e 7814 7814r 7814c 7814e 7815 7815r 7815c 7815e 7816 7816r 7816c 7816e 7825 7825r 7825c 7825e 9427r 9427c 9434r 9434c 9599r 9599c 9599e 9603r 9603c 9604r 9604c 9605r 9605c 9606r 9606c 9607r 9607c 9608r 9608c 9609r 9609c 9610r 9610c 9611r 9611c 9612r 9612c 9613r 9613c 9614r 9614c 9615r 9615c 9616r 9616c 9617r 9617c 9618r 9618c 9619r 9619c 9620r 9620c 9621r 9621c 9622r 9622c 9623r 9623c 9624r 9624c 9625r 9625c 9626r 9626c 9627r 9627c 9628r 9628c 9629r 9629c 9630r 9630c 9631r 9631c 9632r 9632c 9633r 9633c 9634r 9634c 9636r 9636c 9637r 9637c 9640r 9640c 9640e 10929r 10929c 12197r 12197c 12197e 12223 12223r 12223c 12223e 12530r 12530c 12530e 12532r 12532c 12532e 12762r 12762e 16994 16994r 16994c 16994e 18592r 18592c 25302 25302r 25302c 25302e 25313 25313r 25313c 25313e 25318 25318r 25318c 25318e 25348r 25348c 25348e 25350r 25350c 25350e 25351 25351r 25351c 25351e 25352r 25352c 25352e 26809 26809r 26809e


What does it mean? Is this drive still used?

User avatar
TrevorH
Forum Moderator
Posts: 21209
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/04 20:50:42

Those numbers in the fuser output are process ids of the (many!) processes on that system that are using that mount point. You'll need to stop all those before you can umount that file system.

To stop only a single NFS export I guess you can edit /etc/exports and comment out any lines in there that refer to the mount, save it and then run `exportfs -ra` to have it take effect. However with all those processes using it I think it's unlikely to work. Perhaps you can reboot the system and come up in single user mode to do what you're after?

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/04 21:26:09

Ok, I have tried to kill those processes using

Code: Select all

fuser -km /compute

The connection closed and now I'm not able to login again. Is it possible that I closed some important processes? I'm a little bit worried now :-(