software RAID-1 data recovery

General support questions including new installations
User avatar
TrevorH
Forum Moderator
Posts: 21162
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/05 01:17:15

With all those processes using the mount? Yes, I'm almost sure you killed something that should normally be running but you're probably only at the equivalent of runlevel 1 now anyway so I'd not worry to much about it!

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/05 21:51:41

you're probably only at the equivalent of runlevel 1 now anyway so I'd not worry to much about it!

I have executed the command as root. Now I went to the cluster and the headnode didn't respond to the connected keyboard or the monitor. So I have rebooted the head node.

Now it shows the following:

Code: Select all

PXE-NFS: Exiting Intel Boot Agent
Operting System not found


Should I be worried now? Is it possible that some hardware was destroyed?

User avatar
TrevorH
Forum Moderator
Posts: 21162
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/05 23:46:34

Should you be worried? Yes, in my opinion that looks decidedly fatal but I cannot see how you killing a bunch of processes could cause a physical disk failure which is what that looks like. I guess it's also possible that your BIOS options have become corrupted and it's trying to boot from the network instead of the hard disk. I do also have to ask if you are sure that's the head node and not one of the clients attempting to PXE boot from the head node on which you've killed everything that might have allowed it to do a diskless boot?

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/06 00:01:04

Unfortunately, I cannot access the head node through the network anymore. I have to go to the cluster tomorrow and connect the monitor again. But today I did connect the monitor and keyboard to the head node for sure. And there was no response, not even the numpad light on. So I decided to reboot the head node. And when I do that it takes a some time. It takes more then a normal computer. So what you say might be possible.

But how can I check if one client is trying to boot from the head node? Sorry, I'm seriously afraid now to do something wrong. What would you recommend to do?

In the post #13 you can see the disk configuration on the head node. /dev/sdc1 is a partition on /dev/sdc where the computer boots from. I was thinking of inserting a bootable disk in the DVD drive tomorrow. Do you agree?

User avatar
TrevorH
Forum Moderator
Posts: 21162
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/03/06 00:17:25

Booting the system from the DVD in rescue mode can't do any harm and will allow you to look around and see what you can find.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/03/06 22:28:41

Today, I went to the cluster head node and I still couldn't boot it. I got these error messages:

"BMC system error log (SEL) Full"

"PXE-NFS: Exiting Intel Boot Agent
Operating System not found"

Then I have inserted a DVD with Puppy Linux. I was able to boot and the good thing is that I was able to mount /dev/sda and rescue all the data from /compute! This was the goal of this thread. But I cannot recover the head node. The hard disk /dev/sdc corresponding to its /home directory (for details see post #13 in this thread) seems to be gone.

I have tried "hdparm -I /dev/sdb" which I know is broken and I got:
"HDIO_DRIVE_CMD(identify) failed: Input/output error"

But the same command for /dev/sdc gave:
"HDIO_DRIVE_CMD(identify) failed: Invalid exchange"

For me it means that /dev/sdc is not broken yet.

1) But how can I access it?

2) How can I exclude that it is really the head node and not one of the client node trying to boot from the head node?