Can't find root filesystem after conversion to software RAID

General support questions including new installations
Post Reply
pdavis
Posts: 2
Joined: 2011/02/23 23:09:41

Can't find root filesystem after conversion to software RAID

Post by pdavis » 2011/02/24 00:05:16

A Centos OS 5.5 system's hard drive was starting to get I/O errors so I
converted it to use software RAID 1. Briefly, I did:

- installed one of the two new drives, the system only has 2 SATA ports
- partitioned the first new drive
- run: mdadm --create /dev/md0 --raid-devices=2 --level=1 /dev/sdb2 missing # for /boot
- run: mdadm --create /dev/md1 --raid-devices=2 --level=1 /dev/sdb3 missing # for root
- run: mdadm --create /dev/md2 --raid-devices=2 --level=1 /dev/sdb6 missing # for /home
- e2fsk -j on each of /dev/md0,md1,md2
- mounted each in turn and used cpio to copy relevant content from the old drive
- shut down, removed old drive, installed 2nd new drive
- booted from a standalone CD (System Rescue CD 1.6.3)
- partitioned the second new drive
- run: mdadm --add /dev/md0 /dev/sda2
- run: mdadm --add /dev/md1 /dev/sda3
- run: mdadm --add /dev/md2 /dev/sda6
- checked /proc/mdstat - looked ok
- ran grub and did a 'setup' on both hard drives
- mounted the new root and boot on /mnt/custom and /mnt/custom/boot
- did: chroot /mnt/custom /bin/bash to chroot into the system
- edited /etc/fstab to change devices for boot, / and home to /dev/md0,/dev/md1,/dev/md2
- created a new initial ramdisk:
cd /boot ; rm initrd-2.6.18-194.17.4.el5.img ; mkinitrd initrd-2.6.18-194.17.4.el5.img 2.6.18-194.17.4.el5
- edited grub.conf to change root=/dev/md1
- rebooted

Grub is finding the kernel, the kernel sees both hard drives. I can see the dm
modules being loaded, there is a short pause while the system waits for the
SCSI system to stabilize and then panics because it can't find /dev/root.

After lots of fiddling around, I ran out of ideas.

To get the system booting again, from the System Rescue CD, I removed the software
raid for the root filesystem:
- mdadm --zero-superblock /dev/sda3
- changed /etc/fstab and /etc/grub.conf to put root back to /dev/sda3
- rebuilt initial ramdisk

And the system boots and runs fine. The software raid /boot and /home are
mounted fine. /proc/mdstat shows:

Personalities : [raid1]
md1 : active raid1 sdb3[0]
97667072 blocks [2/1] [U_]

md2 : active raid1 sdb6[0] sda6[1]
877960128 blocks [2/2] [UU]

md0 : active raid1 sdb2[0] sda2[1]
104320 blocks [2/2] [UU]

unused devices:

The /dev/md1 remains of the root raid1 are still there and mounts correctly manually.

df shows:

Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 96132876 7733996 83515528 9% /
/dev/md2 864182880 114422024 705862852 14% /home
/dev/md0 101018 31285 64517 33% /boot
tmpfs 257408 0 257408 0% /dev/shm

Any ideas on what I missed with getting the root filesystem set up under raid 1?

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: Can't find root filesystem after conversion to software RAID

Post by pschaff » 2011/02/24 00:16:46

Welcome to the CentOS fora. Reading [url=https://www.centos.org/modules/newbb/viewforum.php?forum=47]FAQ & Readme First[/url] is recommended for new users.

Nothing I can see to explain the issue, but have you read [url=http://wiki.centos.org/HowTos/CentOS5ConvertToRAID]Convert a CentOS System to RAID 1[/url]? Perhaps that might generate some clues.

User avatar
jlehtone
Posts: 4530
Joined: 2007/12/11 08:17:33
Location: Finland

Re: Can't find root filesystem after conversion to software RAID

Post by jlehtone » 2011/02/24 07:16:24

[quote]pdavis wrote:
- mounted the new root and boot on /mnt/custom and /mnt/custom/boot
- did: chroot /mnt/custom /bin/bash to chroot into the system
- edited /etc/fstab to change devices for boot, / and home to /dev/md0,/dev/md1,/dev/md2
- created a new initial ramdisk:
cd /boot ; rm initrd-2.6.18-194.17.4.el5.img ; mkinitrd initrd-2.6.18-194.17.4.el5.img 2.6.18-194.17.4.el5
...
SCSI system to stabilize and then panics because it can't find /dev/root.
...
Any ideas on what I missed with getting the root filesystem set up under raid 1?[/quote]
Looks like your procedure might not have been exactly as in section 4.4.4 of the HowTo
linked by [b]pschaff[/b]. There is also a [url=http://wiki.centos.org/TipsAndTricks/CreateNewInitrd]Tip&Trick[/url] about it.

It is quite easy to have wrong commands in the 'init' file of the initrd image. It is also
typical that the first (and most descriptive) error message has barely scrolled past
the screen by the time the "can't find /dev/root" is shown.


Another (unimportant) detail is that you apparently did cpio sync from a running system.
Running systems do have files (particularly in /var/run) that are not there in shut down
system. Session specific, so to say. That is unnecessary risk.

But a cpio sync while still running, and then an rsync refresh while in rescue mode
is probably a good approach.

gerald_clark
Posts: 10642
Joined: 2005/08/05 15:19:54
Location: Northern Illinois, USA

Can't find root filesystem after conversion to software RAID

Post by gerald_clark » 2011/02/24 14:28:20

You also need to label the new filesystems.
This can be a problem when those labels are already in use.
You can either pull the original drives and boot the DVD in rescue mode to do the labeling,
or use different labels and edit fstab and the grub entries.

pdavis
Posts: 2
Joined: 2011/02/23 23:09:41

Re: Can't find root filesystem after conversion to software RAID

Post by pdavis » 2011/02/24 17:56:53

I did read the guide to migrating to software raid, perhaps there was something subtle I did differently.

I even tried adding the raid1 kernel module (using --preload) when I built the initial ramdisk and I could see the raid1 module being loaded while the init script ran, but it didn't make any difference.

An rsync with --delete running from the CD would have been a good idea, but I don't think it would have helped with the problem.

Under software raid, the kernel can't find /dev/md1 or it is silently refusing to mount it. It's not getting far enough that any content on the root partition could cause it problems.

The fact that a trivial conversion back to non-software RAID allowed the system to boot without complaint indicates that the contents of the root filesystem are ok. The fact that I can now mount the degraded /dev/md1 that only has the one drive in it and it looks ok seems to indicate that the RAID was set up correctly for the root partition.

Here is some additional information about the system that shouldn't have made any difference:
- The system started life as Centos 5.2 and had been upgraded incrementally to 5.5.
- The system had another PATA drive in it that CentOS called /dev/hda, but when booted under the System Rescue CD, it was called /dev/sda moving the other drive names up a letter. The partitions were created and grub booting set up while this was true, but I removed it before adding the 2nd new drive to the RAID set. The fact that it boots and runs from the same partition outside of RAID seems to indicate that any differences didn't matter.

OK, sometimes these thought processes help track down what is going on.

I examined the init script in the initial ramdisk and compared it to another software RAID installation and the 'raidautorun' command is missing meaning that the setup of the RAID got skipped. The mkinitrd script puts this command into the init script if detects there is any software raid in the currently running system. I bet that while I ran mkinitrd under the chroot environment running from the rescue CD, that /proc wasn't mounted and the script didn't notice that there was software RAID. :-o

I just reran the mkinit under the live system and it still didn't put 'raidautorun' in the init script, so something else is going on too, but I'm sure this is the answer.

For those wondering how to look at the contents of the init script, do:
cd /tmp ; mkdir img ; cd img ; zcat /boot/initrd-2.6.18-53.1.21.el5.img | cpio -ivc
and this will extract the contents of the initial ramdisk

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: Can't find root filesystem after conversion to software RAID

Post by pschaff » 2011/02/24 18:14:59

[quote]
pdavis wrote:
...
Here is some additional information about the system that shouldn't have made any difference:
- The system started life as Centos 5.2 and had been upgraded incrementally to 5.5.
- The system had another PATA drive in it that CentOS called /dev/hda, but when booted under the System Rescue CD, it was called /dev/sda moving the other drive names up a letter. The partitions were created and grub booting set up while this was true, but I removed it before adding the 2nd new drive to the RAID set. [/quote]
I wonder if the operations performed while using the [url=http://www.sysresccd.org/Main_Page]"System Rescue CD"[/url] may be the source of the problem - seems to be based on Gentoo. It might be better to boot from installation media in rescue mode, or to use the CentOS LiveCD.

It might be useful to see the output file from [b]./getinfo.sh disk[/b] as explained in [url=http://www.centos.org/modules/newbb/viewtopic.php?topic_id=25128&forum=47]How to provide information about your system[/url].

User avatar
TrevorH
Site Admin
Posts: 33215
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Can't find root filesystem after conversion to software RAID

Post by TrevorH » 2011/02/24 20:14:19

It would appear that the mkinitrd man page is out of date vs the operands it actually support. Running mkinitrd --help has a number of undocumented options listed that may help. One of them is --force-raid-probe

Post Reply