mdadm stopped working since motherboard was replaced

General support questions including new installations
Post Reply
petan
Posts: 5
Joined: 2015/04/21 00:04:22

mdadm stopped working since motherboard was replaced

Post by petan » 2015/04/21 00:17:11

I replace motherboard to supermicro X10SLL-F with Xeon E3 processor, since then, CentOS is not able to reassemble raid array on boot.

The hardware is just fine, when I try to boot debian on same box, it loads all md arrays on boot, even if they aren't anywhere in configs and they can be mounted and I see rootfs of centos.

The problem as I see it is somewhere in initrd, this is init script in initrd, unfortunatelly I have no idea how nash works:

Code: Select all

#!/bin/nash

mount -t proc /proc /proc
setquiet
echo Mounting proc filesystem
echo Mounting sysfs filesystem
mount -t sysfs /sys /sys
echo Creating /dev
mount -o mode=0755 -t tmpfs /dev /dev
mkdir /dev/pts
mount -t devpts -o gid=5,mode=620 /dev/pts /dev/pts
mkdir /dev/shm
mkdir /dev/mapper
echo Creating initial device nodes
mknod /dev/null c 1 3
mknod /dev/zero c 1 5
mknod /dev/urandom c 1 9
mknod /dev/systty c 4 0
mknod /dev/tty c 5 0
mknod /dev/console c 5 1
mknod /dev/ptmx c 5 2
mknod /dev/mem c 1 1
mknod /dev/tty0 c 4 0
mknod /dev/tty1 c 4 1
mknod /dev/tty2 c 4 2
mknod /dev/tty3 c 4 3
mknod /dev/tty4 c 4 4
mknod /dev/tty5 c 4 5
mknod /dev/tty6 c 4 6
mknod /dev/tty7 c 4 7
mknod /dev/tty8 c 4 8
mknod /dev/tty9 c 4 9
mknod /dev/tty10 c 4 10
mknod /dev/tty11 c 4 11
mknod /dev/tty12 c 4 12
mknod /dev/ttyS0 c 4 64
mknod /dev/ttyS1 c 4 65
mknod /dev/ttyS2 c 4 66
mknod /dev/ttyS3 c 4 67
mknod /dev/fb0 c 29 0
/bin/fbi  -f /etc/lat1-16.psfu -q  /etc/background.png
echo Setting up hotplug.
hotplug
echo Creating block device nodes.
mkblkdevs
echo "Loading ehci-hcd.ko module"
insmod /lib/ehci-hcd.ko 
echo "Loading ohci-hcd.ko module"
insmod /lib/ohci-hcd.ko 
echo "Loading uhci-hcd.ko module"
insmod /lib/uhci-hcd.ko 
echo "Loading md-mod.ko module"
insmod /lib/md-mod.ko 
echo "Loading scsi_mod.ko module"
insmod /lib/scsi_mod.ko 
echo "Loading sd_mod.ko module"
insmod /lib/sd_mod.ko 
echo "Loading libata.ko module"
insmod /lib/libata.ko 
echo "Loading ata_generic.ko module"
insmod /lib/ata_generic.ko 
echo "Loading ata_piix.ko module"
insmod /lib/ata_piix.ko 
echo "Loading pata_acpi.ko module"
insmod /lib/pata_acpi.ko 
echo "Loading dm-mod.ko module"
insmod /lib/dm-mod.ko 
echo "Loading dm-log.ko module"
insmod /lib/dm-log.ko 
echo "Loading dm-region-hash.ko module"
insmod /lib/dm-region-hash.ko 
echo "Loading scsi_dh.ko module"
insmod /lib/scsi_dh.ko 
echo "Loading scsi_dh_alua.ko module"
insmod /lib/scsi_dh_alua.ko 
echo "Loading scsi_dh_emc.ko module"
insmod /lib/scsi_dh_emc.ko 
echo "Loading scsi_dh_hp_sw.ko module"
insmod /lib/scsi_dh_hp_sw.ko 
echo "Loading scsi_dh_rdac.ko module"
insmod /lib/scsi_dh_rdac.ko 
echo "Loading raid1.ko module"
insmod /lib/raid1.ko 
echo Waiting for driver initialization.
stabilized --hash --interval 1000 /proc/scsi/scsi
mkblkdevs
echo "Scanning and configuring dmraid supported devices"
mdadm -vvAs # --auto=yes
echo "mdadm finished"
sleep 10
echo "trying recover by hand"
mdadm -A /dev/md0 /dev/sda2 /dev/sdb2
sleep 10
echo "raid autorun is here"
raidautorun /dev/md0
resume /var/swap/swap.001
sleep 10
echo Creating root device.
mkrootdev -t ext3 -o defaults,ro /dev/md0
sleep 10
echo Mounting root filesystem.
mount /sysroot
sleep 10
echo Setting up other filesystems.
setuproot
sleep 10
echo Switching to new root and running init.
sleep 10
switchroot
you can ignore the sleeps, they are there just for debugging so that I can read what's there before it disappears. Is there a way for me to somehow enter terminal during boot, in nash, so that I can enter commands there myself and I don't have to "change the init script, reboot, see the error, change the init script again, reboot again and so on" it's very annoying. Some recovery mode where I would be able to enter the terminal would be nice.

The error I see during boot is simply that mdadm -As doesn't find anything, no /dev/md0 is created, raidautorun fail as well with /dev/md0 not found and when I tried to run mdadm /dev/md0 /dev/sda2 /dev/sdb2 I got error that /dev/sda2 doesn't exist. I don't even know which devices do exist, because when I did ls /dev in nash, it didn't print anything to output. How do I display contents of /dev in nash?

The disks are connected using AHCI, both sata and they are connected on motherboard in order which should make them sda sdb sdc where sda sdb are parts of array and sdc is debian recovery. When I boot to recovery, I see the disks in this order just fine. I have no idea how they look on centos because I don't get far enough to be able to examine the system.

Even if you don't know how to fix this issue I would be happy just to understand how to debug nash little bit. I would like to see contents of /dev, maybe it would help? Also being able to run autoscan on mdadm with direct feedback would be nice, is there a way to spawn a shell during boot directly from initrd?

gerald_clark
Posts: 10642
Joined: 2005/08/05 15:19:54
Location: Northern Illinois, USA

Re: mdadm stopped working since motherboard was replaced

Post by gerald_clark » 2015/04/21 02:04:23

It sounds like the initrd does not contain the drivers for you new hard drive controller.
Boot the DVD in rescue mode. The RAID should auto-assemble.
You should be able to chroot and rebuild the initrd.

petan
Posts: 5
Joined: 2015/04/21 00:04:22

Re: mdadm stopped working since motherboard was replaced

Post by petan » 2015/04/21 07:08:42

Hello, thanks for reply

I tried this and the produced image from mkinitrd contained ever lower amount of modules than the previous one, it still fails to boot and I still THINK that it doesn't see the disks. I CAN'T CONFIRM this because I have absolutely no way to examine the system as it is during the moment it's booting. Debian has busybox fallback in case that initrd fails to mount the root system. This would EXTREMELY HELPFUL if there was such a choice in centos. Is there any way to run any kind of shell from that initrd script so that I can examine the system and find out what is actually wrong? Right now I just see a stream of text, then kernel panic and reboot, which is totally useless for any debugging.

petan
Posts: 5
Joined: 2015/04/21 00:04:22

Re: mdadm stopped working since motherboard was replaced

Post by petan » 2015/04/21 07:11:21

Here is a log from mkinitrd

Code: Select all

[root initrd_recover]# mkinitrd -v --fstab=/etc/fstab initrd-3.10.0+2-raid.img '3.10.0+2'
Creating initramfs
Modulefile is /etc/modprobe.d/blacklist-bridge /etc/modprobe.d/blacklist-compat /etc/modprobe.d/blacklist-vfunc-drivers /etc/modprobe.d/blacklist.conf /etc/modprobe.d/bnx2x /etc/modprobe.d/bonding /etc/modprobe.d/cifs.conf /etc/modprobe.d/disable-ipv6 /etc/modprobe.d/igb /etc/modprobe.d/ixgbe /etc/modprobe.d/lpfc /etc/modprobe.d/mlx4 /etc/modprobe.d/modprobe.conf.dist /etc/modprobe.d/mptbase
Looking for deps of module ehci-hcd
Looking for deps of module ohci-hcd
Looking for deps of module uhci-hcd
Looking for deps of module ext3
Found RAID array md0
Looking for driver for device /var/swap/swap.001
error opening /sys/block: No such file or directory
Looking for deps of module ide-disk
Looking for deps of module dm-mem-cache
Looking for deps of module dm-region-hash: dm-mod dm-log 
Looking for deps of module dm-mod
Looking for deps of module dm-log: dm-mod 
Looking for deps of module dm-message
Looking for deps of module dm-raid45
Looking for deps of module scsi_dh_alua: scsi_mod scsi_dh 
Looking for deps of module scsi_mod
Looking for deps of module sd_mod: scsi_mod 
Looking for deps of module scsi_dh: scsi_mod 
Looking for deps of module scsi_dh_emc: scsi_mod scsi_dh 
Looking for deps of module scsi_dh_hp_sw: scsi_mod scsi_dh 
Looking for deps of module scsi_dh_rdac: scsi_mod scsi_dh 
Using modules:   /lib/modules/3.10.0+2/kernel/drivers/usb/host/ehci-hcd.ko /lib/modules/3.10.0+2/kernel/drivers/usb/host/ohci-hcd.ko /lib/modules/3.10.0+2/kernel/drivers/usb/host/uhci-hcd.ko /lib/modules/3.10.0+2/kernel/drivers/md/dm-mod.ko /lib/modules/3.10.0+2/kernel/drivers/md/dm-log.ko /lib/modules/3.10.0+2/kernel/drivers/md/dm-region-hash.ko /lib/modules/3.10.0+2/kernel/drivers/scsi/scsi_mod.ko /lib/modules/3.10.0+2/kernel/drivers/scsi/sd_mod.ko /lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh.ko /lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh_alua.ko /lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh_emc.ko /lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko /lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko
/sbin/nash -> /tmp/initrd.jU8290/bin/nash
/sbin/insmod.static -> /tmp/initrd.jU8290/bin/insmod
copy from `/lib/modules/3.10.0+2/kernel/drivers/usb/host/ehci-hcd.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/ehci-hcd.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/usb/host/ohci-hcd.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/ohci-hcd.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/usb/host/uhci-hcd.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/uhci-hcd.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/md/dm-mod.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/dm-mod.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/md/dm-log.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/dm-log.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/md/dm-region-hash.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/dm-region-hash.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/scsi/scsi_mod.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/scsi_mod.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/scsi/sd_mod.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/sd_mod.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/scsi_dh.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh_alua.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/scsi_dh_alua.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh_emc.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/scsi_dh_emc.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/scsi_dh_hp_sw.ko' [elf64-x86-64]
copy from `/lib/modules/3.10.0+2/kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko' [elf64-x86-64] to `/tmp/initrd.jU8290/lib/scsi_dh_rdac.ko' [elf64-x86-64]
/sbin/dmraid.static -> /tmp/initrd.jU8290/bin/dmraid
/sbin/kpartx -> /tmp/initrd.jU8290/bin/kpartx
/lib64/libc.so.6 -> /tmp/initrd.jU8290//lib64/libc.so.6
/lib64/libdevmapper.so.1.02 -> /tmp/initrd.jU8290//lib64/libdevmapper.so.1.02
/lib64/libdl.so.2 -> /tmp/initrd.jU8290//lib64/libdl.so.2
/lib64/libselinux.so.1 -> /tmp/initrd.jU8290//lib64/libselinux.so.1
/lib64/libsepol.so.1 -> /tmp/initrd.jU8290//lib64/libsepol.so.1
Adding module ehci-hcd
Adding module ohci-hcd
Adding module uhci-hcd
Adding module dm-mod
Adding module dm-log
Adding module dm-region-hash
Adding module scsi_mod
Adding module sd_mod
Adding module scsi_dh
Adding module scsi_dh_alua
Adding module scsi_dh_emc
Adding module scsi_dh_hp_sw
Adding module scsi_dh_rdac

petan
Posts: 5
Joined: 2015/04/21 00:04:22

Re: mdadm stopped working since motherboard was replaced

Post by petan » 2015/04/21 07:59:07

I managed to copy static busybox to that initrd and now I can see that indeed, no disks were detected /dev contains 100 loop devices (no idea why) tty0-12 and some other devices, like mem null... nothing what looks like hard drives, so I suppose it's indeed some wrong module? I will keep investigating...

BTW this CentOS is Dom0 of a XEN hypervisor, so it doesn't boot on bare metal, but as I said, it /did/ work in past, before I switched the mother board.

petan
Posts: 5
Joined: 2015/04/21 00:04:22

Re: mdadm stopped working since motherboard was replaced

Post by petan » 2015/04/21 09:22:42

I solved the issue by a temporary hackish change in BIOS, where I disabled AHCI and turned on legacy IDE mode. This kind of suck because I would like to utilize AHCI but it seems that this combination of xen and centos 5.x doesn't handle it. I don't even know if that is a Xen kernel of Linux kernel issue :/

Booting linux (debian at least, although with older kernel) on bare metal with AHCI and all modern features works fine no idea how

Post Reply