boot failure - timed out waiting for device
Posted: 2017/10/18 15:58:33
Dear all,
I have a weird voodoo-like issue that started out of the blue.
After a standard maintenance reboot one out of two identical VMhosts (PROD + DEV Server, CentOS Linux release 7.4.1708) refused to start and ended up in emergency mode:
Error from journalctl output (relevant lines only):
Unfortunately, the PROD system is the one failing. I found a workaround to bring the system back alive (login, run pvscan, activate the missing logical volume and mount it). For future reboots, a "noauto" mount parameter did the trick, although that's obviously not the preferred solution since it requires manual intervention after startup (pvscan, missing LV activation, mounting /opt, starting services).
/etc/fstab:
For me it looks like if during the boot process the intitial (pvscan) scan to detect both physical volumes (sda + sdb which is the datavg = "/opt" that fails), times out for the 2nd device so that the corresponding systemd job "dev-disk-by\x2duuid-1b1e5c6a\x2da824\x2d4ec5\x2d951d\x2d977e883a12e6.device/start" will time out/fail.
Most struggling to me is that the identical DEV system does not show this problem and I am not aware of having changed anything that might cause this.
Since I now already spent more than 2 days searching the web for hints regarding a possible cause and solution, unfortunately without result, any suggestion would be greatly appreciated.
Thanks a lot in advance. Best regards,
-Rainer
I have a weird voodoo-like issue that started out of the blue.
After a standard maintenance reboot one out of two identical VMhosts (PROD + DEV Server, CentOS Linux release 7.4.1708) refused to start and ended up in emergency mode:
Error from journalctl output (relevant lines only):
Code: Select all
Oct 18 17:18:54 localhost.localdomain systemd[1]: Mounted /tmp.
Oct 18 17:18:55 localhost.localdomain systemd[1]: Started Flush Journal to Persistent Storage.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Job dev-disk-by\x2duuid-1b1e5c6a\x2da824\x2d4ec5\x2d951d\x2d977e883a12e6.device/start timed out.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-1b1e5c6a\x2da824\x2d4ec5\x2d951d\x2d977e883a12e6.device.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Dependency failed for /opt.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Dependency failed for Local File Systems.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Dependency failed for Migrate local SELinux policy changes from the old store structure to the new structure.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Job selinux-policy-migrate-local-changes@targeted.service/start failed with result 'dependency'.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Dependency failed for Mark the need to relabel after reboot.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Job rhel-autorelabel-mark.service/start failed with result 'dependency'.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Dependency failed for Relabel all filesystems, if necessary.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Job rhel-autorelabel.service/start failed with result 'dependency'.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Job local-fs.target/start failed with result 'dependency'.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Triggering OnFailure= dependencies of local-fs.target.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Job opt.mount/start failed with result 'dependency'.
Oct 18 17:20:21 localhost.localdomain systemd[1]: Job dev-disk-by\x2duuid-1b1e5c6a\x2da824\x2d4ec5\x2d951d\x2d977e883a12e6.device/start failed with result 'timeout'.
/etc/fstab:
Code: Select all
/dev/mapper/sysvg-rootlv / xfs defaults 0 0
UUID=f3f0652f-a23d-4438-810d-203b54d40b09 /boot xfs defaults 0 0
/dev/mapper/sysvg-tmplv /tmp xfs defaults 0 0
/dev/mapper/sysvg-varlv /var xfs defaults 0 0
/dev/mapper/sysvg-swaplv swap swap defaults 0 0
UUID=1b1e5c6a-a824-4ec5-951d-977e883a12e6 /opt xfs defaults,noauto 0 0
#UUID=1b1e5c6a-a824-4ec5-951d-977e883a12e6 /opt xfs defaults 0 0
Most struggling to me is that the identical DEV system does not show this problem and I am not aware of having changed anything that might cause this.
Since I now already spent more than 2 days searching the web for hints regarding a possible cause and solution, unfortunately without result, any suggestion would be greatly appreciated.
Thanks a lot in advance. Best regards,
-Rainer