Kernel update halts the system on boot

SS2000 · Post by **SS2000** » 2019/10/10 14:07:10

Hi,
My 7.7.1908 system hangs after the latest kernel update to 3.10.0-1062.1.2.el7.x86_64.
No problem loading if I were to select the previous version (1062.1.1)
How can I fix this? Where should I look for? [Is it time to go for CentOS 8?]
Any help will be greatly appreciated.
Thanks

tunk · Post by **tunk** » 2019/10/10 14:13:34

In the grub menu screen, press "e" to edit and then remove "rhgb quiet".
This may give you more info.

Post by **TrevorH** » 2019/10/10 14:18:56

Probably the first thing to check is if the initramfs file for that kernel version exists in /boot and is also mentioned in grub.cfg. There is a problem that happens if yum is interrupted in mid-run between install and cleanup phases where the package can be installed but the initramfs file is generated later and the grub.cfg amendment at a different stage. If this is the case then the easiest solution is to yum reinstall kernel-3.10.0-1062.1.2.el7.x86_64

PCrighton2 · Post by **PCrighton2** » 2019/10/10 18:07:54

I have the same problem and I don't have the initramfs file for the new kernel.
I have tried using the previous option in the boot menu, but that doesn't boot properly. Although I can get to the command line, I have no network loaded (looking through the results of journalctl -xb, there is a line "3x59x: module verification failed: signature and/or require" which might mean that the network driver is not properly running) so when I try to re-install the kernel that also fails.

Is there a way to re-install the kernel without network? Maybe a USB stick?

SS2000 · Post by **SS2000** » 2019/10/10 23:04:05

TrevorH wrote: ↑
2019/10/10 14:18:56
Probably the first thing to check is if the initramfs file for that kernel version exists in /boot and is also mentioned in grub.cfg. There is a problem that happens if yum is interrupted in mid-run between install and cleanup phases where the package can be installed but the initramfs file is generated later and the grub.cfg amendment at a different stage. If this is the case then the easiest solution is to yum reinstall kernel-3.10.0-1062.1.2.el7.x86_64

Thanks.
The file exists and it is also referred to in the grub.cfg. I tried your suggestion of re-install, but no joy.
[My updates are looked after via a cron job anyway. So, no accidental interruption or otherwise.]
So, what next?
Do you think I should try to rebuild the initramfs file as described here:
https://wiki.centos.org/TipsAndTricks/CreateNewInitrd
NB.: I have aslo tried to install an upgrade kernel from ELRepo, version 5.3.5-1.el7.elrepo, which also does not load. So I am stuck now.

Post by **TrevorH** » 2019/10/11 00:02:49

SS2000: I think you should try booting it again only this time, when you see the grub menu allowing you to choose which kernel to boot, hit 'e' to edit it then find the line that loads the kernel - it'll either be "linux16" or "linuxefi". On the list of parameters there at the moment you probably have 'rhgb' and 'quiet', remove them both then hit ctrl-x to boot using the amended params.

This won't fix the problem but it will now show you the boot as it happens and you can watch for what the error is. You might want to video it as it boots so you can play it back more slowly.

Post by **TrevorH** » 2019/10/11 00:04:49

there is a line "3x59x: module verification failed: signature and/or require

Do you have a really ancient 3Com network card in the machine? If not then it's not related. If it's complaining about the signature then it sounds like you are using secure boot and haven't imported the ELRepo key into your secure boot list of keys. ELRepo have a help page about how to do so.

SS2000 · Post by **SS2000** » 2019/10/12 01:07:18

TrevorH wrote: ↑
2019/10/11 00:02:49
SS2000: I think you should try booting it again only this time, when you see the grub menu allowing you to choose which kernel to boot, hit 'e' to edit it then find the line that loads the kernel - it'll either be "linux16" or "linuxefi". On the list of parameters there at the moment you probably have 'rhgb' and 'quiet', remove them both then hit ctrl-x to boot using the amended params.

This won't fix the problem but it will now show you the boot as it happens and you can watch for what the error is. You might want to video it as it boots so you can play it back more slowly.

Thank you. The option of editing it is not there on my system [...that's strange for me as I've seen it/use it before]
Anyhow, it was in the /etc/default/grub file that you could remove the quiet/rhgb bit. After recompiling the config [as per here:https://wiki.centos.org/HowTos/Grub2] I have rebooted and got the journal. It contains over 3000 lines. Admittedly, not all belong to this error.
However, I have no idea/knowledge/expertise as to what to look for.
Helpppppppppp please?

SS2000 · Post by **SS2000** » 2019/10/14 00:58:10

I've got a snapshot of the journal here with the help from this site:
https://www.vultr.com/docs/centos-7-and ... leshooting
However, still no idea to as to what to look for/fix:
------------------------------------
-- Logs begin at Sat 2019-10-12 11:17:06 BST, end at Mon 2019-10-14 01:47:59 BST. --
Oct 14 02:21:31 case99 kernel: sd 4:0:0:0: [sdc] No Caching mode page found
Oct 14 02:21:31 case99 kernel: sd 4:0:0:0: [sdc] Assuming drive cache: write through
Oct 14 02:21:32 case99 kernel: sd 5:0:0:0: [sdd] No Caching mode page found
Oct 14 02:21:32 case99 kernel: sd 5:0:0:0: [sdd] Assuming drive cache: write through
Oct 14 01:21:33 case99 systemd-udevd[628]: invalid key/value pair in file /usr/lib/udev/rules.d/59-fc-wwpn-id.rules on line 10, starting at character 26 (';')
Oct 14 01:21:33 case99 systemd-udevd[628]: invalid key/value pair in file /usr/lib/udev/rules.d/59-fc-wwpn-id.rules on line 11, starting at character 29 (';')
Oct 14 01:21:33 case99 systemd-udevd[628]: invalid key/value pair in file /usr/lib/udev/rules.d/59-fc-wwpn-id.rules on line 12, starting at character 25 (';')
Oct 14 01:21:37 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2018-09-17-23:16:31' is not a problem directory
Oct 14 01:21:38 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2018-09-15-01:44:49.new' is not a problem directory
Oct 14 01:21:38 case99 smartd[1161]: Device: /dev/sdd [SAT], no ATA CHECK POWER STATUS support, ignoring -n Directive
Oct 14 01:21:38 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2017-08-19-23:14:02' is not a problem directory
Oct 14 01:21:39 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2018-09-15-20:44:42' is not a problem directory
Oct 14 01:21:39 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2018-09-16-15:28:37' is not a problem directory
Oct 14 01:21:40 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2018-09-18-12:32:30' is not a problem directory
Oct 14 01:21:40 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2018-09-14-17:43:06' is not a problem directory
Oct 14 01:21:41 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2018-09-17-15:23:32' is not a problem directory
Oct 14 01:21:41 case99 systemd[1548]: Failed at step EXEC spawning /home/acunetix/.acunetix_trial/start.sh: No such file or directory
Oct 14 01:21:41 case99 rsyslogd[1559]: imjournal: fscanf on state file `/var/lib/rsyslog/imjournal.state' failed [v8.24.0-41.el7_7 try http://www.rsyslog.com/e/2027 ]
Oct 14 01:21:41 case99 rsyslogd[1559]: imjournal: ignoring invalid state file [v8.24.0-41.el7_7]
Oct 14 01:21:41 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2018-09-17-13:06:09' is not a problem directory
Oct 14 01:21:41 case99 nmbd[1571]: [2019/10/14 01:21:41.923209, 0] ../lib/util/become_daemon.c:138(daemon_ready)
Oct 14 01:21:41 case99 nmbd[1571]: daemon_ready: STATUS=daemon 'nmbd' finished starting up and ready to serve connections
Oct 14 01:21:41 case99 systemd[1]: Failed to start Dovecot IMAP/POP3 email server.
Oct 14 01:21:42 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2017-09-08-01:00:31' is not a problem directory
Oct 14 01:21:42 case99 smbd[1657]: [2019/10/14 01:21:42.458967, 0] ../lib/util/become_daemon.c:138(daemon_ready)
Oct 14 01:21:42 case99 smbd[1657]: daemon_ready: STATUS=daemon 'smbd' finished starting up and ready to serve connections
Oct 14 01:21:42 case99 abrtd[1058]: '/var/spool/abrt/vmcore-127.0.0.1-2018-09-23-12:51:12' is not a problem directory
Oct 14 01:21:45 case99 dnsmasq[2118]: bad address at /etc/hosts line 5
lines 1-29/341 5%
-----------------------------

Post by **TrevorH** » 2019/10/14 01:06:30

I'd start by fixing this:

Oct 14 01:21:45 case99 dnsmasq[2118]: bad address at /etc/hosts line 5

and this one looks like you deleted a directory and left its systemd unti file and/or sysv startup script in place.

Oct 14 01:21:41 case99 systemd[1548]: Failed at step EXEC spawning /home/acunetix/.acunetix_trial/start.sh: No such file or directory

You need to find out where it comes from, maybe in /etc/init.d/, maybe in /etc/systemd/system or /usr/lib/systemd/system.

None of that looks like it would stop a system from booting though.

CentOS

Kernel update halts the system on boot

Kernel update halts the system on boot

Re: Kernel update halts the system on boot

Re: Kernel update halts the system on boot

Re: Kernel update halts the system on boot

Re: Kernel update halts the system on boot

Re: Kernel update halts the system on boot

Re: Kernel update halts the system on boot

Re: Kernel update halts the system on boot

Re: Kernel update halts the system on boot

Re: Kernel update halts the system on boot