CentOS 7 fails to boot after updating to kernel-3.10.0-693.11.6.el7.x86_64

General support questions
desertcat
Posts: 843
Joined: 2014/08/07 02:17:29
Location: Tucson, AZ

Re: CentOS 7 fails to boot after updating to kernel-3.10.0-693.11.6.el7.x86_64

Post by desertcat » 2018/01/17 12:20:09

paco wrote:Hi desertcat,
861MB of free space.
Hummmm Curiouser and curiouser. This is starting to sound like a boot problem. For this next part you are going to need a Utility Disk such as KNOPPIX that will allow you to mount the /boot and /root (under which you should find /etc) partitions. There are 3 files that you need to LOOK at: /boot/grub2/grub.cfg; /etc/default/grub; and /etc/grub.d A quick place to start is to view /etc/grub.d You *should* see something that looks like this:

These are the files that you *should* have correctly id'ed:
*00_header
*00_tuned
*01_users
*10_linux
*20_linux_xen
*20_ppc_terminfo
*30_os-prober
*40_custom
*41_custom


Sometimes these file numbers get out of whack. Here is where you will find things moved, renumbered, duplicated, or omitted. These files should have a chmod of 755 ( -rwxr-xr-x), and be owned by root:root. I just had this this problem trying to dual boot one machine into the other and then vice versa. If you see this list out of order or renumbered, then the chances are good your boot config file has been toasted.

See if you have a file in /boot, /etc/default, or /etc/grub.d called /backup.

Apparently this is and old problem in as much you are told to in /etc/default/backup under RESTORE_INSTRUCTIONS:

How to restore this backup
--------------------------
* make sure you have root permissions (`gksu nautilus` or `sudo -s` on command line) otherwise you won't be able to copy the files
* to fix an unbootable configuration, just copy:
* '/etc/grub.d/backup/boot_grub' to'/boot/grub2'
* to reset the whole configuration (if it cannot be fixed by using grub customizer), also copy these files:
* '/etc/grub.d/backup/etc_grub_d' to /etc/grub.d'
* '/etc/grub.d/backup/default_grub' to '/etc/default/grub'

Most of these files are throwbacks to a virgin system and lack any customizations you may have made between install and the point where you become toasted. The files that are most likely to become mangled are found in /etc/grub.d which interact with the files in /boot/grub2/grub.cfg referenced above.

Referencing the above list which show the config files names and the correct order in which they should appear, see if yours matches this list. IF you see say *10_linux sporting a new name or is numbered like *31_linux, or worse still duplicated, the procedure is to eliminate one of the two IF THEY ARE THE EXACT DUPLICATE OF EACH OTHER and then simply rename the the misidentified back to *10_linux you would then do the same for each config file until you have replicated the list. WARNING: If you have NOT messed with these config files and/or know what the contents should look like DO NOT ATTEMPT!!!! Once you have the list back in order you then run grub2-mkconfig -o /boot/grub2/grub.cfg After that reboot the machine and cross your fingers.

If however your list matches the supplied list you have eliminated one suspect.

Two final questions:

1) Can you still boot into the previous kernel or is that too whacked?

2) Did you change anything?!? Pulled cards, inserted cards, etc., etc., etc. If so just for fun run grub2-mkconfig -o /boot/grub2/grub.cfg It should tell you if the machine is fine or it sees ERRORS. If it sees ERRORS that may -- or may NOT (mine were very cryptic and did not make a lot of sense) -- point you in a direction.

User avatar
TrevorH
Site Admin
Posts: 33215
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS 7 fails to boot after updating to kernel-3.10.0-693.11.6.el7.x86_64

Post by TrevorH » 2018/01/17 13:27:15

How about ... ignore all that and try something simple. Does it boot in command line only mode? Interrupt the boot at the grub menu and hit 'e' to edit the current kernel, scroll down to find the linux16/linuxefi line for the current 3.10.0-693.11.6 kernel and append a space followed by the digit 3 to teh end of that and then hit Ctrl-x to boot up using it. If you get to a command line login prompt then your problem is video drivers.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

desertcat
Posts: 843
Joined: 2014/08/07 02:17:29
Location: Tucson, AZ

Re: CentOS 7 fails to boot after updating to kernel-3.10.0-693.11.6.el7.x86_64

Post by desertcat » 2018/01/17 19:24:08

TrevorH wrote:How about ... ignore all that and try something simple. Does it boot in command line only mode? Interrupt the boot at the grub menu and hit 'e' to edit the current kernel, scroll down to find the linux16/linuxefi line for the current 3.10.0-693.11.6 kernel and append a space followed by the digit 3 to teh end of that and then hit Ctrl-x to boot up using it. If you get to a command line login prompt then your problem is video drivers.
Trevor you might be RIGHT and it is his Video drivers, which would be my next suspect. Still it does NOT make sense WHY a simply kernel update should take a working system and make it non-working. I have encountered this only once when the video driver and the kernel release were out of sync, and fixed as soon as a new kernel and video driver updates were released. The solution was to drop back one kernel release, punt, and wait out until the next round of updates. Yet according to him BOTH the kernel AND the video driver updates were released simultaneously. Buggy driver release? Possible. Buggy kernel release? Possible. OTOH if this is a boot problem it will be readily seen by looking at /etc/grub.d. I just went through this last week. That the system booted at all was amazing, but it was seriously jacked. The problem was apparent as soon as I looked at the /etc/grub.d table it was so scrambled. My great "Ah HA!! moment. Now what was the cause of the scrambling?? No idea. Once I fixed the table, it worked like a fine Swiss watch.

paco
Posts: 12
Joined: 2018/01/08 12:06:23

Re: CentOS 7 fails to boot after updating to kernel-3.10.0-693.11.6.el7.x86_64

Post by paco » 2018/01/17 21:59:22

TrevorH wrote:How about ... ignore all that and try something simple. Does it boot in command line only mode? Interrupt the boot at the grub menu and hit 'e' to edit the current kernel, scroll down to find the linux16/linuxefi line for the current 3.10.0-693.11.6 kernel and append a space followed by the digit 3 to teh end of that and then hit Ctrl-x to boot up using it. If you get to a command line login prompt then your problem is video drivers.
Hey Trevor,
Yes I did get to a command line login prompt at the end. Great, thanks!
I want to remember that we are talking about just the rescue mode which is the only one kernel that never gone to black passed the grub menu.
So I did:

Code: Select all

yum remove nvidia-x11-drv.x86_64 nvidia-detect.x86_64 kmod-nvidia.x86_64 yum-plugin-nvidia.noarch
Now the rescue mode is working fine :) I tried to start with the other 2 kernel but nothing they still going to black after the grub menu stage.
Great, this is a good step.
Any other suggestions?

Thanks a lot guys!!

User avatar
TrevorH
Site Admin
Posts: 33215
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS 7 fails to boot after updating to kernel-3.10.0-693.11.6.el7.x86_64

Post by TrevorH » 2018/01/17 23:29:29

If you're using the latest kmod-nvidia then it will only work with the 7.4 kernel series. However the "rescue" kernel is put there at the time of the initial install and will probably not be a 7.4 series kernel (3.10.0-693*) so will not work with kmod-nvidia for 7.4.

First, how often will you want to use the rescue kernel at all? Second, will you ever need it in graphical mode? If the answers are - as I suspect - "not very often" and "no" then you could probably just edit /boot/grub2/grub.cfg and scroll down until you find the stanza there for the rescue kernel and append a space followed by the digit 3 so that it boots into command line only mode. That'll default the rescue kernel to booting in command line only mode and you don't need to worry about the mismatched kernel/video driver problem.

If you need a GUI in rescue mode then I think you have to do some stuff that I haven't found any doc for (though that may just mean I didn't look very hard) and that's to rebuild the rescue kernel with a 7.4 version.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

desertcat
Posts: 843
Joined: 2014/08/07 02:17:29
Location: Tucson, AZ

Re: CentOS 7 fails to boot after updating to kernel-3.10.0-693.11.6.el7.x86_64

Post by desertcat » 2018/01/18 08:06:54

Now the rescue mode is working fine :) I tried to start with the other 2 kernel but nothing they still going to black after the grub menu stage.
Great, this is a good step.
Any other suggestions?

Thanks a lot guys!!
So.... let's see:

You tried to boot the OLD kernel -- that one that *use* to work before you did the kernel update, but now no longer works ("goes to black"), correct?

Qestion: You say it "hangs" a certain point, Correct? Just to make sure we've covered all bases, how long do you give it before you give it the old heave ho? It might be a "timed out" problem ie it has not "timed out" which can take 15-20 minutes before it continues on booting the rest of the system. ntpd which deals with Network Time Protocol is referenced at startup. It is frequently a villain in hangs, after it times out the computer then goes on to boot as normal. If you conclude that it is "hung" after only after a few minutes it might be worth while to give it 30 minutes to see if it is still hung, or if it has "timed out" and carried on to boot.

You removed the latest version kmod-nvidia, correct?

Question: Did kernel-3.10.0-693.11.1.el7.x86_64 boot the system before you updated to the 396.11.6 w/ the latest kmod-nvidia update -- just making sure we are on the same page here. Here TrevorH might have some ideas, but the idea that comes to mind -- and I have no idea *how* it could be done -- but you could try to re-install the *last* kmod-nvidia and try to boot both 396.11.1 and 396.11.6. You are *probably* not booting the machine currently since you are lacking the kmod-nvidia driver, nor can you run yum install kmod-nvidia since it will simply install the latest version of kmod-nvidia which was the one you just removed. In *theory* at least the 396.11.1 kernel would boot the machine, if it also boots the .11.6 kernel so much the better, the problem would be a buggy nvidia driver. To update the machine and not have a repeat performance you would run from the CLI yum update --exclude=kmod-nvidia (I think that is the command).

Again ask TrevorH that if there is no way to roll back to the previous kmod-nvidia, if it is possible to install it from your install disk, and if there *might* be a way to *just* install that particular driver from the install disk -- not sure if that could be done.

If desperate enough, and as a last resort assuming there is nothing of value on the install, bite the bullet, and do a total reinstall of the OS. Make *sure* it boots and that it boots the default kernel -- which is likely to be 693.1.1 or 1.2. Once you are *sure* you can boot the base system then run yum update --exclude=kmod-nvidia which should update all the packages *except* for kmod-nivia.

If you have access to the CLI it is still worth a quick peek of /etc/grub.d as well as /boot/grub2/grub.cfg to make sure it is NOT a scrambled table. I tend to now agree with TrevorH the problem now *seems* to be the kmod-nvidia driver. At this point I'm tapped out of ideas, your problem is way beyond my pay grade. Sorry.

On a POSITIVE note: Think how much FUN you are having in learning Linux. If everything went smooth as silk you'd learn nothing, you only learn something in trying to figure out the occasional head scratching problem. This is indeed a head scratcher. Again HAVE FUN!!

Best,

Desertcat

User avatar
TrevorH
Site Admin
Posts: 33215
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS 7 fails to boot after updating to kernel-3.10.0-693.11.6.el7.x86_64

Post by TrevorH » 2018/01/18 09:38:48

CentOS kernels use the RHEL "stable KABI" so that modules can be built against one kernel and run with a newer one. That allows kmod packages to survive kernel updates. All of which works fine until Redhat change the "stable KABI" and then it all stops working and the kmod needs rebuilding against the newer kernel. This usually only happens at point releases though it has happened in mid-flight too.

A kmod built for one kernel KABI will not work and will lead to a black screen if it's used on a kernel with a different KABI.

The rescue kernel is installed and the initramfs is built for it at the time of the original install so if you installed first on 7.0.1406 then the rescue kernel will be the one from that install. A kmod built for 7.4.1708 will not work on a 7.0.1406 kernel and vice versa. To run with the kmod on the current kernel will need you to run the latest kmod-nvidia and that kmod will not work on a 7.0 kernel.

You cannot have your cake and eat it. You have to pick which one you want to work - the latest kernel or the rescue one. The only way around this would be to update the rescue kernel to a 7.4 one.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

paco
Posts: 12
Joined: 2018/01/08 12:06:23

Re: CentOS 7 fails to boot after updating to kernel-3.10.0-693.11.6.el7.x86_64

Post by paco » 2018/01/18 10:03:43

Hy guys,
once I removed nvidia-x11-drv.x86_64 nvidia-detect.x86_64 kmod-nvidia.x86_64 yum-plugin-nvidia.noarch didn't install them back again, the rescue kernel is working without any kmod-nvidia, they are no longer installed.
Thank you!!

Post Reply