Can I please has a working install? wtf

General support questions
Something_Witty
Posts: 4
Joined: 2018/03/20 18:46:01

Can I please has a working install? wtf

Post by Something_Witty » 2018/03/20 18:50:04

I've done this hundreds of times, so yeah, why should it just keep working?

Trying to install fresh copy of centos7 on a server here
server: nvme intel ssd, nvidia titan xp, 50gbe mellanox
as far as I can tell, I've installed this in xfs and ext4 mode and the install goes great, asks for reboot and when it comes back up it eats feces.

xfs corrupt
memory errors
nouveau errors (nvidia drivers, but that never stopped my other intalls)
no controller found (oh wow, really? nice!)
metadata errors
failed to start udev kernel device manager
dependency failed
python failed? lol nice one!

ohh.. but the best part was when I finally got it to a shell, I logged in, but no command was found.


Woot woot!!!!!!!!

and a slew of other garbage.

I've already run through forum posts going back years, and nothing too helpful popped up. Can anyone propose something?


Thanks.

User avatar
avij
Retired Moderator
Posts: 3046
Joined: 2010/12/01 19:25:52
Location: Helsinki, Finland
Contact:

Re: Can I please has a working install? wtf

Post by avij » 2018/03/20 19:04:14

This smells like faulty hardware. Let's start with the easiest diagnostics first. You mentioned "memory errors". You can launch memtest86+ from the CentOS 7 install media. Select Troubleshooting from the main menu, then Run a memory test. You should let this run overnight.

Something_Witty
Posts: 4
Joined: 2018/03/20 18:46:01

Re: Can I please has a working install? wtf

Post by Something_Witty » 2018/03/20 20:22:41

memtest is fine
hardware is fine

this is a one week-old super-micro server that came with windows pre-installed, and i wiped it clean to put centos7 on it.

i just installed centos in ext4 mode, without any xfs crap, and it booted up after reboot (although it took lots of dumps and diarrhea, screen flashing, had a few heart-attacks, also coughed up a ton of errors, and then finally made its way into the os login shell). great right? wrong...

I ssh'd in, provisioned the server, ran updates, set all my configs, installed drivers, and rebooted the server.

Bam, everything gone, and only thing that pops up is the UEFI shell. so, what happened? Where is my os? umm.. yeah, so its not anywhere in UEFI (so it got installed as legacy?), and it wont boot as legacy or uefi. Great... so . ..

Now...

I'm trying another usb drive installing in legacy mode with all uefi stuff off in the bios atm, but this really has to be one of the absolutely worst processes I've seen in 15 years of working in i.t.

User avatar
avij
Retired Moderator
Posts: 3046
Joined: 2010/12/01 19:25:52
Location: Helsinki, Finland
Contact:

Re: Can I please has a working install? wtf

Post by avij » 2018/03/20 20:44:12

Perhaps you should pay closer attention to the error messages you see during installation and first boot. They may give you some clues for what might be wrong with the system.

Something_Witty
Posts: 4
Joined: 2018/03/20 18:46:01

Re: Can I please has a working install? wtf

Post by Something_Witty » 2018/03/20 21:01:10

Here are some highlights

[FAILED] Failed to start udev Coldplug all Devices.
See 'systemctl status systemd-udev-trigger.service' for details

[FAILED] Failed to start udev Kernel Device Manager.
...
...



Pre-Install from the USB drive, I get a quick message that says
"no controller found"


Keep in mind, I have no shell, no access to even the grub, nothing. Install goes well, then all hell breaks loose with error messages. .... If it's a harwdare compatibility issue, it's strange why it boots to the usb installer, installs everything fine, and then on a reboot it acts like nothing got installed?

Second... if it was the hdd controller (nvme intel), why would it be able to install TO it, but not boot from it? If it was the nvidia nouveaux issue why would it prevent os boot?


Looks like it's time to call the folks at thinkmate and ask them if they know of any hardware compatibility issues and centos7...

User avatar
avij
Retired Moderator
Posts: 3046
Joined: 2010/12/01 19:25:52
Location: Helsinki, Finland
Contact:

Re: Can I please has a working install? wtf

Post by avij » 2018/03/20 21:19:18

The "no controller found" may be referring to the 8042 controller, which is the PS/2 keyboard controller. Perhaps you don't have a PS/2 port on your system. This is likely harmless.

You could try a minimal install (don't install a GUI) to see if the problems are related to the display adapter.

Something_Witty
Posts: 4
Joined: 2018/03/20 18:46:01

Re: Can I please has a working install? wtf

Post by Something_Witty » 2018/03/20 21:36:11

That's interesting because the only time it did boot after install was when I did a minimal install. But after that.. when I went in and ran yum updates and installed drivers, on reboot it stopped booting to the os and instead gave me the supermicro UEFI shell.

Will try that again and not run updates, will try to see what the heck it is actually complaining about.

I'm at a loss here, spent my whole day on this and other things have piled up. :shock:

User avatar
avij
Retired Moderator
Posts: 3046
Joined: 2010/12/01 19:25:52
Location: Helsinki, Finland
Contact:

Re: Can I please has a working install? wtf

Post by avij » 2018/03/20 21:52:40

For troubleshooting purposes, try skipping the "install drivers" part for now and don't enable any additional repositories. Chances are that this is caused by some third party package.

If you, for example, installed some odd driver for your NVMe storage after a successful installation, don't be surprised if your storage starts acting up.

desertcat
Posts: 843
Joined: 2014/08/07 02:17:29
Location: Tucson, AZ

Re: Can I please has a working install? wtf

Post by desertcat » 2018/03/21 00:17:51

Something_Witty wrote:memtest is finehardware is fine

Bam, everything gone, and only thing that pops up is the UEFI shell. so, what happened? Where is my os? umm.. yeah, so its not anywhere in UEFI (so it got installed as legacy?), and it wont boot as legacy or uefi. Great... so . ..

Now...

I'm trying another usb drive installing in legacy mode with all uefi stuff off in the bios atm, but this really has to be one of the absolutely worst processes I've seen in 15 years of working in i.t.
WARNING!! WARNING!! WARNING!! The second you mentioned UEFI that sends BIG RED FLAGS. A trivia question: Can you choose a STRAIGHT BIOS (Legacy BIOS) or are you stuck with UEFI?? If you can select Legacy BIOS pick that. You will also probably have to do either a custom install or a modified Custom install and select /boot NOT /boot/efi and make sure that /boot = at least 1 GB (this is based on RHEL's recommendation of at least 1GB).

There is something about UEFI and Linux and CentOS 7 in particular that does not play well with others. Once you have it installed and before you boot the machine you are going to have to go into BIOS and set the default device to a BIOS device rather than a UEFI device which is the DEFAULT. After that SAVE the configuration and reboot the machine and it *should* come up just fine. If it sounds like I've been there and done that you would be correct -- pulled my hair out for two days before I stumbled across the solution.

Now *if* you are stuck with UEFI... "vaya con Dios"... you really might need Divine Help.

User avatar
avij
Retired Moderator
Posts: 3046
Joined: 2010/12/01 19:25:52
Location: Helsinki, Finland
Contact:

Re: Can I please has a working install? wtf

Post by avij » 2018/03/21 07:24:28

No, I don't think this is caused by UEFI.

Post Reply