GUI gone after nvidia driver install - startx/xinit both fail: (EE) No screens detected

Issues related to applications and software problems
Anner
Posts: 4
Joined: 2018/02/01 09:42:32

GUI gone after nvidia driver install - startx/xinit both fail: (EE) No screens detected

Postby Anner » 2018/02/01 10:21:32

On my clean install of Rocks 7 with CentOS 7, screen/desktop output works fine from the VGA outlet with nouveau drivers.
I have 4 NVIDIA Geforce 1080 Ti's, so I am trying to install the nvidia drivers (didn't instal cuda etc yet)
I tried several different installation procedures, of which some seem to install and load the drivers on reboot fine (see lsmod below)

methods tried:

Current install is last option in above list.
However, every time the GUI breaks down (ctrl+alt+Fx still works, gives me normal terminal)
The screen shows a loading bar with CentOS printed next to it, yet when almost full it breaks down in printing the kernel boot log (or so I think it is? the stuff with [ OK ] loaded/loading/started etc etc)
No clear errors there.

I believe the nouveau drivers are properly blacklisted.

I realized, startx and xinit both fail (giving different output based on different /etc/X11/xorg.conf versions:
startx command, no xorg.conf file:

Code: Select all

Fatal server error:
(EE) no screens found(EE)
##in the log:
[  3144.214] (EE) [drm] Failed to open DRM device for (null): -2 ### Sometimes the (null) is replaced by a correct NVIDIA PCI BusID
[  3144.240] EGL_MESA_drm_image required.
[  3144.241] (EE) modeset(0): glamor initialization failed
[  3144.256] (EE) Failed to initialize GLX extension (Compatible NVIDIA X driver not found)

Sometimes this appears in the terminal as well:" Failed to assign any connected display devices to X screen 0"

According to https://bbs.archlinux.org/viewtopic.php?id=223581 I also tried editing my ~/.xinitrc (It was empty before:)

Code: Select all

xrandr --setprovideroutputsource modesetting NVIDIA-0
xrandr --auto


Thanks anybody for your help.

Reference:

Code: Select all

[root@server ~]# uname -a
Linux server.frontend 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

[root@server ~]# rpm -qa kernel\* | sort
kernel-3.10.0-693.5.2.el7.x86_64
kernel-devel-3.10.0-693.5.2.el7.x86_64
kernel-doc-3.10.0-693.5.2.el7.noarch
kernel-headers-3.10.0-693.5.2.el7.x86_64
kernel-tools-3.10.0-693.5.2.el7.x86_64
kernel-tools-libs-3.10.0-693.5.2.el7.x86_64

[root@server etc]# lsmod | grep nouveau

[root@server etc]# lsmod | grep nvidia
nvidia_drm             39700  0
nvidia_modeset       1087441  1 nvidia_drm
nvidia              14328234  1 nvidia_modeset
drm_kms_helper        159169  2 ast,nvidia_drm
drm                   370825  5 ast,ttm,drm_kms_helper,nvidia_drm
i2c_core               40756  7 ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
ipmi_msghandler        46608  3 ipmi_devintf,nvidia,ipmi_si

[root@server etc]# lspci | grep VGA
02:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
82:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
83:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)




NOT_Allow_Empty.log.zip
/var/log/xorg.0.log
(3.02 KiB) Downloaded 10 times

Attached logfile is for /etc/X11/xorg.conf as such:

Code: Select all

# /etc/X11/nvidia-xorg.conf provided by http://elrepo.org

Section "Device"
        Identifier  "Device0"
        Driver      "nvidia"
        BusID       "PCI:2:0:0"
EndSection
Last edited by Anner on 2018/02/02 05:22:17, edited 2 times in total.

User avatar
toracat
Forum Moderator
Posts: 7240
Joined: 2006/09/03 16:37:24
Location: California, US
Contact:

Re: GUI gone after nvidia driver install - startx/xinit both fail: (EE) No screens detected

Postby toracat » 2018/02/02 00:42:42

according to: Postby agonda » 2017/02/02 08:16:11


I hope you did not cut/paste the commands from that post. You card needs the current driver, not 375xx. It's easiest if you follow the instructions here:

http://elrepo.org/tiki/nvidia-detect
CentOS Forum FAQ

Anner
Posts: 4
Joined: 2018/02/01 09:42:32

Re: GUI gone after nvidia driver install - startx/xinit both fail: (EE) No screens detected

Postby Anner » 2018/02/02 04:12:05

toracat wrote:
according to: Postby agonda » 2017/02/02 08:16:11


I hope you did not cut/paste the commands from that post. You card needs the current driver, not 375xx. It's easiest if you follow the instructions here:

http://elrepo.org/tiki/nvidia-detect


Hi toracat, thanks for your reply.
I should have specified more, sorry, I did check versions with nvidia-detect.
Indeed it said current 384 version required. If I install with nvidia-detect's output 'kmod-nvidia', it install the 390 version though. I just assumed that nvidia-detect wasnt up to date yet with the latest 'current' version.

Should I do a reinstall with the 384 version?

Code: Select all

[root@server ~]# nvidia-detect --xorg
kmod-nvidia

Checking ABI compatibility with Xorg Server...
Xorg Video Driver ABI detected: 23
ABI compatibility check passed

[root@server ~]# nvidia-detect -v
Probing for supported NVIDIA devices...
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 384.111 NVIDIA driver kmod-nvidia
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 384.111 NVIDIA driver kmod-nvidia
[1a03:2000] ASPEED Technology, Inc. ASPEED Graphics Family
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 384.111 NVIDIA driver kmod-nvidia
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 384.111 NVIDIA driver kmod-nvidia

[root@server ~]# yum install kmod-nvidia
Loaded plugins: fastestmirror, langpacks, nvidia
Rocks-7.0                                                | 3.6 kB     00:00     
elrepo                                                   | 2.9 kB     00:00     
elrepo/primary_db                                          | 497 kB   00:00     
Loading mirror speeds from cached hostfile
 * elrepo: ftp.yz.yamagata-u.ac.jp
Package kmod-nvidia-390.25-1.el7_4.elrepo.x86_64 already installed and latest version
Nothing to do

[root@server ~]# yum install kmod-nvidia-3
kmod-nvidia-304xx.x86_64  kmod-nvidia-340xx.x86_64 

chemal
Posts: 394
Joined: 2013/12/08 19:44:49

Re: GUI gone after nvidia driver install - startx/xinit both fail: (EE) No screens detected

Postby chemal » 2018/02/02 05:12:52

ASPEED: That's the graphics chip in your BMC. It doesn't work with the OpenGL stack that the NVIDIA driver installs. But Gnome needs OpenGL these days.

A simple solution is to let the ASPEED drive a text console only. You can use the elrepo package in this case. If you want X11 on the NVIDIA cards, you need a custom xorg.conf.

If you want to keep the stock OpenGL stack so that the ASPEED can do X11 and drive your monitor in graphics mode, and use the NVIDIA cards only for CUDA, you cannot use the elrepo package because it replaces the stock OpenGL stack. It's all or nothing. NVIDIA's installer is also all or nothing (they even replace by overwriting).

Look at RPM Fusion, they've split the whole thing in pieces. This should allow you to keep the stock OpenGL stack and still get everything needed for CUDA. I've never tried it. I'm using the simple solution.
Last edited by chemal on 2018/02/02 16:56:27, edited 1 time in total.

Anner
Posts: 4
Joined: 2018/02/01 09:42:32

Re: GUI gone after nvidia driver install - startx/xinit both fail: (EE) No screens detected

Postby Anner » 2018/02/02 07:11:01

chemal wrote:ASPEED: That's the graphics chip in your BMC. It doesn't work with the OpenGL stack that the NVIDIA driver installs. But Gnome need OpenGL these days.

A simple solution is to let the ASPEED drive a text console only. You can use the elrepo package in this case. If you want X11 on the NVIDIA cards, you need a custom xorg.conf.

If you want to keep the stock OpenGL stack so that the ASPEED can do X11 and drive your monitor in graphics mode, and use the NVIDIA cards only for CUDA, you cannot use the elrepo package because it replaces the stock OpenGL stack. It's all or nothing. NVIDIA's installer is also all or nothing (they even replace by overwriting).

Look at RPM Fusion, they've split the whole thing in pieces. This should allow you to keep the stock OpenGL stack and still get everything needed for CUDA. I've never tried it. I'm using the simple solution.


Hi chemal, thank you very much for your reply.
I had truly no idea about that.
I don't need high graphics whatsoever, but Gnome to work would be nice.

Is this assumption correct:
In the current situation, Gnome should work if I just instruct the kernel to NOT use ASPEED but use any of the Nvidia gpus for GUI rendering? If this is correct, how would I go about bypassing the ASPEED?

Anner
Posts: 4
Joined: 2018/02/01 09:42:32

Re: GUI gone after nvidia driver install - startx/xinit both fail: (EE) No screens detected

Postby Anner » 2018/02/02 10:47:01

RESOLVED!

Thanks to chemal for putting me on the right path!
Used a combo of these two procedures:


Most important, what resolved the issue, is at the bottom of that first URL:
Use the correct nvidia_driver_xx.run file, INCLUDING the flag --no-opengl-files
(and select no when prompted whether you want an nvidia generated xorg.conf file)

chemal
Posts: 394
Joined: 2013/12/08 19:44:49

Re: GUI gone after nvidia driver install - startx/xinit both fail: (EE) No screens detected

Postby chemal » 2018/02/02 16:18:44

Resolved is resolved, but installing

- kmod-nvidia
- xorg-x11-drv-nvidia-cuda
- xorg-x11-drv-nvidia-cuda-libs

from RPM Fusion should give you the same in a properly packaged way which will also survive kernel updates.