nvidia Kernel Module Loading Isuue

Issues related to hardware problems
dhungyel
Posts: 8
Joined: 2011/12/09 15:30:36
Location: Bhutan

nvidia Kernel Module Loading Isuue

Post by dhungyel » 2011/12/09 16:07:02

Hello All

This is my OP and unfortunately one that requests some assistance in solving a issue with regards to nvidia drivers. I have an old Dell Precision 650 machine lying around and thought of making it useful for my kids to play around by loading CentOS 6. Here are the details:

Hardware: nVidia Quadro4 900 XGL
CentOS 6 with Kernel:
Linux localhost.localdomain 2.6.32-131.17.1.el6.i686 #1 SMP Thu Oct 6 17:25:25 BST 2011 i686 i686 i386 GNU/Linux

As per the informaiton provided by nVidia, the card is supported through legacy drivers. So I installed the following from ELRepo:

kmod-nvidia-96xx & nvidia-x11-drv-96xx (as suggested here

After the install of the nvidia drivers from ELRepo-testing, I notices that the kernel options were changed to blacklist nouveau. Also "nomodeset" was added for the nouveau driver. The xorg.conf file had also been modified. However, the module fails to load. Logs attached:

Xorg.0.log

X.Org X Server 1.7.7
Release Date: 2010-05-04
X Protocol Version 11, Revision 0
Build Operating System: c6b5 2.6.32-44.2.el6.x86_64
Current Operating System: Linux localhost.localdomain 2.6.32-131.17.1.el6.i686 #
1 SMP Thu Oct 6 17:25:25 BST 2011 i686
Kernel command line: ro root=UUID=f7813f0d-c92b-46d6-bcbb-1a932212f3c3 rd_NO_LUK
S rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOAR
DTYPE=pc KEYTABLE=us rhgb quiet nouveau.modeset=0 rdblacklist=nouveau
Build Date: 20 July 2011 10:58:48AM
Build ID: xorg-x11-server 1.7.7-29.el6
Current version of pixman: 0.18.4
Before reporting problems, check https://www.redhat.com/apps/support/
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Fri Dec 9 21:19:43 2011
(==) Using config file: "/etc/X11/xorg.conf"
(==) ServerLayout "Layout0"
(**) |-->Screen "Screen0" (0)
(**) | |-->Monitor "Monitor0"
(**) | |-->Device "Device0"
(**) |-->Input Device "Keyboard0"
(**) |-->Input Device "Mouse0"
(==) Automatically adding devices
(==) Automatically enabling devices
(**) FontPath set to:
/usr/share/fonts/default/Type1,
catalogue:/etc/X11/fontpath.d,
built-ins
(**) ModulePath set to "/usr/lib/xorg/modules/extensions/nvidia,/usr/lib/xorg/modules"
(WW) AllowEmptyInput is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be di
sabled.
(WW) Disabling Keyboard0
(WW) Disabling Mouse0
(II) Loader magic: 0x821a560
(II) Module ABI versions:
X.Org ANSI C Emulation: 0.4
X.Org Video Driver: 6.0
X.Org XInput driver : 7.0
X.Org Server Extension : 2.0
(++) using VT number 1

(--) PCI:*(0:1:0:0) 10de:0258:10de:0138 nVidia Corporation NV25GL [Quadro4 900 XGL] rev
163, Mem @ 0xfc000000/16777216, 0xf0000000/134217728, 0xeff80000/524288, BIOS @ 0x??????
??/131072
(II) LoadModule: "extmod"
(II) Loading /usr/lib/xorg/modules/extensions/libextmod.so
(II) Module extmod: vendor="X.Org Foundation"
compiled for 1.7.7, module version = 1.0.0
Module class: X.Org Server Extension
ABI class: X.Org Server Extension, version 2.0
(II) Loading extension SELinux
(II) Loading extension MIT-SCREEN-SAVER
(II) Loading extension XFree86-VidModeExtension
(II) Loading extension XFree86-DGA
(II) Loading extension DPMS
(II) Loading extension XVideo
(II) Loading extension XVideo-MotionCompensation
(II) Loading extension X-Resource
(II) LoadModule: "dbe"
(II) Loading /usr/lib/xorg/modules/extensions/libdbe.so
(II) Module dbe: vendor="X.Org Foundation"
compiled for 1.7.7, module version = 1.0.0
Module class: X.Org Server Extension
ABI class: X.Org Server Extension, version 2.0
(II) Loading extension DOUBLE-BUFFER
(II) LoadModule: "glx"
(II) Loading /usr/lib/xorg/modules/extensions/nvidia/libglx.so
(II) Module glx: vendor="NVIDIA Corporation"
compiled for 4.0.2, module version = 1.0.0
Module class: X.Org Server Extension
(II) NVIDIA GLX Module 96.43.19 Wed Oct 27 19:20:11 PDT 2010
(II) Loading extension GLX
(II) LoadModule: "record"
(II) Loading /usr/lib/xorg/modules/extensions/librecord.so
(II) Module record: vendor="X.Org Foundation"
compiled for 1.7.7, module version = 1.13.0
Module class: X.Org Server Extension
ABI class: X.Org Server Extension, version 2.0
(II) Loading extension RECORD
(II) LoadModule: "dri"
(II) Loading /usr/lib/xorg/modules/extensions/libdri.so
(II) Module dri: vendor="X.Org Foundation"
compiled for 1.7.7, module version = 1.0.0
ABI class: X.Org Server Extension, version 2.0
(II) Loading extension XFree86-DRI
(II) LoadModule: "dri2"
(II) Loading /usr/lib/xorg/modules/extensions/libdri2.so
(II) Module dri2: vendor="X.Org Foundation"
compiled for 1.7.7, module version = 1.1.0
ABI class: X.Org Server Extension, version 2.0
(II) Loading extension DRI2
(II) LoadModule: "nvidia"
(II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
(II) Module nvidia: vendor="NVIDIA Corporation"
compiled for 4.0.2, module version = 1.0.0
Module class: X.Org Video Driver
(II) NVIDIA dlloader X Driver 96.43.19 Wed Oct 27 19:07:40 PDT 2010
(II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
(II) Primary Device is: PCI 01@00:00:0
(II) Loading sub module "fb"
(II) LoadModule: "fb"
(II) Loading /usr/lib/xorg/modules/libfb.so
(II) Module fb: vendor="X.Org Foundation"
compiled for 1.7.7, module version = 1.0.0
ABI class: X.Org ANSI C Emulation, version 0.4
(II) Loading sub module "ramdac"
(II) LoadModule: "ramdac"
(II) Module "ramdac" already built-in
(**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
(==) NVIDIA(0): RGB weight 888
(==) NVIDIA(0): Default visual is TrueColor
(==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
(**) NVIDIA(0): Enabling RENDER acceleration
(II) NVIDIA(0): Support for GLX with the Damage and Composite X extensions is
(II) NVIDIA(0): enabled.
(EE) NVIDIA(0): Failed to load the NVIDIA kernel module!
(EE) NVIDIA(0): *** Aborting ***
(II) UnloadModule: "nvidia"
(II) UnloadModule: "fb"
(EE) Screen(s) found, but none have a usable configuration.

Fatal server error:
no screens found
-----------------------------------------------------------------------------------------------

xorg.conf

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig: version 1.0 (buildmeister@builder75) Wed Oct 27 19:20:23 PDT 2010

Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0" 0 0
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection

Section "Files"
ModulePath "/usr/lib/xorg/modules/extensions/nvidia"
ModulePath "/usr/lib/xorg/modules"
FontPath "/usr/share/fonts/default/Type1"
EndSection

Section "InputDevice"

# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/input/mice"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

# generated from data in "/etc/sysconfig/keyboard"
Identifier "Keyboard0"
Driver "kbd"
Option "XkbLayout" "us"
Option "XkbModel" "pc105"
EndSection

Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 30.0 - 110.0
VertRefresh 50.0 - 150.0
Option "DPMS"
EndSection

Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
EndSection

Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
SubSection "Display"
Depth 24
Modes "1600x1200" 1024x768" "800x600" "640x480"
EndSubSection
EndSection
-------------------------------------------------------------------------------

I know this topic is discussed in other threads but mine seems to be unique problem. Any help would be much appreciated.

Thanks.

--
Dhungyel

NedSlider
Forum Moderator
Posts: 2897
Joined: 2005/10/28 13:11:50
Location: UK

nvidia Kernel Module Loading Isuue

Post by NedSlider » 2011/12/09 23:13:27

[quote]
dhungyel wrote:
Hello All

This is my OP and unfortunately one that requests some assistance in solving a issue with regards to nvidia drivers. I have an old Dell Precision 650 machine lying around and thought of making it useful for my kids to play around by loading CentOS 6. Here are the details:

Hardware: nVidia Quadro4 900 XGL
CentOS 6 with Kernel:
Linux localhost.localdomain 2.6.32-131.17.1.el6.i686 #1 SMP Thu Oct 6 17:25:25 BST 2011 i686 i686 i386 GNU/Linux

As per the informaiton provided by nVidia, the card is supported through legacy drivers. So I installed the following from ELRepo:

kmod-nvidia-96xx & nvidia-x11-drv-96xx (as suggested here

After the install of the nvidia drivers from ELRepo-testing, I notices that the kernel options were changed to blacklist nouveau. Also "nomodeset" was added for the nouveau driver. The xorg.conf file had also been modified. However, the module fails to load. Logs attached:

[/quote]

Hi,

I maintain the nvidia drivers at elrepo.org.

They are still in the testing repository as no one has ever provided feedback that they actually work on EL6.

After installing the packages, did you reboot or simply try to restart X (a reboot is preferable) ?

The error says the kernel module failed to load. Can you load it manually (although that shouldn't be necessary)? As root:

[code]
# modprobe nvidia
[/code]

I wonder if the drivers need updating. A recent bug reported the 173xx legacy drivers need updating to work with RHEL-6.2:

http://elrepo.org/bugs/view.php?id=207

I could try updating them to the latest version (96.43.20) later this weekend.

It would be helpful if you could open a bug at http://elrepo.org/bugs/ and I'll track the issue there.

dhungyel
Posts: 8
Joined: 2011/12/09 15:30:36
Location: Bhutan

Re: nvidia Kernel Module Loading Isuue

Post by dhungyel » 2011/12/10 03:28:36

Hi NedSlider

Thanks for the reply and great work you guys are doing.

After loading the drivers, I rebooted. Yes I did try loading the driver manually but it fails to load. I had to uninstall the drivers to atleast get the Nouveau running. I will reload the drivers and try again and post the log. Is it a kernel related issue? I had the CentOS 6 Continuous Release installed and enabled for getting the updates for 6.1 release. The kernel definitely got updated before I loaded the drivers.

I will try and file a bug report as suggested.

Thanks and look forward to an update.

--
Dhungyel

dhungyel
Posts: 8
Joined: 2011/12/09 15:30:36
Location: Bhutan

Re: nvidia Kernel Module Loading Isuue

Post by dhungyel » 2011/12/10 10:18:34

Loading Driver manually by "modprobe nvidia" gives the following:


FATAL: Error inserting nvidia (/lib/modules/2.6.32-131.17.1.el6.i686/weak-updates/nvidia-96xx/nvidia.ko): Accessing a corrupted shared library

Not sure which library is corrupted. It was a clean install of CentOS 6 with updates. I have enabled CentOS-CR repo besides the base Repo protection is enabled for this. ELRepo was was enabled just for installing the nvidia drivers.

--
Dhungyel

NedSlider
Forum Moderator
Posts: 2897
Joined: 2005/10/28 13:11:50
Location: UK

Re: nvidia Kernel Module Loading Isuue

Post by NedSlider » 2011/12/10 17:00:22

Hmm, that doesn't sound too promising.

Anyway, I've built the latest packages (96.43.20) and they should appear on the mirrors shortly:

http://elrepo.org/bugs/view.php?id=208

although I'm not convinced this is going to solve your issue.

If these updated elrepo packages still don't work, then the next step I'd suggest (after uninstalling the elrepo driver packages) would be to see if you can get the official nvidia installer to work.

dhungyel
Posts: 8
Joined: 2011/12/09 15:30:36
Location: Bhutan

Re: nvidia Kernel Module Loading Isuue

Post by dhungyel » 2011/12/11 06:26:41

Hi

No luck with the latest packages (96.43.20) also. :-(

some snippets from the log files below:

*********
(--) PCI:*(0:1:0:0) 10de:0258:10de:0138 nVidia Corporation NV25GL [Quadro4 900 XGL] rev 163, Mem @ 0xfc000000/16777216, 0xf0000000/134217728, 0xeff80000/524288, BIOS @ 0x????????/131072
.......
......

(II) Loading /usr/lib/xorg/modules/extensions/nvidia/libglx.so
(II) Module glx: vendor="NVIDIA Corporation"
compiled for 4.0.2, module version = 1.0.0
Module class: X.Org Server Extension
(II) NVIDIA GLX Module 96.43.20 Sun Jul 17 23:48:16 PDT 2011
........
.......
(EE) NVIDIA(0): Failed to load the NVIDIA kernel module!
(EE) NVIDIA(0): *** Aborting ***
(II) UnloadModule: "nvidia"
(II) UnloadModule: "fb"
(EE) Screen(s) found, but none have a usable configuration.

Fatal server error:

I tried to load the driver manually but same FATAL error about "accessing a corrupted shared library"

The I uninstalled the packages from the ELRepo and tried with binary package from nVidia. No luck there also
:-?

The driver module builds fine but fails during the instllation. nvidia-installer log is attached.

Module fails ELF checks and does not load as per the nvidia installer log. While booting, dmesg shows the same issue as:

Verify ELF error [sec 33 note 12] (assertion 286)
Module failed ELF checks

Looks like I am stuck with nouveau for the time being. Looks like my ten year old dell is too old to run anyhtung properly :) btw tried with Fedora 16 and drivers from rpmfusion. Same story there. Also same ELF check failure message with the binary from nVidia also.

--
Dhungyel

dhungyel
Posts: 8
Joined: 2011/12/09 15:30:36
Location: Bhutan

Re: nvidia Kernel Module Loading Isuue

Post by dhungyel » 2011/12/11 06:34:41

Looks like the nvidia installer log was rejected from my earier post. Here are some sinppets from that log:

-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most
frequently when this kernel module was built against the wrong or
improperly configured kernel sources, with a version of gcc that differs
from the one used to build the target kernel, or if a driver such as
rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel
module from obtaining ownership of the NVIDIA graphics device(s).

Please see the log entries 'Kernel module load error' and 'Kernel
messages' at the end of the file '/var/log/nvidia-installer.log' for
more information.
-> Kernel module load error: insmod: error inserting './usr/src/nv/nvidia.ko':
-1 Accessing a corrupted shared library

Verify ELF error [sec 33 note 12] (assertion 286)
Module failed ELF checks
ERROR: Installation has failed. Please see the file
'/var/log/nvidia-installer.log' for details. You may find suggestions
on fixing installation problems in the README available on the Linux

Additional Info:

#lspci |grep VGA
01:00.0 VGA compatible controller: nVidia Corporation NV25GL [Quadro4 900 XGL] (rev a3)


# cat /proc/version
Linux version 2.6.32-131.21.1.el6.i686 (mockbuild@c6b6.bsys.dev.centos.org) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Tue Nov 22 18:21:07 GMT 2011


# gcc -v
Using built-in specs.
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC)

#cat grub.conf|grep kernel
# all kernel and initrd paths are relative to /boot/, eg.
# kernel /vmlinuz-version ro root=/dev/sda8
kernel /vmlinuz-2.6.32-131.21.1.el6.i686 ro root=UUID=98149cb5-bc47-43b8-9e14-097ee67c3567 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet crashkernel=auto nouveau.modeset=0 rdblacklist=nouveau

# cat /etc/modprobe.d/blacklist-nouveau.conf
# ELRepo.org (http://elrepo.org)
# Blacklist file for the nouveau driver in el6
# If the nouveau continues to load, run as root:
# dracut -f /boot/initramfs-$(uname -r).img $(uname -r)

blacklist nouveau

-------------------
I could post the output for the linked libraries from 'ldconfig' if that would help. They looked okay to me.

--
Dhungyel

NedSlider
Forum Moderator
Posts: 2897
Joined: 2005/10/28 13:11:50
Location: UK

Re: nvidia Kernel Module Loading Isuue

Post by NedSlider » 2011/12/11 12:13:30

The ELF errors are worrying. What does the output of file have to say on your system:

[code]
$ file /lib/modules/2.6.32-71.el6.x86_64/extra/nvidia-96xx/nvidia.ko
/lib/modules/2.6.32-71.el6.x86_64/extra/nvidia-96xx/nvidia.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
[/code]

Unfortunately I can't load the module on my RHEL-6 test box because I don't have supporting hardware, but the module looks normal to me.

I'm beginning to think the issue is your end rather than with the driver / packaging.

dhungyel
Posts: 8
Joined: 2011/12/09 15:30:36
Location: Bhutan

Re: nvidia Kernel Module Loading Isuue

Post by dhungyel » 2011/12/11 16:19:24

Hmm! I had booted with 2.6.32-131.21.1.el6.i686 Kernel and loaded the ELRepo packages. However, /lib/modules/2.6.32-131.21.1.el6.i686/extra directory is empty. The nvidia.ko module is in /lib/modules/2.6.32-131.21.1.el6.i686/weak-updates/nvidia-96xx/ directory and a symlink is created from this to /lib/modules/2.6.32-71.el6.i686/extra/nvidia-96xx/nvidia.ko.

It that hardcoded into the ELRepo package?

Just to be clear, I uninstalled the kmod packages from ELRepo, removed the prervious kernel (2.6.32-71.el6.i686), rebooted with 2.6.32-131.21.1.el6.i686 kernel, reinstalled the nvidia packages from ELRepo. Again, /lib/modules/2.6.32-131.21.1.el6.i686/extra is empty. The nvidia.ko is under /lib/modules/2.6.32-131.21.1.el6.i686/weak-updates/nvidia-96xx/ directory and a symlink is created from this to /lib/modules/2.6.32-71.el6.i686/extra/nvidia-96xx/nvidia.ko

The ELF info on the nvidia.ko is as follows:


# file /lib/modules/2.6.32-131.21.1.el6.i686/weak-updates/nvidia-96xx/nvidia.ko

/lib/modules/2.6.32-131.21.1.el6.i686/weak-updates/nvidia-96xx/nvidia.ko: symbolic link to `/lib/modules/2.6.32-71.el6.i686/extra/nvidia-96xx/nvidia.ko'
/lib/modules/2.6.32-71.el6.i686/extra/nvidia-96xx/nvidia.ko: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped


I made a symlink of the module to /lib/modules/2.6.32-131.21.1.el6.i686/extra /nvidia-96xx/ but it did not help.

--
Dhungyel

NedSlider
Forum Moderator
Posts: 2897
Joined: 2005/10/28 13:11:50
Location: UK

Re: nvidia Kernel Module Loading Isuue

Post by NedSlider » 2011/12/11 16:35:52

[quote]
dhungyel wrote:
Hmm! I had booted with 2.6.32-131.21.1.el6.i686 Kernel and loaded the ELRepo packages. However, /lib/modules/2.6.32-131.21.1.el6.i686/extra directory is empty. The nvidia.ko module is in /lib/modules/2.6.32-131.21.1.el6.i686/weak-updates/nvidia-96xx/ directory and a symlink is created from this to /lib/modules/2.6.32-71.el6.i686/extra/nvidia-96xx/nvidia.ko.

It that hardcoded into the ELRepo package?

Just to be clear, I uninstalled the kmod packages from ELRepo, removed the prervious kernel (2.6.32-71.el6.i686), rebooted with 2.6.32-131.21.1.el6.i686 kernel, reinstalled the nvidia packages from ELRepo. Again, /lib/modules/2.6.32-131.21.1.el6.i686/extra is empty. The nvidia.ko is under /lib/modules/2.6.32-131.21.1.el6.i686/weak-updates/nvidia-96xx/ directory and a symlink is created from this to /lib/modules/2.6.32-71.el6.i686/extra/nvidia-96xx/nvidia.ko
[/quote]

That's all correct. The elrepo module is built (in this case) against kernel-2.6.32-71.el6 so it is installed to /lib/modules/2.6.32-71.el6.i686/extra/nvidia-96xx/nvidia.ko.

It is then weak-linked against each kernel that it is kABI-compatible with by the /sbin/weak-modules script (and a symlink created to /lib/modules//weak-updates/nvidia-96xx/nvidia.ko ). This is the magic that allows the driver to work seamlessly across kernel updates.

[quote]
The ELF info on the nvidia.ko is as follows:


# file /lib/modules/2.6.32-131.21.1.el6.i686/weak-updates/nvidia-96xx/nvidia.ko

/lib/modules/2.6.32-131.21.1.el6.i686/weak-updates/nvidia-96xx/nvidia.ko: symbolic link to `/lib/modules/2.6.32-71.el6.i686/extra/nvidia-96xx/nvidia.ko'
/lib/modules/2.6.32-71.el6.i686/extra/nvidia-96xx/nvidia.ko: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
[/quote]

No errors there so 'file' thinks it's of genuine ELF format.

Post Reply