Drivers in dracut.conf Not Loaded at Boot

Issues related to applications and software problems
Quantum`
Posts: 19
Joined: 2015/05/15 18:50:42

Drivers in dracut.conf Not Loaded at Boot

Postby Quantum` » 2018/05/12 22:09:59

I am setting up to do GPU passthrough in KVM to a VM using OMVF, but it's got a cray in its craw.

I've added to /etc/dracut.d/vfio.conf:

Code: Select all

add_drivers+=" vfio_pci vfio vfio_iommu_type1 vfio_virqfd "

... and to /etc/modprobe.d/vfio.conf:

Code: Select all

options vfio-pci ids=10de:1c02,10de:10f1

... then # dracut --force

... which should add those drivers and their options to initramfs. And examining initramfs they are indeed there:

Code: Select all

# lsinitrd |grep vfio
-rw-r--r--   1 root     root           41 May 12 14:23 etc/modprobe.d/carls-vfio.conf
drwxr-xr-x   3 root     root            0 May 12 14:53 usr/lib/modules/4.16.7-1.el7.elrepo.x86_64/kernel/drivers/vfio
drwxr-xr-x   2 root     root            0 May 12 14:53 usr/lib/modules/4.16.7-1.el7.elrepo.x86_64/kernel/drivers/vfio/pci
-rwxr--r--   1 root     root        82336 May 12 14:53 usr/lib/modules/4.16.7-1.el7.elrepo.x86_64/kernel/drivers/vfio/pci/vfio-pci.ko
-rwxr--r--   1 root     root        30800 May 12 14:53 usr/lib/modules/4.16.7-1.el7.elrepo.x86_64/kernel/drivers/vfio/vfio_iommu_type1.ko
-rwxr--r--   1 root     root        55080 May 12 14:53 usr/lib/modules/4.16.7-1.el7.elrepo.x86_64/kernel/drivers/vfio/vfio.ko
-rwxr--r--   1 root     root        11072 May 12 14:53 usr/lib/modules/4.16.7-1.el7.elrepo.x86_64/kernel/drivers/vfio/vfio_virqfd.ko


But on boot the drivers are simply not loading. They do not appear in lsmod, and the normal default drivers are assigned to the video and sound cards.

Code: Select all

IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU Group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
IOMMU Group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)


Code: Select all

lspci -k
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] (rev a1)
        Subsystem: ASUSTeK Computer Inc. Device 85b9
        Kernel driver in use: nouveau
        Kernel modules: nouveau
01:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
        Subsystem: ASUSTeK Computer Inc. Device 85b9
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel


I see no mention of vfio in dmesg, journalctl, or messages. How is this getting overlooked?

Quantum`
Posts: 19
Joined: 2015/05/15 18:50:42

Re: Drivers in dracut.conf Not Loaded at Boot

Postby Quantum` » 2018/05/14 15:13:48

Gee... am I doing new science?

User avatar
TrevorH
Forum Moderator
Posts: 22574
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Drivers in dracut.conf Not Loaded at Boot

Postby TrevorH » 2018/05/14 16:23:26

Everyone who replies here is a volunteer and we all have our areas of expertise. Your problem is with something that I would guess most people do not use. To get any good replies, you'll need to wait for someone who knows this area. Hopefully there will be someone.

Also, you posted on Saturday, late in the evening here in the UK and most people will not have returned to work until today and the USA probably aren't even properly awake yet...
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

hunter86_bg
Posts: 1101
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: Drivers in dracut.conf Not Loaded at Boot

Postby hunter86_bg » 2018/05/15 03:54:52

Sadly I had IOMMU experience only with AMD GPUs and I never had to add 'extra' drivers to the initramfs.
Let's start fresh,by answering the following:
1.What type and manufacturer is your IOMMU device
2.Have you enabled IOMMU from BIOS/UEFI?
3. Have you checked the following guide.

Quantum`
Posts: 19
Joined: 2015/05/15 18:50:42

Re: Drivers in dracut.conf Not Loaded at Boot

Postby Quantum` » 2018/05/16 05:03:01

Thanks all. I've been without TV for a week, is it.

1. It's an nVidia GeForce GTX 1060 3GB PCIe GPU
2. Yes

Code: Select all

# dmesg | grep -e DMAR -e IOMMU
[    0.000000] ACPI: DMAR 0x00000000A4018530 0000A8 (v01 INTEL  SKL      00000001 INTL 00000001)
[    0.000000] DMAR: IOMMU enabled
[    0.001000] DMAR: Host address width 39
[    0.001000] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.001000] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 7e3ff0505e
[    0.001000] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.001000] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.001000] DMAR: RMRR base: 0x000000b1856000 end: 0x000000b1875fff
[    0.001000] DMAR: RMRR base: 0x000000b3800000 end: 0x000000b7ffffff
[    0.001000] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.001000] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.001000] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.002000] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.544647] DMAR: No ATSR found
[    0.544681] DMAR: dmar0: Using Queued invalidation
[    0.544685] DMAR: dmar1: Using Queued invalidation
[    0.544935] DMAR: Hardware identity mapping for device 0000:00:00.0
[    0.544938] DMAR: Hardware identity mapping for device 0000:00:01.0
[    0.544942] DMAR: Hardware identity mapping for device 0000:00:02.0
[    0.544945] DMAR: Hardware identity mapping for device 0000:00:14.0
[    0.544947] DMAR: Hardware identity mapping for device 0000:00:16.0
[    0.544949] DMAR: Hardware identity mapping for device 0000:00:17.0
[    0.544951] DMAR: Hardware identity mapping for device 0000:00:1b.0
[    0.544952] DMAR: Hardware identity mapping for device 0000:00:1b.4
[    0.544955] DMAR: Hardware identity mapping for device 0000:00:1c.0
[    0.544956] DMAR: Hardware identity mapping for device 0000:00:1d.0
[    0.544959] DMAR: Hardware identity mapping for device 0000:00:1f.0
[    0.544961] DMAR: Hardware identity mapping for device 0000:00:1f.2
[    0.544963] DMAR: Hardware identity mapping for device 0000:00:1f.3
[    0.544965] DMAR: Hardware identity mapping for device 0000:00:1f.4
[    0.544967] DMAR: Hardware identity mapping for device 0000:00:1f.6
[    0.544970] DMAR: Hardware identity mapping for device 0000:01:00.0
[    0.544972] DMAR: Hardware identity mapping for device 0000:01:00.1
[    0.544975] DMAR: Hardware identity mapping for device 0000:03:00.0
[    0.544977] DMAR: Hardware identity mapping for device 0000:03:00.1
[    0.544980] DMAR: Hardware identity mapping for device 0000:03:00.2
[    0.544982] DMAR: Hardware identity mapping for device 0000:03:00.3
[    0.544983] DMAR: Setting RMRR:
[    0.544985] DMAR: Ignoring identity map for HW passthrough device 0000:00:02.0 [0xb3800000 - 0xb7ffffff]
[    0.544987] DMAR: Ignoring identity map for HW passthrough device 0000:00:14.0 [0xb1856000 - 0xb1875fff]
[    0.544989] DMAR: Prepare 0-16MiB unity mapping for LPC
[    0.544991] DMAR: Ignoring identity map for HW passthrough device 0000:00:1f.0 [0x0 - 0xffffff]
[    0.545014] DMAR: Intel(R) Virtualization Technology for Directed I/O
[   40.411872] DMAR: 64bit 0000:03:11.1 uses identity mapping
[   40.587790] DMAR: 64bit 0000:03:12.1 uses identity mapping
[   40.976340] DMAR: 64bit 0000:03:10.0 uses identity mapping
[   41.000373] DMAR: 64bit 0000:03:10.1 uses identity mapping
[   41.188460] DMAR: 64bit 0000:03:10.4 uses identity mapping
[   41.378421] DMAR: 64bit 0000:03:12.5 uses identity mapping
[   41.576544] DMAR: 64bit 0000:03:11.5 uses identity mapping

3. Actually I'm working from the newer Arch guide. Sure it's not CentOS, but the CentOS guide doesn't know from OMVF. It's for CentOS v5.

Groups:

Code: Select all

# iommuTOpci
...
IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU Group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
IOMMU Group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
...

Code: Select all

find /sys/kernel/iommu_groups/ -type l
...
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
...


- Most motherboards have PCIe slots provided by both the CPU and the PCH. Depending on CPU, it is possible that the processor-based PCIe slot does not support isolation properly, in which case the PCI slot itself will be appear to be grouped with the device that is connected to it. This is fine so long as only your guest GPU is included in here, but additional devices within the same group must be passed through too.
- The device and all those sharing the same IOMMU group must have their driver replaced by a stub driver or a VFIO driver in order to prevent the host machine from interacting with them.
- Due to their size and complexity, GPU drivers do not tend to support dynamic rebinding very well, so bind those placeholder drivers manually before starting the VM.

Code: Select all

# modinfo vfio-pci
filename:       /lib/modules/4.16.7-1.el7.elrepo.x86_64/kernel/drivers/vfio/pci/vfio-pci.ko
description:    VFIO PCI - User Level meta-driver
author:         Alex Williamson <alex.williamson@redhat.com>
license:        GPL v2
version:        0.2
srcversion:     285B406AFDCA2E25E1FFCE6
depends:        vfio,irqbypass,vfio_virqfd
retpoline:      Y
intree:         Y
name:           vfio_pci
vermagic:       4.16.7-1.el7.elrepo.x86_64 SMP mod_unload modversions
parm:           ids:Initial PCI IDs to add to the vfio driver, format is "vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]" and multiple comma separated entries can be specified (string)
parm:           nointxmask:Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag. (bool)
parm:           disable_idle_d3:Disable using the PCI D3 low power state for idle, unused devices (bool)


/etc/modprobe.d/vfio.conf

Code: Select all

options vfio-pci ids=10de:1c02,10de:10f1

(This is taken up by dracut)

- If pci root port (Bridge) is part of your IOMMU group, must not pass its ID to vfio-pci, as it
needs to remain attached to the host to function properly. Any other device within that group, however, should be left for vfio-pci to bind with.
- This does not guarantee that vfio-pci will be loaded before other graphics drivers though.
To ensure that -- in this order, and these must precede any video drivers loaded this way:
/etc/dracut.conf.d/vfio.conf

Code: Select all

force_drivers+=" vfio_pci vfio vfio_iommu_type1 vfio_virqfd "
   # dracut -f --regenerate-all   

And reboot.

Is vfio-pci loaded properly and bound to the right devices?

Code: Select all

# dmesg | grep -i vfio
#


Code: Select all

# lspci -nnk -d 10de:1c02
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
        Subsystem: ASUSTeK Computer Inc. Device [1043:85b9]
        Kernel driver in use: nouveau
        Kernel modules: nouveau

# lspci -nnk -d 10de:10f1
01:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
        Subsystem: ASUSTeK Computer Inc. Device [1043:85b9]
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel


- Ensure that installed are: qemu-system-x86.x86_64, libvirt.x86_64, OVMF, and virt-manager
- Add the path to the OVMF firmware image and runtime variables template to libvirt config so virt-install or virt-manager can find it:
/etc/libvirt/qemu.conf

Code: Select all

nvram = [
        "/usr/share/OVMF/OVMF_CODE.secboot.fd:/usr/share/OVMF/OVMF_VARS.fd"
]

(File confirmed)
# systemctl status libvirtd
active (running)
# systemctl status virtlogd
active (running)


Code: Select all

# lsinitrd
...
-rw-r--r--   1 root     root           41 May 12 14:23 etc/modprobe.d/carls-vfio.conf
...
# nano dracut.conf.d/carls-vfio.conf
# lsinitrd |grep vfio_pci
# lsinitrd |grep vfio
-rw-r--r--   1 root     root           41 May 12 14:23 etc/modprobe.d/vfio.conf
# lsinitrd |grep vfio_iommu_type1
# lsinitrd |grep vfio_virqfd


Well for the love of Saint Peter, dracut isn't picking up my drivers!

Is this another ridiculously common case where I properly put a modification in the *.d directory and it gets ignored?