Trouble updating igb driver for diskless clients

Issues related to configuring your network
Post Reply
rleto
Posts: 1
Joined: 2013/08/15 20:48:54
Contact:

Trouble updating igb driver for diskless clients

Post by rleto » 2013/08/15 21:59:44

We have several compute clusters used for CFD and are using CentOS 5.9 on a new build.

2.6.18-348.6.1.el5 x86_64 x86_64
centos-release-5-9.el5.centos.1
CentOS release 5.9 (Final)

In trying to trouble shoot some issues with tcp errors from OpenMPI, I wanted to test updating the igb driver.
The orig installed version of the driver is:

[root@hill2 ~]# ethtool -i eth0
driver: igb
version: 4.0.1-k1-1
firmware-version: 0.1470, 0x06b28000
bus-info: 0000:03:00.0

I installed kmod-igb via yum on the headnode to the latest:

[root@hill ~]# ethtool -i eth0
driver: igb
version: 4.3.0
firmware-version: 0.93, 0x800006b2
bus-info: 0000:03:00.0

FYI:
[root@hill ~]# lspci -nn |grep Ethernet
03:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
03:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)

Our diskless setup was done as described in the wiki http://wiki.centos.org/HowTos/DisklessClients

I used yumdownloader to get the kmod-igb rpm and tried installing that after a chroot in the /diskless/root/

There was an error related to grub trying this method, but the igb.ko was copied to /lib/modules/2.6.18-348.el5/extra/igb/igb.ko
and the soft link to it was created in /lib/modules/2.6.18-348.6.1.el5/weak-updates/igb
As well the kmod-igb.conf was placed in /etc/depmod.d/
Thinking maybe the soft link to need to be relative I changed that to ../../../2.6.18-348.el5/extra/igb/igb.ko just to be sure (even though I know the nodes see root at /diskless/root as exported on the nfs)

Anyway the nodes don't pick up this version of the driver after installing there.

It occurred to me that maybe I needed to create a new initrd.img to place in /tftboot/linux-install/

so I used the system-config-netboot gui and made a new OSName from the 2.6.18-348.6.1.el5 kernel

If I try to pxe boot a node now from that img it will not even load the igb driver and therefore eth0 is not found... kernel panic.
It finds the correct initrd.img and starts the boot but the without the eth0 interface, dhcp fails and the kernel can't sync
I next thought maybe due to the way the kmod-igb uses the ../weak-updates/ ../extra/ and linking I needed to use just the 2.6.18-348.el5 kernel.
Same result no igb driver loaded.

Two real questions:
1. What would be the best way to update the igb driver for the diskless clients? (did I miss something simple on how to install at /diskless/root/?)
2. Why would the new diskless initrd.img I created not work?

As I mentioned the kmod-igb install on the head node works fine and the new driver is the one loaded.

Was trying to avoid manually unpacking my original iniitrd.img and directly replacing the igb.ko in there.

I am an aero engineer way over my head in the IT dept but by nature of being a small company I try to do most of this myself (our clusters mostly hum along nicely crunching numbers for the last 4+years)... I heavily relying on reading in this forum and the wiki(!)... maybe one day we'll grow to point I can hire an IT guy or gal (I hope!)

Thanks for any and all suggestions and apologies upfront if I've left out any relavant info.

Post Reply