CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Issues related to configuring your network
paix
Posts: 6
Joined: 2009/06/19 13:16:13
Contact:

CentOS 5.3 nvidia nForce network bugs, kernel panics (forced

Post by paix » 2009/06/19 13:46:14

Hi all.

I have a strange problems with the network card.
Under high network traffic a NIC freeze the server.
You could see screenshot from IPKVM here: http://paix.org.ua/tmp/panik_260509.jpg

I am running a openvz kernel, which is based on the current RHEL5 kernel.

[code]
#uname -a
Linux domain 2.6.18-128.1.1.el5.028stab062.3 #1 SMP Sun May 10 18:54:51 MSD 2009 x86_64 x86_64 x86_64 GNU/Linux
[/code]

from dmesg:
[code]
00:0a.0 Ethernet controller: nVidia Corporation MCP67 Ethernet (rev a2)
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:82b3 bound to 0000:00:0a.0
[/code]

[code]
# lspci
00:00.0 RAM memory: nVidia Corporation MCP67 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP67 ISA Bridge (rev a2)
00:01.1 SMBus: nVidia Corporation MCP67 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation MCP67 OHCI USB 1.1 Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation MCP67 EHCI USB 2.0 Controller (rev a2)
00:04.0 USB Controller: nVidia Corporation MCP67 OHCI USB 1.1 Controller (rev a2)
00:04.1 USB Controller: nVidia Corporation MCP67 EHCI USB 2.0 Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation MCP67 IDE Controller (rev a1)
00:07.0 Audio device: nVidia Corporation MCP67 High Definition Audio (rev a1)
00:08.0 PCI bridge: nVidia Corporation MCP67 PCI Bridge (rev a2)
00:09.0 IDE interface: nVidia Corporation MCP67 AHCI Controller (rev a2)
00:0a.0 Ethernet controller: nVidia Corporation MCP67 Ethernet (rev a2)
00:0b.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0c.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0d.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0e.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0f.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:10.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:11.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:12.0 VGA compatible controller: nVidia Corporation GeForce 7050 PV / nForce 630a (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
[/code]

Recently I have got too panics, when I tested the network by iperf.

kernel booted with
[code]
irqpoll nousb noapic
[/code]

screen from ipkvm: http://paix.org.ua/tmp/panic_190609.jpg

kernel booted with
[code]
nousb noapic
[/code]
and

[code]
alias eth0 forcedeth
options forcedeth optimization_mode=1
[/code]

screen from ipkvm: http://paix.org.ua/tmp/panic2_190609.jpg

Also there is one interesting oops in log/messages:

[code]
kernel: skb_over_panic: text:ffffffff881bf46f len:15398 put:15398 head:ffff8100a25c5800 data:ffff8100a25c5810 tail:ffff8100a25c9436 end:ffff8100a25c5e80 dev:eth0
kernel: ----------- [cut here ] --------- [please bite here ] ---------
kernel: Kernel BUG at net/core/skbuff.c:96
[/code]

Also I have a kernel booted with crashkernel=128M@16M option, and I have kdump running, but unfortunately there are no any saved core. I couldn't reboot server via ipkvm too, so I've requested a support to hardware reboot the server.

dmesg:
http://paix.org.ua/tmp/dmesg_190609.txt

PS. I understand that this driver "forcedeth.c: Reverse Engineered nForce ethernet driver" is very very unstable and unpredictable, but I have such server under Freebsd 7, and a NIC on the FreeBSD7 (by nfe driver) works normal. Also this server has a 1U case, so at the moment I can't switch to a normal external NIC...

Thanks in advice

User avatar
AlanBartlett
Forum Moderator
Posts: 9345
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Post by AlanBartlett » 2009/06/19 14:24:49

[quote]I am running a openvz kernel[/quote]
That is unfortunate. If you were running the distributed kernel-2.6.18-128.1.14.el5 I would suggest that you install the kABI tracking [url=http://elrepo.org/tiki/kmod-forcedeth]kmod-forcedeth[/url] package from the [url=//elrepo.org]ELRepo[/url] repository which should then resolve your problem.

As you are using a modified kernel that does not preserve [i]Red Hat[/i]'s established kABI, this solution is not open to you.

paix
Posts: 6
Joined: 2009/06/19 13:16:13
Contact:

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Post by paix » 2009/06/19 15:17:40

[quote]
AlanJBartlett wrote:
[quote]I am running a openvz kernel[/quote]
That is unfortunate. If you were running the distributed kernel-2.6.18-128.1.14.el5 I would suggest that you install the kABI tracking [url=http://elrepo.org/tiki/kmod-forcedeth]kmod-forcedeth[/url] package from the [url=//elrepo.org]ELRepo[/url] repository which should then resolve your problem.
As you are using a modified kernel that does not preserve [i]Red Hat[/i]'s established kABI, this solution is not open to you.[/quote]

Thanks for a reply.
I have downloaded the src.rpm of kmod-forcedeth and rebuilded it by mock for my kernel version.
The procedure is quite simple:
[code]
mock -r centos-5-x86_64 --no-clean --rebuild forcedeth-kmod-0.62-1.25.1.el5.elrepo.src.rpm
[/code]

and now I have
http://paix.org.ua/tmp/kmod-forcedeth-0.62-1.25.1.el5.elrepo.x86_64.rpm

I'm interesting what should I do to activate this module ?
Does it activate automatically after installation ?

And what a procedure for updating ?
I.e. after installing a new openvz kernel, I should also rebuild the forcedeth-kmod and only after this reboot the server ?

Thanks!

User avatar
AlanBartlett
Forum Moderator
Posts: 9345
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Post by AlanBartlett » 2009/06/19 15:43:30

[quote]I have downloaded the src.rpm of kmod-forcedeth and rebuilded it by mock for my kernel version.[/quote]
Very resourceful. :-D

[code]
I'm interesting what should I do to activate this module ?
Does it activate automatically after installation ?

And what a procedure for updating ?
I.e. after installing a new openvz kernel, I should also rebuild the forcedeth-kmod and only after this reboot the server ?
[/code]
After installing it ([b]rpm -ivh kmod-forcedeth-*.rpm[/b]), check that the driver module is present under your [b]/lib/modules/`uname -r`/extra/[/b] directory and that a [b]/etc/depmod.d/forcedeth.conf[/b] file now exists.

In general, you should stop the network service ([b]service network stop[/b]), unload the older version of the driver from the kernel ([b]modprobe -r forcedeth[/b]) and then restart the network ([b]service network start[/b]). Often, people prefer to take the simpler option of rebooting the system. ;-)

You will need to manually update -- i.e. do all of the above -- for each and every new kernel that is installed. That is [i]unless[/i] those [i]openvz[/i] kernels do retain their own ABI consistency. (You could check for that by executing a [b]find /lib/modules -name forcedeth.ko[/b] command after a new kernel has been installed. If there is a symbolic link from the [i]new[/i] kernel's [i]weak-updates/[/i] directory to the [i]extra/[/i] directory of this, your current kernel, it would appear that the [i]openvz[/i] kernels do retain some degree of ABI consistency.)

paix
Posts: 6
Joined: 2009/06/19 13:16:13
Contact:

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Post by paix » 2009/06/19 16:07:52

Alan, thanks for a help.

[code]
storm ~ # rpm -ql kmod-forcedeth
/lib/modules/2.6.18-128.1.1.el5.028stab062.3
/lib/modules/2.6.18-128.1.1.el5.028stab062.3/extra
/lib/modules/2.6.18-128.1.1.el5.028stab062.3/extra/forcedeth
/lib/modules/2.6.18-128.1.1.el5.028stab062.3/extra/forcedeth/forcedeth.ko
[/code]

[code]
storm ~ # cat /etc/depmod.d/forcedeth.conf
override forcedeth * weak-updates/forcedeth
[/code]

[code]
storm ~ # modprobe -r forcedeth; service network restart
Shutting down interface venet0: Shutting down interface venet0:
[ OK ]
Shutting down loopback interface: [ OK ]
Disabling IPv4 packet forwarding: net.ipv4.ip_forward = 0
[ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: [ OK ]
Bringing up interface venet0: Bringing up interface venet0:
Configuring interface venet0:
net.ipv4.conf.venet0.send_redirects = 0
[ OK ]
[/code]

[code]
#last dmesg:
ACPI: PCI interrupt for device 0000:00:0a.0 disabled
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.62-Driver Package V1.25.
PCI: Enabling device 0000:00:0a.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LMAC] -> GSI 10 (level, low) -> IRQ 10
PCI: Setting latency timer of device 0000:00:0a.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:82b3 bound to 0000:00:0a.0
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: link up.
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present
[/code]

So, look likes all fine :-)

Also I have
[code]
storm ~ # cat /etc/modprobe.conf
alias eth0 forcedeth
options forcedeth optimization_mode=1
[/code]

What you suggestions as for forcedeth optimization_mode ?

I'm planning to do a stress test on Monday.
Because today is the evening of Friday (in central Europe) so it's time to drink beer ;-)
Thanks a lot!

User avatar
AlanBartlett
Forum Moderator
Posts: 9345
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Post by AlanBartlett » 2009/06/19 16:59:20

[quote]Thanks a lot![/quote]
You're welcome. :-)

[quote]Because today is the evening of Friday (in central Europe) so it's time to drink beer ;-)[/quote]
What a good idea! :pint:

[quote]
What you suggestions as for forcedeth optimization_mode ?
[/quote]
We, the ELRepo Admin team, did not write the actual driver code -- we have just packaged the most recent version that is available.

As I do not have a system with such a NIC, I've never used, let alone experimented, with that driver.

Let's have a look at the configurable parameters --

[code]
$ modinfo -F parm forcedeth.ko
tagging_8021pq:802.1pq tagging is enabled by setting to 1 and disabled by setting to 0.
wol:Wake-On-Lan is enabled by setting to 1 and disabled by setting to 0.
dma_64bit:High DMA is enabled by setting to 1 and disabled by setting to 0.
rx_flow_control:Rx flow control is enabled by setting to 1 and disabled by setting to 0.
tx_flow_control:Tx flow control is enabled by setting to 1 and disabled by setting to 0.
rx_ring_size:Rx ring size. Maximum value of 1024 or 16384 depending on hardware.
tx_ring_size:Tx ring size. Maximum value of 1024 or 16384 depending on hardware.
rx_checksum_offload:Rx checksum offload is enabled by setting to 1 and disabled by setting to 0.
tx_checksum_offload:Tx checksum offload is enabled by setting to 1 and disabled by setting to 0.
mtu:MTU value. Maximum value of 1500 or 9100 depending on hardware.
tso_offload:TCP Segmentation offload is enabled by setting to 1 and disabled by setting to 0.
scatter_gather:Scatter gather is enabled by setting to 1 and disabled by setting to 0.
autoneg:PHY autonegotiate is enabled by setting to 1 and disabled by setting to 0.
speed_duplex:PHY speed and duplex settings. Auto = 0, 10mbps half = 1, 10mbps full = 2, 100mbps half = 3, 100mbps full = 4, 1000mbps full = 5.
msix:MSIX interrupts are enabled by setting to 1 and disabled by setting to 0.
msi:MSI interrupts are enabled by setting to 1 and disabled by setting to 0.
poll_interval:Interval determines how frequent timer interrupt is generated by [(time_in_micro_secs * 100) / (2^10)]. Min is 0 and Max is 65535.
optimization_mode:In throughput mode (0), every tx & rx packet will generate an interrupt. In CPU mode (1), interrupts are controlled by a timer.
max_interrupt_work:forcedeth maximum events handled per interrupt
lowpowerspeed:Low Power State Link Speed enable by setting to 1 and disabled by setting to 0
[/code]
Quite a few to experiment with. :-?

[quote]I'm planning to do a stress test on Monday.[/quote]
Good luck.

paix
Posts: 6
Joined: 2009/06/19 13:16:13
Contact:

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Post by paix » 2009/06/22 11:02:47

This drivers (0.62-Driver) looks better then centos's default forcedeth.
but I've got the panic all the same. Server freeze completely.

The screenshot from IPKVM here: http://paix.org.ua/tmp/panic_220609.jpg
Unfortunately there isn't any interesting info to identify the problem.

I've stressed the NIC by iperf package (from epel. Description: Iperf is a tool to measure maximum TCP bandwidth)
The tests was running in both directions. On the fourth test the NIC freeze the server.

Kernel was loaded with

[code] kernel /vmlinuz-2.6.18-128.1.1.el5.028stab062.3 ro root=/dev/VolGroupSys/LogVolRoot crashkernel=128M@16M nousb noapic debug=2[/code]

and
[code]options forcedeth optimization_mode=1[/code]


Looks like I should by a new case and a intel NIC.....
I very tired to debug a working server with this nForce NIC. It was a big mistake to build a server on nvidia hardware :-?

User avatar
AlanBartlett
Forum Moderator
Posts: 9345
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Post by AlanBartlett » 2009/06/22 14:30:34

I had a suspicion that even with the latest available version of the [i]forcedeth[/i] driver, you would probably not be satisfied with the NIC's performance.

[quote]Looks like I should by a new case and a intel NIC.....[/quote]
In all seriousness, that would be the best solution.

[quote]I very tired to debug a working server with this nForce NIC. It was a big mistake to build a server on nvidia hardware :-? [/quote]
I would agree with your latter statement. The [i]nVidia[/i] GPUs and their proprietary driver software are acceptable but the other [i]nVidia[/i] chip-sets leave a lot to be desired.

paix
Posts: 6
Joined: 2009/06/19 13:16:13
Contact:

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Post by paix » 2009/08/26 12:57:00

Just in case it's may be interesting.

Driver from elrepo works normal.
The system was freezed under a stress tests, but in normal mode it works.

Updating the driver in case a custom kernel. ( Openvz kernel in my case.)

[code]
# uname -r
2.6.18-128.1.1.el5.028stab062.3

yum update
shutdown -r now
[/code]

The system boots and works via standart forcedeth driver:
[code]forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.[/code]

Then login to the system and rebuild the forcedeth driver:
[code]
# uname -r
2.6.18-128.2.1.el5.028stab064.4
[/code]

I've wrote a simple script to updating..:

[code]
#!/bin/sh
# script for updating centos custom nve driver

drv="forcedeth"
link=`curl http://elrepo.org/linux/elrepo/el5/SRPMS/ 2>/dev/null | grep $drv`

drvver=`echo $link | awk -F\" '{print $2}'`
echo $drvver

if [ ! -f $drvver ]; then
wget http://elrepo.org/linux/elrepo/el5/SRPMS/$drvver
echo
fi

#mock -r centos-5-x86_64 --no-clean --rebuild $drvver
mock -r centos-5-x86_64 --debug --no-clean --rebuild $drvver
[/code]

this scripts download latest forcedeth driver from elrepo and rebuild them.

Installing the driver:
[code]
rpm -Uvh /vz/root/1000/var/lib/mock/centos-5-x86_64/result/kmod-forcedeth-0.62-1.25.2.el5.elrepo.x86_64.rpm
[/code]
Note: I've built the driver in VE (Virtual Environment), but install it on the HN (Hardware Node)

Load the new driver and restart the network:
[code]
modprobe -r forcedeth; service network restart
[/code]


dmesg:
[code]
ACPI: PCI interrupt for device 0000:00:0a.0 disabled
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.62-Driver Package V1.25.
PCI: Enabling device 0000:00:0a.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LMAC] -> GSI 10 (level, low) -> IRQ 10
PCI: Setting latency timer of device 0000:00:0a.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:82b3 bound to 0000:00:0a.0
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: link up.
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[/code]


PS. thanks to Alan and the elrepo team!

paix
Posts: 6
Joined: 2009/06/19 13:16:13
Contact:

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Post by paix » 2009/08/26 13:00:23

[b]PS. NEVER use the nvidia NICs on a production servers.[/b]

Post Reply