CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Issues related to configuring your network

CentOS 5.3 nvidia nForce network bugs, kernel panics (forced

Postby paix » 2009/06/19 13:46:14

Hi all.

I have a strange problems with the network card.
Under high network traffic a NIC freeze the server.
You could see screenshot from IPKVM here: http://paix.org.ua/tmp/panik_260509.jpg

I am running a openvz kernel, which is based on the current RHEL5 kernel.

Code: Select all
#uname -a
Linux domain 2.6.18-128.1.1.el5.028stab062.3 #1 SMP Sun May 10 18:54:51 MSD 2009 x86_64 x86_64 x86_64 GNU/Linux


from dmesg:
Code: Select all
00:0a.0 Ethernet controller: nVidia Corporation MCP67 Ethernet (rev a2)
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:82b3 bound to 0000:00:0a.0


Code: Select all
# lspci
00:00.0 RAM memory: nVidia Corporation MCP67 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP67 ISA Bridge (rev a2)
00:01.1 SMBus: nVidia Corporation MCP67 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation MCP67 OHCI USB 1.1 Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation MCP67 EHCI USB 2.0 Controller (rev a2)
00:04.0 USB Controller: nVidia Corporation MCP67 OHCI USB 1.1 Controller (rev a2)
00:04.1 USB Controller: nVidia Corporation MCP67 EHCI USB 2.0 Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation MCP67 IDE Controller (rev a1)
00:07.0 Audio device: nVidia Corporation MCP67 High Definition Audio (rev a1)
00:08.0 PCI bridge: nVidia Corporation MCP67 PCI Bridge (rev a2)
00:09.0 IDE interface: nVidia Corporation MCP67 AHCI Controller (rev a2)
00:0a.0 Ethernet controller: nVidia Corporation MCP67 Ethernet (rev a2)
00:0b.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0c.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0d.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0e.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0f.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:10.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:11.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:12.0 VGA compatible controller: nVidia Corporation GeForce 7050 PV / nForce 630a (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control


Recently I have got too panics, when I tested the network by iperf.

kernel booted with
Code: Select all
irqpoll nousb noapic


screen from ipkvm: http://paix.org.ua/tmp/panic_190609.jpg

kernel booted with
Code: Select all
nousb noapic

and

Code: Select all
alias eth0 forcedeth
options forcedeth optimization_mode=1


screen from ipkvm: http://paix.org.ua/tmp/panic2_190609.jpg

Also there is one interesting oops in log/messages:

Code: Select all
kernel: skb_over_panic: text:ffffffff881bf46f len:15398 put:15398 head:ffff8100a25c5800 data:ffff8100a25c5810 tail:ffff8100a25c9436 end:ffff8100a25c5e80 dev:eth0
kernel: ----------- [cut here ] --------- [please bite here ] ---------
kernel: Kernel BUG at net/core/skbuff.c:96


Also I have a kernel booted with crashkernel=128M@16M option, and I have kdump running, but unfortunately there are no any saved core. I couldn't reboot server via ipkvm too, so I've requested a support to hardware reboot the server.

dmesg:
http://paix.org.ua/tmp/dmesg_190609.txt

PS. I understand that this driver "forcedeth.c: Reverse Engineered nForce ethernet driver" is very very unstable and unpredictable, but I have such server under Freebsd 7, and a NIC on the FreeBSD7 (by nfe driver) works normal. Also this server has a 1U case, so at the moment I can't switch to a normal external NIC...

Thanks in advice
paix
 
Posts: 6
Joined: 2009/06/19 13:16:13

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Postby AlanBartlett » 2009/06/19 14:24:49

I am running a openvz kernel

That is unfortunate. If you were running the distributed kernel-2.6.18-128.1.14.el5 I would suggest that you install the kABI tracking kmod-forcedeth package from the [url=//elrepo.org]ELRepo[/url] repository which should then resolve your problem.

As you are using a modified kernel that does not preserve Red Hat's established kABI, this solution is not open to you.
User avatar
AlanBartlett
Forum Moderator
 
Posts: 8975
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Postby paix » 2009/06/19 15:17:40

AlanJBartlett wrote:
I am running a openvz kernel

That is unfortunate. If you were running the distributed kernel-2.6.18-128.1.14.el5 I would suggest that you install the kABI tracking kmod-forcedeth package from the [url=//elrepo.org]ELRepo[/url] repository which should then resolve your problem.
As you are using a modified kernel that does not preserve Red Hat's established kABI, this solution is not open to you.


Thanks for a reply.
I have downloaded the src.rpm of kmod-forcedeth and rebuilded it by mock for my kernel version.
The procedure is quite simple:
Code: Select all
mock -r centos-5-x86_64 --no-clean  --rebuild forcedeth-kmod-0.62-1.25.1.el5.elrepo.src.rpm


and now I have
http://paix.org.ua/tmp/kmod-forcedeth-0 ... x86_64.rpm

I'm interesting what should I do to activate this module ?
Does it activate automatically after installation ?

And what a procedure for updating ?
I.e. after installing a new openvz kernel, I should also rebuild the forcedeth-kmod and only after this reboot the server ?

Thanks!
paix
 
Posts: 6
Joined: 2009/06/19 13:16:13

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Postby AlanBartlett » 2009/06/19 15:43:30

I have downloaded the src.rpm of kmod-forcedeth and rebuilded it by mock for my kernel version.

Very resourceful. :-D

Code: Select all
I'm interesting what should I do to activate this module ?
Does it activate automatically after installation ?

And what a procedure for updating ?
I.e. after installing a new openvz kernel, I should also rebuild the forcedeth-kmod and only after this reboot the server ?

After installing it (rpm -ivh kmod-forcedeth-*.rpm), check that the driver module is present under your /lib/modules/`uname -r`/extra/ directory and that a /etc/depmod.d/forcedeth.conf file now exists.

In general, you should stop the network service (service network stop), unload the older version of the driver from the kernel (modprobe -r forcedeth) and then restart the network (service network start). Often, people prefer to take the simpler option of rebooting the system. ;-)

You will need to manually update -- i.e. do all of the above -- for each and every new kernel that is installed. That is unless those openvz kernels do retain their own ABI consistency. (You could check for that by executing a find /lib/modules -name forcedeth.ko command after a new kernel has been installed. If there is a symbolic link from the new kernel's weak-updates/ directory to the extra/ directory of this, your current kernel, it would appear that the openvz kernels do retain some degree of ABI consistency.)
User avatar
AlanBartlett
Forum Moderator
 
Posts: 8975
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Postby paix » 2009/06/19 16:07:52

Alan, thanks for a help.

Code: Select all
storm ~ # rpm -ql kmod-forcedeth
/lib/modules/2.6.18-128.1.1.el5.028stab062.3
/lib/modules/2.6.18-128.1.1.el5.028stab062.3/extra
/lib/modules/2.6.18-128.1.1.el5.028stab062.3/extra/forcedeth
/lib/modules/2.6.18-128.1.1.el5.028stab062.3/extra/forcedeth/forcedeth.ko


Code: Select all
storm ~ # cat /etc/depmod.d/forcedeth.conf
override forcedeth * weak-updates/forcedeth


Code: Select all
storm ~ # modprobe -r forcedeth; service network restart
Shutting down interface venet0:  Shutting down interface venet0:
                                                           [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Disabling IPv4 packet forwarding:  net.ipv4.ip_forward = 0
                                                           [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]
Bringing up interface venet0:  Bringing up interface venet0:
Configuring interface venet0:
net.ipv4.conf.venet0.send_redirects = 0
                                                           [  OK  ]


Code: Select all
#last dmesg:
ACPI: PCI interrupt for device 0000:00:0a.0 disabled
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.62-Driver Package V1.25.
PCI: Enabling device 0000:00:0a.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LMAC] -> GSI 10 (level, low) -> IRQ 10
PCI: Setting latency timer of device 0000:00:0a.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:82b3 bound to 0000:00:0a.0
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: link up.
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present


So, look likes all fine :-)

Also I have
Code: Select all
storm ~ # cat /etc/modprobe.conf
alias eth0 forcedeth
options forcedeth optimization_mode=1


What you suggestions as for forcedeth optimization_mode ?

I'm planning to do a stress test on Monday.
Because today is the evening of Friday (in central Europe) so it's time to drink beer ;-)
Thanks a lot!
paix
 
Posts: 6
Joined: 2009/06/19 13:16:13

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Postby AlanBartlett » 2009/06/19 16:59:20

Thanks a lot!

You're welcome. :-)

Because today is the evening of Friday (in central Europe) so it's time to drink beer ;-)

What a good idea! :pint:

What you suggestions as for forcedeth optimization_mode ?

We, the ELRepo Admin team, did not write the actual driver code -- we have just packaged the most recent version that is available.

As I do not have a system with such a NIC, I've never used, let alone experimented, with that driver.

Let's have a look at the configurable parameters --

Code: Select all
$ modinfo -F parm forcedeth.ko
tagging_8021pq:802.1pq tagging is enabled by setting to 1 and disabled by setting to 0.
wol:Wake-On-Lan is enabled by setting to 1 and disabled by setting to 0.
dma_64bit:High DMA is enabled by setting to 1 and disabled by setting to 0.
rx_flow_control:Rx flow control is enabled by setting to 1 and disabled by setting to 0.
tx_flow_control:Tx flow control is enabled by setting to 1 and disabled by setting to 0.
rx_ring_size:Rx ring size. Maximum value of 1024 or 16384 depending on hardware.
tx_ring_size:Tx ring size. Maximum value of 1024 or 16384 depending on hardware.
rx_checksum_offload:Rx checksum offload is enabled by setting to 1 and disabled by setting to 0.
tx_checksum_offload:Tx checksum offload is enabled by setting to 1 and disabled by setting to 0.
mtu:MTU value. Maximum value of 1500 or 9100 depending on hardware.
tso_offload:TCP Segmentation offload is enabled by setting to 1 and disabled by setting to 0.
scatter_gather:Scatter gather is enabled by setting to 1 and disabled by setting to 0.
autoneg:PHY autonegotiate is enabled by setting to 1 and disabled by setting to 0.
speed_duplex:PHY speed and duplex settings. Auto = 0, 10mbps half = 1, 10mbps full = 2, 100mbps half = 3, 100mbps full = 4, 1000mbps full = 5.
msix:MSIX interrupts are enabled by setting to 1 and disabled by setting to 0.
msi:MSI interrupts are enabled by setting to 1 and disabled by setting to 0.
poll_interval:Interval determines how frequent timer interrupt is generated by [(time_in_micro_secs * 100) / (2^10)]. Min is 0 and Max is 65535.
optimization_mode:In throughput mode (0), every tx & rx packet will generate an interrupt. In CPU mode (1), interrupts are controlled by a timer.
max_interrupt_work:forcedeth maximum events handled per interrupt
lowpowerspeed:Low Power State Link Speed enable by setting to 1 and disabled by setting to 0

Quite a few to experiment with. :-?

I'm planning to do a stress test on Monday.

Good luck.
User avatar
AlanBartlett
Forum Moderator
 
Posts: 8975
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Postby paix » 2009/06/22 11:02:47

This drivers (0.62-Driver) looks better then centos's default forcedeth.
but I've got the panic all the same. Server freeze completely.

The screenshot from IPKVM here: http://paix.org.ua/tmp/panic_220609.jpg
Unfortunately there isn't any interesting info to identify the problem.

I've stressed the NIC by iperf package (from epel. Description: Iperf is a tool to measure maximum TCP bandwidth)
The tests was running in both directions. On the fourth test the NIC freeze the server.

Kernel was loaded with

Code: Select all
 kernel /vmlinuz-2.6.18-128.1.1.el5.028stab062.3 ro root=/dev/VolGroupSys/LogVolRoot crashkernel=128M@16M nousb noapic debug=2


and
Code: Select all
options forcedeth optimization_mode=1



Looks like I should by a new case and a intel NIC.....
I very tired to debug a working server with this nForce NIC. It was a big mistake to build a server on nvidia hardware :-?
paix
 
Posts: 6
Joined: 2009/06/19 13:16:13

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Postby AlanBartlett » 2009/06/22 14:30:34

I had a suspicion that even with the latest available version of the forcedeth driver, you would probably not be satisfied with the NIC's performance.

Looks like I should by a new case and a intel NIC.....

In all seriousness, that would be the best solution.

I very tired to debug a working server with this nForce NIC. It was a big mistake to build a server on nvidia hardware :-?

I would agree with your latter statement. The nVidia GPUs and their proprietary driver software are acceptable but the other nVidia chip-sets leave a lot to be desired.
User avatar
AlanBartlett
Forum Moderator
 
Posts: 8975
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Postby paix » 2009/08/26 12:57:00

Just in case it's may be interesting.

Driver from elrepo works normal.
The system was freezed under a stress tests, but in normal mode it works.

Updating the driver in case a custom kernel. ( Openvz kernel in my case.)

Code: Select all
# uname -r
2.6.18-128.1.1.el5.028stab062.3

yum update
shutdown -r now


The system boots and works via standart forcedeth driver:
Code: Select all
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.


Then login to the system and rebuild the forcedeth driver:
Code: Select all
# uname -r
2.6.18-128.2.1.el5.028stab064.4


I've wrote a simple script to updating..:

Code: Select all
#!/bin/sh
# script for updating centos custom nve driver

drv="forcedeth"
link=`curl http://elrepo.org/linux/elrepo/el5/SRPMS/ 2>/dev/null | grep $drv`

drvver=`echo $link |  awk -F\" '{print $2}'`
echo $drvver

if [ !  -f $drvver ]; then
        wget http://elrepo.org/linux/elrepo/el5/SRPMS/$drvver
        echo
fi

#mock -r centos-5-x86_64  --no-clean  --rebuild $drvver
mock -r centos-5-x86_64 --debug --no-clean  --rebuild $drvver


this scripts download latest forcedeth driver from elrepo and rebuild them.

Installing the driver:
Code: Select all
rpm -Uvh  /vz/root/1000/var/lib/mock/centos-5-x86_64/result/kmod-forcedeth-0.62-1.25.2.el5.elrepo.x86_64.rpm

Note: I've built the driver in VE (Virtual Environment), but install it on the HN (Hardware Node)

Load the new driver and restart the network:
Code: Select all
modprobe -r forcedeth; service network restart



dmesg:
Code: Select all
ACPI: PCI interrupt for device 0000:00:0a.0 disabled
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.62-Driver Package V1.25.
PCI: Enabling device 0000:00:0a.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LMAC] -> GSI 10 (level, low) -> IRQ 10
PCI: Setting latency timer of device 0000:00:0a.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:82b3 bound to 0000:00:0a.0
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: link up.
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready



PS. thanks to Alan and the elrepo team!
paix
 
Posts: 6
Joined: 2009/06/19 13:16:13

Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

Postby paix » 2009/08/26 13:00:23

PS. NEVER use the nvidia NICs on a production servers.
paix
 
Posts: 6
Joined: 2009/06/19 13:16:13

Next

Return to CentOS 5 - Networking Support

Who is online

Users browsing this forum: No registered users and 0 guests