CentOS Icon CentOS Logo
CentOS Text
   
  
www.centos.org Forum Index
   CentOS 5 - Networking Support
  CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)

 

 Bottom   Previous Topic   Next Topic
  •  Rate Thread
      Rate this Thread
      Excellent
      Good
      Average
      Bad
      Terrible
Poster Thread Rated:  2 Votes
  •  paix
      paix
CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#1
Newbie
Joined: 2009/6/19
From
Posts: 6
Hi all.

I have a strange problems with the network card.
Under high network traffic a NIC freeze the server.
You could see screenshot from IPKVM here: http://paix.org.ua/tmp/panik_260509.jpg

I am running a openvz kernel, which is based on the current RHEL5 kernel.

#uname -a
Linux domain 2.6.18-128.1.1.el5.028stab062.3 #1 SMP Sun May 10 18:54:51 MSD 2009 x86_64 x86_64 x86_64 GNU/Linux


from dmesg:
00:0a.0 Ethernet controller: nVidia Corporation MCP67 Ethernet (rev a2)
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:82b3 bound to 0000:00:0a.0


# lspci 
00:00.0 RAM memory: nVidia Corporation MCP67 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP67 ISA Bridge (rev a2)
00:01.1 SMBus: nVidia Corporation MCP67 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation MCP67 OHCI USB 1.1 Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation MCP67 EHCI USB 2.0 Controller (rev a2)
00:04.0 USB Controller: nVidia Corporation MCP67 OHCI USB 1.1 Controller (rev a2)
00:04.1 USB Controller: nVidia Corporation MCP67 EHCI USB 2.0 Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation MCP67 IDE Controller (rev a1)
00:07.0 Audio device: nVidia Corporation MCP67 High Definition Audio (rev a1)
00:08.0 PCI bridge: nVidia Corporation MCP67 PCI Bridge (rev a2)
00:09.0 IDE interface: nVidia Corporation MCP67 AHCI Controller (rev a2)
00:0a.0 Ethernet controller: nVidia Corporation MCP67 Ethernet (rev a2)
00:0b.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0c.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0d.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0e.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:0f.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:10.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:11.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2)
00:12.0 VGA compatible controller: nVidia Corporation GeForce 7050 PV / nForce 630a (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control


Recently I have got too panics, when I tested the network by iperf.

kernel booted with
irqpoll nousb noapic


screen from ipkvm: http://paix.org.ua/tmp/panic_190609.jpg

kernel booted with
nousb noapic

and

alias eth0 forcedeth
options forcedeth optimization_mode=1


screen from ipkvm: http://paix.org.ua/tmp/panic2_190609.jpg

Also there is one interesting oops in log/messages:

kernel: skb_over_panic: text:ffffffff881bf46f len:15398 put:15398 head:ffff8100a25c5800 data:ffff8100a25c5810 tail:ffff8100a25c9436 end:ffff8100a25c5e80 dev:eth0
kernel: ----------- [cut here ] --------- [please bite here ] ---------
kernel: Kernel BUG at net/core/skbuff.c:96


Also I have a kernel booted with crashkernel=128M@16M option, and I have kdump running, but unfortunately there are no any saved core. I couldn't reboot server via ipkvm too, so I've requested a support to hardware reboot the server.

dmesg:
http://paix.org.ua/tmp/dmesg_190609.txt

PS. I understand that this driver "forcedeth.c: Reverse Engineered nForce ethernet driver" is very very unstable and unpredictable, but I have such server under Freebsd 7, and a NIC on the FreeBSD7 (by nfe driver) works normal. Also this server has a 1U case, so at the moment I can't switch to a normal external NIC...

Thanks in advice
Posted on: 2009/6/19 13:46
Create PDF from Post Print
Top
  •  AlanBartlett
      AlanBartlett
Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#2
Moderator
Joined: 2007/10/22
From ~/Earth/UK/England/Suffolk
Posts: 9188
Quote:
I am running a openvz kernel

That is unfortunate. If you were running the distributed kernel-2.6.18-128.1.14.el5 I would suggest that you install the kABI tracking kmod-forcedeth package from the ELRepo repository which should then resolve your problem.

As you are using a modified kernel that does not preserve Red Hat's established kABI, this solution is not open to you.
_________________
Alan

100% Unix & Linux. Co-founder of the ELRepo Project.
Posted on: 2009/6/19 14:24
Create PDF from Post Print
Top
  •  paix
      paix
Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#3
Newbie
Joined: 2009/6/19
From
Posts: 6
Quote:

AlanJBartlett wrote:
Quote:
I am running a openvz kernel

That is unfortunate. If you were running the distributed kernel-2.6.18-128.1.14.el5 I would suggest that you install the kABI tracking kmod-forcedeth package from the ELRepo repository which should then resolve your problem.
As you are using a modified kernel that does not preserve Red Hat's established kABI, this solution is not open to you.


Thanks for a reply.
I have downloaded the src.rpm of kmod-forcedeth and rebuilded it by mock for my kernel version.
The procedure is quite simple:
mock -r centos-5-x86_64 --no-clean  --rebuild forcedeth-kmod-0.62-1.25.1.el5.elrepo.src.rpm


and now I have
http://paix.org.ua/tmp/kmod-forcedeth-0.62-1.25.1.el5.elrepo.x86_64.rpm

I'm interesting what should I do to activate this module ?
Does it activate automatically after installation ?

And what a procedure for updating ?
I.e. after installing a new openvz kernel, I should also rebuild the forcedeth-kmod and only after this reboot the server ?

Thanks!
Posted on: 2009/6/19 15:17
Create PDF from Post Print
Top
  •  AlanBartlett
      AlanBartlett
Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#4
Moderator
Joined: 2007/10/22
From ~/Earth/UK/England/Suffolk
Posts: 9188
Quote:
I have downloaded the src.rpm of kmod-forcedeth and rebuilded it by mock for my kernel version.

Very resourceful.

I'm interesting what should I do to activate this module ?
Does it activate automatically after installation ?

And what a procedure for updating ?
I.e. after installing a new openvz kernel, I should also rebuild the forcedeth-kmod and only after this reboot the server ?

After installing it (rpm -ivh kmod-forcedeth-*.rpm), check that the driver module is present under your /lib/modules/`uname -r`/extra/ directory and that a /etc/depmod.d/forcedeth.conf file now exists.

In general, you should stop the network service (service network stop), unload the older version of the driver from the kernel (modprobe -r forcedeth) and then restart the network (service network start). Often, people prefer to take the simpler option of rebooting the system.

You will need to manually update -- i.e. do all of the above -- for each and every new kernel that is installed. That is unless those openvz kernels do retain their own ABI consistency. (You could check for that by executing a find /lib/modules -name forcedeth.ko command after a new kernel has been installed. If there is a symbolic link from the new kernel's weak-updates/ directory to the extra/ directory of this, your current kernel, it would appear that the openvz kernels do retain some degree of ABI consistency.)
_________________
Alan

100% Unix & Linux. Co-founder of the ELRepo Project.
Posted on: 2009/6/19 15:43
Create PDF from Post Print
Top
  •  paix
      paix
Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#5
Newbie
Joined: 2009/6/19
From
Posts: 6
Alan, thanks for a help.

storm ~ # rpm -ql kmod-forcedeth
/lib/modules/2.6.18-128.1.1.el5.028stab062.3
/lib/modules/2.6.18-128.1.1.el5.028stab062.3/extra
/lib/modules/2.6.18-128.1.1.el5.028stab062.3/extra/forcedeth
/lib/modules/2.6.18-128.1.1.el5.028stab062.3/extra/forcedeth/forcedeth.ko


storm ~ # cat /etc/depmod.d/forcedeth.conf
override forcedeth * weak-updates/forcedeth


storm ~ # modprobe -r forcedeth; service network restart
Shutting down interface venet0:  Shutting down interface venet0: 
                                                           [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Disabling IPv4 packet forwarding:  net.ipv4.ip_forward = 0
                                                           [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]
Bringing up interface venet0:  Bringing up interface venet0: 
Configuring interface venet0: 
net.ipv4.conf.venet0.send_redirects = 0
                                                           [  OK  ]


#last dmesg:
ACPI: PCI interrupt for device 0000:00:0a.0 disabled
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.62-Driver Package V1.25.
PCI: Enabling device 0000:00:0a.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LMAC] -> GSI 10 (level, low) -> IRQ 10
PCI: Setting latency timer of device 0000:00:0a.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:82b3 bound to 0000:00:0a.0
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: link up.
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present


So, look likes all fine

Also I have
storm ~ # cat /etc/modprobe.conf
alias eth0 forcedeth
options forcedeth optimization_mode=1


What you suggestions as for forcedeth optimization_mode ?

I'm planning to do a stress test on Monday.
Because today is the evening of Friday (in central Europe) so it's time to drink beer
Thanks a lot!
Posted on: 2009/6/19 16:07
Create PDF from Post Print
Top
  •  AlanBartlett
      AlanBartlett
Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#6
Moderator
Joined: 2007/10/22
From ~/Earth/UK/England/Suffolk
Posts: 9188
Quote:
Thanks a lot!

You're welcome.

Quote:
Because today is the evening of Friday (in central Europe) so it's time to drink beer

What a good idea!

Quote:

What you suggestions as for forcedeth optimization_mode ?

We, the ELRepo Admin team, did not write the actual driver code -- we have just packaged the most recent version that is available.

As I do not have a system with such a NIC, I've never used, let alone experimented, with that driver.

Let's have a look at the configurable parameters --

$ modinfo -F parm forcedeth.ko
tagging_8021pq:802.1pq tagging is enabled by setting to 1 and disabled by setting to 0.
wol:Wake-On-Lan is enabled by setting to 1 and disabled by setting to 0.
dma_64bit:High DMA is enabled by setting to 1 and disabled by setting to 0.
rx_flow_control:Rx flow control is enabled by setting to 1 and disabled by setting to 0.
tx_flow_control:Tx flow control is enabled by setting to 1 and disabled by setting to 0.
rx_ring_size:Rx ring size. Maximum value of 1024 or 16384 depending on hardware.
tx_ring_size:Tx ring size. Maximum value of 1024 or 16384 depending on hardware.
rx_checksum_offload:Rx checksum offload is enabled by setting to 1 and disabled by setting to 0.
tx_checksum_offload:Tx checksum offload is enabled by setting to 1 and disabled by setting to 0.
mtu:MTU value. Maximum value of 1500 or 9100 depending on hardware.
tso_offload:TCP Segmentation offload is enabled by setting to 1 and disabled by setting to 0.
scatter_gather:Scatter gather is enabled by setting to 1 and disabled by setting to 0.
autoneg:PHY autonegotiate is enabled by setting to 1 and disabled by setting to 0.
speed_duplex:PHY speed and duplex settings. Auto = 0, 10mbps half = 1, 10mbps full = 2, 100mbps half = 3, 100mbps full = 4, 1000mbps full = 5.
msix:MSIX interrupts are enabled by setting to 1 and disabled by setting to 0.
msi:MSI interrupts are enabled by setting to 1 and disabled by setting to 0.
poll_interval:Interval determines how frequent timer interrupt is generated by [(time_in_micro_secs * 100) / (2^10)]. Min is 0 and Max is 65535.
optimization_mode:In throughput mode (0), every tx & rx packet will generate an interrupt. In CPU mode (1), interrupts are controlled by a timer.
max_interrupt_work:forcedeth maximum events handled per interrupt
lowpowerspeed:Low Power State Link Speed enable by setting to 1 and disabled by setting to 0

Quite a few to experiment with.

Quote:
I'm planning to do a stress test on Monday.

Good luck.
_________________
Alan

100% Unix & Linux. Co-founder of the ELRepo Project.
Posted on: 2009/6/19 16:59
Create PDF from Post Print
Top
  •  paix
      paix
Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#7
Newbie
Joined: 2009/6/19
From
Posts: 6
This drivers (0.62-Driver) looks better then centos's default forcedeth.
but I've got the panic all the same. Server freeze completely.

The screenshot from IPKVM here: http://paix.org.ua/tmp/panic_220609.jpg
Unfortunately there isn't any interesting info to identify the problem.

I've stressed the NIC by iperf package (from epel. Description: Iperf is a tool to measure maximum TCP bandwidth)
The tests was running in both directions. On the fourth test the NIC freeze the server.

Kernel was loaded with

 kernel /vmlinuz-2.6.18-128.1.1.el5.028stab062.3 ro root=/dev/VolGroupSys/LogVolRoot crashkernel=128M@16M nousb noapic debug=2


and
options forcedeth optimization_mode=1



Looks like I should by a new case and a intel NIC.....
I very tired to debug a working server with this nForce NIC. It was a big mistake to build a server on nvidia hardware
Posted on: 2009/6/22 11:02
Create PDF from Post Print
Top
  •  AlanBartlett
      AlanBartlett
Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#8
Moderator
Joined: 2007/10/22
From ~/Earth/UK/England/Suffolk
Posts: 9188
I had a suspicion that even with the latest available version of the forcedeth driver, you would probably not be satisfied with the NIC's performance.

Quote:
Looks like I should by a new case and a intel NIC.....

In all seriousness, that would be the best solution.

Quote:
I very tired to debug a working server with this nForce NIC. It was a big mistake to build a server on nvidia hardware

I would agree with your latter statement. The nVidia GPUs and their proprietary driver software are acceptable but the other nVidia chip-sets leave a lot to be desired.
_________________
Alan

100% Unix & Linux. Co-founder of the ELRepo Project.
Posted on: 2009/6/22 14:30
Create PDF from Post Print
Top
  •  paix
      paix
Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#9
Newbie
Joined: 2009/6/19
From
Posts: 6
Just in case it's may be interesting.

Driver from elrepo works normal.
The system was freezed under a stress tests, but in normal mode it works.

Updating the driver in case a custom kernel. ( Openvz kernel in my case.)

# uname -r
2.6.18-128.1.1.el5.028stab062.3

yum update
shutdown -r now


The system boots and works via standart forcedeth driver:
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.


Then login to the system and rebuild the forcedeth driver:
# uname -r
2.6.18-128.2.1.el5.028stab064.4


I've wrote a simple script to updating..:

#!/bin/sh
# script for updating centos custom nve driver

drv="forcedeth"
link=`curl http://elrepo.org/linux/elrepo/el5/SRPMS/ 2>/dev/null | grep $drv` 

drvver=`echo $link |  awk -F\" '{print $2}'`
echo $drvver

if [ !  -f $drvver ]; then
        wget http://elrepo.org/linux/elrepo/el5/SRPMS/$drvver
        echo
fi

#mock -r centos-5-x86_64  --no-clean  --rebuild $drvver
mock -r centos-5-x86_64 --debug --no-clean  --rebuild $drvver


this scripts download latest forcedeth driver from elrepo and rebuild them.

Installing the driver:
rpm -Uvh  /vz/root/1000/var/lib/mock/centos-5-x86_64/result/kmod-forcedeth-0.62-1.25.2.el5.elrepo.x86_64.rpm

Note: I've built the driver in VE (Virtual Environment), but install it on the HN (Hardware Node)

Load the new driver and restart the network:
modprobe -r forcedeth; service network restart



dmesg:
ACPI: PCI interrupt for device 0000:00:0a.0 disabled
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.62-Driver Package V1.25.
PCI: Enabling device 0000:00:0a.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LMAC] -> GSI 10 (level, low) -> IRQ 10
PCI: Setting latency timer of device 0000:00:0a.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:82b3 bound to 0000:00:0a.0
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: link up.
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready



PS. thanks to Alan and the elrepo team!
Posted on: 2009/8/26 12:57
Create PDF from Post Print
Top
  •  paix
      paix
Re: CentOS 5.3 nvidia nForce network bugs, kernel panics (forcedeth, MCP67)
#10
Newbie
Joined: 2009/6/19
From
Posts: 6
PS. NEVER use the nvidia NICs on a production servers.
Posted on: 2009/8/26 13:00
Create PDF from Post Print
Top
 Top   Previous Topic   Next Topic

 


 You cannot start a new topic.
 You can view topic.
 You cannot reply to posts.
 You cannot edit your posts.
 You cannot delete your posts.
 You cannot add new polls.
 You cannot vote in polls.
 You cannot attach files to posts.
 You cannot post without approval.




"Linux" is a registered trademark of Linus Torvalds. | All other trademarks are property of their respective owners. | All other content is Copyright @ 2004-2009 by the CentOS Project or "each individual contributor (forums, comments, etc.) unless otherwise assigned".| Theme based on a theme by 7dana.com