I'm trying to track down a problem with a set of machines. They are all using Trenton single-board computers, and they are all having issues. I will post what I have.
The first set are running Trenton XPTs with dual 3.06GHz Xeons, 4GB of RAM. They have dual onboard Intel Gigabit Ethernet interfaces (running the E1000 driver). It has an offboard Adaptec 2110S board which is running the RAID array (RAID 1 mirroring with a hot spare using the i2o driver).
The system is running CentOS 4.2 with a custom-compiled 2.6.9 kernel built from the RPM.
At random times, when copying large amounts of data over to the drive(s), the networking will lock up, and ifconfig will show something similar to the following:
eth0 Link encap:Ethernet HWaddr 00:10:6F:00:CE:1D
inet addr:192.168.2.5 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:26895304 errors:4286972686 dropped:4292302426 overruns:4293634861 frame:4290969991
TX packets:13258020 errors:4292302426 dropped:0 overruns:0 carrier:4290969991
collisions:4293634861 txqueuelen:1000
RX bytes:4089800891 (3.8 GiB) TX bytes:960396945 (915.9 MiB)
Base address:0xe880 Memory:fdda0000-fddc0000
eth1 Link encap:Ethernet HWaddr 00:10:6F:00:CE:1C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4294966531 errors:4294962706 dropped:4294965766 overruns:4294966531 frame:4294965001
TX packets:4294966531 errors:4294965766 dropped:0 overruns:0 carrier:4294965001
collisions:4294966531 txqueuelen:1000
RX bytes:4294966531 (3.9 GiB) TX bytes:4294966531 (3.9 GiB)
Base address:0xec00 Memory:fddc0000-fdde0000
This has occurred on two boxes we are trying to bring up in production, so we set up a test box. The test box originally had an IDE drive and no RAID card, and I was unable to repeat the problem. As soon as we replaced the IDE drive with the 2110S and a SCSI drive, the problem occurred within minutes of my attempting to dump data to it (I was using approximately 7GB of test data).
Has anyone ever seen or heard of interactions between the E1000 and i2o drivers? I have been searching on google for the past 2 or 3 days, and haven't found anything. Right now, we are going to put the IDE drive back in the test box and leave the Adaptec controller in there, but try writing to the IDE to see whether the problem is with the
presence of the Adaptec or whether it is an issue of
writing to the SCSI drive.
Any suggestions would be deeply appreciated.
--Storm