10GbE MTU 9000 broken after recent update

Issues related to configuring your network
cekim
Posts: 9
Joined: 2014/12/02 22:55:02

10GbE MTU 9000 broken after recent update

Postby cekim » 2017/11/14 07:40:50

As update various internal machines to the 3.10.0-5xx and newer kernels I finding MTU=9000 for 10GbE interfaces no longer works. I've had this network of 8 machines on a 10GbE switch working for more than a year in this config against older kernels and CentOS7 installs. Yum update broke a working configuration.

It drops the connection and goes back to 1500, then tries to set 9000 again and cycles endlessly never bringing the device (Intel 540T2) back up to full function.

The workaround is to reduce the MTU to 1500 to restore connectivity/function, but this degrades NFS performance (and I'm using NFS in a manner where this throughput drop is noticeable - fast disk arrays servicing highly parallelized code able to saturate the link with read and/or write).

Anyone know what's causing this and how to fix it?

pjsr2
Posts: 168
Joined: 2014/03/27 20:11:07

Re: 10GbE MTU 9000 broken after recent update

Postby pjsr2 » 2017/11/14 11:39:00

Current kernel is 3.10.0-693.5.2, not 3.10.0-5xx. Does this problem show on the current kernel version?

cekim
Posts: 9
Joined: 2014/12/02 22:55:02

Re: 10GbE MTU 9000 broken after recent update

Postby cekim » 2017/11/14 15:45:12

pjsr2 wrote:Current kernel is 3.10.0-693.5.2, not 3.10.0-5xx. Does this problem show on the current kernel version?

Yes, as well as 4.x (now 4.14) kernels from elrepo (4.14.0-1.el7.elrepo.x64_64)

I first noticed it around the 5xx kernel update, but to be honest, it might have happened earlier as these machines don't get regular updates. They live on an internal net with very limited access, no browsing, etc... So, they only get updates when there is a problem or new feature needed.

BTW, this what this looks like in /var/log/messages:

Nov 14 02:23:35 xxxxx kernel: ixgbe 0000:01:00.0 enp1s0f0: changing MTU from 9000 to 1500
Nov 14 02:23:46 xxxxx kernel: ixgbe 0000:01:00.0 enp1s0f0: changing MTU from 1500 to 9000
Nov 14 02:23:51 xxxxx kernel: ixgbe 0000:01:00.0 enp1s0f0: changing MTU from 9000 to 1500
Nov 14 02:23:53 xxxxx kernel: ixgbe 0000:01:00.0 enp1s0f0: changing MTU from 1500 to 9000
Nov 14 02:23:57 xxxxx kernel: ixgbe 0000:01:00.0 enp1s0f0: changing MTU from 9000 to 1500

and so on and so forth - no other indication of failure. No dropped packets reported in ifconfig, just never actually connects long enough to function.

cekim
Posts: 9
Joined: 2014/12/02 22:55:02

Re: 10GbE MTU 9000 broken after recent update

Postby cekim » 2017/11/14 15:55:18

A little more info - I have 2 of those machines still running MTU=9000 on the following kernel and yum update "vintage":
3.10.0-514.10.2.el7.x86_64

Connected to the same switch - and I did swizzle all the in cables on the switch at one point (as well as swap 10GbE cards in one case), so I don't think this is a hardware issue.
ens6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet xx.xx.xx.xx netmask 255.255.255.0 broadcast xx.xx.xx.255
inet6 xxxx.xxxx.xxxx.xxx prefixlen 64 scopeid 0x20<link>
ether xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet)
RX packets 5123262 bytes 6837407152 (6.3 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8144582 bytes 43312233303 (40.3 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

MTU=9000 added to ifconfig-<ifname> in all cases as a the means of setting this configuration.

User avatar
TrevorH
Forum Moderator
Posts: 20996
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: 10GbE MTU 9000 broken after recent update

Postby TrevorH » 2017/11/14 16:07:57

All 3.10.0-514 kernels belong to 7.3 which is not the current version. Only the current version gets updates and fixes. You need to yum update to 7.4.
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

cekim
Posts: 9
Joined: 2014/12/02 22:55:02

Re: 10GbE MTU 9000 broken after recent update

Postby cekim » 2017/11/14 16:15:07

TrevorH wrote:All 3.10.0-514 kernels belong to 7.3 which is not the current version. Only the current version gets updates and fixes. You need to yum update to 7.4.

514/7.3 is the last config I know that works.

All the machines I presently have on newer kernels and 7.4 are the ones that fail with MTU=9000.

Specifically:
3.10.0-693.5.2.el7.x86_64
AND
4.14.0-1.el7.elrepo.x86_64

both of those fail the same way on 3 machines so far.

p.s. given the lack of errors reported, I'm thinking this is a systemD or configuration issue on my part causing a loop. I just don't know which or where.

pjsr2
Posts: 168
Joined: 2014/03/27 20:11:07

Re: 10GbE MTU 9000 broken after recent update

Postby pjsr2 » 2017/11/14 19:10:53

Do you have a switch or router in your network that blocks ICMP messages?

The NFS protocol likes to have messages of around 8500 bytes for optimal performance. Normally path MTU discovery (PMTUD) should automatically increase the MTU, but that requires that the ICMP messages that signal that messages are too big and notify unwanted fragmentation can find their way back to the sender. When PMTUD works properly, there should be no need to tweak your MTU manually.

https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html gives a nice explanation, although a bit geared to IPSEC tunnels.

IPv6 handles fragmentation in a completely different manner. Is switching to IPv6 an option?

cekim
Posts: 9
Joined: 2014/12/02 22:55:02

Re: 10GbE MTU 9000 broken after recent update

Postby cekim » 2017/11/14 19:55:53

pjsr2 wrote:Do you have a switch or router in your network that blocks ICMP messages?

The NFS protocol likes to have messages of around 8500 bytes for optimal performance. Normally path MTU discovery (PMTUD) should automatically increase the MTU, but that requires that the ICMP messages that signal that messages are too big and notify unwanted fragmentation can find their way back to the sender. When PMTUD works properly, there should be no need to tweak your MTU manually.

https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html gives a nice explanation, although a bit geared to IPSEC tunnels.

IPv6 handles fragmentation in a completely different manner. Is switching to IPv6 an option?

No blockage - working among older kernel machines still presently connected. They and the server were fine with 9000 and empirically had higher throughput as a result. After updating them, I had seen 2 clients of that server start doing this config loop and dropped them back to 1500 to stop that and retain connectivity while the clients with the older kernels continued to function and produce higher NFS throughput. The update was to address driver issues with megaraid (3.10.x driver resets itself spuriously every now and then disrupting long - multi-day - simulation runs while the NFS server is unresponsive as a result of the megaraid bug).

So, I updated the server and it did the same thing (loop on 9000/1500/9000/1500 config switch).

The infrastructure here is really simple - a netgear 8-port 10GbE prosafe switch and 8 machines connected directly to it.

As for v6 - yes, that is a possibility (see above - very simple system), but was avoiding it as I'd have to stop ignoring v6 and and set it all up. Until now, there was no known benefit to me and it would still require verification that it does the right thing and produces the desired throughput.

Since this appears to be a configuration or systemd sequencing error, it seems like it should be easily remedied if I could figure out who/what is initiating the re-configuration.

pjsr2
Posts: 168
Joined: 2014/03/27 20:11:07

Re: 10GbE MTU 9000 broken after recent update

Postby pjsr2 » 2017/11/14 20:51:00

Are you using DHCP? Relation to https://github.com/coreos/bugs/issues/1827

cekim
Posts: 9
Joined: 2014/12/02 22:55:02

Re: 10GbE MTU 9000 broken after recent update

Postby cekim » 2017/11/14 21:53:32

pjsr2 wrote:Are you using DHCP? Relation to https://github.com/coreos/bugs/issues/1827

No, static IP. 10G traffic isn't routed out either - no bridge to other sub-nets or even DNS - just the Netgear switch and those machines.

hmmm, I did not explicitly disable ipv6 - maybe that's what's changing things?