bonding issue 3.10.0-693.2.2.el7.x86_64

Issues related to configuring your network
patrick90
Posts: 5
Joined: 2017/09/20 10:48:27

bonding issue 3.10.0-693.2.2.el7.x86_64

Postby patrick90 » 2017/09/20 11:30:21

Hi!

I'm currently testing a new CentOS 7 server and configured two active-backup bonds which worked quite fine under kernel version 3.10.0-514.26.2.el7.x86_64. However it seams that the update to kernel version 3.10.0-693.2.2.el7.x86_64 has broken both bonds. The bonds are configured as follows (both bonds are configured similar):

(I've also configured /etc/udev/rules.d/70-persistent-net.rules to rename my interfaces correctly)

/etc/sysconfig/network-scripts/ifcfg-link1:

Code: Select all

TYPE=Ethernet
HWADDR=XX:XX:XX:XX:XX:XX
DEVICE=link1
NM_CONTROLLED=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=yes

/etc/sysconfig/network-scripts/ifcfg-link2:

Code: Select all

TYPE=Ethernet
HWADDR=YY:YY:YY:YY:YY:YY
DEVICE=link2
NM_CONTROLLED=no
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=yes

/etc/sysconfig/network-scripts/ifcfg-bond1

Code: Select all

DEVICE=bond1
BRIDGE=bridge1
NM_CONTROLLED=no
BOOTPROTO=none
ONBOOT=yes
BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=link1 primary_reselect=better resend_igmp=5"

NetworkManager is disabled, I'm using the network service directly.

If I boot the server with kernel 3.10.0-514.26.2.el7.x86_64 everything works like expected. The bonds start correctly, the first physical interface is used as primary slave, the second is checked for 2 minutes before it is considered as up.

If I boot the server with kernel 3.10.0-693.2.2.el7.x86_64 the bonds are initialized and the first physical interface is used as primary slave. Allthough the second interface is shown as up in ip addr, it is marked as down in /proc/net/bonding/bond1 and in /var/log/messages the message

Code: Select all

Sep 20 12:19:52 server kernel: bond1: link status up for interface link2, enabling it in 120000 ms
Sep 20 12:19:52 server kernel: bond1: link status up for interface link2, enabling it in 120000 ms
Sep 20 12:19:52 server kernel: bond1: link status up for interface link2, enabling it in 120000 ms
Sep 20 12:19:52 server kernel: bond1: link status up for interface link2, enabling it in 120000 ms

is repeated over and over again. The message appears every 100ms which is configured as miimon interval. After 2 minutes link2 is still down in /proc/net/bonding/bond1 and the message keeps repeating.

Just in case, if this is a hardware/driver issue: I'm using a HP DL380 G9 with a Broadcom BCM5719 ethernet adapter.

Was there a change in the way the kernel handles bonds?

Many thanks in advance for all replies!

User avatar
TrevorH
Forum Moderator
Posts: 21211
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: bonding issue 3.10.0-693.2.2.el7.x86_64

Postby TrevorH » 2017/09/20 12:55:58

Please report upstream on bugzilla.redhat.com. I've seen a similar issue here at home with bonded connections using the Intel igb module so that probably means it's not the interface driver that's wrong but the bonding module itself. To fix mine I switched to using teamd instead but that has its own set of problems. My problems occurred with mode=4 (LACP) so it looks like it's bonding with multiple modes that is affected. For me the problem also totally disabled one of the two interfaces and the only way to get it back was to take down all interfaces, modprobe -r igb then reload it and restart networking :-(
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

patrick90
Posts: 5
Joined: 2017/09/20 10:48:27

Re: bonding issue 3.10.0-693.2.2.el7.x86_64

Postby patrick90 » 2017/09/22 08:34:29

Thanks for your reply TrevorH!
I've opened a bug report under
https://bugzilla.redhat.com/show_bug.cgi?id=1494410
(But it seams that it is currently only viewable to a "Private Group". This is my first bug report there.)

User avatar
TrevorH
Forum Moderator
Posts: 21211
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: bonding issue 3.10.0-693.2.2.el7.x86_64

Postby TrevorH » 2017/09/22 11:27:25

Yes, all kernel bugs raised on bugzilla automatically get made private. If you want specific other people to be able to read it, you can add their email addresses (the ones they use for their bugzilla account) to the list near the top of the bz on the right hand side.
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

patrick90
Posts: 5
Joined: 2017/09/20 10:48:27

Re: bonding issue 3.10.0-693.2.2.el7.x86_64

Postby patrick90 » 2017/09/25 08:43:36

I was marked as duplicate of bug 1469987. There was a fix reported in kernel-3.10.0-697.el7.
Additionally there is a solution in the Red Hat Knowledge Base under

https://access.redhat.com/solutions/3152981
Last edited by patrick90 on 2017/09/26 10:04:48, edited 1 time in total.

User avatar
TrevorH
Forum Moderator
Posts: 21211
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: bonding issue 3.10.0-693.2.2.el7.x86_64

Postby TrevorH » 2017/09/25 13:16:25

Workaround to add updelay=0 to BONDING_OPTS=. I'll try that later.
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

patrick90
Posts: 5
Joined: 2017/09/20 10:48:27

Re: bonding issue 3.10.0-693.2.2.el7.x86_64

Postby patrick90 » 2017/09/26 10:04:28

I suppose (untested) that it should work as long as updelay is smaller than miimon. But I don't think that small values for updelay are recommended for productive environments.

Lets see what kernel version 3.10.0-697.el7 changes on the situation when it releases for CentOS.

In the meantime I use kernel 3.10.0-514.26.2.el7.x86_64. This solves the problem completely for me.
Last edited by patrick90 on 2017/09/26 10:29:51, edited 1 time in total.

User avatar
TrevorH
Forum Moderator
Posts: 21211
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: bonding issue 3.10.0-693.2.2.el7.x86_64

Postby TrevorH » 2017/09/26 10:17:26

You won't see that kernel - it's a test one and will not be released as that version. If this gets fixed in 7.4 then it'll be in a 3.10.0-693.x kernel. I killed my teaming setup last night and reverted to old style bonding using updelay=0 and it works for me. Please note that it says that updelay must be exactly 0 not some other "small value".
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

patrick90
Posts: 5
Joined: 2017/09/20 10:48:27

Re: bonding issue 3.10.0-693.2.2.el7.x86_64

Postby patrick90 » 2017/09/26 10:31:21

Ok. Thanks for your information and testing.