Bonding mode4/802.3ad/lacp behavior issue

DGhost · Post by **DGhost** » 2017/10/06 13:28:07

Greetings!

I have on my centos 7.3.1611 server a weird behavior with my bonding mode (lacp) for a dual port 1 GBps NIC ethernet card ; Intel 82546 Gigabit Ethernet Controller (PCI-X connected in a PCI slot). The bonding is done in mode 4 with 802.3ad and yes, my switch also support this mode and the trunk has been created on the switch and it does work for communication with my server.

The weird part is that all traffic only goes through one port of the NIC and not both ports. When I look at the NIC stats here's what I got.

Device Received Sent Err/Drop
enp6s2f0 836.77 GiB 2.74 TiB 0/58176
enp6s2f1 283.65 MiB 7.52 MiB 0/0
bond0 837.04 GiB 2.74 TiB 0/58179

Bond0 is made of enp6s2f0 and f1. All traffic goes through f0. The stats of the traffic on f1 that you see has been done before the 2 ports where bonded.

I'm expecting that the traffic should be spread accros both ethernet port and yet it is not. Just beside my linux server I have a Windows Server 2012 R2 with 2 NIC in 802.3ad mode and when I check the stats of each NIC, I can clearly see that the traffic on this trunk is spread accros both ports. The only difference between my Linux server and my Windows server is that the Windows server use 2 different NIC (onboard and PCI) to create the bonding/teaming.

I also do have a NAS (Netgear ReadyNAS 104) on that same switch with dual ethernet ports and supporting the lacp mode. My switch show clearly that the traffic from the NAS is also shared between the 2 ports.

So to recapitulate, I have a windows server, a NAS and a linux server each configured in lacp mode with dual 1 Gbps ports. Only the linux server is not sharing the traffic between both ethernet ports.

The only thing that I can think of is a problem with the dual port Intel NIC card. I will try that card in a Windows Server and create a teaming to see if the problem is reproduced.

Or I'm also thinking about trying the teaming mode of CentOS instead of the bonding.

Any suggestions for this behavior or is normal with CentOS?

Thx for all the input about this.

Post by **TrevorH** » 2017/10/06 14:53:27

Can you post the contents of /proc/net/bonding/bond0 ?

DGhost · Post by **DGhost** » 2017/10/06 15:44:02

Code: Select all

$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable

Slave Interface: enp6s2f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1a:64:a8:22:c2
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

Slave Interface: enp6s2f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1a:64:a8:22:c3
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1

DGhost · Post by **DGhost** » 2017/10/06 15:50:59

Hey Trevor, after posting the info you asked, I went and double check the MAC address and the ports membership of the trunk in the switch, It seems that the dead port of the bond0 is not connected in the proper port of the switch for the trunk membership. If that is so, I feel stupid, I'll double check later on when I'll be on site beside the switch and the server...

Post by **TrevorH** » 2017/10/06 16:05:47

Yeah, that's not set up right - any time you see two different Aggregator Ids listed on the links for the same bind, it's most likely a switch config problem.

DGhost · Post by **DGhost** » 2017/10/06 18:42:33

So yeah, I confirm that it was just a mixed up problem of the ports in the switch. The third port of my linux server was in the wrong place of the membership on the trunk in the switch. After correction, I still had some weird issue with MAC address, so I simply deleted the original trunk and recreated it. Now the trunk membership issue has been resolved.

So when I check the bond 0

Code: Select all

 cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable

Slave Interface: enp6s2f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1a:64:a8:22:c2
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

Slave Interface: enp6s2f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:1a:64:a8:22:c3
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

I have the proper ID's for the Aggregator.

I just did a test for copying a 15 GB files from the server and I also launched a copy to the server, here's the results ;

Device Received Sent Err/Drop
enp6s2f0 2.34 MiB 27.07 GiB 0/0 ---> data being copied from the server
enp6s2f1 3.40 GiB 9.48 MiB 0/3 ---> data being copied on the server
bond0 3.40 GiB 27.07 GiB 0/3

I've got some mixed feeling about this because I was expecting that the first copy from the server to the NAS would use both ethernet port but only 1 port was used. Still, when I was also copying something on the server at the same time, I can now see that the server received the data on the second port, which was not happening before. So it's an improvement.

Is this behavior normal?

Post by **TrevorH** » 2017/10/06 23:26:25

Bonding chooses which slave to use based on a hash policy. Read /usr/share/doc/kernel-doc-3.10.0/Documentation/networking/bonding.txt - part of the kernel-doc package and see if that helps.

hunter86_bg · Post by **hunter86_bg** » 2017/10/10 16:30:01

If you want "load-balancing for LACP support" - that is supported only in teaming.
You can check the differences here.

CentOS

Bonding mode4/802.3ad/lacp behavior issue

Bonding mode4/802.3ad/lacp behavior issue

Re: Bonding mode4/802.3ad/lacp behavior issue

Re: Bonding mode4/802.3ad/lacp behavior issue

Re: Bonding mode4/802.3ad/lacp behavior issue

Re: Bonding mode4/802.3ad/lacp behavior issue

Re: Bonding mode4/802.3ad/lacp behavior issue

Re: Bonding mode4/802.3ad/lacp behavior issue

Re: Bonding mode4/802.3ad/lacp behavior issue