network bond has stopped working

Issues related to configuring your network
Post Reply
jamesprescott
Posts: 10
Joined: 2009/03/13 22:59:15

network bond has stopped working

Post by jamesprescott » 2018/08/07 00:11:42

I have a newly installed CentOS 7 server. During the manual install I configured 2 ethernet ports as a bond. It worked fine. The network came up as expected, I could connect to things and was able to ssh into the machine. I did a "yum update" and rebooted and it continued to work fine.

Now a few weeks later I did another "yum update" and now the network doesn't work after reboot.

ifconfig shows the bond0 interface as up with the right IP and settings; the default route is being set correctly. I can ping my own IP# but not the IP# of my router, or anything else.

ethtool shows link detected on the bond0 and both ethernet ports.

I've tried turning off firewalld and NetworkManager.

Unfortunately I only have IPMI access to the machine so I can't cut/paste. I've attached screenshots of some of the key info.

I also have an identical system. I tried installing it the same way but the network never came up. I assumed I had made some mistake but now I'm wondering if there is something wrong/fragile about how I am setting up the bond.

Any suggestions on what the problem is or how to diagnose it?
Attachments
bond 1
bond 1
badbond1.png (63.8 KiB) Viewed 2664 times

jamesprescott
Posts: 10
Joined: 2009/03/13 22:59:15

Re: network bond has stopped working

Post by jamesprescott » 2018/08/07 00:25:59

Here is some additional info. I tried to attach them to the initial post but they disappeared.
Attachments
badbond3.png
badbond3.png (40.82 KiB) Viewed 2661 times
badbond2.png
badbond2.png (61.2 KiB) Viewed 2661 times

User avatar
TrevorH
Site Admin
Posts: 33191
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: network bond has stopped working

Post by TrevorH » 2018/08/07 08:05:45

Look in /proc/net/bonding/bond0 for info about the status of the bond.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

jamesprescott
Posts: 10
Joined: 2009/03/13 22:59:15

Re: network bond has stopped working

Post by jamesprescott » 2018/08/07 14:21:56

Here is /proc/net/bonding/bond0
Attachments
/proc/net/bonding/bond0
/proc/net/bonding/bond0
badbond4.png (34.36 KiB) Viewed 2637 times

User avatar
TrevorH
Site Admin
Posts: 33191
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: network bond has stopped working

Post by TrevorH » 2018/08/07 16:04:47

That output is different to any of that that I get on a CentOS 6 or a 7 system. What is the output from uname -a ? But... it does show that both slaves are part of the bond and that there have been no failures.

I get a section near the top that looks like this on CentOS 7

Code: Select all

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: fx:ex:dx:cx:fx:xx
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2
        Actor Key: 15
        Partner Key: 32797
        Partner Mac Address: 0x:2x:0x:ex:bx:xx
[/quote]
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

jamesprescott
Posts: 10
Joined: 2009/03/13 22:59:15

Re: network bond has stopped working

Post by jamesprescott » 2018/08/07 16:35:28

The full uname and kernel info are at the top of the first screenshot. 3.10.0-862.9.1.el7.x86_64

Is "Active Aggregator" the switch? I wonder if that shortened section in the output is what you'd see if it was never able to talk with the switch?

I see link on the 2 bonded ports (and I don't see link on the other 2 ports) but I don't see anything to show I'm ever communicating with the switch.

According to "ip addr", bond0, eno3 & eno4 all have the same MAC. Is that expected?

larwood
Posts: 66
Joined: 2011/07/27 12:07:30
Location: Perth WA, Australia

Re: network bond has stopped working

Post by larwood » 2018/08/08 08:44:42

I would check with the Networks team to see if LACP is still configured on the switch for those two ports.

The Aggregator ID should match for the slaves. Yours doesn't and this means the bond is not aggregating.

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: network bond has stopped working

Post by hunter86_bg » 2018/08/13 04:08:07

90% when this happens, something is wrong on the switch. To test you can stop bond0 and both slaves. Then bring the first slave as standalone (new NetworkManageer profile) with the ip of the bond, if it works do it again for the second slave (stop the first one). If they both work, then the VLANs are OK and only the LACP settings on the switch(es) are wrong.

jamesprescott
Posts: 10
Joined: 2009/03/13 22:59:15

Re: network bond has stopped working

Post by jamesprescott » 2018/08/13 17:34:17

I working with our network team now. I think our problem is that the switch ports configured for the bond don't match up with the ports we are actually using.

I'll update the thread when I learn more.

jamesprescott
Posts: 10
Joined: 2009/03/13 22:59:15

Re: network bond has stopped working

Post by jamesprescott » 2018/08/13 21:33:53

The problem was with our network cabling. Once that was resolved everything started working. I could bring up the bond on both systems and was able to see the expected performance changes by turning the slaves up/down with ifconfig.

For completeness, here is what /proc/net/bonding/bond0 looks like on the working system. The Aggregator ID's now match. Both slaves and the bond have the same MAC address.

Turns out that while "cat /proc/net/bonding/bond0" works as a regular user, it provides lots more info when run as root/sudo.

Thanks for everyone help.

Code: Select all

$ sudo cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: xx:xx:xx:xx:xx:42
Active Aggregator Info:
        Aggregator ID: 15
        Number of ports: 2
        Actor Key: 9
        Partner Key: 1
        Partner Mac Address: xx:xx:xx:xx:xx:c0

Slave Interface: eno3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: xx:xx:xx:xx:xx:42
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: xx:xx:xx:xx:xx:42
    port key: 9
    port priority: 255
    port number: 1
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: xx:xx:xx:xx:xx:c0
    oper key: 1
    port priority: 32768
    port number: x3
    port state: 61

Slave Interface: eno4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: xx:xx:xx:xx:xx:43
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: xx:xx:xx:xx:xx:42
    port key: 9
    port priority: 255
    port number: 2
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: xx:xx:xx:xx:xx:c0
    oper key: 1
    port priority: 32768
    port number: x7
    port state: 61

Post Reply