KVM + VXLAN + 3.10.0-957.10.1.el7

Issues related to configuring your network
Post Reply
ScootNBagNz
Posts: 3
Joined: 2019/03/26 15:59:57

KVM + VXLAN + 3.10.0-957.10.1.el7

Post by ScootNBagNz » 2019/03/26 16:54:54

Hello,

I could not find a bug report or a forum post related to my issue so I am posting here. I do not know how to "dig deep" into the kernel to find out what is going on, so any troubleshooting help or guidance to where else to post is appreciated.

I am using OpenNebula, and I have a two KVM hosts with VLAN encapsulating VXLAN to facilitate communication between guests. Just by creating one guest on each host, and mounting the vNIC for the VXLAN on the guests throws this error. The issue is unlikely with OpenNebula, as I have worked around the issue.

The issue is my network is flooded with multicast packets when KVM brings up the guests, and the multicast packets continue until I migrate my guests onto one host and bounce the subinterface on the empty host. Multicast flood continues even if I delete all guests from the hosts!

I resolved the issue by upgrading the kernel to 4.4.177-1.el7.elrepo.x86_64. This makes me suspect it is the kernel. Looking at the NIC drivers, on both kernels lspci -k output shows:
  • Kernel driver in use: mlx5_core
  • Kernel modules: mlx5_core
So yeah, I think there is a bug somewhere in 3.10. I can see the effect of using the "bugged" kernel, but I am not sure how to truly expose it in order to write a detailed bug report. Thanks for the help!

User avatar
TrevorH
Forum Moderator
Posts: 26129
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: KVM + VXLAN + 3.10.0-957.10.1.el7

Post by TrevorH » 2019/03/26 17:18:38

What kernel were you running before the problem started to occur? Or is this a new set up installed with CentOS 7.6 to start with?

You can `yum downgrade kernel` to revert all the way back to the original 7.6 kernel-3.10.0-957.el7.x86_64. If you need to go back further than that to see if this is a new problem introduced with 7.6 then you can `yum --enablerepo=C7.5.1804\* downgrade kernel-3.10.0-862.14.4.el7.x86_64` (might also need --noplugins if you have configured repo priorities).
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

ScootNBagNz
Posts: 3
Joined: 2019/03/26 15:59:57

Re: KVM + VXLAN + 3.10.0-957.10.1.el7

Post by ScootNBagNz » 2019/03/28 16:39:56

Hello,

Thanks for the suggestions. I have narrowed down the first kernel that begins the issue.

There is no issue with 3.10.0-514.16.1.el7.x86_64.
The kernels 3.10.0-514.21.1.el7.x86_64 and on have the multicast flood issue. I assume only, I have tested 514.26.2.el7, 862.14.el7, 957.el7, 957.5 and 957.10, and saw the flood.

I have 'rpm -q --changelog kernel-3.10.0-514.21.1.el7.x86_64 | grep -i vxlan' pasted here:
https://pastebin.com/0g1evn96

There are a lot of vxlan changes. Can you help me narrow down this issue further?

The spammed packets that I see via tcmpdump are:
  • IP <KVMHOST2IP>.filenet-tms > 239.0.0.2.otv: OTV, flags (0x08), overlay 0, instance 2
  • IP <KVMHOST3IP>.filenet-tms > 239.0.0.2.otv: OTV, flags (0x08), overlay 0, instance 2
  • IP truncated-ip - 50 bytes missing! <KVMHOST3IP>.41381 > 239.0.0.2.otv: OTV, flags (0x08), overlay 0, instance 2
  • IP truncated-ip - 50 bytes missing! <KVMHOST2IP>.38244 > 239.0.0.2.otv: OTV, flags (0x08), overlay 0, instance 2


Further semi-related thoughts:
I am not sure what vxlan communicates in the background to facilitate communication between guests with these OTV packets (surely something analogous to "mac address / route discovery"), and communication between guests functions during multicast flood.

User avatar
TrevorH
Forum Moderator
Posts: 26129
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: KVM + VXLAN + 3.10.0-957.10.1.el7

Post by TrevorH » 2019/03/28 17:45:11

I would suggest raising a ticket on bugzilla.redhat.com about this. It'll be marked as a private bug as it's a kernel problem. Make sure you mention the 2 kernel versions that you checked where the problem did not happen and when it started. There were no public kernel versions between 3.10.0-514.16.1.el7 and 3.10.0-514.21.1.el7 so those are consecutive releases.
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

ScootNBagNz
Posts: 3
Joined: 2019/03/26 15:59:57

Re: KVM + VXLAN + 3.10.0-957.10.1.el7

Post by ScootNBagNz » 2019/03/28 17:47:55

Ok thanks!

Post Reply