I could not find a bug report or a forum post related to my issue so I am posting here. I do not know how to "dig deep" into the kernel to find out what is going on, so any troubleshooting help or guidance to where else to post is appreciated.
I am using OpenNebula, and I have a two KVM hosts with VLAN encapsulating VXLAN to facilitate communication between guests. Just by creating one guest on each host, and mounting the vNIC for the VXLAN on the guests throws this error. The issue is unlikely with OpenNebula, as I have worked around the issue.
The issue is my network is flooded with multicast packets when KVM brings up the guests, and the multicast packets continue until I migrate my guests onto one host and bounce the subinterface on the empty host. Multicast flood continues even if I delete all guests from the hosts!
I resolved the issue by upgrading the kernel to 4.4.177-1.el7.elrepo.x86_64. This makes me suspect it is the kernel. Looking at the NIC drivers, on both kernels lspci -k output shows:
- Kernel driver in use: mlx5_core
- Kernel modules: mlx5_core