Infiniband ipoib not work after OS/kernel update

Issues related to configuring your network
Post Reply
NOPA
Posts: 6
Joined: 2015/09/29 13:35:33

Infiniband ipoib not work after OS/kernel update

Post by NOPA » 2015/09/29 13:48:29

After OS update to 6.6, ipoib does not work.

cat /var/log/opensm.log

Code: Select all

OpenSM: Got signal 15 - exiting...
Exiting SM

Sep 29 14:57:11 118835 [3B68D700] 0x80 -> Exiting SM
Sep 29 14:57:15 653286 [610C7700] 0x03 -> OpenSM 3.3.17
OpenSM 3.3.17

Sep 29 14:57:15 653364 [610C7700] 0x80 -> OpenSM 3.3.17
Sep 29 14:57:15 680874 [610C7700] 0x01 -> subn_validate_neighbor: ERR 7518: neighbor does not point back at us (guid: 0x003048ffffa10132, port 1)
Sep 29 14:57:15 680904 [610C7700] 0x01 -> subn_validate_neighbor: ERR 7518: neighbor does not point back at us (guid: 0x003048ffffa10029, port 17)
Using default GUID 0x3048ffffa10416
Entering DISCOVERING state

Sep 29 14:57:15 681048 [610C7700] 0x02 -> osm_vendor_init: 1000 pending umads specified
Sep 29 14:57:15 696033 [610C7700] 0x80 -> Entering DISCOVERING state
Entering STANDBY state

Sep 29 14:57:15 696212 [610C7700] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0x3048ffffa10416
Sep 29 14:57:15 720252 [610C7700] 0x02 -> osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0x3048ffffa10416
Sep 29 14:57:15 720360 [610C7700] 0x02 -> osm_vendor_bind: Mgmt class 0x04 binding to port GUID 0x3048ffffa10416
Sep 29 14:57:15 720475 [610C7700] 0x02 -> osm_vendor_bind: Mgmt class 0x21 binding to port GUID 0x3048ffffa10416
Sep 29 14:57:15 720566 [610C7700] 0x02 -> osm_opensm_bind: Setting IS_SM on port 0x003048ffffa10416
Sep 29 14:57:15 723948 [598CE700] 0x80 -> Entering STANDBY state
ibqueryerrors

Code: Select all

Errors for "MT25218 InfiniHostEx Mellanox Technologies"
   GUID 0x3048ffffa10152 port 1: [PortXmitDiscards == 2] [VL15Dropped == 2]
Errors for "MT25218 InfiniHostEx Mellanox Technologies"
   GUID 0x3048ffffa10426 port 1: [PortXmitDiscards == 2] [VL15Dropped == 2]
Errors for 0x3048ffffa10029 "MT47396 Infiniscale-III Mellanox Technologies"
   GUID 0x3048ffffa10029 port ALL: [LinkErrorRecoveryCounter == 2] [LinkDownedCounter == 3] [PortRcvSwitchRelayErrors == 55] [PortXmitDiscards == 1064]
   GUID 0x3048ffffa10029 port 11: [PortXmitDiscards == 45]
   GUID 0x3048ffffa10029 port 12: [LinkErrorRecoveryCounter == 2] [LinkDownedCounter == 2] [PortRcvSwitchRelayErrors == 24] [PortXmitDiscards == 1019]
   GUID 0x3048ffffa10029 port 13: [LinkDownedCounter == 1] [PortRcvSwitchRelayErrors == 31]
Errors for "MT25218 InfiniHostEx Mellanox Technologies"
   GUID 0x3048ffffa10416 port 1: [PortXmitDiscards == 2] [VL15Dropped == 3]

## Summary: 4 nodes checked, 4 bad nodes found
##          27 ports checked, 6 ports have errors beyond threshold
## Thresholds: 
## Suppressed:
Modules ib_mad and ib_umad are present.

What could happen?

User avatar
ChubaDuba
Posts: 45
Joined: 2015/08/16 06:06:41
Location: Rostov-on-Don
Contact:

Re: Infiniband ipoib not work after OS/kernel update

Post by ChubaDuba » 2015/09/29 16:18:08

I think the reason is that, after you upgrade, these modules are not working. Most likely you need to reinstall them again. See that you show command lsmod. If Used setting them equal to zero, they are not used by the kernel.

Post Reply