Background:
I have a 6 year old, 10 node i86 cluster running Solaris 10 and each of the nodes have Mellanox Infiniband cards. I have a Qlogic 9024 infiniband switch that all nodes are connected to. 9 of the nodes were purchased initially and have (what I would call) 'Infiniband only' cards, in that the connectors on the back are the CX4 style. So those are CX4 cable to CX4 connector on the Qlogic. The 10th node was purchased a few years later (refurbished, so I don't know the age) and it has a QSFP+ (I believe) connector on the card and a cable that converts QSFP+ to CX4 and it plugs into the Qlogic.. Everything works as expected on the current setup.
The Project:
I have been tasked with adding three new nodes to the cluster and at the same time change everything from Solaris to CentOS. The new nodes are Dell R620's with Mellanox QSFP+ style Infiniband cards. Over the past few weeks I have used the three nodes for testing purposes of installing CentOS and documenting the process.
What I know:
All of the non-Infiniband stuff works fine, but when I moved the three new nodes into the server room and connected them to the Qlogic switch for the first time I had problems. I don't get a blue light on the Qlogic port like I do with the first 10 (operational) nodes. In CentOS, running an 'ip link' command shows the state of both infiniband ports to be DOWN. From what I have read, if I am connected properly to the Qlogic, at minimum the output of 'ip link' should show an INIT state for the one port that is connected (It is a dual-port card and I am just using one port/cable). I have the "Infiniband Support" group installed. I can see p1p1 and p1p2 listed in ifconfig or ip link.
Code: Select all
6: p1p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
link/ether e4:1d:2d:06:d6:f0 brd ff:ff:ff:ff:ff:ff
7: p1p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
link/ether e4:1d:2d:06:d6:f1 brd ff:ff:ff:ff:ff:ff
Perhaps the new Infiniband cards are either not compatible with the Qlogic 9024, or by default they run at a higher speed and therefore I need to configure the switch for some 12X ports.
Perhaps I don't have them configured properly in CentOS and the switch sees a misconfigured card and doesn't activate the port. I know you can get different modules that plug into the back of a QSFP+ Infiniband cards and make them into ethernet ports and perhaps other options as well. Do I need to tell CentOS what type of module or connection I'm using (ethernet, ib, etc)? I didn't need to specify in any of the Solaris installations what type of connection it was. I believe once I had the firmware updated and ran devfsadm -C, I was able to assign an IP address and it worked.
I could use any help in getting pointed in the right direction or maybe someone has done this before and can guide me through the process.
Tron