No SAN connection with the latest kernel

Issues related to hardware problems
Post Reply
silvio
Posts: 38
Joined: 2008/11/10 13:06:03
Contact:

No SAN connection with the latest kernel

Post by silvio » 2018/05/18 07:51:13

Hi,

we have some bl465c G7 blades with Emulex cnas.
These devices uses the be2net and lpfc drivers for network and fc connections.
I reinstalled one of the blades with an 1511.iso (i know it's old but the DVD is in the bladecenter and installing systems over Ilom is a ...) and all is working.
After yum upgrade to the latest version, i lost the connection to the SAN.
In the bootlog i can see this:

May 17 17:52:52 blade-server-16 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.2.3.el7.x86_64 root=UUID=c02e6a02-88ed-4757-a57d-b36a581b85ea ro rhgb quiet LANG=de_DE.UTF-8
...
May 17 17:52:55 blade-server-16 kernel: lpfc 0000:04:00.2: 0:1412 Failed to set up driver resource.
May 17 17:52:55 blade-server-16 kernel: lpfc 0000:04:00.2: Driver probe function unexpectedly returned 16
May 17 17:52:55 blade-server-16 kernel: lpfc 0000:04:00.3: 0:1412 Failed to set up driver resource.
May 17 17:52:55 blade-server-16 kernel: lpfc 0000:04:00.3: Driver probe function unexpectedly returned 16

I don't know if this only happens in this kernel version, because i switch directly from 3.10.0-514 to 3.10.0-862.2.3 .
Have someone seen the same problem and have a solution?

lspci:
04:00.0 Ethernet controller: Emulex Corporation OneConnect OCe10100/OCe10102 Series 10 GbE (rev 02)
04:00.1 Ethernet controller: Emulex Corporation OneConnect OCe10100/OCe10102 Series 10 GbE (rev 02)
04:00.2 Fibre Channel: Emulex Corporation OneConnect OCe10100/OCe10102 Series 10 GbE CNA (rev 02)
04:00.3 Fibre Channel: Emulex Corporation OneConnect OCe10100/OCe10102 Series 10 GbE CNA (rev 02)
04:00.4 Ethernet controller: Emulex Corporation OneConnect OCe10100/OCe10102 Series 10 GbE (rev 02)
04:00.5 Ethernet controller: Emulex Corporation OneConnect OCe10100/OCe10102 Series 10 GbE (rev 02)
04:00.6 Ethernet controller: Emulex Corporation OneConnect OCe10100/OCe10102 Series 10 GbE (rev 02)
04:00.7 Ethernet controller: Emulex Corporation OneConnect OCe10100/OCe10102 Series 10 GbE (rev 02)


Silvio

silvio
Posts: 38
Joined: 2008/11/10 13:06:03
Contact:

Re: No SAN connection with the latest kernel

Post by silvio » 2018/05/28 12:20:49

I tested all official kernels between 3.10.514 and 3.10.862.3.2 and the problem starts with 3.10.862.
Now the system is running with 3.10.693.21.1 which should be the last CentOS 7.4 kernel.
I the kernel changelog i see a lot of changes between these 2 versions but i found no infos about my problem.

Silvio

User avatar
TrevorH
Forum Moderator
Posts: 24064
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: No SAN connection with the latest kernel

Post by TrevorH » 2018/05/28 12:56:32

If you have time and inclination to debug this then you could try experimenting with the lpfc_log_verbose parameter that modinfo lpfc says it supports. Perhaps using that you can get more information from it about what it was trying to do at the time...

Any fix will need to come from Redhat though so if you have more information, raise a ticket on bugzilla.redhat.com
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

silvio
Posts: 38
Joined: 2008/11/10 13:06:03
Contact:

Re: No SAN connection with the latest kernel

Post by silvio » 2018/05/28 13:38:33

I hope that i can check it in ~3weeks, after the SAN migration :-) .

silvio
Posts: 38
Joined: 2008/11/10 13:06:03
Contact:

Re: No SAN connection with the latest kernel

Post by silvio » 2018/10/18 13:27:28

TrevorH wrote:
2018/05/28 12:56:32
If you have time and inclination to debug this then you could try experimenting with the lpfc_log_verbose parameter that modinfo lpfc says it supports. Perhaps using that you can get more information from it about what it was trying to do at the time...
After a while i had a little bit time ..
Set up another system with the current kernel (3.10.0-862.14.4.el7.x86_64) and the same problem.
For the debugging i create a lpfc.conf file in modprobe.d with these options:
options lpfc lpfc_log_verbose=0xffff

After an reboot i see .. nothing.
Only the same messages:

[ 4.738655] lpfc 0000:04:00.2: 0:1412 Failed to set up driver resource.
[ 4.739364] lpfc 0000:04:00.2: Driver probe function unexpectedly returned 16
[ 4.869673] lpfc 0000:04:00.3: 0:1412 Failed to set up driver resource.
[ 4.870340] lpfc 0000:04:00.3: Driver probe function unexpectedly returned 16

I was not sure if 0xffff is the correct code and checked it with lpfc_log_verbose=1 but it changed nothing.

Any idea what i did wrong with my config file?

Silvio

With rmmod lpfc i crash the kernel ...

alex1010
Posts: 6
Joined: 2018/02/13 10:29:56

Re: No SAN connection with the latest kernel

Post by alex1010 » 2018/11/07 00:37:08

hi Silvio,

did you manage to troubleshoot it further?

btw which versions of be2net and lpfc you have, firmware?


in my case storage sees login but VLAN discovery fails, in logs I see
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:0373 FCP complete error: status=x1, hw_status=x0, total_data_specified=0, parameter=x2e, word3=x80010000
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:(0):9030 FCP cmd xa3 failed <0/0> status: x1 result: x90 sid: x10302 did: x10100 oxid: x6b Data: x2 x808
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:(0):9024 FCP command xa3 failed: x2 SNS x70000600 x29000000 Data: xa x90 x16 x0 x0
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:(0):0710 Iodone <0/0> cmd ffff880636e72140, error x2 SNS x60070 x29 Data: x0 x90
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:0373 FCP complete error: status=x1, hw_status=x0, total_data_specified=40, parameter=x18, word3=x80010000
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:(0):9024 FCP command xa3 failed: x0 SNS x0 x0 Data: x8 x68 x0 x0 x0

if I unload lpfc VLAN discovery happens but the port stays offline

alex1010
Posts: 6
Joined: 2018/02/13 10:29:56

Re: No SAN connection with the latest kernel

Post by alex1010 » 2018/11/07 09:38:10

Hi,

https://access.redhat.com/solutions/2206921 looks it can be ignored


anyway FCoE port is offline, on the storage I see logged server

keep trying different drivers SW

Kernel 3.10.0-693.17.1
[root@localhost ~]# fcoeadm -i
Description: OneConnect 10Gb NIC (be3)
Revision: 01
Manufacturer: Emulex Corporation
Serial Number: 009C023DBBF8

Driver: be2net 11.4.0.0r
Number of Ports: 1

Symbolic Name: fcoe v0.1 over ens2f0.1780-fco
OS Device Name: host9
Node Name: 0x1000009c023dbbf8
Port Name: 0x2000009c023dbbf8
Fabric Name: 0x0 <<<<< sometimes it filled
Speed: 10 Gbit
Supported Speed: 10 Gbit
MaxFrameSize: 1452 bytes
FC-ID (Port ID): 0x000000 <<<<< this too
State: Offline <- but still stays offline

root@localhost ~]# modinfo lpfc
filename: /lib/modules/3.10.0-693.17.1.el7.x86_64/kernel/drivers/scsi/lpfc/lpfc.ko.xz
version: 0:11.2.0.6
author: Emulex Corporation - tech.support@emulex.com
description: Emulex LightPulse Fibre Channel SCSI driver 11.2.0.6


If someone knows , could you advice ?

silvio
Posts: 38
Joined: 2008/11/10 13:06:03
Contact:

Re: No SAN connection with the latest kernel

Post by silvio » 2018/11/20 10:39:49

Hi Alex,

firts, no i have no solution for this problem at this time.
The last functional kernel version has this be2net version:
be2net 0000:04:00.0: be2net version is 11.4.0.0r
and the card has:
be2net 0000:04:00.1: FW version is 4.9.416.15

This is the last firmware version i can get for my cards....
The card is identified as Emulex OneConnect OCe10100 card and i found that these cards marked as deprecated in the release notes from 7.5.
I'm not sure if deprecated means "not working anymore" because the blade is in the certified hardware list from Redhat ...

Do you connect over an flexfabric switch and can you see some errors there?

Silvio


alex1010 wrote:
2018/11/07 00:37:08
hi Silvio,

did you manage to troubleshoot it further?

btw which versions of be2net and lpfc you have, firmware?


in my case storage sees login but VLAN discovery fails, in logs I see
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:0373 FCP complete error: status=x1, hw_status=x0, total_data_specified=0, parameter=x2e, word3=x80010000
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:(0):9030 FCP cmd xa3 failed <0/0> status: x1 result: x90 sid: x10302 did: x10100 oxid: x6b Data: x2 x808
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:(0):9024 FCP command xa3 failed: x2 SNS x70000600 x29000000 Data: xa x90 x16 x0 x0
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:(0):0710 Iodone <0/0> cmd ffff880636e72140, error x2 SNS x60070 x29 Data: x0 x90
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:0373 FCP complete error: status=x1, hw_status=x0, total_data_specified=40, parameter=x18, word3=x80010000
Nov 6 19:41:38 localhost kernel: lpfc 0000:08:00.2: 0:(0):9024 FCP command xa3 failed: x0 SNS x0 x0 Data: x8 x68 x0 x0 x0

if I unload lpfc VLAN discovery happens but the port stays offline

User avatar
avij
Forum Moderator
Posts: 2774
Joined: 2010/12/01 19:25:52
Location: Helsinki, Finland
Contact:

Re: No SAN connection with the latest kernel

Post by avij » 2018/11/20 10:43:42

Have you tried this with the 7.6 kernel? Try yum update --enablerepo=cr to get the new packages.

silvio
Posts: 38
Joined: 2008/11/10 13:06:03
Contact:

Re: No SAN connection with the latest kernel

Post by silvio » 2018/11/20 15:05:36

Hi Avij,

thanks for the hint.
I installed the 3.10.0-957 kernel and get a new error message:

Nov 20 14:29:34 blade-server-15 kernel: be2net 0000:04:00.7: Emulex OneConnect: PF FLEX10 port 2
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2574 IO channels: irqs 4 fcp 4 nvme 0 MRQ: 16
Nov 20 14:29:34 blade-server-15 kernel: scsi host4: Emulex OneConnect OCe10100, FCoE Initiator on PCI bus 04 device 02 irq 49
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2501 CQ_CREATE mailbox failed with status x3 add_status x0, mbx status x10
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:6086 Failed setup of CQ (0), rc = 0xfffffffa
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:0535 Failed to setup fastpath FCP WQ/CQ (0), rc = 0xfffffffa
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2507 MQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2510 RQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:34 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.2: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.2: 0:0381 Error -6 during queue setup.#012
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.2: 0:1421 Failed to set up hba
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2574 IO channels: irqs 4 fcp 4 nvme 0 MRQ: 16
Nov 20 14:29:35 blade-server-15 kernel: scsi host5: Emulex OneConnect OCe10100, FCoE Initiator on PCI bus 04 device 03 irq 51
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2501 CQ_CREATE mailbox failed with status x3 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:6086 Failed setup of CQ (0), rc = 0xfffffffa
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:0535 Failed to setup fastpath FCP WQ/CQ (0), rc = 0xfffffffa
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2507 MQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2509 RQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2508 WQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:2506 CQ_DESTROY mailbox failed with status x5 add_status x0, mbx status x10
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:0381 Error -6 during queue setup.#012
Nov 20 14:29:35 blade-server-15 kernel: lpfc 0000:04:00.3: 0:1421 Failed to set up hba

Additionally i lost my network connection on this blade but this is a different problem ...

Silvio

avij wrote:
2018/11/20 10:43:42
Have you tried this with the 7.6 kernel? Try yum update --enablerepo=cr to get the new packages.

Post Reply