Kernel Issues: LSI SAS3080x (SAS1068) HBA On CentOS 7

General support questions
MadIRQ
Posts: 3
Joined: 2017/09/09 14:27:40

Kernel Issues: LSI SAS3080x (SAS1068) HBA On CentOS 7

Postby MadIRQ » 2017/09/09 16:15:17

Hello all, hoping somebody can point me in the right direction as I seem to be having multiple kernel-related issues...
(Mods, if this is more suitable for the hardware forum please move as appropriate)

I'm working with a small low-power server that uses a SAS1068 based HBA for its data disks.

Full boot console logs available for any configuration below.

CentOS 5.11 (2.6.18-398.el5) Boots/works fine.
CentOS 6.9 (2.6.32-696.el6.x86_64) Boots/works fine.
CentOS 7 (3.10.0-514.el7.x86_64) "Boots" - however, the log is filled with APIC/IRQ errors/oddities related specifically to the HBA.
CentOS 7 (4.9.39-29.el7.x86_64) Same as above but with total HBA failure and system instability. System also fails to reboot.

A quick snippet:

CentOS 6.9 (2.6.32-696.el6.x86_64) Console:

Code: Select all

Fusion MPT base driver 3.04.20
Copyright (c) 1999-2008 LSI Corporation
Fusion MPT SAS Host driver 3.04.20
ACPI: PCI Interrupt Link [LNEC] enabled at IRQ 18
mptsas 0000:03:04.0: PCI INT A -> Link[LNEC] -> GSI 18 (level, low) -> IRQ 18
mptbase: ioc0: Initiating bringup
ioc0: LSISAS1068 B0: Capabilities={Initiator}
scsi8 : ioc0: LSISAS1068 B0, FwRev=01210000h, Ports=1, MaxQ=483, IRQ=18
mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 6, phy 1, sas_addr 0xd93c3f15cf1fa418
The above loads quick and clean - everything works.

CentOS 7 (3.10.0-514.el7.x86_64) Console:

Code: Select all

[20.029134] ACPI: PCI Interrupt Link [LNEA] enabled at IRQ 18
[20.029279] ata7: SATA link down (SStatus 0 SControl 300)
[20.029652] Fusion MPT SAS Host driver 3.04.20
[20.029762] mptsas 0000:03:04.0: PCI IRQ 0 -> rerouted to legacy IRQ 16
[20.029764] ACPI: Invalid index 16
[20.029766] mptsas 0000:03:04.0: PCI INT A: no GSI - using ISA IRQ 5
[20.029877] mptbase: ioc0: Initiating bringup
[21.929020] ioc0: LSISAS1068 B0: Capabilities={Initiator}

[22.573281] irq 5: nobody cared (try booting with the "irqpoll" option)

[22.573292] Call Trace:
[22.573298]  <IRQ>  [<ffffffff81685fac>] dump_stack+0x19/0x1b
[22.573302]  [<ffffffff81132702>] __report_bad_irq+0x32/0xd0
[22.573304]  [<ffffffff81132b22>] note_interrupt+0x132/0x1f0
[22.573307]  [<ffffffff810f39a1>] ? tick_nohz_idle_enter+0x41/0x70
[22.573309]  [<ffffffff81130201>] handle_irq_event_percpu+0xe1/0x1e0
[22.573311]  [<ffffffff8113033d>] handle_irq_event+0x3d/0x60
[22.573313]  [<ffffffff811337da>] handle_fasteoi_irq+0x5a/0x100
[22.573315]  [<ffffffff8102d26f>] handle_irq+0xbf/0x150
[22.573317]  [<ffffffff810f3cb6>] ? tick_check_idle+0xb6/0xd0
[22.573319]  [<ffffffff81698bef>] do_IRQ+0x4f/0xf0
[22.573322]  [<ffffffff8168dd6d>] common_interrupt+0x6d/0x6d
[22.573324]  <EOI>  [<ffffffff810f39a1>] ? tick_nohz_idle_enter+0x41/0x70
[22.573326]  [<ffffffff810f399d>] ? tick_nohz_idle_enter+0x3d/0x70
[22.573329]  [<ffffffff810e7af3>] cpu_startup_entry+0xa3/0x290
[22.573332]  [<ffffffff8104f12a>] start_secondary+0x1ba/0x230
[22.573333] handlers:
[22.573342] [<ffffffffa00e8bb0>] mpt_interrupt [mptbase]
[22.573343] Disabling IRQ #5

[OK] Reached target Basic System.

[34.697154] scsi host8: ioc0: LSISAS1068 B0, FwRev=01210000h, Ports=1, MaxQ=483, IRQ=5
[40.497025] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 6, phy 1, sas_addr 0xd93c3f15cf1fa418
As you can see, eventually it sort-of "works" but something if definitely wrong. The stability of this configuration is also in question.

CentOS 7 (4.9.39-29.el7.x86_64 as from Xen4) Console:

Code: Select all

[24.265640] Fusion MPT base driver 3.04.20
[24.328262] Copyright (c) 1999-2008 LSI Corporation
[24.451803] Fusion MPT SAS Host driver 3.04.20
[24.452063] pps_core: LinuxPPS API ver. 1 registered
[24.452064] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[24.725813] PTP clock support registered
[24.927387] mptsas 0000:03:04.0: PCI IRQ 0 -> rerouted to legacy IRQ 16
[25.363447] ACPI: Invalid index 16
[25.404326] mptsas 0000:03:04.0: PCI INT A: no GSI

[25.681543] mptbase: ioc0: Initiating bringup

[28.287019] ioc0:
[28.287020] LSISAS1068 B0:
[28.287020] Capabilities={
[28.287020] Initiator
[28.287021] }

[55.776202] mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!, doorbell=0x24000000
[68.382020] mptbase: ioc0: Attempting Retry Config request type 0x1, page 0x1, action 0
[83.936020] mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!, doorbell=0x24000000
[84.034009] mptbase: ioc0: Initiating recovery
[Messages repeat to infinity]
The system at this point is virtually useless.

Anybody have suggestions on where to start?

Thank you all,
MI

SIGBUS
Posts: 13
Joined: 2007/05/28 15:39:06

Re: Kernel Issues: LSI SAS3080x (SAS1068) HBA On CentOS 7

Postby SIGBUS » 2017/09/13 20:56:53

Unfortunately, mptsas is deprecated in 7.x. You can pretty well count on the driver being removed completely in 8.x.

I ran into the driver deprecation/removal issue this spring as well, working at a client. We were going to use an old Dell server with a PERC/4e RAID controller as a development box, and much to my consternation, CentOS 7 wouldn't detect the RAID controller. Ubuntu Server worked fine on the same hardware, but we were using a proprietary database that didn't support Ubuntu. We ended up buying a cheap refurbished desktop as a development box, though I suppose I could have just stuck with Ubuntu and ran CentOS as a KVM guest.

This is eventually going to bite me at home as well. I have an Intel SASUC8I (also SAS1068-based) reflashed with IT firmware in my home fileserver. In my case, 3.10.0-514.26.2.el7.x86_64 has been stable for me, but the writing is on the wall. I'm torn on whether to eat the cost of a new HBA or to switch distros.

MadIRQ
Posts: 3
Joined: 2017/09/09 14:27:40

Re: Kernel Issues: LSI SAS3080x (SAS1068) HBA On CentOS 7

Postby MadIRQ » 2017/09/17 14:57:55

Thanks for the reply SIGBUS.

I was completely unaware of mptsas being move to depreciation status. That is rather unfortunate to hear. Interestingly enough on all my attempts to get this working the HBA has never actually failed to be detected... just failed to function as expected.

That being said I did do some more testing for anyone that may come across this problem in the future:

3.10.0-123.1.2.el7.centos.plus.x86_64. This behaves the same as 3.10.0-514.el7.x86_64 described above.
Using ELRepo, I then upgraded to 4.13.1-1.el7.elrepo.x86_64. This, this worked perfect much to my surprise. Performance was the same if not better than when the system was originally using 2.6.18-398.el5 - 2.6.32-696.el6.x86_64.

So the good news is the cutting edge kernel(s) works great.

Unfortunately for me, this machine is also needed for virtualization and with xen4centos only at 4.9.39-29.el7.x86_64 I'm being forced to go distro shopping for the time being.

-MI

User avatar
TrevorH
Forum Moderator
Posts: 21171
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Kernel Issues: LSI SAS3080x (SAS1068) HBA On CentOS 7

Postby TrevorH » 2017/09/17 15:15:18

Latest xen4centos kernel is 4.9.44-29 so you are way behind the current release.

http://mirror.centos.org/centos-7/7.4.1 ... x86_64.rpm
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

MadIRQ
Posts: 3
Joined: 2017/09/09 14:27:40

Re: Kernel Issues: LSI SAS3080x (SAS1068) HBA On CentOS 7

Postby MadIRQ » 2017/09/29 00:11:02

That's good to know.

I didn't try manually updating the xen kernel and assumed the one being provided automatically by the centos-release-xen/xen packages was the latest. As of my original posting that was 4.9.39-29.el7.x86_64.