Newer mpt2sas LSI driver ?

Issues related to hardware problems
Post Reply
allan34
Posts: 4
Joined: 2014/12/03 08:27:32

Newer mpt2sas LSI driver ?

Post by allan34 » 2014/12/09 09:15:01

hi

Centos 7 ships with an older version of the mpt2sas driver than what LSI support makes available.
The current LSI driver is P20 - is there a good reason why such an older version of the LSI driver is shipped?

Centos 7 DVD 1406 - modinfo shows
version:16.100.00.00
(yum update keeps the same driver)

Curiously - Centos 6.6 ships with a slightly different
version: 16.101.00.00

Turns out that on a new build flashing the P20 firmware was a bad idea. (ok this may be obviously a bad idea running older driver with newer firmware)

On a new build.
Flashed the P20 firmware to a 9201 hba card.
Installed Centos 7 (minimal) and received the errors shown below while performing dd from the attached WD40EFRX drives.
Same error across all drives.

Since I initially suspected the driver, went through a bit of experimentation with centos 6.6, and another distro, but still get the same errors.
Sanity only prevailed after down flashing the cards firmware to P16.

Maybe this card and P20 firmware have a problem? Have not had time to try with P19 etc... Thought it best to start with the P16 firmware to match the driver. And there is some pressure to get this system working a.s.a.p.

However there is that nagging feeling that LSI release firmware for a good reason and it would be better to be using later firmware.

Any advice on how to proceed?

Atempting to install the P20 from lsi was not straight forward possibly due to my ignorance.

However durrently the P20 driver (confirmed with modinfo) still errors with the P20 firmware.

Regards
A

PS anyone interesed it's a 45drives storinator case (http://www.45drives.com/)
Asus P8C WS mb
LSI 9201 hba
WD40EFRX

Code: Select all

Dec  5 16:00:45 localhost kernel: sd 0:0:1:0: [sdb]  
Dec  5 16:00:45 localhost kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec  5 16:00:45 localhost kernel: sd 0:0:1:0: [sdb]  
Dec  5 16:00:45 localhost kernel: Sense Key : Aborted Command [current] 
Dec  5 16:00:45 localhost kernel: sd 0:0:1:0: [sdb]  
Dec  5 16:00:45 localhost kernel: Add. Sense: Information unit iuCRC error detected
Dec  5 16:00:45 localhost kernel: sd 0:0:1:0: [sdb] CDB: 
Dec  5 16:00:45 localhost kernel: Read(16): 88 00 00 00 00 00 00 19 50 00 00 00 01 00 00 00
Dec  5 16:00:45 localhost kernel: end_request: I/O error, dev sdb, sector 1658880
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207360
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207361
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207362
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207363
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207364
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207365
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207366
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207367
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207368
Dec  5 16:00:45 localhost kernel: Buffer I/O error on device sdb, logical block 207369
Dec  5 16:00:50 localhost kernel: mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Dec  5 16:00:51 localhost kernel: mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Dec  5 16:00:58 localhost kernel: sd 0:0:1:0: [sdb]  
Dec  5 16:00:58 localhost kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec  5 16:00:58 localhost kernel: sd 0:0:1:0: [sdb]  
Dec  5 16:00:58 localhost kernel: Sense Key : Aborted Command [current] 
Dec  5 16:00:58 localhost kernel: sd 0:0:1:0: [sdb]  
Dec  5 16:00:58 localhost kernel: Add. Sense: Information unit iuCRC error detected
Dec  5 16:00:58 localhost kernel: sd 0:0:1:0: [sdb] CDB: 
Dec  5 16:00:58 localhost kernel: Read(16): 88 00 00 00 00 00 00 46 f7 00 00 00 01 00 00 00
Dec  5 16:00:58 localhost kernel: end_request: I/O error, dev sdb, sector 4650752
Dec  5 16:00:58 localhost kernel: quiet_error: 22 callbacks suppressed
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581344
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581345
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581346
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581347
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581348
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581349
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581350
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581351
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581352
Dec  5 16:00:58 localhost kernel: Buffer I/O error on device sdb, logical block 581353
Dec  5 16:01:01 localhost systemd: Starting Session 3 of user root.
Dec  5 16:01:01 localhost systemd: Started Session 3 of user root.
Dec  5 16:01:05 localhost kernel: mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Dec  5 16:01:06 localhost kernel: sd 0:0:1:0: [sdb]  
Dec  5 16:01:06 localhost kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

allan34
Posts: 4
Joined: 2014/12/03 08:27:32

Re: Newer mpt2sas LSI driver ?

Post by allan34 » 2014/12/09 16:52:45

The P18 HBA firmware works with the P18 driver ok.
Installed the P18 driver using the kernel-ml, (see http://elrepo.org/tiki/tiki-index.php and http://elrepo.org/tiki/kernel-ml)
Flashed the P18 firmware - all works fine.

And no surprise the P18 driver + P20 firmware resulted in the same errors.

P20 driver and P18 firmware works ok in a short test - but is this a good idea?

Will request the good folks at Elrepo for a P18 & P19 kmod-mpt2sas rpm's unless anyone here says these are toxic and best to stick to P16?

Will see if LSI/Avago have anything to say on this.

Regards
A

allan34
Posts: 4
Joined: 2014/12/03 08:27:32

Re: Newer mpt2sas LSI driver ?

Post by allan34 » 2014/12/11 23:10:02

Lsi confirm :

The p20 firmware is broken for this card. ( Would be nice if it was not listed on the 9201-16i download page.)

No need for the driver and firmware to be the same release as each other.

Currently no known problems with the p16 driver that ships by default with the current centos7 release.

For this installation I feel confident enough to run with the p16 driver and the p19 firmware.

Cheers
Allan

TexasFight
Posts: 1
Joined: 2015/01/23 02:53:39

Re: Newer mpt2sas LSI driver ?

Post by TexasFight » 2015/01/23 03:38:07

Wanted to chime in on this thread: I also ran into problems with an LSI 9211-4i controller running the P20 firmware.

The P20 firmware seems to work just fine with an older Hitachi 4TB Deskstar. I upgraded to a 6TB Deskstar NAS drive, and started seeing all kinds of weird - but not critical - errors.

The kernel logs (Debian Wheezy 3.2.0-4-amd64 with the stock mpt2sas 10.100.00.00) reported lots of "mpt2sas0: log_info(0x31080000)" and the like. The 6TB Deskstar started accumulating errors in its own SMART logs, too:

199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 49

...

Error 49 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 41 28 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 48 00 d8 f8 8d 40 00 00:09:27.113 READ FPDMA QUEUED
60 00 08 10 03 8e 40 00 00:09:27.113 READ FPDMA QUEUED
60 08 08 f0 8e d1 40 00 00:09:27.106 READ FPDMA QUEUED
60 10 00 00 b6 27 40 00 00:09:27.102 READ FPDMA QUEUED
60 10 00 20 58 d1 40 00 00:09:27.092 READ FPDMA QUEUED


I had none of these issues with P20 and the previous Hitachi 4TB drive.

In any case, after reading this thread and several others around the 'net about a bug in P20, I reflashed the LSI controller down to P19, and voila, the 6TB drive appears to be operating normally (save for now its SMART logs are permanently tainted).

Folks on the FreeBSD list report on a similar situation here:
http://lists.freebsd.org/pipermail/free ... 81014.html

It seems to be some interaction that only occurs with particular hard drives - I saw this exact problem about four years ago with an LSI RAID controller and some recently-released Hitachi A7K2000 2TB Ultrastar drives. The drives would fall out of the RAID, report UDMA CRC Errors, etc., whenever they were attached to a particular LSI RAID card. The drives worked fine on other SAS/SATA controllers.

For future reference, the older drive that worked with the LSI 9211-4i running firmware version P20:

4TB Deskstar HDS724040ALE640 P/N: 0F14681 (Firmware MJAOA3B0)

The drive that did not work with firmware version P20, but works just fine with P19:

6TB Deskstar "NAS" HDN726060ALE610 P/N: 0S03839 (Firmware APGNT517)

Figured I'd throw this out there for the next person Googling around for answers...

Post Reply