SATA problems AMD + Samsung SSD

Issues related to hardware problems
Post Reply
altis0
Posts: 5
Joined: 2019/06/12 09:02:07

SATA problems AMD + Samsung SSD

Post by altis0 » 2019/06/12 09:32:35

I'm having intermittent problems with the SATA interface to the SSD.

OS : Centos 7 with latest yum updates.
Motherboard : Asus M5A78L-M/USB3
CPU : AMD Opteron 3350HE
RAM : 4 x 8GB PC3L-12800
Southbridge : AMD SB710
SSD : 256 GB Samsung SSD 860 Pro

The errors appear to start in the host interface (SErr 0x800):

Code: Select all

Jun 11 14:38:01 lintel kernel: ata1.00: ATA-11: Samsung SSD 860 PRO 256GB, RVM01B6Q, max UDMA/133
Jun 11 14:38:01 lintel kernel: ata1.00: 500118192 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
Jun 11 14:38:01 lintel kernel: ata1.00: supports DRM functions and may not be fully accessible
Jun 11 14:38:01 lintel kernel: ata1.00: configured for UDMA/133
Jun 11 14:38:02 lintel kernel: ata1.00: Enabling discard_zeroes_data
Jun 11 14:38:02 lintel kernel: ata1.00: Enabling discard_zeroes_data
Jun 11 14:38:02 lintel kernel: ata1.00: Enabling discard_zeroes_data
<snip>
Jun 12 07:34:46 lintel kernel: ata1.00: exception Emask 0x50 SAct 0x1c000000 SErr 0x800 action 0x6 frozen
Jun 12 07:34:46 lintel kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Jun 12 07:34:46 lintel kernel: ata1: SError: { HostInt }
Jun 12 07:34:46 lintel kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jun 12 07:34:46 lintel kernel: ata1.00: cmd 61/20:d0:00:5a:96/00:00:17:00:00/40 tag 26 ncq 16384 out#012         res 40/00:d0:00:5a:96/00:00:17:00:00/40 Emask 0x50 (ATA bus error)
Jun 12 07:34:46 lintel kernel: ata1.00: status: { DRDY }
Jun 12 07:34:46 lintel kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jun 12 07:34:46 lintel kernel: ata1.00: cmd 61/20:d8:40:2c:b6/00:00:1a:00:00/40 tag 27 ncq 16384 out#012         res 40/00:d0:00:5a:96/00:00:17:00:00/40 Emask 0x50 (ATA bus error)
Jun 12 07:34:46 lintel kernel: ata1.00: status: { DRDY }
Jun 12 07:34:46 lintel kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jun 12 07:34:46 lintel kernel: ata1.00: cmd 61/02:e0:4f:99:af/00:00:1a:00:00/40 tag 28 ncq 1024 out#012         res 40/00:d0:00:5a:96/00:00:17:00:00/40 Emask 0x50 (ATA bus error)
Jun 12 07:34:46 lintel kernel: ata1.00: status: { DRDY }
Jun 12 07:34:46 lintel kernel: ata1: hard resetting link
Jun 12 07:34:46 lintel kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 12 07:34:46 lintel kernel: ata1.00: supports DRM functions and may not be fully accessible
Jun 12 07:34:46 lintel kernel: ata1.00: supports DRM functions and may not be fully accessible
Jun 12 07:34:46 lintel kernel: ata1.00: configured for UDMA/133
Jun 12 07:34:46 lintel kernel: ata1: EH complete
Jun 12 07:34:46 lintel kernel: ata1.00: Enabling discard_zeroes_data
In this example there are only 3 queued commands. Commonly, there are 31 - the maximum allowed under NCQ.

The BIOS is configured to use IDE mode and not AHCI but, it seems, we still end up command queueing anyway.

Some unresolved incompatibility between AMD's AHCI implementation and Samsung SSDs is widely reported on the internet. Sadly, I only found these after building and installing the system.

User avatar
TrevorH
Forum Moderator
Posts: 25825
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: SATA problems AMD + Samsung SSD

Post by TrevorH » 2019/06/12 10:38:41

The BIOS is configured to use IDE mode and not AHCI but, it seems, we still end up command queueing anyway.
Why? SSDs basically require AHCI to work properly.

If you've not already done so, change the SATA cable.
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

altis0
Posts: 5
Joined: 2019/06/12 09:02:07

Re: SATA problems AMD + Samsung SSD

Post by altis0 » 2019/06/12 10:47:56

I've tried several SATA cables and even a different PSU. Eventually, I always see the same problems.

In the same system are two Seagate Baracuda HDDs arranged as a RAID array. I never see a problem with these.

Note the error code 0x800. This means:

Internal Error (E):
The host bus adapter experienced an internal error
that caused the operation to fail and may have put the host bus adapter
into an error state. The internal error may include a master or target
abort when attempting to access system memory, an elasticity buffer
overflow, a primitive mis-alignment, a synchronization FIFO overflow,
and other internal error conditions. Typically when an internal error
occurs, a non-fatal or fatal status bit in the PxIS register will also be set
to give software guidance on the recovery mechanism required.


From:
https://www.intel.com/content/dam/www/p ... rev1_3.pdf

stevemowbray
Posts: 469
Joined: 2012/06/26 14:20:47

Re: SATA problems AMD + Samsung SSD

Post by stevemowbray » 2019/06/12 11:33:27

We had similar problems with some Integral SSDs. We used the following kernel parameter to disable NCQ:
libata.force=noncq

I don't know whether that will work for your particular mix of hardware but it might be worth a try.

altis0
Posts: 5
Joined: 2019/06/12 09:02:07

Re: SATA problems AMD + Samsung SSD

Post by altis0 » 2019/06/12 11:58:44

Thanks, that sounds promising.

Bit of a Centos novice here. Where do I put that parameter?

stevemowbray
Posts: 469
Joined: 2012/06/26 14:20:47

Re: SATA problems AMD + Samsung SSD

Post by stevemowbray » 2019/06/12 12:04:49

You can update your kernel command line with grubby:

grubby --update-kernel=ALL --args="libata.force=noncq"

altis0
Posts: 5
Joined: 2019/06/12 09:02:07

Re: SATA problems AMD + Samsung SSD

Post by altis0 » 2019/06/12 12:57:59

Thanks. I seem to have typed the right thing:

Code: Select all

Jun 12 13:50:37 lintel kernel: ata1: SATA max UDMA/133 abar m1024@0xf9dffc00 port 0xf9dffd00 irq 22
Jun 12 13:50:37 lintel kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 12 13:50:37 lintel kernel: ata1.00: FORCE: horkage modified (noncq)
Jun 12 13:50:37 lintel kernel: ata1.00: supports DRM functions and may not be fully accessible
Jun 12 13:50:37 lintel kernel: ata1.00: ATA-11: Samsung SSD 860 PRO 256GB, RVM01B6Q, max UDMA/133
Jun 12 13:50:37 lintel kernel: ata1.00: 500118192 sectors, multi 1: LBA48 NCQ (not used)
Jun 12 13:50:37 lintel kernel: ata1.00: supports DRM functions and may not be fully accessible
Jun 12 13:50:37 lintel kernel: ata1.00: configured for UDMA/133
Jun 12 13:50:38 lintel kernel: ata1.00: Enabling discard_zeroes_data
Jun 12 13:50:38 lintel kernel: ata1.00: Enabling discard_zeroes_data
Jun 12 13:50:38 lintel kernel: ata1.00: Enabling discard_zeroes_data
Now I just have to wait and see if I get any more errors.

altis0
Posts: 5
Joined: 2019/06/12 09:02:07

Re: SATA problems AMD + Samsung SSD

Post by altis0 » 2019/06/14 18:38:05

Over 48 hours in and there have been no more errors so it looks like that one's fixed. Many thanks for your help.

Code: Select all

Jun 12 13:50:37 lintel kernel: ata1: SATA max UDMA/133 abar m1024@0xf9dffc00 port 0xf9dffd00 irq 22
Jun 12 13:50:37 lintel kernel: ata2: SATA max UDMA/133 abar m1024@0xf9dffc00 port 0xf9dffd80 irq 22
Jun 12 13:50:37 lintel kernel: ata3: SATA max UDMA/133 abar m1024@0xf9dffc00 port 0xf9dffe00 irq 22
Jun 12 13:50:37 lintel kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 12 13:50:37 lintel kernel: ata1.00: FORCE: horkage modified (noncq)
Jun 12 13:50:37 lintel kernel: ata1.00: supports DRM functions and may not be fully accessible
Jun 12 13:50:37 lintel kernel: ata1.00: ATA-11: Samsung SSD 860 PRO 256GB, RVM01B6Q, max UDMA/133
Jun 12 13:50:37 lintel kernel: ata1.00: 500118192 sectors, multi 1: LBA48 NCQ (not used)
Jun 12 13:50:37 lintel kernel: ata1.00: supports DRM functions and may not be fully accessible
Jun 12 13:50:37 lintel kernel: ata1.00: configured for UDMA/133
Jun 12 13:50:37 lintel kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 12 13:50:37 lintel kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 12 13:50:37 lintel kernel: ata2.00: FORCE: horkage modified (noncq)
Jun 12 13:50:37 lintel kernel: ata2.00: ATA-10: ST8000DM005-2EH112, DN03, max UDMA/133
Jun 12 13:50:37 lintel kernel: ata2.00: 15628053168 sectors, multi 16: LBA48 NCQ (not used)
Jun 12 13:50:37 lintel kernel: ata2.00: configured for UDMA/133
Jun 12 13:50:37 lintel kernel: ata3.00: FORCE: horkage modified (noncq)
Jun 12 13:50:37 lintel kernel: ata3.00: ATA-10: ST8000DM005-2EH112, DN03, max UDMA/133
Jun 12 13:50:37 lintel kernel: ata3.00: 15628053168 sectors, multi 16: LBA48 NCQ (not used)
Jun 12 13:50:37 lintel kernel: ata3.00: configured for UDMA/133
Jun 12 13:50:38 lintel kernel: ata1.00: Enabling discard_zeroes_data
Jun 12 13:50:38 lintel kernel: ata1.00: Enabling discard_zeroes_data
Jun 12 13:50:38 lintel kernel: ata1.00: Enabling discard_zeroes_data
Jun 12 13:50:43 lintel kernel: ata2.00: configured for UDMA/133
Jun 12 13:50:43 lintel kernel: ata2: EH complete
Jun 12 13:50:43 lintel kernel: ata3.00: configured for UDMA/133
Jun 12 13:50:43 lintel kernel: ata3: EH complete

Post Reply