Advanced error reporting on CentOS 5.5

If it doesn't fit in another category, ask it here.
Post Reply
pixpop
Posts: 1
Joined: 2012/02/08 00:23:34

Advanced error reporting on CentOS 5.5

Post by pixpop » 2012/02/08 00:47:35

I'm having trouble getting PCIe advanced error reporting to work on 5.5.

The info I need to start with is the following:

1) Is PCIe Advanced Error Reporting (including error injections) known to work in 5.5?

2) What configuration options do I need to select in order to enable it?

I'm using the following:

CONFIG_ACPI=y
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
CONFIG_PCIE_ECRC=y
CONFIG_PCIE_AER_INJECT=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCI_MSI=y

I also have the kernel boot parameter aerdriver.forceload=y

With this config, I see that the relevant devices appear to support Advanced Error Reporting (visible in lspci -vvv output).

However, when I attempt to inject errors to them the injections fail for various reasons. None of them (out of about a dozen candidate devices) succeeds. Using the aer_inject facility is the only means I have of testing this functionality.

Before I go and track down those various errors, I want to verify that I have the right kernel and that it's configured correctly. Note that on the same hardware with a Rhel-6 based (2.6.32) kernel, the injections work correctly. But I need to make them also work with 5.5

Once again, I need answers to these two specific questions:

1) Is PCIe Advanced Error Reporting (including error injections) known to work in 5.5?

2) What configuration options do I need to select in order to enable it?

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Advanced error reporting on CentOS 5.5

Post by pschaff » 2012/02/09 16:33:14

Welcome to the CentOS fora. Please see the recommended reading for new users linked in my signature.

Don't use obsolete/unsupported releases - 5.7 is the current release. See the [url=http://wiki.centos.org/Manuals/ReleaseNotes/CentOS5.7]CentOS 5.7 Release Notes[/url] for details. By not updating you are implicitly accepting that you will live with numerous bugs and security issues (and associated known exploits) that have subsequently been fixed.

You don't explicitly say so, but you are apparently trying to build a custom kernel. CentOS kernels are based on the RHEL SRPMs so what ever works for RHEL (version unspecified) [i]should[/i] work for CentOS. Have you read the [url=http://wiki.centos.org/?action=fullsearch&context=180&value=kernel&titlesearch=Titles]Wiki kernel articles[/url]?

hitlijinfeng
Posts: 2
Joined: 2012/03/29 02:50:54

Re: Advanced error reporting on CentOS 5.5

Post by hitlijinfeng » 2012/03/29 03:23:05

You said the injections work correctly on your hardware with a Rhel-6 based (2.6.32) kernel. Could you tell the detail of your hardware?

I'm working on testing AER with module "aer_inject", but with my hardware, the module stops at

pos_cap_err = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
or
rp_pos_cap_err = pci_find_ext_capability(rpdev, PCI_EXT_CAP_ID_ERR);

On my hardware, "lspci -t && lspci -vvv" prints info as below:
localhost ljf # lspci -t && lspci -vvv
-[0000:00]-+-00.0
+-02.0
+-02.1
+-1b.0
+-1c.0-[02]----00.0
+-1c.1-[03]----00.0
+-1c.2-[04-0b]--
+-1c.3-[0c-13]--
+-1d.0
+-1d.1
+-1d.2
+-1d.3
+-1d.7
+-1e.0-[15-18]----00.0
+-1f.0
+-1f.2
\-1f.3
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
Subsystem: Lenovo ThinkPad T60/R60 series
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR-
Kernel driver in use: agpgart-intel

00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA controller])
Subsystem: Lenovo ThinkPad T60/R60 series
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- [disabled]
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: i915

00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
Subsystem: Lenovo ThinkPad T60/R60 series
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #2, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #4, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Subsystem: Lenovo Device 2013

00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
Subsystem: Lenovo ThinkPad T60/R60 series
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR-

00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7-M Family) SATA Controller [IDE mode] (rev 02) (prog-if 80 [Master])
Subsystem: Lenovo Thinkpad T60 model 2007
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- [disabled]
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Product Name: Broadcom NetXtreme Gigabit Ethernet Controller
Read-only fields:
[PN] Part number: BCM95751m
[EC] Engineering changes: 106679-15
[SN] Serial number: 0123456789
[MN] Manufacture ID: 31 34 65 34
[RV] Reserved: checksum bad, 27 byte(s) reserved
Read/write fields:
[YA] Asset tag: XYZ01234567
[RW] Read-write area: 107 byte(s) free
End
Capabilities: [58] MSI: Enable+ Count=1/8 Maskable- 64bit+
Address: 00000000fee0100c Data: 41b1
Capabilities: [d0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
Kernel driver in use: tg3

03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)
Subsystem: Intel Corporation Device 1012
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- Reset+ 16bInt+ PostWrite+
16-bit legacy interface ports at 0001
Kernel driver in use: yenta_cardbus

Only 02:00.0 and 03:00.0 have the Advanced Error Reporting capability, the corresponding root ports (00:1c.0 and 00:1c.1) don't have the Advanced Error Reporting capability, so the aer_inject can not going on.

I'm looking for hardware which support the aer injections, please help me.

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: Advanced error reporting on CentOS 5.5

Post by pschaff » 2012/03/29 11:14:34

Welcome to the CentOS fora. Please see the recommended reading for new users linked in my signature.

After reading those links you should realize why you should not hijack threads as you have done. Please start a new Topic (probably in a CentOS-6 forum) for your issue to get the attention you need, providing a link to this one if required for context.

Post Reply