FB-DIMM configuration write error (RAM ISSUE)

Issues related to hardware problems
Post Reply
jboy4
Posts: 5
Joined: 2011/08/11 20:31:47

FB-DIMM configuration write error (RAM ISSUE)

Post by jboy4 » 2011/08/11 20:35:30

Hi there im having an issue with my Server. It is running CentOS 6 with 32gb of ram on a DSEB-DG motherboard from asus.

I receieve this error within my /var/logs/message

Edac MC0:UE row 0, channel -a = 0 labels "-" NON-Fatal recoverable (Branch = 0 DRAM - Bank = 0 Buffer ID = 0 RDWR= Read RAS = 0 Cas= 0 NON-Fatal recoverable Err=0x2000
(FB-DIMM configuration write error on first attempt)


PS: RAM is: QVL Memory KVR800D2D4F5 4G

http://www.ec.kingston.com/ecom/hyperx/partsinfo.asp?root=&ktcpartno=KVR800D2D4F5K2/8G


Any help would be appreciated!

jboy4
Posts: 5
Joined: 2011/08/11 20:31:47

Re: FB-DIMM configuration write error (RAM ISSUE)

Post by jboy4 » 2011/08/12 20:02:08

I tried using memtester to find my issues and recieved this from it.

[quote][root@agsvirt2 ~][root@agsvirt2 ~]# memtester 512M 5
memtester version 4.2.0 (64-bit)
Copyright (C) 2010 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 512MB (536870912 bytes)
got 512MB (536870912 bytes), trying mlock ...locked.
Loop 1/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Loop 2/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Loop 3/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Loop 4/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Loop 5/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.
[root@agsvirt2 ~]# grep EDAC /usr/log/messages
grep: /usr/log/messages: No such file or directory
[root@agsvirt2 ~]# grep EDAC /var/log/messages
Aug 8 09:24:06 agsvirt2 kernel: EDAC MC: Ver: 2.1.0 Jun 27 2011
Aug 8 09:24:06 agsvirt2 kernel: EDAC MC0: Giving out device to 'i5400_edac.c' 'I5400': DEV 0000:00:10.0
Aug 8 09:24:06 agsvirt2 kernel: EDAC PCI0: Giving out device to module 'i5400_edac' controller 'EDAC PCI controller': DEV '0000:00:10.0' (POLLED)
Aug 8 15:11:08 agsvirt2 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recoverable Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
Aug 9 10:17:54 agsvirt2 kernel: EDAC MC: Ver: 2.1.0 Jun 27 2011
Aug 9 10:17:54 agsvirt2 kernel: EDAC MC0: Giving out device to 'i5400_edac.c' 'I5400': DEV 0000:00:10.0
Aug 9 10:17:54 agsvirt2 kernel: EDAC PCI0: Giving out device to module 'i5400_edac' controller 'EDAC PCI controller': DEV '0000:00:10.0' (POLLED)
[root@agsvirt2 ~]# memtester 512M 5^C
[root@agsvirt2 ~]# man memtest
No manual entry for memtest
[root@agsvirt2 ~]# man memtester
[root@agsvirt2 ~]# memtester 20G 2
memtester version 4.2.0 (64-bit)
Copyright (C) 2010 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 20480MB (21474836480 bytes)
got 20480MB (21474836480 bytes), trying mlock ...locked.
Loop 1/2:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : setting 136
Message from syslogd@agsvirt2 at Aug 9 16:11:33 ...
kernel:EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recoverable Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
testing 136
Message from syslogd@agsvirt2 at Aug 9 16:11:40 ...
kernel:EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recoverable Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
setting 137
Message from syslogd@agsvirt2 at Aug 9 16:11:47 ...
kernel:EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recoverable Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
setting 225
Message from syslogd@agsvirt2 at Aug 9 16:32:32 ...
kernel:EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recoverable Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Loop 2/2:
Stuck Address : setting 9
Message from syslogd@agsvirt2 at Aug 9 18:46:26 ...
kernel:EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recoverable Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.
[/quote]
Any help would be great!

r_hartman
Posts: 711
Joined: 2009/03/23 15:08:11
Location: Netherlands
Contact:

FB-DIMM configuration write error (RAM ISSUE)

Post by r_hartman » 2011/08/17 08:53:14

It would appear you have a failing memory module.
Not sure if memtest86+ will run on that box, but as all entities are returned 0 the info available is likely invalid.
You may want to try memtest86+ (if you can take the box offline) and swap memory banks around, depending on the results, in order to identify the failing module.

Alternatively, you could use memtest86+ to run on single modules only (take all modules but one out and cycle through the modules until you identify the failing one.
That is, if your hardware allows you to run single modules. Otherwise you'll need to run module pairs.

You may find that removing and reinstalling the memory banks may make the issue magically disappear (contact issues).

jboy4
Posts: 5
Joined: 2011/08/11 20:31:47

Re: FB-DIMM configuration write error (RAM ISSUE)

Post by jboy4 » 2011/08/25 16:04:59

I have performed both memtest (booted to it) and memtester (within linux) No errors were found. With more going onto the server i recieve the error more often. It can be as much as once every 10 seconds or as little as twice a day. I started to believe it had to do with ECC settings within my motherboard. The memory i have does not support ECC. Disabling the settings on the motherboard did not solve the issue.

Any other ideas or help?

A Google search does show that there are plenty of people that have had the same issue after upgrading to later versions of linux. Is is possible that an update to the EDAC driver now results in it attempting to write some ECC memory configuration that is unsupported by the memory that we have?

jboy4
Posts: 5
Joined: 2011/08/11 20:31:47

Re: FB-DIMM configuration write error (RAM ISSUE)

Post by jboy4 » 2011/09/09 18:24:33

Still recieving this error sometimes daily other times it will wait a day then the next day will be even worse.

Please help...

Post Reply