(memory?) corruption

A 5 star hangout for overworked and underpaid system admins.
Post Reply
User avatar
jlehtone
Posts: 4523
Joined: 2007/12/11 08:17:33
Location: Finland

(memory?) corruption

Post by jlehtone » 2010/05/07 15:58:05

Hi,

OpenGL apps where almost freezing X unpredictably, so I did test several nvidia drivers.
Not crashing on them, apps started to segfault. Even yum and rpm did.
smartctl said "all ok" about the HDD.
Consecutive calls to 'rpm -Va' produce different outputs.

The machine is almost new, and haven't been used much previously. Identical
machine has been reported to freeze couple times recently, so software (fc12) was
suspected first. There was a bunch of updates rolled in before taking this machine into use.

But now I'm not so sure any more. memtest, perhaps?

Asus board, E8500, 4 GB, passive GF9600GT.

(duplicates removed):
[code]# rpm -Va
..5...... /usr/lib/nss/unsupported-tools/selfserv
..5...... /usr/lib/R/library/Biobase/help/Biobase.rdb
..5...... /usr/share/gnome/help/gedit/es/gedit.xml
..5...... /usr/share/i18n/charmaps/IBM918.gz
..5...... /usr/bin/ldb3add
..5...... /usr/bin/smbpasswd
..5...... /usr/lib/libnetapi.so.0
..5...... /usr/lib/libfftw3.so.3.2.4
missing c /usr/share/cups/templates/de/optign-pickmany.tmpl
..5...... /usr/share/gimp/2.0/help/en/gimp-xrefs.xml


# rpm -Va
..5...... d /usr/share/doc/perl-libwww-perl-5.834/Changes
..5...... d /usr/share/doc/perl-Tk-804.028/demos/widget_lib/floor.pl
..5...... /usr/lib/plt/collects/macro-debugger/stxclass/private/compiled/sc_ss.zo
..5...... /usr/lib/plt/collects/mrlib/compiled/hierlist_ss.zo
..5...... /usr/lib/plt/collects/mysterx/doc.txt
..5...... /usr/lib/plt/collects/rnrs/compiled/main_ss.zo
..5...... /usr/share/plt/doc/drscheme/interactions-window.html
..5...... /usr/share/plt/doc/gui/snip_.html
..5...... /usr/lib/libtag.so.1.6.1
..5...... /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi/auto/Date/Calc/Calc.so
..5...... /usr/share/backgrounds/nature/TwoWings.jpg
..5...... /usr/share/texmf/fonts/tfm/public/tex-gyre/qx-qtmbi.tfm
..5...... /usr/share/texmf/fonts/type1/vntex/vnr/vnssdc10.pfb
..5...... /usr/share/texmf/fonts/vf/urw35vf/helvetic/uhvr8tn.vf
..5...... /usr/share/texmf/tex/latex/dottex/dottex.sty
..5...... /usr/share/gok/singlekey-automatic-scanning.xam[/code]

The main problem is naturally the fact that the user said being tolerant a glitch or two, but that repeated problems ruining her work will make her really really angry ...

User avatar
AlanBartlett
Forum Moderator
Posts: 9345
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

(memory?) corruption

Post by AlanBartlett » 2010/05/07 16:30:06

Whenever I see a segfault, I immediately think of either:

(1) poor code

or

(2) defective RAM

Although the machine is almost new, I would suggest that you plan to run [i]memtest86+[/i] for up to 24 hours. It may take that long for errors to start to show up. (As [b]toracat[/b] can confirm.)

Quite often, RAM of marginal quality may be shipped with a new system. If the hardware was designed / marketed primarily for use with [i]Redmond 'doze[/i], then [url=http://en.wikipedia.org/wiki/Blue_Screen_of_Death]BSoD[/url]s would be considered a normal occurrence and slightly "off" RAM can "slip through the net".

User avatar
jlehtone
Posts: 4523
Joined: 2007/12/11 08:17:33
Location: Finland

Re: (memory?) corruption

Post by jlehtone » 2010/05/07 17:04:36

[quote]AlanBartlett wrote:
Although the machine is almost new, I would suggest that you plan to run [i]memtest86+[/i] for up to 24 hours. It may take that long for errors to start to show up.[/quote]
I know, and I kick myself for not having (a) tested the RAM by default and (b) had time to start a run over the weekend. :-(

User avatar
jlehtone
Posts: 4523
Joined: 2007/12/11 08:17:33
Location: Finland

Re: (memory?) corruption

Post by jlehtone » 2010/05/10 08:53:20

As update, scratch that last sentence. After a bit testing I did conclude that the following:
1. yum install memtest86+-4.00-2.fc12.i686
2. memtest-setup
3. Edit grub.conf to make memtest the default
4. Reboot

Leads to memtest automatically running on my absence. And thus after weekend there were 36k+
errors found in 57 hours.


PS. The fc12 package contains an ELF-binary, which the grub loads. However, neither the
memtest86+ (1.65) in CentOS 5.4 base nor the more recent version in rpmforge does include
that ELF, and therefore grub on my own machine fails with error 28: cannot fit.

Before you say: [url=http://bugs.centos.org/view.php?id=3104]3104[/url], I must point out that instruction to "Use the ELF executable version"
does not help, if the user does not possess nor know how to generate such version.

User avatar
AlanBartlett
Forum Moderator
Posts: 9345
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: (memory?) corruption

Post by AlanBartlett » 2010/05/10 11:56:08

[quote]
And thus after weekend there were 36k+ errors found in 57 hours.
[/quote]
I think you may have found the reason for the segfaults . . .

[quote]
Before you say: [url=http://bugs.centos.org/view.php?id=3104]3104[/url]
[/quote]
As if I would. :-P

User avatar
jlehtone
Posts: 4523
Joined: 2007/12/11 08:17:33
Location: Finland

Re: (memory?) corruption

Post by jlehtone » 2010/05/10 16:09:20

While you wouldn't, some forum regulars ... :roll:

Two 2 GiB Kingston DDR2 800 Valueram. One shows errors, the other does not.

These desktops happen to be first F12 installs here, and do show other peculiar
things too, but there I suspect the F12, for it could not possibly be the CentOS
(NFS) servers that make zombies out of Nautilus (users), right? :-)

User avatar
AlanBartlett
Forum Moderator
Posts: 9345
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: (memory?) corruption

Post by AlanBartlett » 2010/05/10 16:46:49

[quote]
2 GiB Kingston DDR2 800 Valueram
[/quote]
I always use RAM from [url=http://www.crucial.com/]Crucial[/url]. No problems . . . so far. ;-)

[quote]
These desktops happen to be first F12 installs here, and do show other peculiar
things too, but there I suspect the F12, for it could not possibly be the CentOS
(NFS) servers that make zombies out of Nautilus (users), right? :-)
[/quote]
Ah ha. You mention an F-number. You really need [b]Scott[/b] for some F-words of advice. :-P

If you have a mix of systems such as [i]F12[/i] and [i]CentOS[/i] then, of course, all issues [i][b]must[/b][/i] originate from the F-boxes. (An [i]enterprise class[/i] OS v a [i]hackers' delight[/i].) What else would you expect me to say? :-D

Post Reply