Differential diagnosis memtest v 4.0: it's memory fault or motherboardbuggy?

T

twistedbrain

Hi!

I've the following nasty problem: a PC with a motherboard Gateway
GA-G31M-ESL2 rev 1, F8 with 2 slot for memory and 2 DIMMS Corsair
TWIN2X 4096-6400C4DHX (2 GB each). I tried many time to install
different Linux flavours, Kubuntu and OpenSuse, but after some tens
minuts the process freezes and the PC too, I have to push the power
button to restart it. The memory should be fit for the mobo because
Corsair on its website says so, also if the Corsair one could work with
CAS 4-4-4-12 and at 2,1 V, instead the requirements for the DIMMs for
such mobo are CAS 5-5-5-18 1,8 V.

I launched memtest 4.0 that you can find with OpenSuse and after few
minutes it reported some thousands errors in test3 (Moving inversion 8
bit pattern) starting about from 1800 MB where the pattern dfdfdfdf
become dfdfdfdb (errbit 4). But If I test each memory stick in each
memory slot I don't get errors. There are errors only when the memories
work together in dual channell.
Now either mb, either RAM are new and I've the right to the warranty,
but which one is the broken one and why? (I don't have other mb or other
DIMMs to check).

Best regards to all and thanks to replying people,

Andrea
~
 
P

Paul

twistedbrain said:
Hi!

I've the following nasty problem: a PC with a motherboard Gateway
GA-G31M-ESL2 rev 1, F8 with 2 slot for memory and 2 DIMMS Corsair
TWIN2X 4096-6400C4DHX (2 GB each). I tried many time to install
different Linux flavours, Kubuntu and OpenSuse, but after some tens
minuts the process freezes and the PC too, I have to push the power
button to restart it. The memory should be fit for the mobo because
Corsair on its website says so, also if the Corsair one could work with
CAS 4-4-4-12 and at 2,1 V, instead the requirements for the DIMMs for
such mobo are CAS 5-5-5-18 1,8 V.

I launched memtest 4.0 that you can find with OpenSuse and after few
minutes it reported some thousands errors in test3 (Moving inversion 8
bit pattern) starting about from 1800 MB where the pattern dfdfdfdf
become dfdfdfdb (errbit 4). But If I test each memory stick in each
memory slot I don't get errors. There are errors only when the memories
work together in dual channell.
Now either mb, either RAM are new and I've the right to the warranty,
but which one is the broken one and why? (I don't have other mb or other
DIMMs to check).

Best regards to all and thanks to replying people,

Andrea
~

Run dual channel, swap DIMMs. Where did the error go ?
Did the errored bit move ?

When a DIMM has a boosted voltage spec, you can interpret
that a couple ways. When I see "CAS 4-4-4-12 at 2.1 V",
that tells me it really needed the voltage, to have
the tight timing. If you loosen up the timing, like your
5-5-5-18, then you may be able to use less voltage. The
industry standard voltage might be 1.8V.

I see some DIMMs, with specs like "1.9 to 2.1V" and I
don't know if that means "timing met at 1.9V" and
"can withstand 2.1V" or what to think.

In any case, you can start by repeating the dual channel
test, and swapping the DIMMs, so that if it was the DIMM,
the errored bit should move. Remember that both the
address reported and the bit location could be important,
because the size of the test location is 32 bits, while
the width of a DIMM is 64 bits. So be really careful
when examining the results, to make sure you're interpreting
it properly. Write down some addresses as well as the
errored bit pattern, before you swap DIMMs. Then do the
math when you run the test with the DIMMs swapped, and
with some luck, you'll be able to tell if the
error moved with the DIMM. In other words, the address
value is also important, in identifying what is bad.

Either the DIMM is defective, or a little extra voltage
will make it work. Up to the stated limit. Some memories
don't tolerate excess voltage well, and a high voltage
may limit the operating life. It wouldn't help, for example,
if the DIMM gets really hot with the excess voltage. Some
people who apply lots of voltage, install a cooler over
top of the DIMMs. And some brands of DIMMs, don't need an
excuse to fail. I check the reviews on Newegg, before
I buy memory now, even if I end up buying the memory
from a Canadian E-tailer.

There is an alternative explanation. There is a difference
in optimization, between the BIOS setup for 1GB DIMMs and
the setup for 2GB DIMMs. Some older motherboard BIOS are
only set up to work well with 1GB DIMMs. The BIOS doesn't
have good settings for a 2GB DIMM. My own motherboard works
that way, and there are no plans for another BIOS release.
It means I can install 2GB DIMMs (and have tried them) but
they throw errors. And I'm not going to waste the rest of
my life experimenting with bus drive settings and other related
crap. If I had a digital storage scope, it might be
worth tuning them, but just randomly trying settings
is for idiots :) I'd have something like 256 combinations
to try. It means I continue to use 2x1GB in that system,
and those are unbelievably stable. (More stable than the
previous generation of PC3200 I was using.)

If your motherboard has a BIOS update available, you can try
that. Make sure the motherboard is stable - use your one DIMM
config, before attempting to update the BIOS. Select a stable
method for updating the BIOS. On my system, that is a
DOS boot floppy and flasher program. Examine your update
options, and select the one that is least likely to
freeze on you. Doing it from Windows, is just asking for
trouble. If you have no other choice, but to flash
from Windows, make sure the tool has downloaded the file
to your local disk, so a networking problem won't
cause the flash update to fail.

(I also like to archive the current BIOS file, using the
archive option on the flashing tool. It means, I always
take a copy of the original BIOS, before flashing. If
the flashing tool says it cannot complete the flash, I
immediately try to reflash the original BIOS in. To
increase the odds the board won't be bricked. So have
your archive file in hand, before doing it. Once you
push the reset button, at that point you're gambling
the flash was a success.)

If the memory error stays put, when you swap the DIMMs,
then you might blame the motherboard. Maybe bumping some
voltage can fix it. It depends on what options you have,
as to whether that might work out.

Paul
 
L

larry moe 'n curly

twistedbrain said:
I've the following nasty problem: a PC with a motherboard Gateway
GA-G31M-ESL2 rev 1, F8 with 2 slot for memory and 2 DIMMS Corsair
TWIN2X 4096-6400C4DHX (2 GB each). I tried many time to install
different Linux flavours, Kubuntu and OpenSuse, but after some tens
minuts the process freezes and the PC too, I have to push the power
button to restart it. The memory should be fit for the mobo because
Corsair on its website says so, also if the Corsair one could work with
CAS 4-4-4-12 and at 2,1 V, instead the requirements for the DIMMs for
such mobo are CAS 5-5-5-18 1,8 V.

I launched memtest 4.0 that you can find with OpenSuse and after few
minutes it reported some thousands errors in test3 (Moving inversion 8
bit pattern) starting about from 1800 MB where the pattern dfdfdfdf
become dfdfdfdb (errbit 4). But If I test each memory stick in each
memory slot I don't get errors. There are errors only when the memories
work together in dual channell.
Now either mb, either RAM are new and I've the right to the warranty,
but which one is the broken one and why? (I don't have other mb or other
DIMMs to check).

Can you try that memory in another mobo that supports it at full
speed? Because if it fails there, then you'll know the memory is
bad. OTOH bad memory can work with some mobos but not others, so I'd
check these things, in about this order:

1. memory timings in BIOS setup;

2. memory voltage (I think it always has to be done manually in the
BIOS setup, but I'm not sure);

3. power supply voltages during normal operation (with a digital
multimeter, not software);

4. motherboard for bulging capacitors (see www.BadCaps.net for info);

5. motherboard voltages (CPU, memory bus, north bridge).

I have a feeling the Corsair memory is bad because the company admits
it uses UnTesTed (UTT) no-name chips, and any DDR2 memory specified
for 2.1V is either meant for overclocking or failed testing at the
normal 1.8V because no chip maker (module makers are rarely chip
makers)specify a normal operating voltage higher than 1.8V for DDR2.
 
T

twistedbrain

Paul ha scritto:
Run dual channel, swap DIMMs. Where did the error go ?
Did the errored bit move ?

Thanks for your prompt, very good, encyclopedic answer, but I'm not a
woman :)
It is, me too, I can reply like that to feminine names asking for help
in ng, but in Italy Andrea is a masculin name and I have a daughter, a
son and a wife and the beard too :) Andrea maybe comes from the ancient
greek word andros that means man, I don't know how and why in your
countries it became a feminine name (in Italy there are also Luca that
ends in a and is a masculin name).

It's an evanescent error, it is I made again the test of yesterday night
(same DIMMs in the same slots) that gave errors and at now it runs
without trouble. Now I'll try to launch it after an installation, when
the PC is more stressed and more prone to give errors, but this time the
installation seems to go well. Installation OK, the only difference has
been the block size of the main ext4 filesystem, that yesterday I choose
of 8 Kb and now of 4, its default.
I've seen that the difference could be if the system has been under load
for some tens minutes, like as the installation time, in such case the
errors spot out. I swapped the DIMMs and the memory position of the
error is slightly changed (not the test or the bit error pattern), then
I tested after load the single DIMMs and I found that one of them give
errors, then I tested the other in the same slot and I didn't get errors
and then I tested again the first one and I didn't get errors
(why?!?!?!?). So at now I'm thinking that could be the memory, but I'm
not yet absolutely sure. And the voltage doesn't make the difference (I
tried both).
There is an alternative explanation. There is a difference
in optimization, between the BIOS setup for 1GB DIMMs and
the setup for 2GB DIMMs. Some older motherboard BIOS are
only set up to work well with 1GB DIMMs. The BIOS doesn't
have good settings for a 2GB DIMM. My own motherboard works
that way, and there are no plans for another BIOS release.
It means I can install 2GB DIMMs (and have tried them) but
they throw errors. And I'm not going to waste the rest of
my life experimenting with bus drive settings and other related
crap. If I had a digital storage scope, it might be
worth tuning them, but just randomly trying settings
is for idiots :) I'd have something like 256 combinations
to try. It means I continue to use 2x1GB in that system,
and those are unbelievably stable. (More stable than the
previous generation of PC3200 I was using.)

That's could be the clue, but in theory such mobo should support 2x2GB
and also Corsair says that on its site about such mobo.
If your motherboard has a BIOS update available, you can try
that.

There are 2 updates but irrelated from my problem.
Make sure the motherboard is stable - use your one DIMM
config, before attempting to update the BIOS. Select a stable
method for updating the BIOS. On my system, that is a
DOS boot floppy and flasher program. Examine your update
options, and select the one that is least likely to
freeze on you. Doing it from Windows, is just asking for
trouble. If you have no other choice, but to flash
from Windows, make sure the tool has downloaded the file
to your local disk, so a networking problem won't
cause the flash update to fail.

If possible I'll have to use an USB pen, because I don't have floppy.
Gigabyte is sure from this point of view, it is, it has two copy of
BIOS, so if something goes wrong I have only to choose to recover the
BIOS at start up.

(I also like to archive the current BIOS file, using the
archive option on the flashing tool. It means, I always
take a copy of the original BIOS, before flashing. If
the flashing tool says it cannot complete the flash, I
immediately try to reflash the original BIOS in. To
increase the odds the board won't be bricked. So have
your archive file in hand, before doing it. Once you
push the reset button, at that point you're gambling
the flash was a success.)

It will not be an easy thing, because I don't have DOS, Windows or other
meaningless things :)

If the memory error stays put, when you swap the DIMMs,
then you might blame the motherboard. Maybe bumping some
voltage can fix it. It depends on what options you have,
as to whether that might work out.

Maybe it's DIMMs fault, but I'm not fully convinced.

Best regards and thanks again,

Andrea
 
T

twistedbrain

larry said:
Can you try that memory in another mobo that supports it at full
speed?

I* can't, I don't have one.
Because if it fails there, then you'll know the memory is
bad. OTOH bad memory can work with some mobos but not others, so I'd
check these things, in about this order:

1. memory timings in BIOS setup;

There are no. It is my BIOS setup I can't change CAS.
2. memory voltage (I think it always has to be done manually in the
BIOS setup, but I'm not sure);

I can do that by increasing by 0.1, 0.2, 0.3 or 0.4. If the normal
voltage is 1.8, 0.3 would do the trick, but I tried and the PC is more
responsive and quicker, but I don't get neither less, neither more
memory errors with memtest.
3. power supply voltages during normal operation (with a digital
multimeter, not software);


I don't have a multimeter but at this point I suspect I'd best to get
one. The PSU should be a good one, it is, Seasonic S12II-330 80+ Bronze
with 3 warranty years, but you never know

4. motherboard for bulging capacitors (see www.BadCaps.net for info);


They're OK.
5. motherboard voltages (CPU, memory bus, north bridge).

Multimeter?

If I can I'll manage to get replacement either for the mb, either for
the memory.

Thanks and best regards,

Andrea
 
I

invalid

Occassionally the problem is as simple as reseating the DIMMS. If you
move them and the errors go away, that may be all the solution you
need. I've had that happen when even boosting the voltage to 2.1
didn't help, or only helped for a short time.
 
L

larry moe 'n curly

twistedbrain said:
There are no. It is my BIOS setup I can't change CAS.

Try changing the memory bus speed. That's sometimes the only
parameter that eliminates the errors.
I can do that by increasing by 0.1, 0.2, 0.3 or 0.4. If the normal
voltage is 1.8, 0.3 would do the trick, but I tried and the PC is more
responsive and quicker, but I don't get neither less, neither more
memory errors with memtest.


I don't have a multimeter but at this point I suspect I'd best to get
one. The PSU should be a good one, it is, Seasonic S12II-330 80+ Bronze
with 3 warranty years, but you never know

Seasonic tends to be darn good, and I doubt that it's at fault here.
They're OK.


Multimeter?

If I can I'll manage to get replacement either for the mb, either for
the memory.

Considering that raising the memory voltage helps some, I'm 99% sure
that the memory is at fault. I've had a lot of trouble with Corsair
DDR2 modules, which is odd because I never saw a failure with their
PC133 and DDR memory. OTOH I had tons of problems with Kingston DDR,
from PC2100 to PC3200 (2/3 of the latter failed), while all their DDR2
modules have all been good. That's not to say you should buy Kingston
DDR2, at least not out of state, because they also use no-name
chips. I'd insist on any replacements for the Corsair modules be
rated error-free at 1.8V.
 
T

twistedbrain

I already did that, at now many times, but it works sometimes and not
forever. Then I identified a DIMM that, after have been stressed, gives
almost always memtest errors, so I have the guilty.

Regards

(e-mail address removed) ha scritto:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top