G
Gianni Mariani
I started getting instabilities on my homebuilt dual Athlong 2400 on an
MSI K7D Master. I run Linux on the box and I started getting kernel panics.
The system has 2 512MB sticks of Samsung ECC memory and the BIOS setting
are set to "Error Correct" so I thought that this would be adequate to
deal with memory issues.
The failures started a while ago with a program I wrote a while ago
called "cpulat" which measures the CPU->CPU memory latentcy. It would
crash in unexpected nonsensical places and after trying to debug it for
a while, I just gave up. Then a few months later, the machine
mysteiously hung, followed by a succsession of kernel panics with
allways the same error message.
In the process of trying to diagnose the problem the mainboard stopped
responding to the keyboard and mouse which led to swapping out the
mainboard. Kernel panics still persisted and for a joke I swapped the
CPU's and finally I pulled one of the memory sticks and bingo, the
machine was now stable. I picked up a replacement stick and now it is
working properly and I now can't reproduce the cpulat errors either.
I've built probably 30+ PC's in my time and I have never seen this kind
of behaviour.
So the question that still remains for me is why didn't the ECC error
recovery/check pick this up ?
MSI K7D Master. I run Linux on the box and I started getting kernel panics.
The system has 2 512MB sticks of Samsung ECC memory and the BIOS setting
are set to "Error Correct" so I thought that this would be adequate to
deal with memory issues.
The failures started a while ago with a program I wrote a while ago
called "cpulat" which measures the CPU->CPU memory latentcy. It would
crash in unexpected nonsensical places and after trying to debug it for
a while, I just gave up. Then a few months later, the machine
mysteiously hung, followed by a succsession of kernel panics with
allways the same error message.
In the process of trying to diagnose the problem the mainboard stopped
responding to the keyboard and mouse which led to swapping out the
mainboard. Kernel panics still persisted and for a joke I swapped the
CPU's and finally I pulled one of the memory sticks and bingo, the
machine was now stable. I picked up a replacement stick and now it is
working properly and I now can't reproduce the cpulat errors either.
I've built probably 30+ PC's in my time and I have never seen this kind
of behaviour.
So the question that still remains for me is why didn't the ECC error
recovery/check pick this up ?