Johanna said:
I don't really understand how dual channel RAM works, and how much
benefits it really brings me.
My (crazy) memory situation is this:
On my AMD motherboard there are 4 slots for memory.
Previously I had 2 x 512 MB Ram.
The Bios (AMI) believed this was dual channel (or at least it said it
did at boot-up?)
But the two sticks were not the same make, which is a requirement for
dual channel according to the mobo manual.
I am not sure what to make of this.
Then I bought some more RAM and accidentally got a 1 GB stick, instead
of 2 x 512. * blush* :-(
This takes me up to 2 GB using 3 slots.
When I put in the 1 GB stick in a free slot, the memory gets reported as
'single channel'.!
I came across some information about dual channel memory saying that it
works 3 times faster than single channel!
So of course, I want that....
Will it switch over to dual channel again if I buy another identical 1
GB stick?
Or does it have to be 4 identical sticks?
This is all a bit theoretical since I actually have 2 GB RAM now. I
If my current setup is completely idiotic,and there is significant
benefit in changing, I'll buy another stick to balance things out.
I could use the 512 MB sticks for something else, like a computer I am
making for my grandmother.
What would you advice me to do?
JO
First off, dual channel is not three times faster than single
channel. I've measured my P4 dual channel system here, and in
dual channel mode, memtest86+ reports 2732MB/sec when in dual
channel mode, and 1584MB/sec in single channel mode. That is a
ratio of 1.72x . And in real computing situations, the apparent
performance difference will not be as great as this. Compute
speed is not directly proportional to RAM speed, due to the
presence of that nice fat L2 cache on the processor. Many RAM
accesses hit the cache, and don't go to the memory controller,
and that is why the performance effect is not that exaggerated.
For small differences in RAM speed, a "1/3rd rule" is good
for performance estimates. In other words, if two RAM configs
differed by 6%, an "average" application would see a 2% speed
difference in the time needed to complete a calculation.
(Don't ask me what this means - this is simply an observation
I've made from looking at a bunch of benchmarks...)
This rule may not be appropriate for use with the memtest86+
collected result, and concluding the diff is 72%/3=24% could
well be wrong. The "1/3rd rule" is applicable for small diffs
only.
Here is an article, if you want to read someone else's findings:
http://www.tcmagazine.info/articles.php?action=show&id=128&perpage=1&pagenum=4
You have all of the necessary tools at hand to do your own
benchmarking. You could get a copy of SuperPI and set up your
RAM in a dual channel config (2x512) or one of several possible
single channel configurations (like 2x512+1GB if you want, since
that forces single channel mode, or even 1x1GB), then run SuperPI
in each case and see the diff in the time to calculate PI. You
can also boot with a memtest86+ memory test floppy or CD, and
view the bandwidth indicator in the upper left hand corner (third
indicator down).
Dual channel fetches 128 bits at a time. Single channel
fetches 64 bits at a time. The size of memory transfer is
generally a single cache line, so while dual channel may
be twice as wide, the burst will be half as long. Shorter
bursts lead to lower bus efficiency. So the comparison
between the two modes is not a simple 2:1 proposition.
(I wish there were some engineering pictures of the bus
data burst, so I could see whether there are any diffs in
how the bursts are done, but unfortunately there just doesn't
seem to be this kind of info available.)
There can also be issues with bus widths in the system,
at least on an Intel P4. The FSB on the processor is 64
bits wide. The bus is quad pumped. The Northbridge has
two 64 bit wide DIMMs. The system is "balanced" when two
DDR400 DIMMs deliver 3200MB/sec each, to a FSB800 processor
transferring 8 bytes at a time, for 6400MB/sec bandwidth.
If the memory were to be run faster than the FSB, then
the advantage of the speedup is not directly proportional
to the change in RAM speed. That is because the Northbridge
has to queue up a read burst, until the FSB can eat it.
There can still be a speedup, because the memory bus is
not 100% efficient, and has "holes" where there is no
activity, and so running the RAM at a bandwidth higher
than the FSB, can still have an effect.
The AMD architecture has better possibilities, as there is
no traditional FSB as such. But there are still clocks
involved inside the processor, still quantization
effects, and so the performance effect will not be
a simple minded straight line.
In the new Conroe systems, some of the DDR2 results show no
or very small difference between running at DDR2-533 and
DDR2-667. Each computing solution seems to have its quirks.
And this is why the best way to determine the effect, is
to do your own benchmarks.
As I explained previously, your best config now, is to
run 2x1GB. On an Athlon64 DDR system, that is the best
configuration. It should be able to run at DDR400. Four
sticks of RAM may run at DDR333. That is why overclockers
or enthusiasts would run a two stick config (one per channel)
and would not be caught dead with a four stick config.
In terms of four stick configurations, there are some other,
minor effects to be considered. There is a difference between
running 4x512MB and running 2x512MB + 2x1GB. The 4x512MB
configuration allows an interleaving pattern to be used in
the memory controller. A couple of address bits are moved
around on the address bus. The objective of this kind of
interleaving, is to leave as many open "pages" of memory
as possible. This is because paged operation of memory
is an overhead, and opening and closing pages takes
valuable time on the bus that could be used for other things.
A certain interleave pattern between memory chips, and
certain expectations about how processors access memory,
results in interleaving used as an optimisation. The diff
is an invisible couple of percent. But in case someone
asks, with the right memory controller, the 4x512MB
config could be slightly faster.
(See "dynamic paging mode" of PDF page 7 here, for an
explanation every bit as incomprehensible as mine
ftp://download.intel.com/design/chipsets/applnots/25273001.pdf
On an Athlon64, page 71 shows some control bits that could be
doing the same thing as the Intel feature:
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF
Paul