Multi-core and memory

Rui Pedro Mendes Salgueiro · May 20, 2008

Hello

Would it make sense to have multiple memory interfaces in multi-core CPUs ?
Have Intel or AMD announced plans to have such a thing ?

MitchAlsup · May 20, 2008

Opteron with its on-chip DRAM controller and on-board chip-to-chip
interconnect has had multiple memory controllers for (¿what?) 4.5
years now.....

Rui Pedro Mendes Salgueiro · May 21, 2008

In comp.arch MitchAlsup said:
Opteron with its on-chip DRAM controller and on-board chip-to-chip
interconnect has had multiple memory controllers for (¿what?) 4.5
years now.....

It has multiple memory controllers, if you have multiple chips.
When Opteron first appeared, each chip had only one core so it
had one memory bank per core.

Then with dual-core Opterons, each memory bank served 2 cores.

Now, with quad-core chips, I think you still have only one memory
controller per chip (I saw something in AMD's web site about that memory
being able to be used either as one 128-bit-wide memory or 2 64-bit-wide
memories, but that is not quite the same thing).

Since even one core can saturate one memory controller, it seems to me
that the systems are getting more and more inbalanced, and it could be
useful to have multiple memory controllers per chip. But maybe it would
make more sense to have wider memory instead.

And I suppose for the moment it is not practical to do either thing (pin
count, price for base configurations, other reasons ?).

Evandro Menezes · May 21, 2008

Now, with quad-core chips, I think you still have only one memory
controller per chip (I saw something in AMD's web site about that memory
being able to be used either as one 128-bit-wide memory or 2 64-bit-wide
memories, but that is not quite the same thing).

You're right about your concerns, but AMD's Barcelona has two memory
controllers that can be ganged together to control memory as a 128-bit
array (for greater bandwidth) or left independent to control separate
64-bit memory arrays (page-interleaved, IIRC).

HTH

Evandro Menezes · May 21, 2008

Opteron with its on-chip DRAM controller and on-board chip-to-chip
interconnect has had multiple memory controllers for (¿what?) 4.5
years now.....

Actually, K8 had a dual-channel controller, meaning that it would keep
track of RAM resources for both channels, such as open pages, etc.
For example, if a RAM page was open, it was open on both channels.

With Barcelona though does have two independent controllers. For
example, a RAM page on one "channel" might not have a corresponding
open page on the other "channel".

HTH

MitchAlsup · May 21, 2008

Actually, K8 had a dual-channel controller, meaning that it would keep
track of RAM resources for both channels, such as open pages, etc.
For example, if a RAM page was open, it was open on both channels.

Yes, the later K8s did (rev ¿C? and above).
Although generally marketed as "allowing more 'stuffings' of the DRAM
arrays" it was, in effect two DRAM controllers hiding behind 1 memory
controller.

With Barcelona though does have two independent controllers. For
example, a RAM page on one "channel" might not have a corresponding
open page on the other "channel".

Barcelona is an enhancement of the later K8 dual controllers, with a
much greater write buffer depth and a watermarked write back scheme,
and a much more clever DRAM scheduling scheme. Some prefetching is
done by the DRAM controller itself on cycles that are not otherwise
"in demand". Something that CPU-based and Cache-based prefetchers
cannot do--because they cannot figure out when the cycles are free.

The decision to run the DRAM banks as one channel of 128 bits (Plus
ECC as desired) or as two channels of 64-bits is done at boot time. If
all the DRAMs on both channels can comply with the same timings,
simulations showed that generally the 1 channel twice as wide
performed better. When a random mix of DRAM timings is "stuffed" into
the sockets, the controler manages both banks independently with the
slowest timings on that bank. You particular BIOS may allow you to run
the DRAMs as 2 banks even if all the timings are the same.

Mitch

Nate Edel · May 21, 2008

In comp.sys.ibm.pc.hardware.chips Rui Pedro Mendes Salgueiro said:
Since even one core can saturate one memory controller, it seems to me
that the systems are getting more and more inbalanced, and it could be
useful to have multiple memory controllers per chip. But maybe it would
make more sense to have wider memory instead.

And I suppose for the moment it is not practical to do either thing (pin
count, price for base configurations, other reasons ?).

Intel went to wider memory (quad channel) with the current generation (5xxx
series) of dual socket Xeons, but that's still multiple sockets - AMD has
effectively been doing that since the Opterons came out - and that's also
only between the northbridge and memory.

Bandwidth and memory width do go up at times; Intel hasn't widened from 64
bits since the Pentium classic came out but they have upped the clock speed
many times and gone from a regular FSB to a QDR one (which is then split
into DDR dual channel by the north bridge)

With AMD, it should be hypothetically possible to stick additional memory
controllers on the end of some of the HT links, but it would be slow
compared to memory off the onboard controller, and I don't know if anyone's
actually done it.

MitchAlsup · May 22, 2008

Rui Pedro Mendes Salgueiro said:
Rui Pedro Mendes Salgueiro said:

Hello

Click to expand...

Would it make sense to have multiple memory interfaces in multi-core CPUs
?
Have Intel or AMD announced plans to have such a thing ?

Click to expand...

[...]

I have always wondered why a multi-core CPU could not be
__directly_integrated__ into a memory card. IMHO, a 2GB mem-card should be
able to physically integrate with one or two multi-core CPU's. The memory
which resides on the same card as the CPU(s) could be accessed using a cache
coherent shared memory model. If one card needs to communicate with another,
then a message-passing interface would be utilized. Think of a single
desktop sized system that has 8 2GB cards with two 64-core CPU's per-card.
That's 16GB of total distributed memory running on 1024 cores...

Does anybody know of any experimental projects that are trying to accomplish
something even vaguely similar?

Of course, the chip vendor would need to be the memory vendor as well...
Humm...

IMVHO, drastically reducing the physical distance between the chip and its
local memory can be very important factor wrt scalability concerns. It
should be ideal to merge the chip and a couple of GB of memory into a single
unit.

Intra-CPU to local memory communication would use shared memory, and
inter-CPU and remote memory communication would use message passing. It
seems the scheme could be made to work... What am I missing?

Heat disipation. One could, in principle, design a DRAM daughtercard
that had an Opteron/Barcelona socket in the middle, ranks of DRAMs to
the left and right, and HT links through the pins. Dealing with the
100Watts of power at the center would make the spacing between
daughter cards pretty large, and cooling a little on the difficult
side.

BUT it is possible today, if you wanted to try and make a go of it.

Del Cecchi · Jun 2, 2008

Chris Thomasson said:
Do you know of _any_ venture capitalists who might be even remotely
interested in reducing the physical distance between processing
elements and their local/regional memory source(s)? My cousins sister
is good friends with a guy from Google (his first name is Ray). Should
I ask him what he thinks? Would I be wasting his time???? He spends his
some of hard earned money in the form of donations to the South Lake
Tahoe Basin and Douglas County Region of Nevada. This is real. I have
several Ideas, but I DO NOT like to WASTE peoples TIME! Any advise
Sir?!

;^/

Are there any integral caveats wrt state-of-the-art liquid and fan
intergraded cooling systems?

Perhaps you ought to review IBM Mainframe designs, high end Pseries
designs, and Blue Gene/L and BlueGene/P.

or are you guys just being snarky and I didn't notice?

del

Neal · Jun 6, 2008

Rui Pedro Mendes Salgueiro said:
Rui Pedro Mendes Salgueiro said:

Hello

Click to expand...

Would it make sense to have multiple memory interfaces in multi-core CPUs
?
Have Intel or AMD announced plans to have such a thing ?

Click to expand...

[...]

I have always wondered why a multi-core CPU could not be
__directly_integrated__ into a memory card. IMHO, a 2GB mem-card should be
able to physically integrate with one or two multi-core CPU's. The memory
which resides on the same card as the CPU(s) could be accessed using a cache
coherent shared memory model. If one card needs to communicate with another,
then a message-passing interface would be utilized. Think of a single
desktop sized system that has 8 2GB cards with two 64-core CPU's per-card.
That's 16GB of total distributed memory running on 1024 cores...

Does anybody know of any experimental projects that are trying to accomplish
something even vaguely similar?

Of course, the chip vendor would need to be the memory vendor as well...
Humm...

IMVHO, drastically reducing the physical distance between the chip and its
local memory can be very important factor wrt scalability concerns. It
should be ideal to merge the chip and a couple of GB of memory into a single
unit.

Intra-CPU to local memory communication would use shared memory, and
inter-CPU and remote memory communication would use message passing. It
seems the scheme could be made to work... What am I missing?

With this type of setup, it seems like each card could be running a separate
operating system that is physically isolated from the other cards in the
system. Their only communication medium would be message passing. OS(a)
running on Card(a) could communicate with OS(b) running on Card(b) using
MPI. Card(a) intra-comm could use shared memory. This sure seems like it
would scale. Adding extra cards would not seem to be a problem. They might
even be able to be hot-swappable. Humm...

The programming model would be something like:

http://groups.google.com/group/comp.arch/msg/18dbf634f491f46b

http://groups.google.com/group/comp.arch/msg/2e5eeaecd0e69aed

Basically, intra-node communication is analogous to inter-card comms, and
inter-node comms would be similar to inter-card comm...

Any thoughts?

Are you talking about processor in memory (PIM) and intelligent RAM
(IRAM)? There are many academic projects which have implemented/
proposed what you are talking about.

Neal · Jun 7, 2008

message
Hello
Would it make sense to have multiple memory interfaces in multi-core
CPUs
?
Have Intel or AMD announced plans to have such a thing ?
[...]
I have always wondered why a multi-core CPU could not be
__directly_integrated__ into a memory card. IMHO, a 2GB mem-card should
be
able to physically integrate with one or two multi-core CPU's. The memory
which resides on the same card as the CPU(s) could be accessed using a
cache
coherent shared memory model. If one card needs to communicate with
another,
then a message-passing interface would be utilized. Think of a single
desktop sized system that has 8 2GB cards with two 64-core CPU's
per-card.
That's 16GB of total distributed memory running on 1024 cores...
Does anybody know of any experimental projects that are trying to
accomplish
something even vaguely similar?
Of course, the chip vendor would need to be the memory vendor as well...
Humm...
IMVHO, drastically reducing the physical distance between the chip and
its
local memory can be very important factor wrt scalability concerns. It
should be ideal to merge the chip and a couple of GB of memory into a
single
unit.
Intra-CPU to local memory communication would use shared memory, and
inter-CPU and remote memory communication would use message passing. It
seems the scheme could be made to work... What am I missing?
With this type of setup, it seems like each card could be running a
separate
operating system that is physically isolated from the other cards in the
system. Their only communication medium would be message passing. OS(a)
running on Card(a) could communicate with OS(b) running on Card(b) using
MPI. Card(a) intra-comm could use shared memory. This sure seems like it
would scale. Adding extra cards would not seem to be a problem. They
might
even be able to be hot-swappable. Humm...
The programming model would be something like:
http://groups.google.com/group/comp.arch/msg/18dbf634f491f46b
http://groups.google.com/group/comp.arch/msg/2e5eeaecd0e69aed
Basically, intra-node communication is analogous to inter-card comms, and
inter-node comms would be similar to inter-card comm...
Any thoughts?

Click to expand...

Click to expand...

Are you talking about processor in memory (PIM) and intelligent RAM
(IRAM)? There are many academic projects which have implemented/
proposed what you are talking about.

Click to expand...

I am talking about integrating a multi-core processor with a 2GB of DDR3
memory in a single pluggable card. Think if one could cram a N-core
processor directly into the following card:

http://www.corsair.com/products/dominator.aspx

I was just wondering of that could be possible, or if its a pipe-dream...

:^o

You wouldn't solve the pin bandwidth problem by doing this. Meaning,
that bandwidth across DRAM<->CPU wouldn't be improved. What you would
have to do is either place the DRAM and CPU on the same die, or
connect the two die together either via a multi-chip module or die
stacking (and then place in the same package. There are of course
problems with doing this which I'm not discussing here.

In short, you can certainly gain what you are talking about in theory,
but at the chip level rather than the board level.

Neal

Robert Myers · Jun 7, 2008

What you would
have to do is either place the DRAM and CPU on the same die, or
connect the two die together either via a multi-chip module or die
stacking (and then place in the same package.

like this:

http://www.eetimes.com/showArticle.jhtml?articleID=208402316

slashdotted today.

Robert.

Del Cecchi · Jun 10, 2008

Chris Thomasson said:
Humm...... When you get some _really_ free time, go ahead and spend a
few minutes thinking along the line of:

http://groups.google.com/group/comp.arch/browse_frm/thread/0574295430deb430
(Please read entire thread...)

WOW! Nano/Bio-Gates... Very Slick! Some genius actually created
workable gates out of several molecules! :^O

Any Thoughts?

Sure. may be important and may not be. Remember tunnel diodes?
Josephson Junctions? High Temp superconductors? AI? Xray lithography?
VLIW? All were going to change the world.

Nick Maclaren · Jun 10, 2008

|> |> >
|> > WOW! Nano/Bio-Gates... Very Slick! Some genius actually created
|> > workable gates out of several molecules! :^O
|> >
|> Sure. may be important and may not be. Remember tunnel diodes?
|> Josephson Junctions? High Temp superconductors? AI? Xray lithography?
|> VLIW? All were going to change the world.

On the other hand, some technology fails when it first appears, only
to succeed years, decades or centuries later as enabling technology,
constraints or requirements change. One always needs to consider
timescale when planning any research or design project :-)

Regards,
Nick Maclaren.

Nick Maclaren · Jun 11, 2008

|> |> >> I am talking about integrating a multi-core processor with a 2GB of DDR3
|> >> memory in a single pluggable card.
|> >
|> > I think the biggest problem you'd have with that would be thermal if
|> > you want all that in a DIMM-like form factor. If you need 30 in^3 of
|> > thermal management infrastructure for each one then you might as well
|> > have discrete processors and separate memory maybe.
|> >
|> >> I was just wondering of that could be possible, or if its a pipe-dream...
|> >
|> > A heat-pipe-dream.
|>
|> :^D
|>
|> There has to be a way to address the heat issue. Perhaps with very clever
|> liquid cooling techniques and fans. Humm...

Perhaps by stepping back and thinking.

It is obviously impossible to keep ramping up the heat density, even
ignoring matters like machine room capacities and global warming.
So, as always in engineering when faced with an insoluble problem,
the solution is to change the problem.

We already know how to solve this for another 10-20 years, but the
political will is lacking. All it requires is gritting our teeth
and saying "no, you CAN'T have continually increasing performance
of your serial spaghetti codes - we are up against the physical
limits - now get on with delivering a software revolution."

Regards,
Nick Maclaren.

Multi-core and memory

Rui Pedro Mendes Salgueiro

MitchAlsup

Rui Pedro Mendes Salgueiro

Evandro Menezes

Evandro Menezes

MitchAlsup

Nate Edel

MitchAlsup

Del Cecchi

Neal

Neal

Robert Myers

Del Cecchi

Nick Maclaren

Nick Maclaren