Intel's FB-DIMM, any kind of RAM will work for your controller?

T

The little lost angel

The "FB" buffer on an FBdimm is also a bus repeater (aka "buffer") for the
"next" FBdimm in the chain of FBdimms that comprise a channel. The presence of
this buffer feature allows the channel to run at the advertised frequencies in
the face of LOTS of FBdimms on a single channel - frequencies that could not
be achieved if all those dimms were on the typical multi drop memory
interconnect (ala most multi-dimm SDR/DDR/DDR2 implementations).

Does this also mean that I could in theory put a very fast say 1.6Ghz
buffer on the FBDIMM and sell it as say DDR3-1.6Ghz because of that.
Even though the actual ram chips are only capable of say 200Mhz?
:pPpPpP

--
L.Angel: I'm looking for web design work.
If you need basic to med complexity webpages at affordable rates, email me :)
Standard HTML, SHTML, MySQL + PHP or ASP, Javascript.
If you really want, FrontPage & DreamWeaver too.
But keep in mind you pay extra bandwidth for their bloated code
 
Y

Yousuf Khan

The little lost angel said:
Does this also mean that I could in theory put a very fast say 1.6Ghz
buffer on the FBDIMM and sell it as say DDR3-1.6Ghz because of that.
Even though the actual ram chips are only capable of say 200Mhz?

Wasn't there also some talk back in the early days of the K7 Athlon about
Micron coming out with an AMD chipset with a huge buffer built into its own
silicon. Micron went so far as to give it a cool codename, Samurai or Mamba
or something. But nothing else came of it after that.

Yousuf Khan
 
D

daytripper

Does this also mean that I could in theory put a very fast say 1.6Ghz
buffer on the FBDIMM and sell it as say DDR3-1.6Ghz because of that.
Even though the actual ram chips are only capable of say 200Mhz?
:pPpPpP

The short answer is: certainly.

The longer answer is: this is *exactly* the whole point of this technology: to
make heaps of s l o w but cheap (read: "commodity") drams look fast when
viewed at the memory channel, in order to accommodate large memory capacities
for server platforms (ie: I doubt you'll be seeing FBdimms on conventional
desktop machines anytime soon).

Like the similar schemes that have gone before this one, it sacrifices some
latency at the transaction level for beau coup bandwidth at the channel level.

No doubt everyone will have their favorite benchmark to bang against this to
see if the net effect is positive...

/daytripper (Mine would use rather nasty strides ;-)
 
T

The little lost angel

Wasn't there also some talk back in the early days of the K7 Athlon about
Micron coming out with an AMD chipset with a huge buffer built into its own
silicon. Micron went so far as to give it a cool codename, Samurai or Mamba
or something. But nothing else came of it after that.

Hmm, don't remember that much. Only remember for sure what you forgot,
it was Samurai :p

--
L.Angel: I'm looking for web design work.
If you need basic to med complexity webpages at affordable rates, email me :)
Standard HTML, SHTML, MySQL + PHP or ASP, Javascript.
If you really want, FrontPage & DreamWeaver too.
But keep in mind you pay extra bandwidth for their bloated code
 
T

Tony Hill

Wasn't there also some talk back in the early days of the K7 Athlon about
Micron coming out with an AMD chipset with a huge buffer built into its own
silicon. Micron went so far as to give it a cool codename, Samurai or Mamba
or something. But nothing else came of it after that.

I believe they even built a prototype. Never made it to market
though. Either way, the chipset in question just had an L3 cache (8MB
of eDRAM if my memory serves) on the chipset, nothing really to do
with the buffers in Fully Buffered DIMMs. Buffer != cache.
 
K

KR Williams

@twister01.bloor.is.net.cable.rogers.com>, news.tally.bbbl67
@spamgourmet.com says...
Not necessarily, a buffer is also meant to increase overall bandwidth, which
may be done at the expense of latency.

Jeez Yousuf, a "buffer" may be used simply to increase drive
(current, if you will). An INVERTER can be a buffer "buffer"
(though most buffers are non-inverting to avoid confusion. Then
again there are unbuffered inverters (74xxU... ;-)

The point is that there are *many* uses of the term "buffer" and
most have nothing to do with any kind of a "cache". A "cache"
implies an addressed (usually with a directory) storage element.
A "buffer" implies no such thing.
 
K

KR Williams

Sure; you can think of it as a *really* small cache, which will
therefore have a terrible hit ratio, thus (most likely) increasing latency.

Ok, how about a *Zero* byte buffer (a.k.a. amplifier)? Is that a
cache?? You're wrong. A cache is a buffer of sorts, but the term
"buffer" in no way implies a cache. It may simply be an
amplifier, which I believe is the case here. 'tripper has his
ear closer to this ground than anyone else here.
 
K

KR Williams

Yes, but not by making the RAM any faster, but by avoiding RAM accesses.
We add cache to the CPU because we admit our RAM is slow.

We "admit"?? Hell it's a known issue that RAM is *SLOW*. Caches
are there to improve apparent latency, sure.
That makes no sense. Everything between the CPU and the memory will
increase latency. Even caches increase worst case latency because some time
is spent searching the cache before we start the memory access. I think
you're confused.

No necessarily. Addresses can be broadcast to the entire memory
hierarchy simultaneously. The first to answer wins. If it's
cached, it's fast. If not, there is no penalty in asking the
cach if it's there and being answered in the negative.
Except that we're talking about memory latency due to buffers. And by
memory latency we mean the most time it will take between when we ask the
CPU to read a byte of memory and when we get that byte.

Buffers <> caches. IIRC, the issue here was about buffers.
 
K

KR Williams

.

This particular buffer reduces the DRAM interface pinout by a factor
of 3 for CPU chips having the memory interface on-chip (such as
Opteron, the late and unlamented Timna, and future Intel CPUs). This
reduces the cost of the CPU chip while increasing the cost of the DIMM
(because of the added buffer chip).

And yes, the presence of the buffer does increase the latency.

It may reduce it too! ;-) On-chip delay goes up with the square
of the length of the wire. Adding a *buffer* in the wire drops
this to 2x half the length squared (plus buffer delay). "Buffer"
has many meanings. Me thinks CG doesn't "get it".
There are other tradeoffs, the main one being the ability to add lots
more DRAM into a server. Not important for desktops. YMMV.

In this specific instance, perhaps not. Memory is good though.
More is better, and an upgrade path is also goodness. ...at
least for the folks in this group. ;-)
 
D

David Schwartz

No necessarily. Addresses can be broadcast to the entire memory
hierarchy simultaneously. The first to answer wins. If it's
cached, it's fast. If not, there is no penalty in asking the
cach if it's there and being answered in the negative.

Consider two back-to-back addresses. We start broadcasting the first
address on the memory bus but the cache answers first. Now we can't
broadcast the second address onto the memory bus until we can quiesce the
address bus from the first address, can we?
Buffers <> caches. IIRC, the issue here was about buffers.

Buffers must increase latency. Caches generally increase worst case
latency; however, unless you have a pathological load, they should improve
average latency.

DS
 
G

George Macdonald

Consider two back-to-back addresses. We start broadcasting the first
address on the memory bus but the cache answers first. Now we can't
broadcast the second address onto the memory bus until we can quiesce the
address bus from the first address, can we?

The look aside vs. look through cache. It depends... on all the relative
timings. First a cache does not have to be "searched" - from the lookup
you can get a hit/miss answer in one cycle. Assuming look aside cache, if
the memory requests are queued to the memory controller, there's the
question of whether you can get a Burst Terminate command through to the
memory chips past, or before, the 2nd memory access.
Buffers must increase latency. Caches generally increase worst case
latency; however, unless you have a pathological load, they should improve
average latency.

Two points here. I don't think we're talking about data buffering - more
like "electrical" buffering, as in registered modules. If you have 4 (or
more) ranks of memory modules (per channel) operating at current speeds,
you need the registering/buffering somewhere. It makes sense to move it
closer to the channel interface of the collective DIMMs than to have it
working independently on each DIMM. I'm not sure there's necessarily any
increased latency for that situation.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
D

David Schwartz

George Macdonald said:
On Wed, 21 Apr 2004 19:42:35 -0700, "David Schwartz"
<[email protected]>
wrote:
The look aside vs. look through cache. It depends... on all the relative
timings. First a cache does not have to be "searched" - from the lookup
you can get a hit/miss answer in one cycle.

That would still be a one cycle delay while the cache was searched,
whether or not you found anything in it.
Assuming look aside cache, if
the memory requests are queued to the memory controller, there's the
question of whether you can get a Burst Terminate command through to the
memory chips past, or before, the 2nd memory access.

Even so, it takes some time to terminate the burst.
Two points here. I don't think we're talking about data buffering - more
like "electrical" buffering, as in registered modules.

No difference. There is not a data buffer in the world whose output
transitions before or at the same time as its input. They all add some delay
to the signals.
If you have 4 (or
more) ranks of memory modules (per channel) operating at current speeds,
you need the registering/buffering somewhere. It makes sense to move it
closer to the channel interface of the collective DIMMs than to have it
working independently on each DIMM. I'm not sure there's necessarily any
increased latency for that situation.

If you have so many modules per channel that you need buffering, then
you suffer a buffering penalty. That's my point. Whether that means you need
faster memory chips to keep the same cycle speed or you cycle more slowly,
you have a buffering delay.

I'm really not saying anything controversial. Buffers and caches
increase latency, at least in the worst case access.

DS
 
F

Felger Carbon

George Macdonald said:
Two points here. I don't think we're talking about data buffering - more
like "electrical" buffering, as in registered modules. If you have 4 (or
more) ranks of memory modules (per channel) operating at current speeds,
you need the registering/buffering somewhere. It makes sense to move it
closer to the channel interface of the collective DIMMs than to have it
working independently on each DIMM. I'm not sure there's necessarily any
increased latency for that situation.

I think the "increased latency" is with respect to the usual (in PCs)
one or two unbuffered DIMMs. In this case, the FB-DIMMs do indeed
have a greater latency.

Keep in mind that there will really be no choice once you've bought
your mobo. The CPU socket will either be for a CPU to use traditional
DIMMs or (with 66% fewer memory pins) to use FB-DIMMs. You will never
ever stand there with both types of memory modules in hand and have to
decide which to plug in.
 
C

chrisv

Felger Carbon said:
I think the "increased latency" is with respect to the usual (in PCs)
one or two unbuffered DIMMs. In this case, the FB-DIMMs do indeed
have a greater latency.

Keep in mind that there will really be no choice once you've bought
your mobo. The CPU socket will either be for a CPU to use traditional
DIMMs or (with 66% fewer memory pins) to use FB-DIMMs. You will never
ever stand there with both types of memory modules in hand and have to
decide which to plug in.

"OE quotefix", dude.
 
K

KR Williams

Consider two back-to-back addresses. We start broadcasting the first
address on the memory bus but the cache answers first. Now we can't
broadcast the second address onto the memory bus until we can quiesce the
address bus from the first address, can we?

You're assuming the time to access to the caches is a significant
fraction of the time required to access main memory. It's
certainly not. Cache results are known *long* before the address
is broadcast to the mass memory. By the time the memory request
gets near the chip's I/O the caches know whether they can deliver
the data. If so the memory request is killed. THere is no
additional latency here.
Buffers must increase latency.

Ok. You're right. Except that things don't work without
buffers. Does that mean they increase latency, or does it mean
that they allow things to *work*?
Caches generally increase worst case
latency; however, unless you have a pathological load, they should improve
average latency.

Again, BUFFERS <> CACHES! A buffer can be a simple amplifier
(thus no storage element at all). It's naive to say that a
buffer increases latency (particularly since many here don't seem
to understand the term).
 
K

KR Williams

The look aside vs. look through cache. It depends... on all the relative
timings. First a cache does not have to be "searched" - from the lookup
you can get a hit/miss answer in one cycle. Assuming look aside cache, if
the memory requests are queued to the memory controller, there's the
question of whether you can get a Burst Terminate command through to the
memory chips past, or before, the 2nd memory access.

Sure. THe cache is "searched" in smaller time than the request
gets to the I/O. If it's satisfied by the caches, the storage
request can be canceled with no overhead. If not, the storage
request I allowed to continue.
Two points here. I don't think we're talking about data buffering - more
like "electrical" buffering, as in registered modules.

Bingo! ...though I thought this was clear.
If you have 4 (or
more) ranks of memory modules (per channel) operating at current speeds,
you need the registering/buffering somewhere. It makes sense to move it
closer to the channel interface of the collective DIMMs than to have it
working independently on each DIMM. I'm not sure there's necessarily any
increased latency for that situation.

I'm not either. It works one way, and not the other. DOes that
mean the way it *works* is slower?
 
K

KR Williams

That would still be a one cycle delay while the cache was searched,
whether or not you found anything in it.

Oh, NO! The cache is *not* "searched". THe answer is yes/no and
that answer is quick. In addition the request can be sent in
parallel to the next level of hierarchy and canceled if satisfied
at a lower level. The load/store queues must be coherent for
other reasons, this is a minor architectural complication.
Even so, it takes some time to terminate the burst.

The burst hasn't even started. Sheesh!
No difference. There is not a data buffer in the world whose output
transitions before or at the same time as its input. They all add some delay
to the signals.

Sure. If the signals don't get there they're hardly useful
though.
If you have so many modules per channel that you need buffering, then
you suffer a buffering penalty. That's my point. Whether that means you need
faster memory chips to keep the same cycle speed or you cycle more slowly,
you have a buffering delay.

I'm really not saying anything controversial. Buffers and caches
increase latency, at least in the worst case access.

Certainly anything that adds latency, adds latency (duh!), but
you're arguing that buffers == caches *and* that caches increase
latency. This is just not so!
 
G

George Macdonald

That would still be a one cycle delay while the cache was searched,
whether or not you found anything in it.

On current CPUs, a memory channel cycle is 10-15 or so cache cycles - get
things aligned right, call it a coupla cache clocks, and there's no need to
shove the address on the memory bus (AMD) or FSB (Intel). Accurate info is
elusive on this kind of thing now but I believe that look-aside caches are
just considered unnecessary now.
Even so, it takes some time to terminate the burst.

I doubt that it's going to need to be terminated - IOW the 1st cache
hit/miss result (not necessarily the cache data) should be available before
the memory address has passed out of the CPU.
No difference.
D'oh!

There is not a data buffer in the world whose output
transitions before or at the same time as its input. They all add some delay
to the signals.

But it's not data that's being buffered - it's simply a (near)zero-gain
amplifier to keep all the modules talking in unison.
If you have so many modules per channel that you need buffering, then
you suffer a buffering penalty. That's my point. Whether that means you need
faster memory chips to keep the same cycle speed or you cycle more slowly,
you have a buffering delay.

That's what the damned thing is for - large memory systems. Currently you
put registered DIMMs, with their latency penalty, in such a system and even
there you run into problems with the multi-drop memory channel of DDR.
What I'm saying is that the buffering of FB-DIMMs is not necessarily any
worse and you get the DIMMs to "talk" consistently to the channel.
I'm really not saying anything controversial. Buffers and caches
increase latency, at least in the worst case access.

You seem to be stuck in data buffers!

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top