Rambus aims for 1 TeraByte per second memory bandwidth by 2010

  • Thread starter Thread starter AirRaid
  • Start date Start date
A

AirRaid

http://www.realworldtech.com/page.cfm?ArticleID=RWT120307033606&p=1


Rambus Sets the Bandwidth Bar at a Terabyte/Second

Last week at the annual Developer's Forum in Japan, Rambus announced
an ambitious technology initiative that aims to create a 16 gigabit-
per-second memory signaling layer that can sustain 1TB/s of bandwidth
to a single memory controller by 2010. The Terabyte Bandwidth
Initiative is still in development hence there are no shipping
products, but the goals are now public and Rambus will demonstrate a
test board that achieves 1TB/s of bandwidth over this signaling
technology. This article will provide an in-depth look at the history,
target market, technical innovations and test vehicle for Rambus'
Terabyte Bandwidth Initiative (TBI).

Target Market

The target markets for Rambus are the segments of the DRAM market that
prefer high bandwidth and are willing to sacrifice capacity to achieve
that bandwidth: graphics, consoles and possibly networking. Graphics
almost universally uses GDDR3 or GDDR4, with GDDR5 slated for 2H08.
Consoles use GDDR and XDR (in the case of the PS3), while networking
applications use DDR, SRAM and RLDRAM (Reduced Latency DRAM).

Motivation

Several trends within the computing industry have driven a tremendous
increase in the need for high bandwidth memory systems. The
exponential increases in graphics performance and display capabilities
require exponentially faster memory. The fierce competition between
NVIDIA and ATI for graphics performance is typified by extremely fast
product cycles. Each new product from either one of the contenders
contains a greater number of programmable pipelines, operating at a
faster frequency as well. The graphics memory must increase
proportionally in order to feed the highly parallel graphics
processors. On the display side, resolutions increase to match the
capabilities of graphics processors, and the internal frame buffers
must be fast enough to transfer 30-60 frames per second.

Multi-core processors have had a similar (although less extreme)
impact on the general purpose market. In theory, a dual core processor
needs twice the memory of a single core processor, the reality is that
processor architects typically use cache to reduce the demand on the
memory subsystem. While processors do not require quite the same
bandwidth as graphics applications, the mismatch between execution
capabilities and memory bandwidth (which is often referred to as the
Memory Wall) is growing quite fast. In 1989, the 486 ran at 16-25MHz,
with 8KB of on-die cache and used 33MHz memory. In 2007, a Core 2 Duo
based processor features 2-4 cores running at 3GHz, with 6-12MB of
cache and uses two channels of 1.33GHz DDR3 memory.

These trends are also found in the gaming console market. Consoles
require both general purpose and graphics processors, with matching
high performance memory hierarchies. Figure 1 above shows the
bandwidth used by various gaming consoles from 1985 to 2006, and
Rambus' target: 1TB/s in 2010.

http://www.realworldtech.com/includes/images/articles/Rambus-TBI-1.jpg

The 1TB/s target was calculated by looking at the overall trend (10X
increase every 5 years, 50GB/s in 2006), and then doubling the
extrapolation for 2010. For products to ship in that 2010 timeframe,
Rambus' IP must be fully validated and verified for high volume 45nm
and 32nm processes well in advance of this target, since system
designers will require time to integrate 3rd party IP.

The Rambus signaling technology operates at 16gbps, and it is
envisioned that a single memory controller could connect to 16 DRAMs,
with each DRAM providing 4 bytes of data per cycle (1TB/s = 16gbps *
4B * 16 DRAMs). To reach the 1TB/s target, Rambus is relying on three
key techniques to increase bandwidth: 32X data rates, full speed
command and addressing, and a differential memory architecture.



Data Tweaks and a Command and Addressing Overhaul

In general, the Terabyte Bandwidth Initiative is best viewed as the
logical successor to the XDR2 memory interface, since in many regards
it builds on that foundation but goes further towards being a narrow,
high speed signaling interface.

32X Data Rate

The memory interface operates at 16gpbs; 32X the 500MHz reference
clock. This requires an extremely accurate PLL that was designed
specifically for this purpose. The 500MHz reference clock was chosen
to reuse the infrastructure for the XDR, XDR2 and FlexIO interfaces,
all of which use 500MHz input clocks. The 32X data rate is a very
evolutionary and predictable change to the data interface. With each
generation, Rambus has consistently increased the ratio between the
data interface and the reference clock. The original Direct Rambus
interface operated at twice the reference clock and later evolved to
four times the reference clock. The first generation of XDR transfered
data at 8X the reference clock, and XDR2 increased the ratio to 16X.
Hence it should come as no surprise that this new interface runs at
32X the reference clock.

FlexLink and Differential Signaling

To reduce area and increase performance, Rambus totally redesigned the
command and addressing (C/A) link. Traditionally, commands and address
information are sent over a parallel, multi-drop bus, with a drop for
each DRAM - this is how XDR, DDR and GDDR all function. For instance,
the XDR in the CELL processor used a 12 bit wide C/A interface that
operated at 800Mbps, a quarter of the data rate. However, as data
rates increase it becomes more and more difficult to synchronize
multi- drop buses. To avoid this problem, Rambus moved from the old
model of a 12 pin shared, parallel bus using slower single-end
signaling to using the same techniques for both the data and address
links. The new address link uses a 2 pin, point-to-point C/A link with
differential signaling that operates at the full 16gbps data rate and
builds on all of Rambus' previous techniques for high performance
(such as FlexPhase, a technique to compensate for skew). Rambus refers
to the narrow, full speed C/A link as FlexLink, and the differential
signaling for the C/A link as a Fully Differential Memory Architecture
(Fully Differential since all three components, clocks data and C/A
will be differential).

The two technologies are extremely complementary. Differential
signaling avoids the capacitance and inductance problems with single-
ended signals, so that the C/A link can operate at 16gbps and achieve
the desired power profile at the transmitter. In turn, this frequency
headroom enables fewer pins for the link. In general, high performance
implementations of alternative technologies (XDR or GDDR) tend to use
one C/A bus (12 or ~20 pins) for 1-2 DRAMs. Rambus' new interface has
4-16 fewer C/A pins per DRAM than XDR and 16-32 fewer C/A pins than
GDDR.

Both of these changes are very consistent with the overall trends for
interconnect architectures in the semiconductor industry. Almost
universally, interconnects have shifted away from the slow and wide
buses that were favored in earlier days, such as SCSI or the front
side bus. Instead physics and economics tend to favor interfaces with
the fewest number of pins, where more bandwidth comes from faster
signaling, rather than additional data pins. Rambus is somewhat ahead
of the curve, as their memory interface is the first to use all
differential signaling, and narrow point-to-point links. Since Rambus
focuses primarily on the high bandwidth market, it is very likely that
these architectural changes will be mirrored by more mainstream
standards over the course of the next 2-4 years.


Rambus' Test Vehicle

Along with declaring their intentions to provide 1TB/s of bandwidth to
ASICs, GPUs or MPUs in the future, Rambus also demonstrated a test
vehicle for their future interconnect, shown below in Figure 2. The
data eyes at the memory controller (with equalization) and the data
eye at the DRAM (without equalization) are both at 16gbps.

http://www.realworldtech.com/includes/images/articles/Rambus-TBI-2.jpg

Rambus' test system is manufactured in TSMC's 65nm ASIC process and
two DRAM emulators were manufactured using their 65nm DRAM process.
The ASIC is flip chip packaged, while the DRAM emulators use wire bond
- which is consistent with overall industry practices. The demo system
does not use transmit equalization, which most high speed memory
interfaces would employ in a real world situation.

At this point, Rambus declined to discuss the power efficiency (as
measured in GB/s per mW) of this first implementation, or specific
targets for the initiative. However, they did state that they do not
believe the thermal envelop for memory interconnects has changed
significantly since the time when XDR or XDR2 debuted. This implies
that the power efficiency should increase by roughly the same factor
as the bandwidth of an individual link relative to XDR2 or XDR. One of
the advantages of setting a performance target twice as high as the
estimated needs of the target systems is that a system designer can
easily trade that extra performance for even lower power consumption.

Rambus did not quantitatively describe the project targets or
implementation specific bit-error rates (BER), other than to say that
they will provide "commercially viable BERs". The achievable level of
BERs is to a large extent implementation specific (depending on board
quality, etc.), and is not defined wholly by Rambus' IP. One challenge
that they are cognizant of is that maintaining acceptable failure
rates requires lower and lower BERs as bandwidth increases. While a
given error rate may be acceptable at 4gbps and yield a 3-5 year
expected life time, at 16gbps that same error rate will produce an
expected life time that is one fourth as long - below most commercial
requirements. Consequently, Rambus will design for lower BERs and help
any customers achieve the desired level of reliability.

Conclusion

As Rambus was very clear to point out, this announcement isn't about a
currently shipping product - it is about setting goals. These goals
are certainly aggressive, but Rambus has made substantial progress
already and there is little risk that they would fall short of their
targets. Rambus' work in this area is extremely relevant for the
console and graphics markets. When this technology is finally
productized, the initial design wins are most likely to be in next
generation consoles, particularly designs from Sony, given their
previous experience. The most interesting aspect of the technology
that Rambus has discussed are the implications for other
interconnects. It will be very interesting to see when other
interconnects follow suit and transition command and address
communication towards narrow high speed differential links instead of
slower and wider single ended buses.
 
Have you ever heard of copyright infringement? Because you just
posted my article without attribution, and you have no rights to
reprint or re-use my article.

DK
 
Have you ever heard of copyright infringement? Because you just
posted my article without attribution, and you have no rights to
reprint or re-use my article.

DK

Ever hear of the "Fair Use Doctrine"? If not, you really ought to become
familiar with it. It clearly covers his ass quite well.

He *did* post a link to the original article, so it was obvious he wasn't
trying to claim originality.

It was clearly stupid of him to paste the content here when the link would
have sufficed - a senseless waste of bytes (Oh! The humanity!) but the
omission of full attribution here is a hand-slappable offense, nothing more...

/daytripper
 
daytripper said:
Ever hear of the "Fair Use Doctrine"? If not, you really ought to become
familiar with it. It clearly covers his ass quite well.

Not "quite well" IMHO.

Posting the link plus a short excerpt would have been fine, as it is
David possibly lost a _lot_ of page views which would have generated
real income for him.
He *did* post a link to the original article, so it was obvious he wasn't
trying to claim originality.

It was clearly stupid of him to paste the content here when the link would
have sufficed - a senseless waste of bytes (Oh! The humanity!) but the
omission of full attribution here is a hand-slappable offense, nothing more...

Consider him slapped then.

Terje
 
Terje Mathisen said:
Not "quite well" IMHO.

Posting the link plus a short excerpt would have been fine, as it is David
possibly lost a _lot_ of page views which would have generated real income
for him.

[...]

Good point.
 
David said:
Have you ever heard of copyright infringement? Because you just
posted my article without attribution, and you have no rights to
reprint or re-use my article.

The AirRaid/Radeon350/whatever spamming, cross-posting, nym-shifting
nitwit has been doing this kind of thing for years.
 
The AirRaid/Radeon350/whatever spamming, cross-posting, nym-shifting
nitwit has been doing this kind of thing for years.

Well I would not call his postings spamming, although I
rarely if ever read them.
As for this "copyright infringed" article, well, I would
never had read it (I did not read much of it now, either...)
had he not posted it here.
AirRaids posts are easily recognizable and are among the
things to least worry about on the net, I suspect.
And BTW, getting some news down the throat from time
to time could be beneficial to those - like me - who are
not so keen news followers...
I'd rather have him around at his current posting
frequency & topics than not.

Dimiter
 
Have you ever heard of copyright infringement? Because you just
posted my article without attribution, and you have no rights to
reprint or re-use my article.

I *thought* it was more coherent than the usual GPU hype.
 
Chris Thomasson said:
Terje Mathisen said:
Not "quite well" IMHO.

Posting the link plus a short excerpt would have been fine, as it is
David possibly lost a _lot_ of page views which would have generated
real income for him.

[...]

Good point.
So I went and clicked on the link. I hope that helps.
 
Well I would not call his postings spamming, although I
rarely if ever read them.
As for this "copyright infringed" article, well, I would
never had read it (I did not read much of it now, either...)

Fair enough. That's a valid point.
had he not posted it here.
AirRaids posts are easily recognizable and are among the
things to least worry about on the net, I suspect.

He certainly at least is on topic, which I agree is better than a lot
of spam.
And BTW, getting some news down the throat from time
to time could be beneficial to those - like me - who are
not so keen news followers...
I'd rather have him around at his current posting
frequency & topics than not.

You should sign up for our RSS feed:
http://www.realworldtech.com/RWTfeed.xml

That's a fairly good way to keep yourself informed and get content
'pushed' out to you.

David
 
Not "quite well" IMHO.

Posting the link plus a short excerpt would have been fine, as it is
David possibly lost a _lot_ of page views which would have generated
real income for him.

Ah - I thought this was a legal/ethical issue and not a financial one...

If it helps, I had immediately forwarded - *just the link* - to my buddies on
the Jedec memory subgroup meeting in Hawaii this week. What with all the rain
keeping them off the golf courses they should have plenty of time to click
through ;-)

Of course, the first question they had was "what about latency?"

/daytripper
 
Ah - I thought this was a legal/ethical issue and not a financial one...
If it helps, I had immediately forwarded - *just the link* - to my buddies on
the Jedec memory subgroup meeting in Hawaii this week. What with all the rain
keeping them off the golf courses they should have plenty of time to click
through ;-)

Hah, one can hope...
Of course, the first question they had was "what about latency?"

I think the latency is pretty reasonable. I don't have an answer in
ns, but I was under the impression that XDR1/2 are comparable to GDDR,
and I regard this as XDR3 (without the official name).

DK
 
Ah - I thought this was a legal/ethical issue and not a financial one...

Copyrights, as it is, is essentially a financially motivated legal
issue, isn't? Otherwise, somebody ought to explain to certain large
groups busy harrassing/suing young children and single parents about
having ethics.
 
Of course, the first question they had was "what about latency?"

/daytripper

Bandwidth is king. Said it long ago. Wider is the only way left to
go. We will see more and more of same and the only thing to do about
latency is to hide it.

Robert.
 
In comp.sys.ibm.pc.hardware.chips Robert Myers said:
Bandwidth is king. Said it long ago. Wider is the only

Uhm, err, for what sorts of problems/tasks? Had bandwidth
been always and overall governing, Rambus first iteration
would have succeeded. Their execs obviously thought they
had technical advantages worth the commercial conditions.
The market disagreed.
way left to go. We will see more and more of same and the
only thing to do about latency is to hide it.

This has often been tried with only partial success (video)
Sometimes latency governs and cannot be hidden (databases).
It must be reduced as AMD has done fairly successfully.

-- Robert
 
Uhm, err, for what sorts of problems/tasks? Had bandwidth
been always and overall governing, Rambus first iteration
would have succeeded. Their execs obviously thought they
had technical advantages worth the commercial conditions.
The market disagreed.
Rambus was hot and expensive.

To turn your argument over, if latency were king, Intel would be out
of business and/or have changed tactics drastically. Intel has taken
its own sweet time about moving away from its traditional memory
architecture and seems to be doing quite nicely.
This has often been tried with only partial success (video)
Sometimes latency governs and cannot be hidden (databases).
It must be reduced as AMD has done fairly successfully.
That's a one-time gain that has been known to be available at least
since the last editions of alpha. For latency, there is nowhere left
to go in terms of completely unpredictable reads from memory (or
disk). All the tactics that work (prefetch, hide, cache) depend on
the ability to foresee the future, another hobby horse of mine. Terje
might claim that improvements come from cache management.
Improvements in cache management come from more successfully
exploiting nonrandomness; that is to say, the ability to predict the
future.

Robert.
 
Rambus was hot and expensive.

To turn your argument over, if latency were king, Intel would be out
of business and/or have changed tactics drastically. Intel has taken
its own sweet time about moving away from its traditional memory
architecture and seems to be doing quite nicely.

That's a one-time gain that has been known to be available at least
since the last editions of alpha. For latency, there is nowhere left
to go in terms of completely unpredictable reads from memory (or
disk). All the tactics that work (prefetch, hide, cache) depend on
the ability to foresee the future, another hobby horse of mine. Terje
might claim that improvements come from cache management.
Improvements in cache management come from more successfully
exploiting nonrandomness; that is to say, the ability to predict the
future.

Robert.

So, in short, you don't think the biggest problem confronting processor design
and performance isn't important because "it's hard"...

/daytripper (well, that's one way to go, I guess ;-)
 
So, in short, you don't think the biggest problem confronting processor design
and performance isn't important because "it's hard"...

/daytripper (well, that's one way to go, I guess ;-)- Hide quoted text -

If you have a need to make problems go dramatically faster, it isn't
going to happen through reducing latency. A good processor design is
one that doesn't make the situation worse. Within a factor of 2,
that's surely the best you can hope to do.

The only big knobs are bandwidth and predictability. As for latency,
"it takes all the running you can do just to stay in the same place."

Robert.
 
In comp.sys.ibm.pc.hardware.chips Robert Myers said:
To turn your argument over, if latency were king, Intel would be
out of business and/or have changed tactics drastically. Intel has
taken its own sweet time about moving away from its traditional
memory architecture and seems to be doing quite nicely.

Your argument assumes Intel and AMD are identicial with respect
to market success. They are NOT! Intel is much larger and can
afford many mistakes. AMD's production capacity is too small to
be any sort of real threat, at least in the short and medium term.
That's a one-time gain that has been known to be available at
least since the last editions of alpha.

Sure. But why not grab it?
For latency, there is nowhere left to go in terms of
completely unpredictable reads from memory (or disk).

Sure there is -- SRAM and other designs which take more xtors
per cell. With the continually decreasing marginal cost
of xtors and a shortage of useful things to do with them,
I expect this transition to happen at some point.
All the tactics that work (prefetch, hide, cache) depend
on the ability to foresee the future, another hobby horse
of mine. Terje might claim that improvements come from
cache management. Improvements in cache management come
from more successfully exploiting nonrandomness; that is
to say, the ability to predict the future.

I agree with Terje and those things can be done in
addition to debottlenecking the circuit response.

-- Robert
 
Back
Top