AMD has no plans to push BTX boards

Tony Hill · Dec 4, 2004

No wait states are needed at all (other than for different speed
memory, perhaps). Simply divide the clock down to the desired
frequency. This sort of thing is done all the time.

Note that on the Opteron it's not even an issue for the different
speeds of memory, the chip doesn't support such things. You can ONLY
use whole integer dividers for the memory clock vs. the core
frequency. This is why you sometimes end up with DDR266 or DDR333
memory not running quite as fast as ideally it should. Eg. if you
plug DDR333 memory into a 1.6GHz Opteron it will only run at 160MHz
(10x divisor) instead of the 166MHz it is designed for.

Fortunately for DDR400 memory this is not an issue at any clock speed
as all Opteron/Athlon64 chips run at a whole integer multiple of
200MHz.

Astute readers among you though might notice that this could again
become a potential issue with DDR2, eg. a 2.4GHz Athlon64 could not
run DDR2-667 memory at it's rated clock speed, only at DDR2-640 speed
(if I understand DDR2 correctly and am doing my math right)

Tony Hill · Dec 4, 2004

I was just thinking the time frame for the "unified bus" corresponds with
the 2007 you mentioned for an on-chip memory controller, according to what
I've read. IOW the unified bus system framework would have be something
similar to Hypertransport's - no?... a processor network or cross-bar, with
a pipe to peripherals

<ding ding> I believe you've just given the winning answer to all of
this George! That would indeed make sense, integrate the memory
controller on both the Itanium and Xeon line and then just have some
sort of HT-like connection to the outside work that could quite easily
be common between the two.

Rob Stow · Dec 4, 2004

Tony said:
Note that on the Opteron it's not even an issue for the different
speeds of memory, the chip doesn't support such things. You can ONLY
use whole integer dividers for the memory clock vs. the core
frequency. This is why you sometimes end up with DDR266 or DDR333
memory not running quite as fast as ideally it should. Eg. if you
plug DDR333 memory into a 1.6GHz Opteron it will only run at 160MHz
(10x divisor) instead of the 166MHz it is designed for.

Fortunately for DDR400 memory this is not an issue at any clock speed
as all Opteron/Athlon64 chips run at a whole integer multiple of
200MHz.

Not all of the AMD64 chips support DDR400. In the
Opterons, for example, DDR400 support started with the
146, 246, and 846. Slower Opterons will treat DDR400 as
DDR333.

keith · Dec 4, 2004

Note that on the Opteron it's not even an issue for the different
speeds of memory, the chip doesn't support such things. You can ONLY
use whole integer dividers for the memory clock vs. the core
frequency. This is why you sometimes end up with DDR266 or DDR333
memory not running quite as fast as ideally it should. Eg. if you
plug DDR333 memory into a 1.6GHz Opteron it will only run at 160MHz
(10x divisor) instead of the 166MHz it is designed for.

I was thinking more along the lines of the memory timings
(precharge/RAS/CAS/whatever).

Fortunately for DDR400 memory this is not an issue at any clock speed as
all Opteron/Athlon64 chips run at a whole integer multiple of 200MHz.

Astute readers among you though might notice that this could again
become a potential issue with DDR2, eg. a 2.4GHz Athlon64 could not run
DDR2-667 memory at it's rated clock speed, only at DDR2-640 speed (if I
understand DDR2 correctly and am doing my math right)

Too much math for a weekend...

Rob Stow · Dec 4, 2004

keith said:
I was thinking more along the lines of the memory timings
(precharge/RAS/CAS/whatever).

Too much math for a weekend...

The math has all been done for you already.
Take a look at Table 2 on page 14 of
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31412.pdf

Yousuf Khan · Dec 4, 2004

David said:
Intel uses its N-1 generation capacity to build the system controllers
to amortize the costs of the fab line, not because there's any issues
in using the N th generation (the latest and greatest) process to
build memory/system controllers.

What relevance does manufacturing generation technology have to do with
it? It's not at issue, Intel could use any process generation they want.
The issue is about how to build a memory controller into a piece of
silicon that keeps changing its frequency within a generation. Chipsets
are stable in their frequency, CPUs are not.

As I wrote above. The GHz/MHz thing is a red herring. I don't know
how you managed to convince yourself that such an explanation works,
but I doubt that it'll make sense to others. You can always operate
part of the chip at a lower frequency with a clock divider if that
makes you happy. Look at how AMD built the Opteron. The DMC is sitting
across a "northbridge" that's integrated into the die, and the DMC
is operating at a lower frequency.

It's not as simple as you seem to make it sound either. What you
describe is fine in a chipset context, which has a fixed main frequency;
a chipset will run at 66Mhz, or 133Mhz, or 200Mhz or whatever for its
entire production lifetime/generation. A CPU, within a generation might
have several speed grades, each of which would require different clock
dividers and waitstates. And if somebody were to try to overclock the
processor, the memory controller would have to be able to compensate
dynamically. There is bound to be more variables to look at when testing
any component inside a CPU context. I'm not saying that the principles
of designing a memory controller are different inside a CPU vs. a
chipset, just that the testing required is much more extensive.

An Opteron running at 2.6 Ghz would probably require a different clock
divider than an Opteron running at 1.8 Ghz. If AMD was smart, it
probably made the divider dynamic and semi-intelligent inside the
Opteron, so that it wouldn't need to keep redesigning (or even simply
lasering) the Opteron everytime there was a different clock increment.
Plus there is no guarantee about what speed grade of RAM is going to be
used with an Opteron (eg. PC1600 to PC3200, or even higher), so each
memory controller would need to be able to compensate even further
depending on what type of memory people actually decide to stick onto
their own Opterons.

In the x86 line, Intel used to sell a processor with an integrated
DMC, I think it was the 486SL or some such thing. It cost more
to build, but Intel wasn't able to sell it for more money, so it
was discontinued.

I think that was an IBM design. Anyways, whoever made it, back in those
days CPU speed increments were also rare. The 486DX started at 25Mhz,
33Mhz, and then there was the DX2 versions at 50Mhz and 66Mhz; basically
two speed grades for that entire generation (not counting AMD versions).
I think the 486SL were even more fixed somewhere around 25Mhz and no
other choices.

Please stop this line of argument. You're twisting yourself in a knot
arguing a viewpoint that makes no sense. The memory controllers won't
run at GHz ranges even if you integrate it into the processor. The
memory controllers must necessarily run at the frequency of the DRAM
devices they control. i.e. the Opteron's DDR SDRAM memory controller
runs at 200 MHz to control DDR400 SDRAM devices.

The only thing I see is you twisting yourself in a knot trying to argue
against a point which any sane man would know is true: life inside a CPU
is much faster and more dynamic, than life inside a chipset. No chipset
approaches the speeds of CPUs, nor do they approach the speed
variations; therefore, slightly different measures have to be taken to
work with it. Are the overall principles different whether it's inside a
CPU or a chipset? No. But their implementation details are different
enough that you need to do a lot more initial testing for it. After
this, AMD will have all kinds of experience making memory controllers
running at multi-Ghz ranges, so it'll have an easier time next time.
It's not like as if AMD has no experience making memory controllers
itself, afterall it also does its own chipsets. Intel will now have to
walk the same path that AMD has already walked when it's time for it to
integrate its own memory controller, it's own chipset-based memory
controller experience will only help it slightly.

Seriously, building a memory controller into the CPU is not rocket
science. Now, building a high performance memory controller, that
requires some serious architecture/engineering work, especially
since DRAM systems are becoming ever more finicky for engineers to
move data in and out of them.

Thank you for agreeing with me.

Intel built those D-RDRAM to SDRAM translation hubs with N-1 generation
silicon. It's not going to be difficult to naively integrate the
D-RDRAM to SDRAM translator onto the same piece of silicon and get
a "SDRAM controller" out of it.... Not that this is anyone's choice
of building an SDRAM controller.

Again, I fail to see what relevence which manufacturing process they
used had? So Intel used an older manufacturing node for its chipsets --
great, good for it, it got to keep using its investment in older
manufacturing technology, but so what?

But as far as I can remember about the Intel i820 Pentium 3 RDRAM
chipset MCH fiasco, the RDRAM to SDRAM translator hub was not built into
the i820 northbridge, but it was a separate chip hanging off of the
northbridge. "Data integrity" issues, with SDRAM but not with RDRAM.
(However, there were separate "performance" issues with that chipset, as
the Pentium 3 could not in anyway take advantage of RDRAM to save its life.)

Weren't you arguing about quality control/signaling problems with
"cheap" memory modules above? D RDRAM kind of requires you to
engineer some pretty clean signal paths. You can be sloppier with
SDRAM. Then again, DDR now gets to be pretty serious too.

No, I was arguing about the performance quality of the RAM modules
themselves, not their signal quality. However, signal quality would
obviously affect performance quality to some degree.

Yousuf Khan

George Macdonald · Dec 4, 2004

No, you simply divide the clock down to the frequency you wish to run
the dram controller. This is *no* different than dividing the
frequency down to run the FSB. ...or caches, or...

Does anyone know for sure where Opteron does the frequency "division"? Why
not keep the memory controller in the same clock domain as the rest of the
chip and "wait" for a memory bus clock edge? It would make sense for the
rest of the "north bridge" logic on-board, i.e. memory address arbitration
and routing, to be at the high clock rate, so as not to have dead cycles on
the accesses destined for the 800/1000MHz HT. The folklore on some Web
sites I've read has it that this is how it's done but... not err,
convincingly.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

David Wang · Dec 4, 2004

What relevance does manufacturing generation technology have to do with
it? It's not at issue, Intel could use any process generation they want.
The issue is about how to build a memory controller into a piece of
silicon that keeps changing its frequency within a generation. Chipsets
are stable in their frequency, CPUs are not.

Which is still not relevent.

You can easily have programmable clock dividers. How do you think that
the FSB stays at a "stable frequency" while the processor frequency
ramps? You can keep part of the CPU at a relatively constant frequency
while running the rest at a much higher frequency. You just have to
design the correct PLL and the (clock multiplied) interface logic.

It's not as simple as you seem to make it sound either.

It is as "simple" as I make it sound. Though I don't think I made
it sound "simple".

Here's a breakdown of "memory latency" in an AMD Opteron.

Notice the part that says "clock domain crossing"? It's there for
a reason.

http://mywebpages.comcast.net/davewang202/misc/K8-1.gif

What you
describe is fine in a chipset context, which has a fixed main frequency;
a chipset will run at 66Mhz, or 133Mhz, or 200Mhz or whatever for its
entire production lifetime/generation. A CPU, within a generation might
have several speed grades, each of which would require different clock
dividers and waitstates.

..... And all of the dividers and waitstates are programmable.

You can and should design the flexibility into the system controller
whether that system controller sits on the CPU or on a separate die.

And if somebody were to try to overclock the
processor, the memory controller would have to be able to compensate
dynamically.

This is completely and utterly irrelevent.

As a processor design engineer, I would specify "bounds of operations"
and guarentee that the processor would operate correctly within those
bounds. If someone wants to "overclock", "overvoltage", "undervoltage"
or whatever they wish to do with the processor that are outside of the
operational boundaries that I have specified, then it is not my
responsibility to "compensate dynamically" for whatever utter
foolishness the end user wants to do with the processor once it
gets in his/her hands. I have no obligation to compensate for anything
that is outside of the bounds of specification of the processor.

There is bound to be more variables to look at when testing
any component inside a CPU context. I'm not saying that the principles
of designing a memory controller are different inside a CPU vs. a
chipset, just that the testing required is much more extensive.

And the "more extensive testing" is going to be a show stopper of
what sort?

An Opteron running at 2.6 Ghz would probably require a different clock
divider than an Opteron running at 1.8 Ghz. If AMD was smart, it
probably made the divider dynamic and semi-intelligent inside the
Opteron, so that it wouldn't need to keep redesigning (or even simply
lasering) the Opteron everytime there was a different clock increment.

Amazing what programmability can do for you.

Plus there is no guarantee about what speed grade of RAM is going to be
used with an Opteron (eg. PC1600 to PC3200, or even higher), so each
memory controller would need to be able to compensate even further
depending on what type of memory people actually decide to stick onto
their own Opterons.

You do realize that you can specify that you will only support a
limited subset of all available DRAM technologies right? It's not
a sin not to support all DRAM technologies under the sun.

The only thing I see is you twisting yourself in a knot trying to argue
against a point which any sane man would know is true: life inside a CPU
is much faster and more dynamic, than life inside a chipset. No chipset
approaches the speeds of CPUs, nor do they approach the speed
variations; therefore, slightly different measures have to be taken to
work with it. Are the overall principles different whether it's inside a
CPU or a chipset? No. But their implementation details are different
enough that you need to do a lot more initial testing for it. After
this, AMD will have all kinds of experience making memory controllers
running at multi-Ghz ranges, so it'll have an easier time next time.

No memory controller runs at multi-GHz ranges, regardless of whether
it's a separate piece of silicon or on the same piece of silicon as the
CPU. Even after integration, the DRAM controller will sit in a
different clock domain than the CPU processor clock domain. There
will be different clock distribution nets... Just as there are now...

It's not like as if AMD has no experience making memory controllers
itself, afterall it also does its own chipsets. Intel will now have to
walk the same path that AMD has already walked when it's time for it to
integrate its own memory controller, it's own chipset-based memory
controller experience will only help it slightly.

Sigh. Intel has already integrated memory controllers into its
products. It has done so in the past (486Sl, Timna), is currently
doing so (XScale), and will likely do so in the future.

Thank you for agreeing with me.

How have I agreed with you?

The problem is with the DRAM interaction and that the complexity of
the DRAM controller will have to grow. This is a given regardless of
whether the DRAM controller stays off die or moves on die.

Again, I fail to see what relevence which manufacturing process they
used had? So Intel used an older manufacturing node for its chipsets --
great, good for it, it got to keep using its investment in older
manufacturing technology, but so what?

So putting it on the same generation die is easier to do (faster),
not harder.

No, I was arguing about the performance quality of the RAM modules
themselves, not their signal quality. However, signal quality would
obviously affect performance quality to some degree.

"performance quality of the RAM modules themselves" and

"not their signal quality". and

"signal quality would obviously affect performance quality to some degree."

So a contradiction followed by a limited qualification.

What exactly do you mean then?

What impacts the performance quality of RAM modules themselves, but
specifically excluding the signal quality aspect?

Who builds a memory module with good signal quality, but would have
difficulty interfacing with memory controllers? I'd surely like to
know what you are talking about here.

keith · Dec 5, 2004

The math has all been done for you already.

I know. Tony did it. ;-)

keith · Dec 5, 2004

Does anyone know for sure where Opteron does the frequency "division"? Why
not keep the memory controller in the same clock domain as the rest of the
chip and "wait" for a memory bus clock edge?

No. ...because that is a bad plan. When one widget needs service, simply
tell the other when it's needed then wait for the completion. It's far
easier to have each work in its own clock domain than forcing the issue on
another. Think of it this way, the logic in the slower clock domain
doesn' thave to be tweaked as much, or can have more levels of logic
between latches. Why complicate things with clocks that are faster than
need be?

It would make sense for the
rest of the "north bridge" logic on-board, i.e. memory address arbitration
and routing, to be at the high clock rate, so as not to have dead cycles on
the accesses destined for the 800/1000MHz HT. The folklore on some Web
sites I've read has it that this is how it's done but... not err,
convincingly.

Seems obvious to me. The stuff that has to be clocked fast is, that not,
not. Certaiunly the request/arbitration has to be done at the processor
speed, at least somewhere along the line. There is no reason to push that
constraint further down the pipe.

Tony Hill · Dec 5, 2004

Not all of the AMD64 chips support DDR400. In the
Opterons, for example, DDR400 support started with the
146, 246, and 846. Slower Opterons will treat DDR400 as
DDR333.

Quite true, but that is a totally separate issue, and actually one
that was solved over a year ago. It actually wasn't a speed thing but
rather a stepping thing, the first stepping of the Opteron did not
support DDR400 officially (it probably would have work, but AMD didn't
qualify the chips for it). The second (or was it third?) stepping
corrected this.

George Macdonald · Dec 5, 2004

<ding ding> I believe you've just given the winning answer to all of
this George! That would indeed make sense, integrate the memory
controller on both the Itanium and Xeon line and then just have some
sort of HT-like connection to the outside work that could quite easily
be common between the two.

Don't tell me - it'll be called err, 4GIO.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

AMD has no plans to push BTX boards

Tony Hill

Tony Hill

Rob Stow

keith

Rob Stow

Yousuf Khan

George Macdonald

David Wang

keith

keith

Tony Hill

George Macdonald