Athlon 64's: a shared memory bus?

P

pigdos

Do the Athlon 64's feature a shared memory bus? Can the north bridge in
Athlon 64 systems directly read/write system memory (without utilizing the
A64's built-in memory controller)? I'm guessing the north bridge can, but
I'm not sure.
 
G

George Macdonald

Do the Athlon 64's feature a shared memory bus? Can the north bridge in
Athlon 64 systems directly read/write system memory (without utilizing the
A64's built-in memory controller)? I'm guessing the north bridge can, but
I'm not sure.

If by north bridge, you mean the I/O chip(s), no. The Athlon64 contains a
fair amount of the functionality of what was traditionally the "north
bridge". IOW all system DMA has to pass through Hypertransport links to
the CPU's crossbar and memory controller.
 
P

pigdos

Are you sure about that? That would seem to me really inefficient and
contribute to propagation delays. The north bridge (which still exists)
would have to receive all the data and forward it to/from the A64 memory
controller, PCI/PCIe/AGP buses rather than handling all this itself? For
example, if we have an AGP access, say from the video card to system memory,
the AGP bus would xfer it's address request to the North Bridge, the North
Bridge would then have to forward this request to the A64 memory controller,
which would then fetch the AGP memory data, send it back to the North Bridge
and then the North Bridge sends it back to the video card. It would even be
worse if we experience a TLB miss in the North Bridge because then we have
to request the GART from the A64 memory controller. DMA access would
similarly be degraded. No matter how fast the bus between the A64 memory
controller and the North Bridge there would still be more latency than if
the North Bridge could handle it itself.
 
G

George Macdonald

Are you sure about that?
Yes.

That would seem to me really inefficient and
contribute to propagation delays. The north bridge (which still exists)
would have to receive all the data and forward it to/from the A64 memory
controller, PCI/PCIe/AGP buses rather than handling all this itself?

Call it a north bridge -- for convenience? -- if you wish but it really
isn't any longer and hasn't been for some time: Intel calls theirs a MCH
(Memory Controller Hub) which includes a memory controller, AGP/PCI-e x16
and bus to the I/O chip; for AMD64 systems, nVidia has a single chip which
includes the HT link and all I/O interfaces, including AGP/PCI-e x16...
IOW, no umm, south bridge!
For
example, if we have an AGP access, say from the video card to system memory,
the AGP bus would xfer it's address request to the North Bridge, the North
Bridge would then have to forward this request to the A64 memory controller,
which would then fetch the AGP memory data, send it back to the North Bridge
and then the North Bridge sends it back to the video card. It would even be
worse if we experience a TLB miss in the North Bridge because then we have
to request the GART from the A64 memory controller. DMA access would
similarly be degraded. No matter how fast the bus between the A64 memory
controller and the North Bridge there would still be more latency than if
the North Bridge could handle it itself.

Compared with current FSB (Intel), and previous AMD, designs the only place
where there's a latency hit is on video card DMA since the memory
controller and video bus are obviously not on the same chip. In the
context of an I/O device, and the relative clock speeds, I believe the
effect is negligible. The current (1000MHz) HT peak bandwidth of 4GB/s in
both directions is sufficient to keep up with any current I/O device.

As I already noted, a fair portion of "north bridge" logic/activity --
address arbitration, MTRRs etc. -- is now on the CPU chip... including the
GART TLB. The system chip -- hub, north bridge, whatever -- just shuffles
and directs data to/from I/O devices.

Bottom line: the priority for latency in the system as a whole is
 
D

David Kanter

George, I don't think this guy actually understands the K8 system
architecture...

DK
 
P

pigdos

So you BELEIVE it's negligible and then you go on to say EXACTLY what I was
saying regarding the latency hit. What happens if the AGP or DMA request
happens when the bus to the A64 memory controller is in use? Yeah, that's
right, we get a nice, big stall, whereas in a North Bridge setup this would
never be an issue. If there are dedicated resources on the A64 memory
controller for the GART TLB, how are these resources utilized in a PCIe-only
environment? PCIe dosn't feature DIME. AGP system memory access, while
similar to DMA, is not DMA BTW.
 
T

Tony Hill

Are you sure about that? That would seem to me really inefficient and
contribute to propagation delays. The north bridge (which still exists)
would have to receive all the data and forward it to/from the A64 memory
controller, PCI/PCIe/AGP buses rather than handling all this itself? For
example, if we have an AGP access, say from the video card to system memory,
the AGP bus would xfer it's address request to the North Bridge, the North
Bridge would then have to forward this request to the A64 memory controller,
which would then fetch the AGP memory data, send it back to the North Bridge
and then the North Bridge sends it back to the video card. It would even be
worse if we experience a TLB miss in the North Bridge because then we have
to request the GART from the A64 memory controller. DMA access would
similarly be degraded. No matter how fast the bus between the A64 memory
controller and the North Bridge there would still be more latency than if
the North Bridge could handle it itself.

There will be a (very) small latency hit on DMA and AGP memory data
accesses when compared to using an external memory controller.
However that is *WAY* more than offset by the (comparatively large)
decrease in latency on CPU memory requests.

The high-bandwidth/low-latency design of hypertransport, used to link
the I/O chips (call them a "north bridge" if you like, it's not
particularly accurate though) means that you're adding only a few
nanoseconds of latency to forward the request on, usually on the order
of 20-40ns (maybe even less). Remember that this link was designed
with this sort of forwarding of memory requests in mind since that's
exactly what it does in a multiprocessor setup when accessing remote
memory. If we compare this to some common sources of DMA access the
latency becomes pretty negligible. For example, hard drives have a
latency up in the millisecond range, so an extra 30 nanoseconds or so
is totally invisible. Same goes for network cards.

The only place where this really comes into affect is video cards, and
in particular, shared memory video cards. AGP/PCI-Express cards with
built-in memory don't really need to worry much about transferring
data to main memory on the fly since *MOST* of the important data is
kept in local memory on the card itself. The difference in latency
and bandwidth between local memory and remote memory is HUGE, so an
extra 30ns of latency and virtually no hit to bandwidth doesn't end up
changing things much. When you're looking at "really slow" vs. "the
tiniest bit slower", usually you don't worry too much.

However with shared memory video, things get a bit trickier. Here
you're ALWAYS dealing with remote memory and you're always limited by
bandwidth and latency. However again this is a bit of a comparison
between "bad" and "just slightly worse", and the goal in designing
integrated video is ALWAYS to reduce the bandwidth needs and hide
latency, regardless of what platform you're using.
 
G

George Macdonald

So you BELEIVE it's negligible and then you go on to say EXACTLY what I was
saying regarding the latency hit.

What do you not understand about "negligible"?
What happens if the AGP or DMA request
happens when the bus to the A64 memory controller is in use? Yeah, that's
right, we get a nice, big stall, whereas in a North Bridge setup this would
never be an issue.

For AGP or video PCI-e x16, there will be a non-noticeable increased delay
- "big stall" is not even close. In a MCH/IOCHx system (north bridge setup
?), all other DMA transactions, e.g. PCI, non-video PCI-e, come off an I/O
chip which is connected to the MCH by a serial link - no different from a
HT link in terms of conflicts and latency. Similarly, when the memory
controller is busy with another transaction, CPU or DMA to/from I/O device,
a simultaneous 2nd request is always going to have to wait its turn.

BTW the "bus to the A64" is HyperTransport and is bi-directional 4GB/s in
both directions simultaneously - the only additional delay vs. your err,
"north bridge" is the request flight time over this "bus".
If there are dedicated resources on the A64 memory
controller for the GART TLB, how are these resources utilized in a PCIe-only
environment? PCIe dosn't feature DIME.

I'm no expert but I believe the GART aperture can be used for mappings to
do with DMA as well as DIME. What the hell does it matter anyway? You're
the one who brought up the GART TLB - WHY... if you now want to argue it's
not necessary? If GART is not needed you umm, disable it.
AGP system memory access, while
similar to DMA, is not DMA BTW.

Depends what you mean by DMA: in the sense of traditional PC ISA
architecture DMA, yes it's different; in the sense of generic Direct Memory
Access, yes it's a valid use of the term. IIIR it's called DMA Mode in the
AGP docs. What do *you* want to call it?
 
G

George Macdonald

Depends what you mean by DMA: in the sense of traditional PC ISA
architecture DMA, yes it's different; in the sense of generic Direct Memory
Access, yes it's a valid use of the term. IIIR it's called DMA Mode in the
AGP docs. What do *you* want to call it?

That should be "IIRC it's called..........."
 
S

Scott Alfter

Duh, I never said I did... That's why I was asking a QUESTION. Dumbfuck...

Yet another rude top-poster plonked into the killfile...

_/_
/ v \ Scott Alfter (remove the obvious to send mail)
(IIGS( http://alfter.us/ Top-posting!
\_^_/ rm -rf /bin/laden >What's the most annoying thing on Usenet?
 
D

David Kanter

pigdos said:
Duh, I never said I did... That's why I was asking a QUESTION. Dumbfuck...

Then go look online at any number of technical websites. Or AMD's
site..they tend to explain it relatively well.

DK
 
P

pigdos

No it isn't called DMA in the AGP docs. I've got the following document:
agp30SpecUpdate06-21.pdf. DMA isn't even mentioned in the document at all.
It's similar but it's not DMA. If AGP is nothing more than DMA then why do
they need a 129 page document to explain it?

Um, if the north bridge isn't connected to main memory then every byte of
AGP data has to pass through the A64 memory controller. Despite the
bi-directional, wider/faster bus, how could this be more efficient? Any
other device on a bus introduces propagation delays. Even if small when
averaged over many Megabytes it adds up. It would seem to me that for AGP,
you can't say the A64 architecture is any better at handling it and might be
a bit worse.
 
G

George Macdonald

No it isn't called DMA in the AGP docs. I've got the following document:
agp30SpecUpdate06-21.pdf. DMA isn't even mentioned in the document at all.
It's similar but it's not DMA. If AGP is nothing more than DMA then why do
they need a 129 page document to explain it?

Who said it's "nothing more"? Read it to find why they need 129 pages.
Your "document" is insufficient to cover everything about AGP and you'll
find that it does not mention DIME either, nor does it refer to two "usage
models" as previous AGP docs did... which the AGP20 docs do: "Two usage
models: "Execute" and "DMA". Whether you like it or not it *is* a DMA
mechanism and you still haven't given me a name you want to use instead of
DMA. said:
Um, if the north bridge isn't connected to main memory then every byte of
AGP data has to pass through the A64 memory controller. Despite the
bi-directional, wider/faster bus, how could this be more efficient? Any
other device on a bus introduces propagation delays. Even if small when
averaged over many Megabytes it adds up. It would seem to me that for AGP,
you can't say the A64 architecture is any better at handling it and might be
a bit worse.

When are you going to comprehend that "north bridge" does not exist and
hasn't for several years? Nobody has suggested the A64 architecture is
more efficient than a memory hub implementation for AGP transactions; the
loss is *NEGLIGIBLE*. Your talk of "many Megabytes" is totally irrelevant
when considering the transaction latency which does *not* "add up" - there
is no accumulation of delay.

Go figure the "propagation delay" and transaction latency - show us the
numbers which are bothering you and tell us how they impact video
performance. The bottom line is that any additional delay is insignificant
and well justified by the performance gains on CPU<->memory transactions.

I'm tired of repeating myself here and I wish you'd learn to post properly
to Usenet.
 
P

pigdos

"Properly," according to whom? I was using usenet before there were any such
standards. Show me convincing evidence as to WHY top-posting is so evil. Net
nazi. Where anywhere in the AGP 2.0 spec does it mention DMA mode? No where.
Neither do any chipset datasheets I have. Cite your source please.

Then I'll be more specific, whatever chip that has taken over the functions
of the north bridge w.r.t. AGP/PCI xfers.

I was never arguing that CPU<--->memory xfers were any worse, I was arguing
that AGP memory xfers (DIME?) could be slower. I'd like to see some proof
that A64's have dedicated hardware for the AGP GART TLB as well, because I
doubt it and I couldn't find any mention of such hardware in the datasheets
I've read for the A64.

If you were a Phd in EE I'd take your word as fact, since you're not, I
don't...
 
P

pigdos

The memory controller has to read every byte (which takes time) then setup
and execute a write of every byte. If this doesn't involve two separate
buses then we still have to turn-around the bus before we can even write
this data or read more. How wide is the data path between the A64 and the
BIU (it might not be a north bridge, but it sure as hell is a BIU of some
type)? For AGP writes this might not be an issue, but for AGP reads it
might.
 
K

Keith

"Properly," according to whom?

Everyone else in the civilized Usenet. Only self-centered kidz top-post.
I was using usenet before

For what, toilet paper?
there were any such standards.
Show me convincing evidence as to WHY top-posting is so evil.

What is so hard to understand about writing for the reader?
Net nazi.

Net loon.
Where anywhere in the AGP 2.0 spec does it mention DMA mode?

Do you have any clue what you're talking about?
No where. Neither do any chipset datasheets I have. Cite your source
please.

George already did, nutcase. If it walks like a duck...
Then I'll be more specific, whatever chip that has taken over the
functions of the north bridge w.r.t. AGP/PCI xfers.

Who knows what the hell you're talking about, since you top-post. Loon!
I was never arguing that CPU<--->memory xfers were any worse, I was
arguing that AGP memory xfers (DIME?) could be slower. I'd like to see
some proof that A64's have dedicated hardware for the AGP GART TLB as
well, because I doubt it and I couldn't find any mention of such
hardware in the datasheets I've read for the A64.

YOu don't get it. CPU performance is about, umm, CPU performance. Who
cares about AGP performance since graphics cards have more memory on them
than one cares about. BTW, that's always been the case. AGP has always
been a solution looking for its problem.
If you were a Phd in EE I'd take your word as fact, since you're not, I
don't...

No PhD, here, only 30+ years in the biz. If you need to listen to a PhD
to move you ass, you won't get squat done.
 
K

Keith

The memory controller has to read every byte (which takes time) then setup
and execute a write of every byte.

Huh? Please don't top post. It's impossible to figure out what the hell
you're talking about without some context.
If this doesn't involve two separate
buses then we still have to turn-around the bus before we can even write
this data or read more.

You've never heard od DMA and burst transfers? Sheesh!
How wide is the data path between the A64 and

32bits at 1GHz, give or take.
the BIU (it might not be a north bridge, but it sure as hell is a BIU of
some type)? For AGP writes this might not be an issue, but for AGP reads
it might.

Show me an AGP read that matters and I'll start top-posting.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top