Xbox 360's graphics chip: massive information blow out



today there has been a massive amount of new info on ATI's graphics
processor for the Xbox 360.

lots of reading. I'm going to c&p one article per post because each
article is several pages. so read any and all of my replies to my op here.

first, the HardOCP article:

Xbox 360 GPU Features:

More facts make their way to the surface concerning the Xbox 360 graphics
processor, codenamed XENOS. This time ATI's Vice President of Engineering
chimes in on the unique technology that is powering the pixels behind
Microsoft's new gaming console.


Our goal here is to give you a better working knowledge of the video
technology inside the Xbox 360 in "plain English." While there is going to
be some technogeek babble, we will try to keep it to a minimum. There will
surely be more in-depth articles posted in the coming days and as usual, we
will be linking those on the HardOCP news page, so keep your eyes peeled if
we have only piqued your inner geek interests.

Earlier this week we got to speak candidly with Bob Feldstein, VP of
Engineering at ATI, and lead sled dog on the Xbox 360 GPU development team.
While Microsoft owns the technology that powers the graphics of the Xbox
360, ATI very much engineered the GPU internally codenamed XENOS. After a
3,200 mile round trip to the Microsoft offices last week, I came home a bit
stunned to spend the day with their Xbox 360 team and not see any sort of
gaming demo on their new gaming console. While there are tons of cool
features embodied in the Xbox 360, it is a gaming platform.thankfully. And
thankfully this week at E3, we have been given more than a few sneak peaks
at the upcoming games that will be available for the Xbox 360 platform.

Without a doubt, if you have broadband, and are a gamer or computer
enthusiast, you owe it to yourself to head over the FileShack and check out
their E3 2005 Hi Def Coverage. The level of graphic detail in these upcoming
titles looks to be generations ahead of the current Xbox and other aging
consoles. As for comparing it to PC titles, I will have to make that call
November when the Xbox 360 is due for release and see what games and
hardware are out for the PC at that time. If I had to make a call now
though, I would have to say that the Xbox 360 graphics I have seen are just
as impressive as any 3D PC game title I have ever seen. Then again, we have
to remember that currently we are being thoroughly kicked in the head by the
size 15 Microsoft Marketing Boot, so of course we are not going to be shown
crappy clips of games. As for the how much content, we are hearing from
several industry insiders that there will be between 25 and 45 titles
available for the Xbox 360 at launch. A lofty goal says many insider
naysayers. However, let's get back on topic.

A GPU is a GPU, right?

If I have read it once, I have read it a thousand times, "The Xbox 360 is
just ATI's next generation (fill in favorite code name here) GPU." The
simple answer to that is, "No it is not." While most modern GPUs share many
architectural similarities, Bob Feldstein and Chris Evenden of ATI went out
of their way to explain to me, no matter how hard I tried to convince them
otherwise, that the Xbox 360 GPU is very much an original creation. While
some will try to tell you that is it simply a modified DirectX 9 GPU, you
might be interested to learn that the only API spec that the Xbox 360
hardware meets is its own API. That is correct, the Xbox 360 GPU only meets
it own Xbox 360 API specifications. While of course some lessons learned in
DX9 and upcoming DX10 were applied, the GPU of the Xbox 360 is very much its
own and comparing it directly to anything in the PC world is simply "not
right" according to Mr. Feldstein. Obviously, the Xbox 360 can be thought of
as a very innovative solution specifically for the Xbox and only the Xbox.

One interesting thing that was said to me during our conversation was that
3D game content developers were relied on along the way as the GPU was
designed. Not consulted with or talked to once in a while, but relied upon
for their GPU functionality requests and feedback as to the GPU's
implementation. Also keep in mind that Microsoft owns this technology and
while there is certainly a good amount of technology sharing between ATI and
Microsoft, Microsoft has the ability to make their own changes and take the
part anywhere in the world to be fabricated. So while it is ATI's design, it
is fundamentally Microsoft's GPU. All this and more indicates that the Xbox
360 GPU is truly a unique GPU.


About the Hardware

While we have tried our best to get better pictures of the Xbox 360
internals, we have pulled up short. While folks at Microsoft and ATI will
definitely not like to have the comparison made, there really is "just" a PC
inside the Xbox 360...that happens to be a Mac. You can find a full table of
Xbox 360 specifications here, but in the interest of speeding things up a
bit, below is the short list covering video.

GPU & Northbridge in One!

Many of you read that the Xbox 360 will have 512MB of GDDR3 RAM and that is
100% correct. But how exactly does this work with the CPU and GPU. Once you
learn that the Xbox 360 GPU also acts as the system's memory controller,
much like the Northbridge in an Intel PC, the picture becomes a bit clearer.
ATI has been making and designing chipsets for a good while now that use
GDDR3 RAM. Add to this that Joe Macri (go cart racing fiend extraordinaire),
who was a pivotal factor in defining the GDDR3 RAM specification at JEDEC is
also a big fish at ATI, and it only makes sense that ATI could possibly put
together one of the best GDDR3 memory controllers in the world. So while it
might seem odd that the Xbox 360 Power PC processor is using "graphics"
memory for its main system memory and a "GPU" as the "northbridge," once you
see the relationship between the three and the technology being used it is
quite simple. So, we have the 700MHz GDDR3 RAM acting as both system RAM and
as GPU RAM, connected to the GPU via a traditional GDDR3 bus interface that
can channel an amazing 25 Gigabytes per second of data.

Now between the GPU and the CPU things get a bit fuzzier. And by "fuzzier,"
I mean that they would not tell me much about it at all. The bus between the
CPU and GPU was characterized as unique and proprietary. Mr. Feldstein did
let on that the bus could shuttle up to 22 Gigabytes of data per second.
Much like GDDR3, this would be a full duplex bus, or one that "goes both
ways" at one time. Beyond that, not much was shared.

Viewing the Xbox 360 GPU as the "northbridge" should give you a better idea
of how the Xbox works and answer some of the overall architecture questions.
It is my own opinion that it is very likely that the CPU/GPU bus is very
similar to the GPU/RAM bus as it was stressed to me by Mr. Feldstein that
the CPU/RAM pathway was very free of any latency bottlenecks. The world may
never know for sure...till some crazy Mac guys hack the thing and run a
Stream benchmarks.

GPU Features & Buzzwords

There are always tons of buzzwords flying around in the technology and
gaming communities, but what we wanted to dig into here is what exactly all
these Xbox 360 technologies do for you. The three that you are going to hear
the most are "Smart / Intelligent 3D Memory," "Adaptable / Unified Shader
Approach," and the "Modeling Engine." Another buzz word that you are going
to hear a lot of is "Fluid Reality." While this is not a new approach in the
PC world, it is new to consoles. This Fluid Reality refers to the way the
fabrics of clothing might flow with movement or how hairs on a character's
head fall into place or how a monster's fur may rustle as it stomps toward
you. It also refers to lifelike facial animations that have been recently
made famous by games like Half Life 2.

Smart / Intelligent 3D Memory

Notice the slash above? Not even ATI had a solid name for this technology
but for the sake of this explanation we are just going to call is "Smart 3D
Memory." Smart 3D Memory is the biggest standout and innovative feature I
saw inside the entire Xbox 360. To give you an idea of what it would look
like first hand, think of any normal GPU you might see, something much like
this Mobility Radeon X700 chipset. That is pretty much what any modern GPU
looks like. Now think of that same chipset as having a single piece of DRAM
sitting off to one side, much like seen in this ATI slide below, but with
one less piece of RAM (and no arrows).

Keep in mind, ATI is not a stranger to adding memory to a chipset, but
remember this is "smart" memory.

The Xbox 360 Smart 3D Memory is a relatively small piece of DRAM sitting off
to the side of the GPU but yet on the same substrate. The Smart 3D Memory
weighs in at only 10MB. Now the first thing that you might think is, "Well
what the hell good is 10MB in the world of 512MB frame buffers?" And that
would be a good line of questioning. The "small" 10MB of Smart 3D memory
that is currently being built by NEC will have an effective bus rate between
it and the GPU of 2GHz. This is of course over 3X faster that what we see on
the high end of RAM today.

Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This
is literally 192 Floating Point Unit processors inside our 10MB of RAM. This
logic unit will be able to exchange data with the 10MB of RAM at an
incredible rate of 2 Terabits per second. So while we do not have a lot of
RAM, we have a memory unit that is extremely capable in terms of handling
mass amounts of data extremely quickly. The most incredible feature that
this Smart 3D Memory will deliver is "antialiasing for free" done inside the
Smart 3D RAM at High Definition levels of resolution. (For more of just what
HiDef specs are, you can read here. Yes, the 10MB of Smart 3D Memory can do
4X Multisampling Antialiasing at or above 1280x720 resolution without
impacting the GPU. So all of your games on Xbox 360 are not only going to be
in High Definition, but all will have 4XAA applied as well.

The Smart 3D Memory can also compute Z depths, occlusion culling, and also
does a very good job at figuring stencil shadows. You know, the shadows in
games that will be using the DOOM 3 engine, like Quake 4 and Prey?

Now remember that all of these operations are taking place on the Smart 3D
Memory and having very little workload impact on the GPU itself. So what
exactly will the GPU be doing?

Adaptable / Unified Shader Approach

First off, we reported on page 2 in our chart that the capable "Shader
Performance" of the Xbox 360 GPU is 48 billion shader operations per second.
While that is what Microsoft told us, Mr. Feldstein of ATI let us know that
the Xbox 360 GPU is capable of doing two of those shaders per cycle. So yes,
if programmed for correctly, the Xbox 360 GPU is capable of 96 billion
shader operations per second. Compare this with ATI's current PC add-in
flagship card and the Xbox 360 more than doubles its abilities.

Now that we see a tremendous amount of raw shader horsepower, we have to
take into account that there are two different kinds of shader operations
that can be programmed by content developers. There are vertex shaders and
pixels shaders. These are really just what they sound like. Vertex shader
operations are used to move vertices, which shape polygons, which make up
most objects you, see in your game, like characters or buildings or
vehicles. Pixel shader operations dictate what groups of pixels do like
bodies of water or clouds in the sky or maybe a layer of smoke or haze.

In today's world of shader hardware, we have traditionally had one hardware
unit to do pixel shaders and one hardware unit to do vertex shaders. The
Xbox 360 GPU breaks new ground in that the hardware shader units are
intelligent as well. Very simply, the Xbox 360 hardware shader units can do
either vertex or pixel shaders quickly and efficiently. Just think of the
Xbox 360 shaders as being agnostic SIMD shader units (Single Instructions
carried out on Multiple Data).

The advantage of this would not be a big deal if every game were split 50/50
in terms of pixel and vertex shaders. That is not the case though. While
most games are vertex shader bottlenecked, some others are pixel shader
bottlenecked. When you combine the Xbox 360 Unified Shader Approach and its
massive shader processing power, you end up with a GPU that is built to
handle gaming content far beyond what we see today in terms of visual

Modeling Engine

The Xbox 360 modeling engine is quite frankly something that is really not
explainable in layman's terms. At least I am not smart enough to explain
most of it. That said, I can share with you some of the things it does while
being able to endlessly loop and manipulate shader data.

Some of the global illumination effects you might have seen in years past at
SIGGRAPH have been put into motion on the Xbox 360 GPU in real time. For the
most part, global illumination is necessary to render a real world picture
quality 3D scene. The Xbox 360 also makes using curved surfaces possible
in-game, meaning that it can calculate the polygons from the proper curved
surface math in order to draw it on your screen "correctly." Much in line
with this is the ability to do high order surface deformation. And if you
are familiar with high order surfaces, you are likely familiar with what
gamers and hardware enthusiasts commonly refer to as "LOD" or Level of
Detail. Mr. Feldstein shared with us that the Xbox 360 GPU has some "really
novel LOD schemes." So we are thinking that the days of cars in the distance
with pentagonal shaped wheels in new Grand Theft Auto titles is a thing of
the past.

GPU Power

ATI would not share with us the power displacement numbers for the Xbox 360
GPU, but if you have seen the heatsink on it in our low quality picture
linked here (heatsink on the right), you will know that the total power
displacement is not much by today's standards. In fact I would compare the
heatsink on that GPU to the ones we used to see on Voodoo 3 3000 video cards
back in 1999 that were running a whopping 200MHz when overclocked. ATI has
implemented "lots" of their mobile power saving features in this Xbox 360
GPU. Clock gating is even more efficient in this chipset than what we were
aware of being implemented in Mobility Radeons of the recent past. Sorry,
but here again we were light on specifics.


ATI took their Xbox 360 GPU project from the ground floor to a working piece
of silicon in 2 years. It is obvious that that there is much more to this
GPU than what is being shared with us here, but even considering just the
features shared here, a 2 year project time seems amazing. This can easily
be considered a ground breaking GPU in terms of 3D gaming regardless of

We should expect to see many of the Xbox 360 GPU technologies shared with
the PC desktop market. Unfortunately it does not look like Smart 3D Memory
will be one of the things that make the crossover, at least immediately. A
shame really, but the impact of a desktop Unified Shader Model will be
hugely benefiting to the PC gamer as well.

From what I have seen coming from this year's Electronic Entertainment Expo
coverage, there is no doubt that the future of the Xbox 360 in terms of pure
gaming fun looks very promising. But of course the marketing fluff always
does. The truth will be known sometime before the Christmas buying season.
It is highly likely though that we will not see the real fruits of ATI's and
Microsoft's labor till a couple of years have passed and game content
developers have had ample time to learn to exploit the incredible power of
the Xbox 360. Programmability and flexibility are two of ATI's Xbox 360 GPU
features that will not be realized immediately and it looks as if the GPU
has plenty of both.

I fully expect that we will see more details float to the surface throughout
the rest of the year as this is not the full Xbox 360 GPU story.





the article:

E3 2005 - Day 2: More Details Emerge on Console GPUs
With a relatively light schedule thanks to the small size of the show, we
were able to spend quite a bit of time digging deeper on the two highlights
of this year's E3 - ATI's Xbox 360 GPU, and NVIDIA's RSX, the GPU powering
the PlayStation 3.

Given that both of the aforementioned GPU designs are very closely tied to
their console manufacturers, information flow control was dictated by the
console makers, not the GPU makers. And unfortunately, neither Microsoft or
Sony were interested in giving away more information than their ridiculously
light press releases.

Never being satisfied with the norm, we've done some digging and this
article is what we've managed to put together. Before we get started, we
should mention a few things:

1) Despite our best efforts, information will still be light because of
the strict NDAs imposed by Microsoft and Sony on the GPU makers.

2) Information on NVIDIA's RSX will be even lighter because it is the more
PC-like of the two solutions and as such, a lot of its technology overlaps
with the upcoming G70 GPU, an item we currently can't talk about in great

With those items out of the way, let's get started, first with what has
already been announced.

The Xbox 360 GPU, manufactured by ATI, is the least PC-like of the two GPUs
for a number of reasons, the most obvious being its 10MB of embedded DRAM.
Microsoft announced that the 10MB of embedded DRAM has 256GB/s of bandwidth
availble to it; keep this figure in mind, as its meaning isn't as clear cut
as it may sound.

The GPU operates at 500MHz and has a 256-bit memory interface to 512MB of
700MHz GDDR3 system memory (that is also shared with the CPU).

Another very prominent feature of the GPU is that it implements ATI's first
Unified Shader Architecture, meaning that there are no longer any discrete
pixel and vertex shader units, they are instead combined into a set of
universal execution units that can operate on either pixel shader or vertex
shader instructions. ATI is characterizing the width of the Xbox 360 GPU as
being 48 shader pipelines; we should caution you that these 48 pipelines
aren't directly comparable to current 16-pipeline GPUs, but rest assured
that the 360 GPU should be able to shade and texture more pixels per clock
than ATI's fastest present-day GPU.

Now let's move on to NVIDIA's RSX; the RSX is very similar to a PC GPU in
that it features a 256-bit connection to 256MB of local GDDR3 memory
(operating at 700MHz). Much like NVIDIA's Turbo Cache products, the RSX can
also render to any location in system memory, giving it access to the full
256MB of system memory on the PS3 as well.

The RSX is connected to the PlayStation 3's Cell CPU by a 35GB/s FlexIO
interface and it also supports FP32 throughout the pipeline.

The RSX will be built on a 90nm process and features over 300 million
transistors running at 550MHz.

Between the two GPUs there's barely any information contained within
Microsoft's and Sony's press launches, so let's see if we can fill in some

More Detail on the Xbox 360 GPU
ATI has been working on the Xbox 360 GPU for approximately two years, and it
has been developed independently of any PC GPU. So despite what you may have
heard elsewhere, the Xbox 360 GPU is not based on ATI's R5xx architecture.

Unlike any of their current-gen desktop GPUs, the 360 GPU supports FP32 from
start to finish (as opposed to the current FP24 spec that ATI has
implemented). Full FP32 support puts this aspect of the 360 GPU on par with

ATI was very light on details of their pipeline implementation on the 360's
GPU, but we were able to get some more clarification on some items. Each of
the 48 shader pipelines is able to process two shader operations per cycle
(one scalar and one vector), offering a total of 96 shader ops per cycle
across the entire array. Remember that because the GPU implements a Unified
Shader Architecture, each of these pipelines features execution units that
can operate on either pixel or vertex shader instructions.

Both consoles are built on a 90nm process, and thus ATI's GPU is also built
on a 90nm process at TSMC. ATI isn't talking transistor counts just yet, but
given that the chip has a full 10MB of DRAM on it, we'd expect the chip to
be fairly large.

One thing that ATI did shed some light on is that the Xbox 360 GPU is
actually a multi-die design, referring to it as a parent-daughter die
relationship. Because the GPU's die is so big, ATI had to split it into two
separate die on the same package - connected by a "very wide" bus operating
at 2GHz.

The daughter die is where the 10MB of embedded DRAM resides, but there is
also a great deal of logic on the daughter die alongside the memory. The
daughter die features 192 floating point units that are responsible for a
lot of the work in sampling for AA among other things.

Remember the 256GB/s bandwidth figure from earlier? It turns out that that's
not how much bandwidth is between the parent and daughter die, but rather
the bandwidth available to this array of 192 floating point units on the
daughter die itself. Clever use of words, no?

Because of the extremely large amount of bandwidth available both between
the parent and daughter die as well as between the embedded DRAM and its
FPUs, multi-sample AA is essentially free at 720p and 1080p in the Xbox 360.
If you're wondering why Microsoft is insisting that all games will have AA
enabled, this is why.

ATI did clarify that although Microsoft isn't targetting 1080p (1920 x 1080)
as a resolution for games, their GPU would be able to handle the resolution
with 4X AA enabled at no performance penalty.

ATI has also implemented a number of intelligent algorithms on the daughter
die to handle situations where you need more memory than the 10MB of DRAM
on-die. The daughter die has the ability to split the frame into two
sections if the frame itself can't fit into the embedded memory. A z-pass is
done to determine the location of all of the pixels of the screen and the
daughter die then fetches only what is going to be a part of the scene that
is being drawn at that particular time.

On the physical side, unlike ATI's Flipper GPU in the Gamecube, the 360 GPU
does not use 1T-SRAM for its on-die memory. The memory on-die is actually
DRAM. By using regular DRAM on-die, latencies are higher than SRAM or
1T-SRAM but costs should be kept to a minimum thanks to a smaller die than
either of the aforementioned technologies.

Remember that in addition to functioning as a GPU, ATI's chip must also
function as a memory controller for the 3-core PPC CPU in the Xbox 360. The
memory controller services both the GPU and the CPU's needs, and as we
mentioned before the controller is 256-bits wide and interfaces to 512MB of
unified GDDR3 memory running at 700MHz. The memory controller resides on the
parent die.

Scratching the Surface of NVIDIA's RSX
As we mentioned before, NVIDIA's RSX is the more PC-like of the two GPU
solutions. Unlike ATI's offering, the RSX is based on a NVIDIA GPU, the
upcoming G70 (the successor to the GeForce 6).

The RSX is a 90nm GPU weighing in at over 300 million transistors and fabbed
by Sony at two plants, their Nagasaki plant and their joint fab with

The RSX follows a more conventional dataflow, with discrete pixel and vertex
shader units. Sony has yet to announce the exact number of pixel and vertex
shader units, potentially because that number may change as time goes by
depending on yields. This time around Sony seems to be very careful not to
let too many specs out that are subject to change to avoid any sort of
backlash as they did back with the PS2. Given the transistor count and 90nm
process, you can definitely expect the RSX to feature more than the 16 pipes
of the present day GeForce 6800 Ultra. As for how many, we'll have to wait
for Sony on that.

NVIDIA confirmed that the RSX is features full FP32 support, like the
current generation GeForce 6 as well as ATI's Xbox 360 GPU. NVIDIA did
announce that the RSX would be able to execute 136 shader operations per
cycle, a number that is greater than ATI's announced 96 shader ops per
cycle. Given that we don't know anything more about where NVIDIA derived
this value from, we can't be certain if we are able to make a direct
comparison to ATI's 96 shader ops per cycle.

Given that the RSX is based off of NVIDIA's G70 architecture, you can expect
to have a similar feature set later this year on the PC. In fact, NVIDIA
stated that by the time PS3 ships there will be a more powerful GPU
available on the desktop. This is in stark contrast to ATI's stance that a
number of the features of the Xbox 360 GPU won't make it to the desktop for
a matter of years (potentially unified shader architecture), while others
will never be seen on the desktop (embedded DRAM?).

There will definitely be some differences between the RSX GPU and future PC
GPUs, for a couple of reasons:

1) NVIDIA stated that they had never had as powerful a CPU as Cell, and
thus the RSX GPU has to be able to swallow a much larger command stream than
any of the PC GPUs as current generation CPUs are pretty bad at keeping the
GPU fed.

2) The RSX GPU has a 35GB/s link to the CPU, much greater than any desktop
GPU, and thus the turbo cache architecture needs to be reworked quite a bit
for the console GPU to take better advantage of the plethora of bandwidth.
Functional unit latencies must be adjusted, buffer sizes have to be changed,

We did ask NVIDIA about technology like unified shader model or embedded
DRAM. Their stance continues to be that at every GPU generation they design
and test features like unified shader model, embedded DRAM, RDRAM, tiling
rendering architectures, etc... and evaluate their usefulness. They have
apparently done a unified shader model design and the performance just
didn't make sense for their architecture.

NVIDIA isn't saying that a unified shader architecture doesn't make sense,
but at this point in time, for NVIDIA GPUs, it isn't the best call. From
NVIDIA's standpoint, a unified shader architecture offers higher peak
performance (e.g. all pixel instructions, or all vertex instructions) but
getting good performance in more balanced scenarios is more difficult. The
other issue is that the instruction mix for pixel and vertex shaders are
very different, so the optimal functional units required for each are going
to be different. The final issue is that a unified shader architecture, from
NVIDIA's standpoint, requires a much more complex design, which will in turn
increase die area.

NVIDIA stated that they will eventually do a unified shader GPU, but before
then there are a number of other GPU enhancements that they are looking to
implement. Potentially things like a programmable ROP, programmable
rasterization, programmable texturing, etc...

Final Words
We're going to keep digging on both of these GPUs, as soon as we have more
information we'll be reporting it but for now it's looking like this is the
best we'll get out of Microsoft and Sony.



the article:

Xbox360 graphics processor block diagram


The graphics inside Microsoft's Xbox 360 chip breaks new ground in several
new ways. The console sports a unified shader architecture, 10MB of embedded
memory, and what ATI calls "48 perfectly efficient shaders" among its long
list of features. But we still had tons of questions.

In order to glean more about the graphics inside Xbox 360 and its
architecture, we recently had the chance to speak with Bob Feldstein, ATI's
VP of Engineering on Xbox 360:

ATI: We have 48 shaders. And each shader, every cycle can do 4
floating-point operations, so that gives you 196. There's a 192 number in
there too, so I'm just going to digress a little bit. The 192 is actually in
our intelligent memory, every cycle we have 192 processors in our embedded
intelligent memory that do things like z, alpha, stencil. So there are two
different numbers and they're kind of close to each other, which leads to
some confusion.

So we have a traditional shader, but it's not traditional at all though
because it's a unified shader. So you have the shader instruction set.
[pauses] In the past you had a vertex shader and a pixel shader, and the
instruction set was different and you couldn't, you know, one couldn't
operate on the other's data. Now we have one set of resources, these 48
shaders, and they naturally dynamically balance between whatever the problem
at hand is.

So if it's dominated by vertices you get more resources for vertices, but if
it's dominated by pixels you get more resources towards pixels, or any other
kind of problem. It's a general purpose, well not a general purpose
processor, but it is a processor with a good general instruction set and it
can operate on a variety of different kinds of data. So unified shader means
we have one set of shader hardware and it can operate on any problem.

So you have 64 threads, and it's all controlled by hardware so it's not like
the programmer knows one way or another about threading at all, and the
threads here are things like vertex buffers or pixel programs and the
hardware just keeps the same [inaudible] in a thread buffer and we can just
switch back and forth between the different threads. That way if we're
waiting for data from a vertex program or vertex array we can go ahead and
work on a pixel program or we can work on a second vertex or whatever, a
different instruction.

Embedded memory, daughter/parent die

ATI: So we have 48 shaders, each one of them does 4 floating-point ops per
cycle, so 196 floating ops per clock.

On the embedded memory, I call it intelligent embedded memory or intelligent
DRAM, because we actually have a lot of the [pauses] well the graphics
pipeline continues into the memory. So we actually have a parent die and a
daughter die on one substrate inside of the package. So if you look at the
Xbox 360, and you open it up you'll see that there's one VPU package, but
inside there are two different dies and the pipeline continues across those

The part of the pipeline that's in the daughter die, that lives with the
memory is the z, the alpha, the stencil, the processors to do the resolve
from multisample AA, the one that gives you AA for free, well 4x or 2x
multisampling, and really what that does is it breaks one of the traditional
bandwidth problems inside traditional graphics architectures. It keeps all
that data so you don't have to do a read or a write..Well you don't have to
read it into the general purpose, across the bus, process it and then write
it back out to the memory. It all happens within the memory, and this just
gives you phenomenally more bandwidth. And that's how we get the
anti-aliasing for free. The result for the anti-aliasing, you have 4 samples
for every pixel, you ultimately end up with and the resolve is done right
within that memory, so it doesn't have to be read back into the parent die.

On the intelligent memory there's something that we call internally the,
well it's a memory export path. So anywhere down the pipeline that you put
data into the pipeline you can export that data out back to EDRAM or Level
Two cache and then use that data again. So, and we really haven't been able
to do that before, so it allows you to do things like higher order surfaces
for real this time, so it's just another feature that gives us a lot of
flexibility in the system.

FiringSquad: So the daughter die is where the embedded memory is?

ATI: Yes, the memory is there. There's 10MB of embedded memory surrounded by
"intelligence". That's the 192 component processors in there that do the
work on the embedded memory data. There's also a very high speed connection
between the parent and daughter die, in case you do get have to get data
back and forth, and that connection is, well there's a 2GHz wide bus
connection between them.

The numbers game

FiringSquad: Where does the 2-terabit number that's been floating around
come from?

ATI: The 2-terabit (256GB/sec) number comes from within the EDRAM, that's
the kind of bandwidth inside that RAM, inside the chip, the daughter die.
But between the parent and daughter die there's a 236Gbit connection on a
bus that's running in excess of 2GHz. It has more than one bit obviously
between them.

Also we're the memory controller for the system, so we have bandwidth
between the CPU and the graphics engine, we have bandwidth between the
memory, the DDR3 memory, which is also, well the DDR3 is the system memory
so all these numbers are sometimes close to each other so things start to
blur together

FiringSquad: And then 22.4GB/sec to the system memory?

ATI: 25GB/sec to the system memory, and 22GB/sec between the CPU and the
GPU. Which, or in this case we're more than the GPU, we're the system memory
controller too, so the bus between the CPU and GPU is 22GB. It's really 11
in both directions, so 11 input and 11 output.

FiringSquad: And this is a 128-bit memory interface or 256-bit?

ATI: 128-bit, 700MHz interface.

FiringSquad: What types of operations do the EDRAMs 192 processors perform?

ATI: Well they do z-compares, they do alpha blends, they do blends of
samples to make a pixel. That kind of thing. They do stencil operations
also. And this is the first time memory has access to something like this,
right in the memory, so it never leaves the memory die. The memory and the
logic is all built into one die. And it's also a power savings by the way.

One of the big uses of power is actually driving I/O pins. In this case, you
never have to go off chip so everything is just internal there. So power is
important you know, of course, it's not quite like the handheld or mobile
space but it's still important and you want to reduce it as much as possible
because we are going pretty fast. You know we have a lot of logic going
pretty fast in memory, a lot of CPU logic going fast, so you want to reduce
power wherever you can.

Anti-aliasing and unified memory architecture

FiringSquad: You said earlier that EDRAM gives you AA for free. Is that 2xAA
or 4x?

ATI: Both, and I would encourage all developers to use 4x FSAA. Well I
should say there's a slight penalty, but it's not what you'd normally
associate with 4x multisample AA. We're at 95-99% efficiency, so it doesn't
degrade it much is what I should say, so I would encourage developers to use
it. You'd be crazy not to do it.

You know even though we're in a high def resolution world with Xbox 360,
well you know at standard definition with today's TVs jaggies look bad,
really bad with standard definition. In hi-def everything is so sharp, that
when there are aliasing issues, you're really going to see the jaggies there
so I think anti-aliasing is just key and we have a great anti-aliasing

And of course you do know that while we support hi-def we will still output
to standard def, although Microsoft has said that hi-def is the target
platform, don't think "oh well, it won't work on my standard definition

FiringSquad: How many textures per pixel are you performing per pass?

ATI: 4 textures. We have 4 texture units. And we don't have separate texture
instructions, the shader just goes and it get textures and it applies them.

FiringSquad: Microsoft has announced 1080i support, but are there any plans
to add support for 1080p?

ATI: I think 720p and 1080i are the sweet spot that developers are going for
and that's what we're going to see in the next few years, for the next five
years really as the main resolutions. It will be awhile before 1080p becomes
standard. I think 720p would be the best to go for, and 1080i is supported
as well of course. So hopefully developers will be doing, or at least the
best would be 720p, 4xAA. You'd get a teriffic image there.

FiringSquad: With the unified memory architecture, do you feel the VPU and
CPU will be fighting over the available bandwidth?

ATI: We've optimized everything enough where I don't think it's going to be
a problem. You know, we've had silicon since November so I can tell you it's
not going to be a problem. We were worried, you know obviously when you
integrate the unified shaders and the EDRAM, we were a bit worried that
there could be efficiency problems in the unified shaders, that there could
just be general problems with EDRAM. And of course we're still looking, but
so far it has all worked out, it isn't a problem. So we've had this debate
for awhile. Not that we didn't have bugs. [laughs] None of them were our


FiringSquad: How does Xbox 360 GPU compare in size to the RSX?

ATI: In terms of size, we're a bit smaller. Of course, I'm not sure if that's
a good way to compare things, and to be honest I can't talk about the number
of transistors for this design. Microsoft owns the IP and that has a lot to
do with their cost model and all that sort of stuff. But we're a very
efficient engine and we feel very good about our design. You know, the bang
for the buck is awesome. The power of the platform [pauses] we're going to
be the most powerful platform out there, we've got a lot of innovation in
there, we're not just a PC chip.

I think the Sony chip is going to be more expensive and awkward. We make
efficient use of our shaders, we have 64 threads that we can have on the
processor at once. We have a thread buffer inside to keep the [inaudible].
The threads consist of 64 vertices or 64 pixels. We have a lot of work that
we can do, a lot of sophistication that the developer never has to see.

FiringSquad: Looking at the parent die, what consumes the majority of the

ATI: The shaders consume a lot of the space. They don't own 50% of the chip,
but any one thing would be the shaders. 48 shaders, but again, they're not
50% of the chip. We have texture units, we have shaders, we have caches, we
have all kinds of things on the chip. We have a sequencer that controls the
threads, we have lots of latency-reducing buffers. So the chip is complex.

FiringSquad: Which feature, or I guess group of features, really sets the
Xbox 360 VPU apart from anything else?

ATI: Well, there can't be one, I've got to go with two. The unified shader
and the embedded DRAM are both unique and just really important to the
success of the platform. They're both powerful features that just allow you
to do things you couldn't otherwise do. They save bandwidth, they give you a
richness. Did we talk with you about fluid reality yesterday?

FiringSquad: No, I don't think so.

ATI: Well, what we've been trying to achieve in this particular go around
is, well, realism, you can see it in games like Half-Life 2 where the walls,
the environment, it all looks really good and you get a good sense of
realism. But the next big hurdle is this fluid reality. The idea that
characters in motion, lets say humans in motion, the joints look natural as
they move along. That's involves a lot of vertex processing, and with this
unified shader we can put all these shaders towards vertex processing.
Cloth, as it's flying in the wind, like a flag for example, when it drops
down on top of something, how that looks as it ripples.

Fur and feathers, the wind blows through them, grass, all that is where this
idea of where fluid reality comes from. I'd say we've had static quality up
until now, but now the fluid rhythm, the motion quality is this next realism
that we're really bringing.

Again, lots of power to devote to vertex processing, generating lots of
pixels, to drive HD. HD is the platform of choice for this.


ATI: But yeah on the shaders we have 48 of what we call the adaptive shader
array. So in the past you've had to say well here's where I'm going to use
my pixel shaders, and here's how I'm going to use my vertex shaders, but in
this particular case the hardware actually figures it all out for you and
determines what the most efficient way to do that is.

So if you have a bunch of commands that are going to the VPU, and let's say
it requires a very light load of vertex shading, but a very heavy load of
pixel shading, the developers don't have to specify that. That is
intelligently figured out by the VPU. So the VPU looks at the workload is
and says, "okay here's how I'm going to evenly distribute the workload", so
we call it the adaptive shader array.

FiringSquad: Onto the video processor, is it an on-die TV encoder or
something like a Rage Theater-type chip? Would that be a third chip?

ATI: It is a third tiny chip and actually Microsoft did that. Microsoft if
you recall acquired, about five or six years ago, acquired WebTV. So the
people in Mountain View, CA that were a part of that group, and of course,
it's not just those people anymore, but they did that chip, and they've done
a good job.

You know it's a good choice because it's a lot cheaper silicon, they're
using 90nm.

FiringSquad: Do you know if it supports dual HD displays?

ATI: No it doesn't. I know the NVIDIA chip does, but that's only because PC
products do. It doesn't seem to have a real use inside the living room, but
maybe you differ with me on that.

FiringSquad: Well, on the Sony console, I think they're looking at
applications that go beyond just a console in the living room don't you

ATI: Yeah I really think it's just an accident because, well you know, last
summer they had to change their plans. They found out that Cell didn't work
as well as they wanted to for graphics. Remember originally you had two or
three Cell processors doing everything and then in August last year they had
to take an NVIDIA PC chip. And as you know, all PC chips do this, and so it
[dual HD display outputs] just came for free.

FiringSquad: What features does Xbox 360 have that really set it apart from
what we know so far about RSX and PS3?

ATI: Well, it has a lot. I'll go through the main features. It has a great
anti-aliasing story, it has a powerful shading story, in that we can, well
performance but also it has a rich instruction set, giving you great image
quality and its flexibility.

It has lots of headroom. This is something that developers will find is easy
to program for and rich enough to last for years. It has the performance and
feature set to last for years.

We'd like to thank Bob Feldstein for taking the time out to answer our
questions about Xbox 360's graphics. As you can see, ATI's gone well beyond
what we see today in RADEON X850 XT. While Bob didn't want to project how
much more powerful the Xbox 360 VPU is over X850, clearly the chip sports
many features that we won't find in anything on the PC.

Guest article:

another block diagram of Xbox360's graphics processor:
" A block diagram of the Xbox 360 GPU. Source: ATI. "

Details of ATI's Xbox 360 GPU unveiled
A peek inside the monster new console GPU
by Scott Wasson - May 19, 2005

WITH MICROSOFT'S OFFICIAL announcement of the next-generation Xbox 360
console this week, ATI has decided to disclose some of the architectural
details of the graphics processor that it created for the system. I had a
brief but enlightening conversation with Bob Feldstein, Vice President of
Engineering at ATI, who helped oversee the Xbox 360 GPU project. He spelled
out some of the GPU's details for me, and they're definitely intriguing.
Feldstein said that ATI and Microsoft developed this chip together in the
span of two years, and that they worked "from the ground up" to do a console
product. He said that Microsoft was a very good partner with some good chip
engineers who understood the problems of doing a non-PC system design. Also,
because the part was custom created for a game console, it could be designed
specifically for delivering a good gaming experience as part of the Xbox 360

Unified shaders
Feldstein cited several major areas of innovation where the Xbox 360 GPU
breaks new ground. The first of those is the chip's unified shader array,
which does away with separate vertex and pixel shaders in favor of 48
parallel shaders capable of operating on data for both pixels and vertices.
The GPU can dynamically allocate shader resources as necessary in order to
best address a computational constraint, whether that constraint is vertex-
or pixel-related.

This sort of graphics architecture has been rumored as a future possibility
for some time, but ATI worried that using unified shaders might cause some
efficiency loss. To keep all of the shader units utilized as fully as
possible, the design team created a complex system of hardware threading
inside the chip itself. In this case, each thread is a program associated
with the shader arrays. The Xbox 360 GPU can manage and maintain state
information on 64 separate threads in hardware. There's a thread buffer
inside the chip, and the GPU can switch between threads instantaneously in
order to keep the shader arrays busy at all times.

This internal complexity allows for efficient use of the GPU's computational
resources, but it's also completely hidden from software developers, who
need only to write their shader programs without worrying about the details
of the chip's internal thread scheduling.

On chip, the shaders are organized in three SIMD engines with 16 processors
per unit, for a total of 48 shaders. Each of these shaders is comprised of
four ALUs that can execute a single operation per cycle, so that each shader
unit can execute four floating-point ops per cycle.

These shaders execute a new unified instruction set that incorporates
instructions for both vertex and pixel operations. In fact, Feldstein called
it a "very general purpose instruction set" with some of the same roots as
the DirectX instruction set. Necessarily, the shader language that
developers will use to program these shader units will be distinct from the
shader models currently used in DirectX 9, including Shader Model 3.0.
Feldstein described it as "beyond 3.0." This new shader language allows for
programs to contain an "infinite" number of instructions with features such
as branching, looping, indirect branching, and predicated indirect. He said
developers are already using shader programs with hundreds of instructions
in them.

I asked Feldstein whether the shaders themselves are, at the hardware level,
actually more general than those in current graphics chips, because I
expected that they would still contain a similar amount of custom logic to
speed up common graphics operations. To my surprise, he said that the
shaders are more general in hardware. At the outset of the project, he said,
ATI hired a number of compiler experts in order to make sure everything
would work right, and he noted that Microsoft is no slouch when it comes to
compilers, either. Feldstein said Microsoft "made a great compiler for it."

At this point, Feldstein paused quickly to note that this GPU was not a VLIW
machine, apparently reminded of all of the compiler talk surrounding a
certain past competitor. (The GeForce FX was, infamously, a VLIW machine
with some less-than-desirable performance characteristics, including an
extreme sensitivity to compiler instruction tuning.) He was quite confident
that the Xbox 360 GPU will not suffer from similar problems, and he claimed
the relative abundance of vertex processing power in this GPU should allow
objects like fur, feathers, hair, and cloth to look much better than past
technology had allowed. Feldstein also said that character skin should look
great, and he confirmed to me that real-time subsurface scattering effects
should be possible on the Xbox 360.

The Xbox 360 GPU's unified shader model pays dividends in other places, as
well. In traditional pixel shaders, he noted, any shader output is generally
treated as a pixel, and it's fed through the rest of the graphics pipeline
after being operated on by the shader. By contrast, the Xbox 360 GPU can
take data output by the shaders, unaltered by the rest of the graphics
pipeline, and reprocess it. This more efficient flow of data, combined with
a unified instruction set for vertex and pixel manipulation, allows easier
implementation of some important graphics algorithms in real time, including
higher-order surfaces and global illumination. I would expect to see fluid
animation of complex terrain and extensive use of displacement mapping in
Xbox 360 games. Feldstein also pointed out that this GPU should have
sufficient muscle to enable the real-time use of other complex shader
algorithms as they're invented.

System architecture

Now that we've delved into the shaders a bit, we should take a step back and
look at the bigger picture. The Xbox 360 GPU not only packs a lot of shader
power, but it's also the central hub in the Xbox 360 system, acting as the
main memory controller as well as the GPU. The Xbox 360 has 512MB of GDDR3
memory onboard running at 700MHz, with a 128-bit interface to ATI's memory
controller. The ATI GPU, in turn, has a very low latency path to the Xbox
360's three IBM CPU cores. This link has about 25GB/s of bandwidth.
Feldstein said the graphics portion of the chip has something of a crossbar
arrangement for getting to memory, but he didn't know whether the CPU uses a
similar scheme.

Embedded DRAM for "free" antialiasing
The GPU won't be using system memory itself quite as much as one might
expect, because it packs 10MB of embedded DRAM right on the package. In
fact, the Xbox 360 GPU is really a two-die design, with two chips in a
single package on a single substrate. The parent die contains the GPU and
memory controller, while the daughter die consists of the 10MB of eDRAM and
some additional logic. There's a high-speed 2GHz link between the parent and
daughter dies, and Feldstein noted that future revisions of the GPU might
incorporate both dies on a single piece of silicon for cost savings.

The really fascinating thing here is the design of that daughter die.
Feldstein called it a continuation of the traditional graphics pipeline into
memory. Basically, there's a 10MB pool of embedded DRAM, designed by NEC, in
the center of the die. Around the outside is a ring of logic designed by
ATI. This logic is made up of 192 component processors capable of doing the
basic math necessary for multisampled antialiasing. If I have it right, the
component processors should be able to process 32 pixels at once by
operating on six components per pixel: red, green, blue, alpha, stencil, and
depth. This logic can do the resolve pass for multisample antialiasing right
there on the eDRAM die, giving the Xbox 360 the ability to do 4X
antialiasing on a high-definition (1280x768) image essentially for
"free"-i.e., with no appreciable performance penalty. The eDRAM holds the
contents of all of the back buffers, does the resolve, and hands off the
resulting image into main system memory for scan-out to the display.

Feldstein noted that this design is efficient from a power-savings
standpoint, as well, because there's much less memory I/O required when
antialiasing can be handled on the chip. He said ATI was very
power-conscious in the design of the chip, so that the Xbox 360 could be a
decent citizen in the living room.

My conversation with Bob Feldstein about the Xbox 360 GPU was quick but,
obviously, very compact, with lots of information. I hope that I've gotten
everything right, but I expect we will learn more and sharpen up some of
these details in the future. Nonetheless, ATI was very forthcoming about the
technology inside its Xbox 360 GPU, and I have to say that it all sounds
very promising.

For those of you wondering how the Xbox 360 GPU relates to ATI's upcoming PC
graphics chips, I wish I could tell you, but I can't. Feldstein said the
Xbox 360 GPU "doesn't relate" to a PC product. Some of elements of the
design seem impractical for PC use, like the 10MB of embedded DRAM for
antialiasing; PCs don't use one single, standard resolution like HDTVs do.
Still, it's hard to imagine ATI having some of this technology in its
portfolio and not using it elsewhere at some point.


one of the most interesting facts about the Xbox360 GPU is that, within the
48 shader pipelines, there are 4 ALUs each, for a total of 196 ALUs
(Arithmetic Logic Unit )

previously, it was thought that the 48 pipes themselves were just ALUs.

other threads with discussions about the GPU and the new articles:


You don't think that those web sites would like to get hits from those
interested to read their articles to support their work, do you? And that
for those here that _don't_ want to read those articles, your posting is
just noise?

A link to them, a quick summary or some original comments, that'd be a
service. What you're doing here is copyright infringement and stealing,
nothing else.





Jan said:
You don't think that those web sites would like to get hits from those
interested to read their articles to support their work, do you? And that
for those here that _don't_ want to read those articles, your posting is
just noise?

A link to them, a quick summary or some original comments, that'd be a
service. What you're doing here is copyright infringement and stealing,
nothing else.

X-Complaints-To: (e-mail address removed)
X-DMCA-Complaints-To: (e-mail address removed)
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your
complaint properly

A short message to (e-mail address removed) might be in order.

Can a third party claim copyright infringement on behalf of a
content producer?

NOTE: Follow-up to set to comp.arch


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question