AMD to produce ATI GPUs in Dresden

D

David Kanter

Yousuf said:
Well, you are the one saying that GPUs aren't that complex of a
circuitry as compared to CPUs. Are you not?

Well if you read what I'm saying below you could find out, instead of
guessing.
In Keith's case, he's definitely not talking about ARM. He works for
IBM, and which processors do you think IBM makes?

I don't have a bloody clue. IBM designs a lot of stuff, he could be
working on the z9 MPU, which is hardly aggressive implementation, he
could be doing POWER6/6+, he could be working on contract design for
CELL or Xbox, he could be doing the X3/X4 northbridge. Maybe he worked
on those older embedded cores that were sold to AMCC...
I would agree that porting a GPU to SOI should be easier than porting a

The guys who have adopted SOI (IBM, AMD, Freescale, etc.) had their
major initial problems in getting the process for laying down circuits
on SOI right. Once that was taken care of, they now just use SOI like bulk.

Right, and they just ignore the FB effect or any of the other
differences. Or the crappy yields that Nvidia complained about...

DK
 
D

David Kanter

Keith said:

I really wish you wouldn't snip attributions. It's a really bad
habit.
OK.
So you won't say whether or not it turned out to be a horrible idea?

It works fine, within the parameters I've mentioned (mature SOI
process, working design). Bringing up SOI is a PITA; going back is
trivial.
What constitutes a 'good design'?

No respins to fix bugs.
So what happens if your target frequency is say, 600MHz, and it turns
out you can only hit 400MHz, but you have 3-4 months before tape out?
Do you just sit on your ass and let the circuits moulder? Or do you,
perhaps, start tweaking things to get improvements, track down critical
paths and fix them?

If you're off that far, you're dead. It's time to shitcan the
entire design, fire all the architects, and sell flowers on the
street corner because you aren't cut out for this business. For
lesser "oopses" (happens all the time) the designer isolates a
critical path at a time then restructures the logic, picks faster
(higher power) "books", diddles with wiring geometry, reduces
loading, moves gates closer together, cheats, steals...

That sounds an awful lot more than 'pick a different book', it sounds
like 'pick the best book, and if you need more, start playing around'.
Or perhaps I'm walking away with the wrong impression.
Yes, but unless it's an unmitigated disaster (fire the
architects...), you don't go back to the circuits guys; too late.


That's irrelevant to the issue at hand. I'm not letting the goal
posts move, this time.

I don't think that's moving the goal posts at all. AMD has a mature
process for SOI, with good yields for *THEIR* chips, which are all
relatively small and contain a lot of SRAM. GPUs are large and contain
little SRAM. Yield and design are both reasons why someone might opt
not to use SOI. Certainly, Nvidia was pissed about their yields from
IBM, which lead to them going straight back to TSMC.

They are both important aspects of the equation.
No, my experience is with CPUs.

Your assumption (which I don't really believe, but no shock there):
CPU >> GPU
My statement:
CPU PD can be turned (bulk -> SOI) in a year (fact)

Therefore:
A turning a GPU from bulk to SOI is a walk in the park.

We're not talking ARM. We're talking modern high-end widgets.

OK, thank you for clarifying. Was it really that hard?
Three years is nuts.

Read what I write, not what you want to read. I said in my estimate a
GPU takes 3 years from start to finish, a CPU takes 5 years. I then
said "its likely they don't retarget once they start PD", which is
probably after 1 year. You have made a somewhat convincing argument
that retargeting for SOI is not a huge issue if you have the experience
lying around. You have yet to make an argument that designing a GPU
takes less than 3 years.
I told you, my experience is with CPUs.

There's a wide range of CPUs that were designed at IBM. There's the
older stuff that was sold to AMCC, there's the POWER4/5/6, there's the
CELL MPU, the Xbox MPU, the z9, etc. etc. and there are probably others
I didn't mention because I wasn't aware of it.

DK
 
K

Keith

Well if you read what I'm saying below you could find out, instead of
guessing.

You are a piece of work.
I don't have a bloody clue. IBM designs a lot of stuff, he could be
working on the z9 MPU, which is hardly aggressive implementation,

You havent' a clue.
he
could be doing POWER6/6+, he could be working on contract design for
CELL or Xbox, he could be doing the X3/X4 northbridge.

Dunce, I said CPU.
Maybe he worked on those older embedded cores that were sold to AMCC...

Maybe you're just intentionally being a jackass. Since you think my
CV matters so much, the last seven years have been PPC750->
Nintendo->a few more 750 variations->PPC970->PPC970MP->PPC970FX->
[codename deleted]. Before that, 6X86, 6X86MX, and once upon a
time 3090 and ES900 crypto. That covers the last 20 years or so,
need more?

I've worked on both bulk and SOI projects (as well as bipolar),
both IBM and vendor microprocessors, logic design, analog,
verification, and who knows what else. ...and your experience is
exactly? Fill in the blank: -><-
Right, and they just ignore the FB effect or any of the other differences.

What an idiot. Of course you don't ignore FBE, you *USE* it.
....and the differences are invisible to the logic designer. Who
cares what the technology underneath is?
Or the crappy yields that Nvidia complained about...

Everyone has issues with yield now and then. I always though the
first go at a new line, new technicians, 90nm, and 300mm (and, and,
and,...) was a tad risky, but I'm not about to discuss the details
of dirty laundry. Everyone has PHBs too.

You, on the other hand, have no problem criticizing what you've
never done and have no clue about, unless of course it's your pal
Intel. They can do no wrong.
 
K

Keith


I really wish you wouldn't snip attributions. It's a really bad
habit.

OK.

Indeed, you can't be trained.

That sounds an awful lot more than 'pick a different book', it sounds
like 'pick the best book, and if you need more, start playing around'.

No, synthesis usually picks what it thinks is the "best" book. If
it chooses poorly, the designer may pick a higher power book (or
one of the other knobs he has control over) to make timing. Timing
analysis shows the worst offenders and that's where the job starts.
If timing is close maybe the process guys can help out with a few
of their knobs (but that's going to cost yield).
Or perhaps I'm walking away with the wrong impression.

Much of it is "playing around" and it can get into a serious game
of "Whack-A-Mole", so experinece helps.
I don't think that's moving the goal posts at all.

Yes it is. It has nothing to do with how long it takes to convert
a design from bulk to SOI. completely irrelevant to the topic at
hand.
AMD has a mature
process for SOI, with good yields for *THEIR* chips, which are all
relatively small and contain a lot of SRAM. GPUs are large and contain
little SRAM. Yield and design are both reasons why someone might opt
not to use SOI. Certainly, Nvidia was pissed about their yields from
IBM, which lead to them going straight back to TSMC.

They are both important aspects of the equation.

You're still searching for a clue.
OK, thank you for clarifying. Was it really that hard?

Sheesh, what in hell do you think we were discussing? Sheesh.
Read what I write, not what you want to read. I said in my estimate a
GPU takes 3 years from start to finish, a CPU takes 5 years.

I I say you're nuts.
I then
said "its likely they don't retarget once they start PD", which is
probably after 1 year. You have made a somewhat convincing argument
that retargeting for SOI is not a huge issue if you have the experience
lying around. You have yet to make an argument that designing a GPU
takes less than 3 years.

I've never talked about the time it takes to design a GPU. I
wouldn't know; never worked on a GPU. The issue at hand was how
long it took to target a known design at SOI, nothing more, nothing
less. You try to move the goal posts, but I'm not letting you,
this time.
There's a wide range of CPUs that were designed at IBM. There's the
older stuff that was sold to AMCC, there's the POWER4/5/6, there's the
CELL MPU, the Xbox MPU, the z9, etc. etc. and there are probably others
I didn't mention because I wasn't aware of it.

There is a lot you're not aware of, apparently. Anyway, since
you're so interested in my CV (see other post).
 
Y

YKhan

Keith said:
Just make sure the operator pizza recipe book open to the right
page. ;-) Things are pretty well automated these days and one can
do many different processes on the same tools in adjacent lots.
You really don't think it takes a complete line per process?

No, no, just wondering if SOI would require special SOI-only tools for
any stage in a production process. Let's say we got a mythical 5 step
production line. Let's say steps 1,2,4,5 use the exact same tools but
step #3 has two different machines, one for SOI and one for bulk. In
other words, after it passes step #2, if it's bulk it goes to step #3a,
and if it's SOI it goes to step #3b? Then again after step #3a or #3b
the production line comes back to a common step #4?
I'm pretty sure it's bulk, but not 100%.

Well, even if it is, that would pretty much prove that you can do bulk
just as easily in SOI fabs.

What about the PowerPC chips that are made in Chartered for the
Xbox360? Bulk or SOI?

Yousuf Khan
 
K

krw

No, no, just wondering if SOI would require special SOI-only tools for
any stage in a production process. Let's say we got a mythical 5 step
production line. Let's say steps 1,2,4,5 use the exact same tools but
step #3 has two different machines, one for SOI and one for bulk. In
other words, after it passes step #2, if it's bulk it goes to step #3a,
and if it's SOI it goes to step #3b? Then again after step #3a or #3b
the production line comes back to a common step #4?

Could easily be true. In fact this could also happen for
contamination reasons (copper vs. aluminum, for example). I don't
do clean rooms (if you ever saw my office... ;). AIUI there are
some interesting tool scheduling issues in a mixed line, but it's
done regularly with other process differences. For example, 130nm
(perhaps even some 250nm still) in the same line as 90nm and
they're not building a new line for 65nm or 45nm. Then there is
SiGe, analog, SRAM, eDRAM, and they used to run even DRAM in the
same line (there was only one line at the time). Intel and AMD are
the only ones who can dedicate a fab to one product/process (one
trick ponies ;).
Well, even if it is, that would pretty much prove that you can do bulk
just as easily in SOI fabs.

That was proven a *long* time ago, which has been my contention
here all along. IBM's ASIC processes are bulk and its internal
process for processors is SOI and they're all done on the same
line, as is eDRAM, SiGe, and the whole enchilada.
What about the PowerPC chips that are made in Chartered for the
Xbox360? Bulk or SOI?

Thinking about it some more (WRT IBM's ASIC offerings), I'm almost
positive the graphics chip is bulk, but we were pretty sheltered
from the M$ details (since the group I'm in works on Nintendo chips
also). The processor may be too, but I can't remember. I'd have
to ask around and the details may still be (considered)
confidential.
 
D

David Kanter

I don't have a bloody clue. IBM designs a lot of stuff, he could be
You havent' a clue.

Care to elaborate? Do you mean to tell us that IBM is using cutting
edge MPU design techniques to achieve high performance the Z900 and
other mainframe processors?

Mainframes are *not* in any way shape or form competitive based on
computational capabilities, as embodied by, say, SPECint/fp. They most
likely don't do well on TPC-C either. However, they are some of the
most reliable systems, with a vastly superior IO architecture to most
of what exists. Oh yea, they also probably have the most (or second
most) valuable set of existing applications.

If you wish to highlight innovative, bleeding edge architectural and
circuit techniques in the z9 systems, feel free. I certainly don't see
SMT being deployed in mainframes...at least for several years. I will
more than happily retract my statement if you can provide proof that
mainframes actually are aggressive architecturally and circuit wise.

There's nothing wrong with not being on the cutting edge, and I'm sure
that quite a few IBM customers prefer to be on systems that are
designed in a relatively conservative fashion.
Dunce, I said CPU.

Which eliminates one of the possibilities.
Maybe he worked on those older embedded cores that were sold to AMCC...

Maybe you're just intentionally being a jackass. Since you think my
CV matters so much, the last seven years have been PPC750->
Nintendo->a few more 750 variations->PPC970->PPC970MP->PPC970FX->
[codename deleted]. Before that, 6X86, 6X86MX, and once upon a
time 3090 and ES900 crypto. That covers the last 20 years or so,
need more?

A simple "I have worked on the PPC970 line" would have worked...
I've worked on both bulk and SOI projects (as well as bipolar),
both IBM and vendor microprocessors, logic design, analog,
verification, and who knows what else. ...and your experience is
exactly? Fill in the blank: -><-

My experience is with performance analysis, benchmarking,
microarchitectural analysis and statistical modelling.
What an idiot. Of course you don't ignore FBE, you *USE* it.
...and the differences are invisible to the logic designer. Who
cares what the technology underneath is?

You seem to have missed the point of my comment.
Everyone has issues with yield now and then. I always though the
first go at a new line, new technicians, 90nm, and 300mm (and, and,
and,...) was a tad risky, but I'm not about to discuss the details
of dirty laundry. Everyone has PHBs too.

Fair enough, I wouldn't expect you to talk about sensitive things in a
public forum. My point is that you cannot simply *assume* that yields
will be good, based on results for relatively small full custom devices
that use a lot of SRAM, when talking about much larger semi-custom
devices that use very little SRAM.
You, on the other hand, have no problem criticizing what you've
never done and have no clue about, unless of course it's your pal
Intel. They can do no wrong.

Right...I don't seem to recall having ever said that. I think Intel's
does quite a lot wrong, like going from being a near or actual
monopolist to letting an upstart competitor into the market. Or
pursuing the P4 design 2 process generations too far (ironically, the
POWER6 appears to be following in the P4's footsteps in terms of
pursuit of clockspeed). Or creating a culture that does not allow
outsiders to succeed...

Intel has bumbled around just as much as anyone else in the industry.
Everyone makes mistakes, IBM, HP, Dell, Sun, Intel, AMD, nobody is
immune. In fact, ISTR pointing that out and being fiercely opposed by
some folks who think AMD can do no wrong...

DK
 
D

David Kanter

Keith said:
[snip]
That sounds an awful lot more than 'pick a different book', it sounds
like 'pick the best book, and if you need more, start playing around'.

No, synthesis usually picks what it thinks is the "best" book. If
it chooses poorly, the designer may pick a higher power book (or
one of the other knobs he has control over) to make timing. Timing
analysis shows the worst offenders and that's where the job starts.
If timing is close maybe the process guys can help out with a few
of their knobs (but that's going to cost yield).

Also, would the degree of interaction between logic and PD change for
ASIC versus semicustom versus full custom?
Much of it is "playing around" and it can get into a serious game
of "Whack-A-Mole", so experinece helps.

Yes, that much I've heard : )
Yes it is. It has nothing to do with how long it takes to convert
a design from bulk to SOI. completely irrelevant to the topic at
hand.

I don't think that's true. If someone ports a product to SOI (or from
SOI) and then discovers the yields suck for the new process, what's
going to happen?

You cannot produce parts with bad yields, unless you have something
else going on (system sales for instance). The original question was
how long (if at all) it will be before AMD is capable of moving ATI
business from TSMC or UMC to internal fabs. My answer was 'probably 2
years'.

The time line of going to production does, to the extent of my
knowledge, include getting suitable yields.
You're still searching for a clue.

That's right, I am. Obviously with your vast experience you are much
better positioned than me to know that yields actually don't matter for
semiconductor companies.

Pass the Kool-Aide please...

[snip]
I I say you're nuts.

OK, you care to elaborate then?
I've never talked about the time it takes to design a GPU. I
wouldn't know; never worked on a GPU. The issue at hand was how
long it took to target a known design at SOI, nothing more, nothing
less. You try to move the goal posts, but I'm not letting you,
this time.

Actually the original issue was if/when AMD could start producing GPUs
on their internal fabs rather than TSMC. That probably includes
achieving reasonable yields.

[snip]

DK
 
K

krw

Care to elaborate? Do you mean to tell us that IBM is using cutting
edge MPU design techniques to achieve high performance the Z900 and
other mainframe processors?

Why does IBM *have* high-end fab capability? Why would IBM use
less than "cutting-edge" design techniques on products with the
*HIGHEST* margins? Think about it. For a wannabe anal-yst, you
sure haven't studied the market much.
Mainframes are *not* in any way shape or form competitive based on
computational capabilities, as embodied by, say, SPECint/fp. They most
likely don't do well on TPC-C either.

Dunno about silly benchmarks, but the z's are used for some of the
largest transaction processing systems in the world. The're not
expensive because they're pretty.
However, they are some of the
most reliable systems, with a vastly superior IO architecture to most
of what exists. Oh yea, they also probably have the most (or second
most) valuable set of existing applications.

Now you're getting the idea.
If you wish to highlight innovative, bleeding edge architectural and
circuit techniques in the z9 systems, feel free. I certainly don't see
SMT being deployed in mainframes...at least for several years. I will
more than happily retract my statement if you can provide proof that
mainframes actually are aggressive architecturally and circuit wise.

Good grief. Do you really think SMT is that advanced? Come on,
it's a simple tweak (a friend was granted the important patents on
it long ago). Yes, every design trick known is put in the z's. It
wasn't until the mid '90s that they could use CMOS at all, and it
took a generation for CMOS to catch up even then. Before '95, give
or take a little, it was all ECL.
There's nothing wrong with not being on the cutting edge, and I'm sure
that quite a few IBM customers prefer to be on systems that are
designed in a relatively conservative fashion.

Clue up!
Dunce, I said CPU.

Which eliminates one of the possibilities.
Maybe he worked on those older embedded cores that were sold to AMCC...

Maybe you're just intentionally being a jackass. Since you think my
CV matters so much, the last seven years have been PPC750->
Nintendo->a few more 750 variations->PPC970->PPC970MP->PPC970FX->
[codename deleted]. Before that, 6X86, 6X86MX, and once upon a
time 3090 and ES900 crypto. That covers the last 20 years or so,
need more?

A simple "I have worked on the PPC970 line" would have worked...

....a lot more than "I have worked on". Hell, I've worked on a
PDP11 too.
My experience is with performance analysis, benchmarking,
microarchitectural analysis and statistical modelling.

Study more.
You seem to have missed the point of my comment.

Nope. It's irrelevant to the discussion. AMD already has a mature
SOI process. That's no longer a variable.
Fair enough, I wouldn't expect you to talk about sensitive things in a
public forum. My point is that you cannot simply *assume* that yields
will be good, based on results for relatively small full custom devices
that use a lot of SRAM, when talking about much larger semi-custom
devices that use very little SRAM.

Good grief! Yes you can. If it works for CPUs it *WILL* work for
GPUs. Even you say GPU << CPU.
Right...I don't seem to recall having ever said that.

Your alliance is as clear as Tom's.
I think Intel's
does quite a lot wrong, like going from being a near or actual
monopolist to letting an upstart competitor into the market. Or
pursuing the P4 design 2 process generations too far (ironically, the
POWER6 appears to be following in the P4's footsteps in terms of
pursuit of clockspeed). Or creating a culture that does not allow
outsiders to succeed...
Intel has bumbled around just as much as anyone else in the industry.
Everyone makes mistakes, IBM, HP, Dell, Sun, Intel, AMD, nobody is
immune. In fact, ISTR pointing that out and being fiercely opposed by
some folks who think AMD can do no wrong...

....except kick Intel's donkey, eh?
 
D

David Kanter

Care to elaborate? Do you mean to tell us that IBM is using cutting
Why does IBM *have* high-end fab capability? Why would IBM use
less than "cutting-edge" design techniques on products with the
*HIGHEST* margins? Think about it. For a wannabe anal-yst, you
sure haven't studied the market much.

Yea, I thought about it. Here's what I thought:

POWER4 - IBM thickened the gate oxide for higher reliability and lower
performance.

One example of going to more conservative, less sexy, lower performance
technology for higher reliability.

zArch - Customers don't want to be using bleeding edge products, they
want stable, reliable, dependable, etc.

Just as an example, let's think about the "Foxton" technology that the
Ft. Collins design team at Intel was touting. That's bleeding edge,
neat and cool technology. Except it turned out to be so bleeding edge
that Intel ended up in a bad situation.

I don't think IBM would ever try a stunt like that with their
mainframes. They would probably test it out somewhere else and then
migrate it into zarch MPUs.
Dunno about silly benchmarks, but the z's are used for some of the
largest transaction processing systems in the world. The're not
expensive because they're pretty.

I'm quite aware of that. Mainframes are the best tools in the world
for certain tasks, such as:

1. Batch processing
2. Running software written for mainframes that is expensive to
migrate
3. Serious I/O and virtualization
Now you're getting the idea.

Why thank you. Could I have some relish with that condescension?
Good grief. Do you really think SMT is that advanced?

Not compared to DMT or other forms of *MT. But it's a lot more
advanced than super scalar OOO.
Come on,
it's a simple tweak (a friend was granted the important patents on
it long ago).

I'm assuming said friend worked on Pulsar (or was it northstar...), the
SoEMT PPC design?
Yes, every design trick known is put in the z's. It
wasn't until the mid '90s that they could use CMOS at all, and it
took a generation for CMOS to catch up even then. Before '95, give
or take a little, it was all ECL.

I'm not making judgments here, I'm just pointing out the facts. That
mainframes were late to the CMOS party was a *decision* made at IBM.

[snip]
Nope. It's irrelevant to the discussion. AMD already has a mature
SOI process. That's no longer a variable.

I still don't believe it.

Let me ask you this, when Nvidia started using IBM as a fab, wasn't
IBM's SOI process already mature? That was 2003 when they publicly
announced it...so they had already been shipping POWER4/4+ IIRC.

If NV was working on a mature process, then, why were yields a problem?
That doesn't sound like a 'solved' problem.
Good grief! Yes you can. If it works for CPUs it *WILL* work for
GPUs. Even you say GPU << CPU.

I said GPUs were easier to design. I don't recall saying that GPUs
have higher yields than CPUs.
Your alliance is as clear as Tom's.

Way to talk about technology and not sling mud.

DK
 
D

Del Cecchi

krw said:
Could easily be true. In fact this could also happen for
contamination reasons (copper vs. aluminum, for example). I don't
do clean rooms (if you ever saw my office... ;). AIUI there are
some interesting tool scheduling issues in a mixed line, but it's
done regularly with other process differences. For example, 130nm
(perhaps even some 250nm still) in the same line as 90nm and
they're not building a new line for 65nm or 45nm. Then there is
SiGe, analog, SRAM, eDRAM, and they used to run even DRAM in the
same line (there was only one line at the time). Intel and AMD are
the only ones who can dedicate a fab to one product/process (one
trick ponies ;).




That was proven a *long* time ago, which has been my contention
here all along. IBM's ASIC processes are bulk and its internal
process for processors is SOI and they're all done on the same
line, as is eDRAM, SiGe, and the whole enchilada.




Thinking about it some more (WRT IBM's ASIC offerings), I'm almost
positive the graphics chip is bulk, but we were pretty sheltered
from the M$ details (since the group I'm in works on Nintendo chips
also). The processor may be too, but I can't remember. I'd have
to ask around and the details may still be (considered)
confidential.

Processor is SOI. That could hardly be confidential since a xbox could
be bought and taken apart.

As I understand it, the notion of a "line" as opposed to a collection of
tools and sectors is really not appropriate any more.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top