CPU vs GPU

Dr Richard Cranium · May 16, 2004

cross posted.

valid question has he, however; no clear direction can the answer come from - so the man
with the answer doesn't quite know whenst or wherest he is answering or whom is responding
from where.

basically.

the guy just "shotgunned" the newsgroups he thought might answer his already made up
mind - a kinda guy you hope has no authority or money to carry out his whims. know what i
mean.

i would even venture to say - he is German - as he is having a difficult time reading and
interpreting the answer (hence my "already made up mind" inference to the poster.

** No Fate **

cheers,
dracman
Tomb Raider: Shotgun City
http://www.smokeypoint.com/tomb.htm

http://www.smokeypoint.com/3dfx.htm

http://www.smokeypoint.com/My_PC.htm

: Someone clue me in on all the attacks on this (mis)post.
: Seems so much energy was expended on bashing this guys error of post. I'm
: new to newsgroups, so please explain the presidential level error of such
: magnitude that it required such negative attention.
:
:
: : > Hi,
: >
: > Today's CPUs are in the 3000 to 4000 MHZ clock speed range.
: >
: > Today's GPU's are in the 250 to 600 MHZ clock speed range.
: >
: > I am wondering if GPU's are holding back game graphics performance because
: > their clock speed is so low ?
: >
: > For example suppose someone programs a game with the following properties:
: >
: > 3 GHZ for os, game, network, etc logic
: > 1 GHZ for graphics.
: >
: > Would that game run faster than any other game ?
: >
: > How about
: >
: > 2 GHZ for game
: > 2 GHZ for graphics.
: >
: > I know CPU's have generic instructions. so 4 GHZ means a cpu can do about
: > 4000 miljoen generic instructions.
: >
: > What does GPU 500 mhz mean ???? 500 miljoen pixels ? 500 miljoen
: triangles
: > ? 100 miljoen T&L + 400 miljoen pixels ? what ?
: >
: > Bye,
: > Skybuck.
: >
: >
:
:

.................................................................
Posted via TITANnews - Uncensored Newsgroups Access-=Every Newsgroup - Anonymous, UNCENSORED, BROADBAND Downloads=-

RipFlex · May 16, 2004

You have serious issues. Period.

Newsgroups are a passing by for me, a dieing bread anyways. Netiquette...
sounds like a serious form a conduct? Just ****ing post, get replies read
ones you care about.. leave the rest of us alone. WTF netiquette?

*stirs up the anger*

Gerry Quinn · May 16, 2004

This over simplifies the situation A LOT, and I could write all day long to

fill-in the gaps to be more precise for the sake of vanity and avoid
procecution by my peers and colleagues but since who know who I am know
what I know I don't quite see the point.

There's one other benefit you didn't mention - even if the GPU were no
faster than the CPU, you could still gain by splitting up the work.

- Gerry Quinn

Gerry Quinn · May 16, 2004

the guy just "shotgunned" the newsgroups he thought might answer his already
made up
mind - a kinda guy you hope has no authority or money to carry out his whims.
know what i
mean.

It wasn't that stupid a question, even if he had apparently developed
some prejudice on the matter. And the newsgroups he chose were targeted
reasonably well, except for the language thing, ewhich was probably an
honest mistake.

At least he's not one of those complete ****wits who complain about
cross-posting per se.

- Gerry Quinn

joe smith · May 16, 2004

You have serious issues. Period.

Newsgroups are a passing by for me, a dieing bread anyways. Netiquette...
sounds like a serious form a conduct? Just ****ing post, get replies read
ones you care about.. leave the rest of us alone. WTF netiquette?

Netiquette is just a form of good manners, some have it, some don't .. in
world where no one cares to have good manners and think only their own
benefit to hell with anyone else interesting things begin to happen. I used
email for decade as a very nice method of communication, now, I just
abandoned my decade+ old email address.. it was so spam infested that it
became virtually useless, even with spam assassin and similiar filter
software. Five. Five ****ing. Hundred. Five ****ing hundred spams a day.
That's it: hotmail address which I *change* monthly and drop old one like
used condom, as can be observed I am beyond giving a rat's ass either how
much spam I get now.

Usenet used to be something more than it is now, because in the old days the
content was the king now the posters. Now, well, still hanging around and
reading post here post there.. but seems that the porn advertisements are
popping up everywhere. I will never fail to find a good bargain on Viagra,
when I might need it (hey getting older).

Actually we are coming to a point, where I would be willing to PAY for email
service which does forward only mail from other humans, who actually have
written the email for me to have something to say. Not to receive 500 mails
which are mailed to 10 million other people (ie. spam). $20 a month feels
very reasonable price for such service to me. Any takers? Oh forgot.. no one
is offering such service yet (hotmails and Gmails and shitnot with their
filtering is NOT the answer.. but when such service is being offered, I'll
be the first one in line. The point is that this great, effective,
convenient form of communication is being rendered ****ing useless by
opportunists.. same thing is happening in the Usenet, no one seems to give a
****. So, my proposal is for a system where people wouldn't have to give a
****, the technology should take care of the caring part and leave people
pissing on their own bed as much as they want w/o the smell coming to other
people's noses..

</off-topic lunatic ranting>

Roland Schweiger · May 16, 2004

So the thruth to be found out needs testing programs !

Test it on P4

Test it on GPU

And then see who's faster.

Sorry but at least *I* don't really understand your question.
How could you write a machine-language-programme that would run on a GPU as
well as a CPU ??
Are there any GPUs with an MMX, SSE, SSE2 or other known command set of a
Pentium CPU ?

Don't you know that GPUs work completely different than CPUs ?

For example : when you convert a DV video stream from any Video CamCorder
.... and you want to turn the large amount of data (about 14 GB per hour for
most Sony miniDV camcorders) into an
MPEG-2 video-stream, any GPU will not help you.
You need a very fast processor that can *calculate* and for instance, modern
Pentium IV with HyperThreading do such jobs very very well indeed.
In such a case, graphic processors do not help you at all.
And in many many games - you need both. You need lots of exact calculation
and also some work that can be done better by a GPU.

But you cannot down-limit everything to the clock speed.
The matter is too complex for that kind of generalization...

greetings from the city of Dresden.

Roland Schweiger

Conor · May 16, 2004

Well I still have seen no real answer.

Can I safely conclude it's non determinstic... in other words people dont
know shit.

No, its already been proven on some graphics card review sites that
CPUs are now holding the graphics cards up.

Conor · May 16, 2004

Well I still have seen no real answer.

Run 3DMark2003. Notice the framerates in the game sections. Then notice
the framerates in the CPU test sections where the CPU is calculating
the rendering instead of the GPU.

Conor · May 16, 2004

icq2 said:
Someone clue me in on all the attacks on this (mis)post.
Seems so much energy was expended on bashing this guys error of post. I'm
new to newsgroups, so please explain the presidential level error of such
magnitude that it required such negative attention.

THe problem wasn't the initial post but his complete inability to
accept the facts when shown to him.

J. Clarke · May 16, 2004

Skybuck said:
Absolutely not...

Since nowadays nvidia cards and maybe radeon cards have T&L... that means
transform and lighting.

That also means these cards can logically only do a limited ammount of
transform and lighting.

Any DX compliant board is going to have hardware transform and lighting. As
for doing "a limited amount", I'm not sure what point you're trying to
make. They do it in dedicated hardware, not in software. What takes a
regular CPU a dozen or so operations happens in one on the GPU.

5 years ago I bought a PIII 450 mhz and a TNT2 at 250 mhz.

If you're comparing apples to apples that PIII was running 100 MHz with a
clock multiplier of 4.5 while the TNT2 was running an honest 125 MHz (not
250--nvidia didn't get to that speed until the Geforce2).

Today there are P4's and AMD's at 3000 mhz... most graphic cards today are
stuck at 500 mhz and overheating with big cooling stuff on it.

Those P4s and AMDs are running around 200 MHz with clock multipliers of
12-16. GPUs get speed by parallelism, not by a deep pipeline, and so
aren't clock-multiplied--a current GPU running 500 MHz is running that as
the primary clock, not the multiplied clock.

So it seems cpu's have become 6 times as fast... (if that is true) and
graphics cards maybe 2 or 3 times ?! they do have new functionality.

Nope, both have increased in performance at about the same rate. Clock
speed is not the whole story. If it was then the P4s would be walking all
over the AMD-64s.

Engines are expensive to build.

Look at the doom3 engine... or any other engine... Suppose that engine
uses T&L... Suppose 5 years from now cpu's are again 6 times faster... and
graphics cards only twice as fast (the best most expensive one's I doubt
that is going to happen any time soon seeing the big heat problem ).

That could mean the doom3 engine or any other engine is seriously held
back if all T&L is done in the graphics card...

Why? You're assuming that doubling the clock speed results in doubling the
T&L rate. Suppose at the same time the clock speed is being doubled the
number of clock cycles needed to perform a stage of T&L is also cut in
half.

You're also assuming that T&L is all that the board does. That assumption
is also not correct. Further, you're assuming that T&L is the bottleneck.

I think john carmack is really smart and codes flexible stuff... so maybe
he is smart enough to do T&L in cpu as well... Actually doom3 might not
use T&L at all... since I was able to run the doom3 alpha on a TNT2 which
has no T&L... also rumors are lol that john carmack likes using opengl
that's true and doom3 only (?) uses opengl.

So for doom3 this might actually not be a problem... But for games like
call of duty, homeworld 2, halo... this could become a problem... a slow
problem that is =D

If your assumptions are correct. They are not.

J. Clarke · May 16, 2004

This is directed at "cranium"'s comment, which seems to have either gotten
lost in the ether or blocked by my filters.

Microsoft is "sitting on the 64 bit windows OS" only if making it avaiable
as a free download to anybody who wants it constitutes "sitting on it".

Skybuck Flying · May 17, 2004

From what I can tell from this text it goes something like this:

Pentium 4's can split instructions into multiple little instructions and
execute them at the same time.

I know Pentium's in general can even pipeline instructions... heck a 80486
can do that !

Now you're saying... a gpu can do many pixel/color operations... like
add/div/multiply whatever at the same time.

( I believe this is what you or generally is called 'stages' )

And it can also pipeline these 'pixel instructions'.

And apperently the number of instructions that can be done parallel or be
pipelined matters for performance.

So not much difference is there ?!

Pentium's have pipelines and parallel instructions and gpu's have pipelines
and parallel instructions.

Where is the difference ?!

Bye,
Skybuck.

Charles Banas · May 17, 2004

Skybuck said:
From what I can tell from this text it goes something like this:

Pentium 4's can split instructions into multiple little instructions and
execute them at the same time.

no. the pentium 4 splits those instruction into microops that flow
serially through one pipeline. but there are 3 pipelines.

I know Pentium's in general can even pipeline instructions... heck a 80486
can do that !

no. the 80486 was a static processor. it did not decode instructions
nor use RISC-style architecture to execute instructions.

Now you're saying... a gpu can do many pixel/color operations... like
add/div/multiply whatever at the same time.

several unrelated operations across the screen. it does a *LOT* of work
in one cycle because it's processing so many different pixels
simultaneously.

( I believe this is what you or generally is called 'stages' )

And it can also pipeline these 'pixel instructions'.

each pipeline is dedicated to a single pixel. so in a sense, yes.

And apperently the number of instructions that can be done parallel or be
pipelined matters for performance.

very much so. longer pipelines tend to greatly hurt performance on CPUs
because CPUs depend heavily these days on their caches and branch
prediction units. when the branch prediction fails, there is a pipeline
stall and 22 cycles are lost (on the P4 in some cases). on a GPU
pipeline stalls like that don't occur because each pipeline is dedicated
to a pixel and no branch prediction must be done. no pipeline stalls
are possible.

on the other hand, more pipelines greatly increase performance on GPUs.
on CPUs, 3 pipelines seems to be the "sweet spot". any more and there
will be too many pipeline stalls. any less and the pipelines become too
crowded. on CPUs, instructions can be dependent on other instructions
that may be in another pipeline - another source for pipeline stalls.
on GPUs, there is no dependence on other pipelines. each pipeline is
independent of the others.

So not much difference is there ?!

more than you think. a LOT more.

Pentium's have pipelines and parallel instructions and gpu's have pipelines
and parallel instructions.

Where is the difference ?!

Pentium 4s are limited to the dregree they can execute parallel
operations. GPUs are, for all intents and purposes, *UNLIMITED* in the
degree which they can be parallelized.

Skybuck Flying · May 17, 2004

Well

I just spent some time reading on about differences and similarities between
CPU's and GPU's and how they work together =D

Performing graphics requires assloads of bandwidth ! Like many GByte/sec.

Current CPU's can only do 2 GB/sec which is too slow.

GPU's are becoming more generic. GPU's could store data inside their own
memory. GPU's in the future can also have conditional jumps/program flow
control, etc.

So GPU's are staring to look more and more like CPU's.

It seems like Intel and AMD are a little bit falling behind when it comes to
bandwidth with the main memory.

Maybe that's a result of Memory wars

Like Vram, Dram, DDR, DRRII,
Rambit(?) and god knows what =D

Though intel and amd probably have a little bit more experience with making
generic cpu's are maybe lot's of people have left and joined GPU's lol.

Or maybe amd and intel people are getting old and are going to retire soon
<- braindrain

However AMD and INTEL have always done their best to keep things
COMPATIBLE... and that is where ATI and NVidia fail horribly it seems.

My TNT2 was only 5 years old and it now can't play some games lol =D

There is something called Riva Tuner... it has NVxx emulation... maybe that
works with my TNT2... I haven't tried it yet.

The greatest asset of GPU's is probably that they deliver a whole graphical
architecture with it... though opengl and directx have that as well... these
gpu stuff explain how to do like vertex and pixel shading and all the other
stuff around that level.

Though games still have to make sure to reduce the number of triangles that
need to be drawn... with bsp's, view frustum clipping, backface culling,
portal engines, and other things. Those can still be done fastest with
cpu's, since gpu's dont support/have it.

So my estimate would be:

1024x768x4 bytes * 70 Hz = 220.200.960 bytes per second = exactly 210
MB/sec

So as long as a programmer can simply draw to a frame buffer and have it
flipped to the graphics card this will work out just nicely...

So far no need for XX GB/Sec.

Ofcourse the triangles still have to be drawn....

Take a beast like Doom III.

How many triangles does it have at any given time...

Thx to bsp's, (possible portal engines), view frustum clipping, etc...

Doom III will only need to drawn maybe 4000 to maybe 10000 triangles for any
given time. ( It could be more... I'll find out how many triangles later

)

Maybe even more than that...

But I am beginning to see where the problem is.

Suppose a player is 'zoomed' in or standing close to a wall...

Then it doesn't really matter how many triangles have to be drawn....

Even if only 2 triangles have to be drawn... the problem is as follows:

All the pixels inside the triangles have to be interpolated...

And apperently even interpolated pixels have to be shaded etc...

Which makes me wonder if these shading calculations can be interpolated...
maybe that would be faster

But that's probably not possible otherwise it would already exist ?!

Or
somebody has to come up with a smart way to interpolate the shading etc for
the pixels

So now the problem is that:

1024x768 pixels have to be shaded... = 786.432 pixels !

That's a lot of pixels to shade !

There are only 2 normals needed I think... for each triangle... and maybe
with some smart code... each pixel can now have it's own normal.. or maybe
each pixel needs it's own normal... how does bump mapping work at this point
?

In any case let's assume the code has to work with 24 bytes for a normal.
(x,y,z in 64 bit floating point ).

The color is also in r,g,b,a in 64 bit floating point another 32 bytes for
color.

Maybe some other color has to be mixed together I ll give it another 32
bytes...

Well maybe some other things so let's round it at 100 bytes per pixel

786.432 pixels * 100 bytes = exactly 75 MB per frame * 70 Hz = 5250 MB /
sec.

So that's roughly 5.1 GB/sec that has to move through any processor just to
do my insane lighting per pixel

Ofcourse doom III or my insane game... uses a million fricking verteces (3d
points) plus some more stuff.

vertex x,y,z,
vertex normal x,y,z
vertex color r,g,b,a

So let's say another insane 100 bytes per vertex.

1 Million verteces * 100 bytes * 70 hz = 7.000.000.000

Which is rougly another 7 GB/sec for rotating, translating, storing the
verteces etc.

So that's a lot of data moving through any processor/memory !

I still think that if AMD or Intel is smart... though will increase the
bandwidth with main memory... so it reaches the Terabyte age

And I think these graphic cards will stop existing

just like windows
graphic accelerator cards stopped existing...

And then things will be back to normal =D

Just do everything via software on a generic processor <- must easier I hope
=D

Bye, Bye,
Skybuck.

joe smith · May 17, 2004

Pentium 4's can split instructions into multiple little instructions and

execute them at the same time.

Sort of.

I know Pentium's in general can even pipeline instructions... heck a 80486
can do that !

Yessir, true.

Now you're saying... a gpu can do many pixel/color operations... like
add/div/multiply whatever at the same time.

The net result of what I explained is that where GPU does 1-16 pixels per
CLOCK CYCLE, the CPU needs 40-400 clock cycles PER PIXEL. The fact that
there are 4-8 times more clock cycles for CPU is not enough to defeat this
architechtural headroom by a long shot.

( I believe this is what you or generally is called 'stages' )

Now that we are familiar with the basics, let's take a more insightful look
again into the issue at hand. GPU is geared handling N pixels
simultaneously, to do this, obviously it has multiply units doing same work.
But even if GPU would do single fragment per clock cycle, it would still be
a lot faster than CPU for this kind of work.

CPU is handling stream of instructions, ergo, it will do processing like
this:

instruction 1 (clock cycle 1)
instruction 2 (clock cycle 1)
instruction 3 (clock cycle 2)
....
instruction N (clock cycle ?)

Each of these instructions could be compared loosely to a "stage" in a GPU,
this is just analogue because in practise the GPU "stages" are just
different parts of silicon.. transistors in otherwords. Where CPU is doing
these one-step-at-a-time, GPU is also doing one-step-at-a-time, except for
GPU these steps are parallel, they happen simultaneously. This is how GPU is
"pipelined".

In CPU side, "pipelining" means there are multiply instructions in flight
simultaneously.. ie. the parts of microprocessor (transistors again!) which
do specific tasks are fed new jobs every clock cycle (or even faster, think
P4..). So, this achieves the result that new instruction completes each
clock cycle (just giving a loose reference what kind of performance we
should expect, architechture affects this in practise).

But since CPU needs 40-400 clock cycles (or even more!) for each pixel, for
example, it means GPU turning up new pixels each clock cycle have a quite an
advantage there.. above "table" could be written for GPU:

pixel 1 (clock cycle 1)
pixel 2 (clock cycle 1)
pixel 3 (clock cycle 1)
pixel 4 (clock cycle 1)
pixel 5 (clock cycle 1)
pixel 6 (clock cycle 1)
pixel 7 (clock cycle 1)
pixel 8 (clock cycle 1)
pixel 9 (clock cycle 2)
pixel 10 (clock cycle 2)
.....

Let's rehash... CPU would need hundreds of clock cycles for each pixel. For
trivial stuff like "gouraud texture" filler with perspective correction and
simple modulation, zbuffering and other default stuff.. the clock cycle
count can kept to reasonably small, say 30 clock cycles (+ memory latency,
assuming we are using cache optimized tiling pattern already). CPU just
isn't Good Enough for that kind of work and the pixels are much, much more
complex than in above example. The 30 clock cycles was for very
uninteresting way to compute the color of the pixel (or fragment.. with
GPU's term fragment is more accurate, ask why if interested

And it can also pipeline these 'pixel instructions'.

And apperently the number of instructions that can be done parallel or be
pipelined matters for performance.

The parallelism is just icing on the cake, really.. the idea is to defeat
other vendor's products not CPU based rendering, that battle CPU software
rendering lost years ago.

So not much difference is there ?!

No, not much.. only 10-100 times performance difference for same price..!

Pentium's have pipelines and parallel instructions and gpu's have pipelines
and parallel instructions.

Where is the difference ?!

In the number of instructions the CPU has to do to compute color of a
fragment.

Now allow me to go further for your convenience. As you might have
understood by now, the difference is that GPU has tons more multipliers,
adders and other transistors doing 'work'. There are diminishing returns
from adding functional units to a CPU, because CPU works serially. There is
dependency to previously executed instructions. The solution would be to add
a lot more registers to the design so that sourcecode could compile into
binary where number of dependencies would be reduced, but because
programming languages are still serial it would require a heavy paradigm
shift to _atleast_ programming practises with existing tools and languages
like Java, C, C++ for programmers to write code so that locally there would
be very little dependancy to previous instructions.

Now, this would require either very, very long pieces of code where the
results are brought together only in the end.. this means we wouldn't be
using function calls much (though latest compilers do have global linking
time optimizations, but it would help only to a degree.. it would leave a
lot of logic to the compiler and the issue was that current languages and
tools precisely weren't so great for this kind of job). But let's humor you
and assume the tools could do it. This means we would want, "sort of",
execute a lot of functions *simultaneously*, because CPU is using "IP" model
(Instruction Pointer) and so are most compilers and tools today, it means
the compiler would have to UNROLL different functions to compile into code
which is executed parallel.. so that all these numerous ALU and FPU, etc.
parts could be put into doing computation all the time.

This is so difficult problem that Intel invented Hyper Threading to put the
existing parts of the processors into better use. Running multiple threads
means there are more consumers for the transistors which can do 'work'
inside the CPU. The problem is to efficiently use the computational
resources the CPU already has! The problem is NOT how to put MORE in there..
like I said, there are diminishing returns from putting more 'stuff' inside
CPU which works serially!

Now look at GPU. It has clearly defined standard "work" which can be
implemented in the GPU transistors. Different filters for sampler lookup,
modulating two 4xN packed values and what not. It is therefore possible to
put a lot of transistors in there which do 'work' all the time! Then like I
said, multiple fragment color can be computed simultaneously.. this is
because each pixel in same primitive is running the same 'code' (combination
of renderstates or fragment program).

Like I said before, CPU has nothing to say to GPU for raw computing power.
Period.

joe smith · May 17, 2004

Though games still have to make sure to reduce the number of triangles
that

need to be drawn... with bsp's, view frustum clipping, backface culling,
portal engines, and other things. Those can still be done fastest with
cpu's, since gpu's dont support/have it.

That sort of design would slow down the GPU based rendering pipeline. It is
better render primitives in large batches than break the rendering into lot
of smaller batches even if there is smaller number of primitives in total
being rendered.

Frustum clipping with CPU? Never. The ideal is that the data is in GPU's
memory and rendering from there directly, if you clip with CPU it means you
will have to transfer the data into the GPU which is a major slowdown. If
you clip primitives you either render them individually (completely
braindead!) or you fill up the data into Vertex Buffer (Object) (and
optionally Index Buffer, or index list for glDrawElements() atleast).

Sometimes when the vertex data must be synthesized and there is no feasible
mechanism to do the synthesis on vertex program, then with DirectX 9 a very
good way is to lock a vertex buffer with NOOVERWRITE flag (this tells the M$
API that you won't overwrite vertices which might be processed at the time
so this leaves the GPU free to do what it wants while you write into the
buffer, A Very Good Thing). Then fill the buffer, burning CPU time.. and
memory bandwidth.

When done, you unlock and then render primitives dereferencing vertices in
the region of buffer you did just fill. But it is still Order Of blablabla..
faster to fill from static buffers. The next generation shaders will enable
sampling from textures in the vertex program (thinking of VS 3.0 and
hardware that supports this profile). This means floating-point data can be
stored into textures and chosen dynamically from there. This means bones for
skinning can be stored into texture. Or height values for displacement
mapping can be stored into textures. Or anything the imagination can think
of.. this will allow a new level of programmability to the GPU.

This will also mean slower pipeline, but, fear not, it won't come close to
the levels of CPU based geometry pipeline.. the base level for performance
with latest ATI and NVIDIA cards have increased a lot (though only NVIDIA
for now can do VS30).

Back to the topic at hand, clipping individual primitives just to "render
less" is a BIG no-no. BSP trees are also a big no-no, because their job was
to sort primitives "quickly" from changing point of view. Sorting means
broken batching. Broken batching means ****ing slow rendering (relatively
speaking). It is one of the Big No's in realtime GPU graphics.

What is A Good Thing, is to cull groups of primitives at a time. Say, you
have 'object', give it a bounding box or bounding sphere. If you can
determine the box or sphere is not visible then you don't have to render
whole 'object' at all. This is a Good Thing, no setup for the primitive
'collections', no setup for the transformations, etc. No primitives
submitted to the geometry pipeline. Et cetera. A good thing.

Similiarly, clipping to portals is redundant computation with CPU. If
clipping MUST be done, stenciling can work.. but it is fillrate consuming,
you have to fill the portal once into the stencil buffer (stencil refvalue
can be increased every time new portal is rendered so stencil buffer doesn't
need to be cleared.. unless stenciling is used for something else, which is
obviously architechtural problem and should be resolved depending on the
engine and/or rendering goals, effects... blabla.. the rule of thumb is more
generic the rendering system is going to be more performance it will eat to
cover all contingencies).

The lesson: Keep Them Batches Large. Saving in a wrong place can hurt the
performance A LOT. 160 fps or 5 fps? The difference can be in a ****up the
rendering engine does. But when know How Shit Works, such mistakes are done
less frequently and when know what the hardware can do, but it won't,
atleast know there is a problem and try to fix it.

Skybuck Flying · May 17, 2004

Hmmm I was wrong about the number of vertices and triangle doom 3 uses...

The doom 3 alpha 0.02 uses about 130.000 verteces and 70.000 triangles
during mars base levels.

All 6 monsters can mount up to 140.000 triangles !

Skybuck.

Skybuck Flying · May 17, 2004

joe smith said:
That sort of design would slow down the GPU based rendering pipeline. It is
better render primitives in large batches than break the rendering into lot
of smaller batches even if there is smaller number of primitives in total
being rendered.

Frustum clipping with CPU? Never. The ideal is that the data is in GPU's
memory and rendering from there directly, if you clip with CPU it means you
will have to transfer the data into the GPU which is a major slowdown. If
you clip primitives you either render them individually (completely
braindead!) or you fill up the data into Vertex Buffer (Object) (and
optionally Index Buffer, or index list for glDrawElements() atleast).

Well

Something has to determine what is visible and what is not...

Something has to determine what is in 'view' and what is not...

I know quake and other games used the cpu to do that, and bsp's and euhm
backface culling and god knows what =D

Now I believe you're saying that the GPU should be left to do that ?

How does the GPU do that ?

Skybuck.

J. Clarke · May 17, 2004

Skybuck said:
Well

I just spent some time reading on about differences and similarities
between CPU's and GPU's and how they work together =D

Performing graphics requires assloads of bandwidth ! Like many GByte/sec.

Current CPU's can only do 2 GB/sec which is too slow.

GPU's are becoming more generic. GPU's could store data inside their own
memory.

What exactly do you think they do with the 100+ meg they already _have_?
You think it's just sitting there to look pretty?

GPU's in the future can also have conditional jumps/program flow
control, etc.

What leads you to believe that they don't already do this?

So GPU's are staring to look more and more like CPU's.

A GPU is a special purpose CPU optimized for graphics use. There's no
reason why you couldn't port Linux to a Radeon or a GeforceFX except the
difficulty of connecting a disk to the thing.

It seems like Intel and AMD are a little bit falling behind when it comes
to bandwidth with the main memory.

Maybe that's a result of Memory wars Like Vram, Dram, DDR, DRRII,
Rambit(?) and god knows what =D

No, the "memory wars" are an attempt to get reasonably priced fast memory.

Though intel and amd probably have a little bit more experience with
making generic cpu's are maybe lot's of people have left and joined GPU's
lol.

Or maybe amd and intel people are getting old and are going to retire soon
<- braindrain

However AMD and INTEL have always done their best to keep things
COMPATIBLE... and that is where ATI and NVidia fail horribly it seems.

My TNT2 was only 5 years old and it now can't play some games lol =D

There is something called Riva Tuner... it has NVxx emulation... maybe
that works with my TNT2... I haven't tried it yet.

The greatest asset of GPU's is probably that they deliver a whole
graphical architecture with it... though opengl and directx have that as
well... these gpu stuff explain how to do like vertex and pixel shading
and all the other stuff around that level.

OpenGL and DirectX are feature-set standards. At one time GPUs were
designed in the absence of standards--that has changed--the current
consumer boards are optimized around the DirectX standard and the
workstation boards around OpenGL, however this is done with firmware--the
GPUs are the same and can be microcoded for either.

Though games still have to make sure to reduce the number of triangles
that need to be drawn... with bsp's, view frustum clipping, backface
culling, portal engines, and other things. Those can still be done fastest
with cpu's, since gpu's dont support/have it.

And you base this assessment on what information?

So my estimate would be:

1024x768x4 bytes * 70 Hz = 220.200.960 bytes per second = exactly 210
MB/sec

So as long as a programmer can simply draw to a frame buffer and have it
flipped to the graphics card this will work out just nicely...

If the programmer can do all the necessary calculations. If you're talking
about a P4-3200 then that would mean that it would have to do every pixel
in 64 calculations or less.

So far no need for XX GB/Sec.

Ofcourse the triangles still have to be drawn....

Take a beast like Doom III.

How many triangles does it have at any given time...

Thx to bsp's, (possible portal engines), view frustum clipping, etc...

Doom III will only need to drawn maybe 4000 to maybe 10000 triangles for
any given time. ( It could be more... I'll find out how many triangles
later )

Maybe even more than that...

But I am beginning to see where the problem is.

Suppose a player is 'zoomed' in or standing close to a wall...

Then it doesn't really matter how many triangles have to be drawn....

Even if only 2 triangles have to be drawn... the problem is as follows:

All the pixels inside the triangles have to be interpolated...

And apperently even interpolated pixels have to be shaded etc...

Which makes me wonder if these shading calculations can be interpolated...
maybe that would be faster

But that's probably not possible otherwise it would already exist ?! Or
somebody has to come up with a smart way to interpolate the shading etc
for the pixels

So now the problem is that:

1024x768 pixels have to be shaded... = 786.432 pixels !

That's a lot of pixels to shade !

There are only 2 normals needed I think... for each triangle... and maybe
with some smart code... each pixel can now have it's own normal.. or maybe
each pixel needs it's own normal... how does bump mapping work at this
point ?

In any case let's assume the code has to work with 24 bytes for a normal.
(x,y,z in 64 bit floating point ).

The color is also in r,g,b,a in 64 bit floating point another 32 bytes for
color.

Maybe some other color has to be mixed together I ll give it another 32
bytes...

Well maybe some other things so let's round it at 100 bytes per pixel

786.432 pixels * 100 bytes = exactly 75 MB per frame * 70 Hz = 5250 MB /
sec.

So that's roughly 5.1 GB/sec that has to move through any processor just
to do my insane lighting per pixel

Ofcourse doom III or my insane game... uses a million fricking verteces
(3d points) plus some more stuff.

vertex x,y,z,
vertex normal x,y,z
vertex color r,g,b,a

So let's say another insane 100 bytes per vertex.

1 Million verteces * 100 bytes * 70 hz = 7.000.000.000

Which is rougly another 7 GB/sec for rotating, translating, storing the
verteces etc.

So that's a lot of data moving through any processor/memory !

I still think that if AMD or Intel is smart... though will increase the
bandwidth with main memory... so it reaches the Terabyte age

Eventually that will happen. By that time the feature set of video
processors will likely be very thoroughly standardized and they'll be able
to handle any image at a few thousand frames a second and cost 50 cents.

And I think these graphic cards will stop existing just like windows
graphic accelerator cards stopped existing...

Huh? What is a Radeon or a GeforceFX if not a "Windows graphic accelerator
card"? They're designed specifically to accelerate DirectX, which in case
you haven't checked recently you will find to be a part of Windows.

And then things will be back to normal =D

Just do everything via software on a generic processor <- must easier I
hope =D

Nope. Lot easier to tell the GPU "draw me a sphere at thus and so
cooordinates" than it is to do all the calculations yourself.

Minotaur · May 17, 2004

Skybuck said:
Well

Something has to determine what is visible and what is not...

Something has to determine what is in 'view' and what is not...

I know quake and other games used the cpu to do that, and bsp's and euhm
backface culling and god knows what =D

Now I believe you're saying that the GPU should be left to do that ?

They do now... those options in the ATI drivers ain't there for nothing.

Games don't use GPU... but GPU testers do? AMD Radeon HD 6850	2	Nov 18, 2011
Catalyst 9.12 and Star Wars Battlefront II ? HD5770 problems ? Notworth the price ?	3	Dec 23, 2009
What are the symptoms of ATI card going to fail......??	1	Oct 5, 2004
Club 3D HD 7870 jokerCard Tahiti LE 2 GB	0	Dec 7, 2012
Gigabyte G1 Gaming GeForce GTX 960	0	Apr 7, 2015
NVIDIA Introduces GeForce GTX 1050 and 1050 Ti	0	Oct 18, 2016
HIS R9 270X IceQ X2 Turbo Boost 2 GB	0	Nov 26, 2013
Xbox 360 graphics capability vs PlayStation3 (X360 is superior)	18	May 2, 2007

CPU vs GPU

Dr Richard Cranium

RipFlex

Gerry Quinn

Gerry Quinn

joe smith

Roland Schweiger

Conor

Conor

Conor

J. Clarke

J. Clarke

Skybuck Flying

Charles Banas

Skybuck Flying

joe smith

joe smith

Skybuck Flying

Skybuck Flying

J. Clarke

Minotaur

Ask a Question

Similar Threads