CPU vs GPU

S

Skybuck Flying

Minotaur said:
They do now... those options in the ATI drivers ain't there for nothing.


Magick

Well such an answer ofcourse won't do for any serieus developer.

The developer has to know how fast a card does this.

So he can decide to use the card (gpu) or do it on the cpu :p :D

Skybuck.
 
J

joe smith

Well
Something has to determine what is visible and what is not...
Something has to determine what is in 'view' and what is not...

I know quake and other games used the cpu to do that, and bsp's and euhm
backface culling and god knows what =D

BSP trees are used for getting perfect zero-overlap sorted set of primitives
for renderer. GPU does not need, infact, the performance is hurt seriously
by such "optimization".. like I explained sorting is a Big No. I must check:
do you know what BSP tree is and what it is commonly used for? And what
Quake engine uses it for, specificly? Or are you just throwing buzzwords
around?
Now I believe you're saying that the GPU should be left to do that ?

You believe wrong. I am saying you cull away large batches at a time. Whole
objects and so on. Not individual primitives. That is da secret 2 sp3eD.

How does the GPU do that ? :)

There is functionality in the GPU which can query how many pixels pass
visibility test.. but this is poor approach due to fact that it has a lot of
latency. Doing visibility computation with CPU is quite cheap. Depending on
the system being implemented, ofcourse. Got something specific in mind?
 
J

joe smith

So he can decide to use the card (gpu) or do it on the cpu :p :D

That is work CPU is better at. Using GPU for deteming if there are
potentially visible pixels requires 1x overdraw fill cost for the bound
volume being rasterized. Then, we might get results back that yes, the
object is visible.. so now you have the luxury of filling most of these
pixels *again*, or, you filled all these pixels just to find out that the
object isn't visible.

Doesn't sound very attractive to me.. does it sound very attactive to you?
Didn't think so. For CPU culling bounding volume such as box or sphere to
volume defined by planes is very cheap. The most commonly used volume is the
frustum. Occluders can also be implemented using same mathematics, just
which primitives are best occluders is interesting problem. Generally best
approach is to recognize in preprocess which surfaces are "large" in a
contributing ways.. the level design and gameplay mechanics determine this
to a large degree. With a decent portal rendering system it is a moot point.

There are a lot of solutions to visibility problem more and less generic. I
don't think that you are genuinely interested in any of this, though..
 
J

joe smith

What leads you to believe that they don't already do this?

-- snip snip --

(vs_/ps_2_x profile)

Dynamic Flow Control
If D3DCAPS9.VS20Caps.DynamicFlowControlDepth > 0, then the following dynamic
flow control instructions are supported:

a.. if_comp
b.. break
c.. break_comp
If the D3DVS20CAPS_PREDICATION flag is also set on D3DCAPS9.VS20Caps.Caps,
then the following additional flow control instructions are supported:
a.. setp_comp
b.. if pred
c.. callnz pred
d.. breakp
The range of values for dynamic flow control depth is 0 to 24 and is equal
to the nesting depth of the dynamic flow control instructions (see Flow
Control Nesting Limits for details). If this cap is zero, the device does
not support dynamic flow control instructions.

-- snip snip --
 
R

Ralf Hildebrandt

Skybuck Flying wrote:

Pentium's have pipelines and parallel instructions and gpu's have pipelines
and parallel instructions.

Where is the difference ?! :)

Just an addition to joe smith's comment:

The difference is the computed algorithm. If you have to add 3 numbers
and multiply them with a constant, you can build a combinational logic
block, that does just this. (And indeed this block will be optimized to
do just this job really fast. (E.g. the multiplication by the constant
gives room for many optimizations.)) At a conventional CPU you have to
do 3 steps: 2 additions and one multiplication.

If you have some specific combinational logic to implement (e.g. invert
bit 7, XOR it with bit 4 and AND it with the OR-concatenation of bits 12
to 16 of one number), it is _very_ easy to implement it in hardware:

result <= (number(7) XOR number(4)) AND (number(12) OR number(13) OR
number(14) OR number(15) OR number(16));

Now try to find a software algorithm on a general purpose CPU, that
computes this in similar time - no way!


Conclusion: If you have a specific problem, you can design a specific
circuit, that solves just this problem. This solution is highly
optimized and therefore fast, small and area-efficient - but highly
specialized and therefore incapable of doing other jobs.


f'up to poster set

Ralf
 
S

Stephen H. Westin

Skybuck Flying said:
Hi,

Today's CPUs are in the 3000 to 4000 MHZ clock speed range.

Today's GPU's are in the 250 to 600 MHZ clock speed range.

I am wondering if GPU's are holding back game graphics performance because
their clock speed is so low ?

Au contraire. GPU's may run at lower clock speeds, but they compute
far faster than CPU's. And that speed is increasing much faster than
for CPU's. Because of this, many are working on ways to implement
traditional CPU computations on GPU's. See, for example,
<http://www.gpgpu.org/>. The advantage of the GPU is that it doesn't
have to implement the traditional von Neumann model of a single
processor executing sequentially from a large memory that can all be
accessed at the same speed and that also contains data.

<snip>
 
S

Stephen H. Westin

Skybuck Flying said:
Well I still have seen no real answer.

Can I safely conclude it's non determinstic... in other words people dont
know shit.

Let's see. There are two alternative hypotheses:

1. There is something you don't know, or don't understand.

2. Everyone else is stupid or ignorant.

You obviously have decided on #2. You might not want to jump to so
hasty a conclusion.
So the thruth to be found out needs testing programs !

Test it on P4

Test it on GPU

And then see who's faster.

See http://www.gpgpu.org/.

But then, you're probably just trolling.

<snip>
 
S

Stephen H. Westin

Since nowadays nvidia cards and maybe radeon cards have T&L... that means
transform and lighting.

Ah, so you really don't know anything about modern graphics cards,
which can in fact perform non-trivial calculations at pixel rates. Any
reasonable mid-to-high end card does all shading on the card:
transform, lighting, texturing, shadows, fog, etc.
That also means these cards can logically only do a limited ammount of
transform and lighting.

5 years ago I bought a PIII 450 mhz and a TNT2 at 250 mhz.

Today there are P4's and AMD's at 3000 mhz... most graphic cards today are
stuck at 500 mhz and overheating with big cooling stuff on it.

But they have just as many transistors. More of which are dedicated to
computation. As others have pointed out, only the very ignorant assume
that performance is determined by clock rate alone. Are you among
them?
So it seems cpu's have become 6 times as fast... (if that is true) and
graphics cards maybe 2 or 3 times ?! they do have new functionality.

No, GPU's have increased *much more* in speed over that time. Please
go get a clue, rather than arguing from false assumptions.

<snip>
 
S

Stephen H. Westin

Skybuck Flying said:
From what I can tell from this text it goes something like this:

Pentium 4's can split instructions into multiple little instructions and
execute them at the same time.

No, it can take an instruction and split its multi-clock execution
time over several units in the pipeline. An individual instruction
takes quite a long time, but several can overlap in time, giving
parallelism.
I know Pentium's in general can even pipeline instructions... heck a 80486
can do that ! ;)

Now you're saying... a gpu can do many pixel/color operations... like
add/div/multiply whatever at the same time.

( I believe this is what you or generally is called 'stages' )

No, you can execute operations *at the same time*.
And it can also pipeline these 'pixel instructions'.

And apperently the number of instructions that can be done parallel or be
pipelined matters for performance.

So not much difference is there ?! :)

Pentium's have pipelines and parallel instructions and gpu's have pipelines
and parallel instructions.

Where is the difference ?! :)

To start with, the massive amount of logic needed to take a serial
instruction stream and break it down into parallelizable bits while
dealing with data dependency and variable latency of multi-level cache
schemes. A modern CPU is horribly complex because it has to maintain
the fiction that instructions are executed in series, and that there
is a >100MB memory that all operates at full CPU speed. The
parallelism in GPU operations make things much simpler: a few bytes
come in on a fast parallel connection (probably not a general bus),
some computation is done in small local memory and with access to a
larger, but still specialized, texture memory, and a few different
bytes get shipped out the end.
 
S

Stephen H. Westin

Skybuck Flying said:
Well

I just spent some time reading on about differences and similarities between
CPU's and GPU's and how they work together =D

Performing graphics requires assloads of bandwidth ! Like many GByte/sec.

Current CPU's can only do 2 GB/sec which is too slow.

GPU's are becoming more generic. GPU's could store data inside their own
memory. GPU's in the future can also have conditional jumps/program flow
control, etc.

So GPU's are staring to look more and more like CPU's.

Yes. Do a Google search on "wheel of incarnation". This phenomenon has
occurred over and over again, and was first described in the 1960's.
It seems like Intel and AMD are a little bit falling behind when it comes to
bandwidth with the main memory.

Maybe that's a result of Memory wars :) Like Vram, Dram, DDR, DRRII,
Rambit(?) and god knows what =D

No, because they are solving a different problem. It's a bit like saying
Ferrari is falling behind because you can get a minivan that seats 7,
while their cars only seat 2 or 4.

However AMD and INTEL have always done their best to keep things
COMPATIBLE... and that is where ATI and NVidia fail horribly it seems.

Yup, that has always been the problem with high-speed special-purpose
hardware for any purpose: you are locked into one vendor, and have no
assurance of a compatible upgrade path. But now there are things like
Direct3D shaders that give a vendor-independent standard for programming
the GPU.
My TNT2 was only 5 years old and it now can't play some games lol =D

But games that are 5 years old will, if properly written, work fine on
the newest hardware. You aren't really asking game designers not to use
enhanced capabilities, are you?

The greatest asset of GPU's is probably that they deliver a whole graphical
architecture with it... though opengl and directx have that as well... these
gpu stuff explain how to do like vertex and pixel shading and all the other
stuff around that level.

No, people have noticed that GPU's offer tremendous processing power at
a shockingly low price, and that the gap between them and CPU's is
growing greater with every generation.
Though games still have to make sure to reduce the number of triangles that
need to be drawn... with bsp's, view frustum clipping, backface culling,
portal engines, and other things. Those can still be done fastest with
cpu's, since gpu's dont support/have it.

You are out of date, aren't you?
So my estimate would be:

1024x768x4 bytes * 70 Hz = 220.200.960 bytes per second = exactly 210
MB/sec

So as long as a programmer can simply draw to a frame buffer and have it
flipped to the graphics card this will work out just nicely...

Until I switch my monitor to 1600x1200.
So far no need for XX GB/Sec.

Again, you have absolutely no clue as to what goes on. For one thing,
you assume that each pixel is written only once.
Ofcourse the triangles still have to be drawn....

And lit, and textured, and shadows generated, and possibly fog and glow...

<snip>
 
S

Stephen H. Westin

Skybuck Flying said:
Well such an answer ofcourse won't do for any serieus developer.

We already figured out that you aren't one of those. Folks are
starting to get insulting because you keep posting without any current
knowledge of graphics hardware or game implementations.
The developer has to know how fast a card does this.

They do.
So he can decide to use the card (gpu) or do it on the cpu :p :D

So why do you think so many "serious developers" are moving more
computation to the GPU?
 
F

FSAA

Stephen H. Westin said:
We already figured out that you aren't one of those. Folks are
starting to get insulting because you keep posting without any current
knowledge of graphics hardware or game implementations.


They do.


So why do you think so many "serious developers" are moving more
computation to the GPU?

Steve,

I would save myself some efforts and just ignore him, he is just trolling
and x-posting to all the NGs

FSAA
 
C

Charles Banas

Skybuck said:
Well

I just spent some time reading on about differences and similarities between
CPU's and GPU's and how they work together =D

Performing graphics requires assloads of bandwidth ! Like many GByte/sec.
you have NO idea.
Current CPU's can only do 2 GB/sec which is too slow.
in most cases, though, this is more than enough. there are
memory-intensive applications that need faster memory to keep from
stalling the CPU, but this is mostly a limitation of the FrontSide Bus
and memory modules. the faster and bigger memory gets, the more
expensive it is.
GPU's are becoming more generic. GPU's could store data inside their own
memory. GPU's in the future can also have conditional jumps/program flow
control, etc.

So GPU's are staring to look more and more like CPU's.
inasmuch that GPUs now perform pixel-specific calculations on texture
and video information. GPUs now also perform a lot of vector processing
work, which helps immensely in such tasks as model animation and the like.
It seems like Intel and AMD are a little bit falling behind when it comes to
bandwidth with the main memory.

Maybe that's a result of Memory wars :) Like Vram, Dram, DDR, DRRII,
Rambit(?) and god knows what =D
no. those are different memory technologies that evolved to keep up
with the growing demand for faster memory. keep in mind CPUs are only
10x as fast as memory now, and that gap is shrinking FAST. (due to
increasing popularity of DDR and the introduction of DDR-II, which, by
the way, is very fast.)
Though intel and amd probably have a little bit more experience with making
generic cpu's are maybe lot's of people have left and joined GPU's lol.

Or maybe amd and intel people are getting old and are going to retire soon
<- braindrain :)
i haven't the slightest clue how you got this inane idea. AMD and intel
aren't going anywhere except UP. CPUs are where the Operating System
run, and they are where programs run. everything in a computer,
including its peripherals, depend on the CPU and its support hardware.
software, which depends on the CPU has the capability to offload work to
specialized hardware - which ATi, nVidia, Creative, 3D Labs, et. al.
manufacture. work can even be sent to the sound card for
post-processing and special effects! (this is audio processing that can
NOT be done on the CPU no matter how bad you want it to.)
However AMD and INTEL have always done their best to keep things
COMPATIBLE... and that is where ATI and NVidia fail horribly it seems.
ATi and nVidia are competing to set standards. what they are trying to
do is not create hardware that will work like other hardware. they are
trying to make hardware that is not only better in performance, but
better architecturally with more features and capabilities. all thanks
to the demands of the gaming industry.

you're not making an "apples-to-apples" comparison. you're comparing
two unrelated industries. things don't work that way.

nVidia and ATi write driver software to control their cards and pass
information from software to hardware. their *DRIVERS* provide two
interfaces to the hardware: DirectX and OpenGL. that is all the
standardisation nVidia, ATi, et. al. need. you can write an OpenGL game
and expect it to work on both brands of cards, for example.
My TNT2 was only 5 years old and it now can't play some games lol =D
that's because it doesn't have the capabilities that games expect these
days. games expect the hardware to handle all of the texturing,
lighting, and transformations, and rarely do it in software.
There is something called Riva Tuner... it has NVxx emulation... maybe that
works with my TNT2... I haven't tried it yet.
if memory serves, it activates a driver option. that emulation is in
nVidia's drivers. it's hidden and disabled by default but was made
available for developer testing.
The greatest asset of GPU's is probably that they deliver a whole graphical
architecture with it... though opengl and directx have that as well... these
gpu stuff explain how to do like vertex and pixel shading and all the other
stuff around that level.
again you've got it backwards. the GPU *PROVIDES* the graphical
services *THROUGH* OpenGL and DirectX. but in addition to this, OpenGL
and DirectX themselves provide a /separate/ software-only
implementation. that is a requirement of both standards. anything that
is not done in hardware must be done in software. *BY THE DRIVER.*

in addition, vertex shading is a convenience. nothing more. but it
happens to be a *FAST* convenience.

pixel shading can *only* be done in hardware. it is a technology that
requires so much computational power that ONLY a GPU can provide it. it
would take entirely too much CPU work to do all that.
Though games still have to make sure to reduce the number of triangles that
need to be drawn... with bsp's, view frustum clipping, backface culling,
portal engines, and other things. Those can still be done fastest with
cpu's, since gpu's dont support/have it.
you again have it wrong.

game engines use those techniques to reduce the amount of data that they
send to the GPU. vertices still have to be sent to the GPU before they
are drawn - and sent again for each frame. to speed THAT process up,
game engines use culling techniques so they have less data to send. not
so the GPU has less work to do.
So my estimate would be:

1024x768x4 bytes * 70 Hz = 220.200.960 bytes per second = exactly 210
MB/sec

So as long as a programmer can simply draw to a frame buffer and have it
flipped to the graphics card this will work out just nicely...
that's the way it was done before GPUs had many capabilities - like an
old TNT or the like.

but in those days, it was ALL done one pixel at a time.
So far no need for XX GB/Sec.

Ofcourse the triangles still have to be drawn....
for a triangle to be drawn, first the vertexes must be transformed.
this invoves a good deal of trigonometry to map a 3D point to a dot on
the screen. second, the vertices are connected and the area is filled
with a color. or:

1. vertexes transformed (rememebr "T&L"?)
2. area bounded among the vertices
3. texture processing is performed (if necessary. this is done by pixel
shaders now, so it's all on the GPU. without pixel shaders, this is
done entirely on the CPU.) this includes lighting, coloring (if
applicable), or rendering (in the case of environment maps).
4. the texture map is transformed to be mapped to the polygon.
5. the texture map is then drawn on the polygon.

in an even more extreme case, we deal with the Z-buffer (or the DirectX
W-buffer), vertex transformations (with vertex shaders), screen
clipping, and screen post-processing (like glow, fades, blurring,
anti-aliasing, etc.)
Take a beast like Doom III.

How many triangles does it have at any given time...

Thx to bsp's, (possible portal engines), view frustum clipping, etc...

Doom III will only need to drawn maybe 4000 to maybe 10000 triangles for any
given time. ( It could be more... I'll find out how many triangles later
;) )

Maybe even more than that...
as you found out very quickly, it draws a LOT of polygons. how many did
Doom use? there were about 7 or 8 on the screen, on average. more for
complex maps. the monsters? single-polygon "imposters" (to use current
terminology).
But I am beginning to see where the problem is.

Suppose a player is 'zoomed' in or standing close to a wall...

Then it doesn't really matter how many triangles have to be drawn....

Even if only 2 triangles have to be drawn... the problem is as follows:

All the pixels inside the triangles have to be interpolated...

And apperently even interpolated pixels have to be shaded etc...

Which makes me wonder if these shading calculations can be interpolated...
maybe that would be faster ;)

But that's probably not possible otherwise it would already exist ?! ;) Or
somebody has to come up with a smart way to interpolate the shading etc for
the pixels ;)
during the texture shading step, that's irrelevant. most textures are
square - 256x256, 512x512, 1024x1024, etc. (there are even 3D textures,
but i won't go into that.) when those textures are shaded for lighting,
all of those pixels must be processed before the texture can be used.
pixel shaders change that and enable us to do those calculations at
run-time.

i won't explain how, since i'm not entirely sure. i haven't had the
chance to use them yet. they may be screen-based or texture-based. i
don't know. maybe both. i'll find out one of these days.
So now the problem is that:

1024x768 pixels have to be shaded... = 786.432 pixels !

That's a lot of pixels to shade !
really. you figured this out. how nice. imagine doing that on the CPU.

on second thought, go back up and re-read what i said about pixel
shaders being done in software.
There are only 2 normals needed I think... for each triangle... and maybe
with some smart code... each pixel can now have it's own normal.. or maybe
each pixel needs it's own normal... how does bump mapping work at this point
?
normals are only really required for texture orientation and lighting.
with software lighting, the texture is usually processed and then sent
to the GPU before the level begins. some games still do that. some use
vertex lighting which gives a polygon lighting based on the strength of
light at each vertex - rather than at each pixel within the polygon. as
you can imagine, that's quite ugly.

hardware lighting (implied by "T&L") gives that job to the GPU. it
allows the GPU to perform lighting calculations accurately across a
polygon while the CPU focuses on more important things. (you might see
this called "dynamic lighting".)

pixel shaders allow for even more amazing dynamic effects with lights in
real-time.

now, about bump-mapping. well, it is what its name implies. it is a
single-color texture map that represents height data. it is processed
based on the position of lights relative to the polygon it's mapped to.
it adds a lot to the realism of a game.

there are numerous algorithms for this, each with its advantages and
disadvantages. nVidia introduced something cool with one of the TNTs
(or maybe GeForce. my history is rusty.) called "register combiners".
this allows developers to do lots of fancy texture tricks like bump
mapping on the GPU.

the basic idea is that light levels are calculated based on wether the
bump map is rising or falling in the direction fo the light. if you
want to know more, there are a lot of tutorials out there.
In any case let's assume the code has to work with 24 bytes for a normal.
(x,y,z in 64 bit floating point ).
it's not. 32-bit floats are more commonly used because of the speed
advantage. CPUs are quite slow when it comes to floating-point math.
(compared to GPUs or plain integer math.)
The color is also in r,g,b,a in 64 bit floating point another 32 bytes for
color.

Maybe some other color has to be mixed together I ll give it another 32
bytes...

Well maybe some other things so let's round it at 100 bytes per pixel ;)
now you're way off. the actual data involves 16 bytes per vertex (the
fourth float is usually 0.0), with usually 3 vertices per polygon, and a
plane normal (another 4-part vertex, sometimes not stored at all), with
texture coordinates. that's sent to the GPU in a display list.

the GPU performs all remaining calculations itself and /creates/ pixel
data that it places in the frame buffer. the frame buffer is then sent
to the monitor by way of a RAMDAC.

So that's roughly 5.1 GB/sec that has to move through any processor just to
do my insane lighting per pixel ;)
assuming a situation that doesn't exist in game development on current CPUs.
Ofcourse doom III or my insane game... uses a million fricking verteces (3d
points) plus some more stuff.

vertex x,y,z,
vertex normal x,y,z
vertex color r,g,b,a

So let's say another insane 100 bytes per vertex.
let's not.
1 Million verteces * 100 bytes * 70 hz = 7.000.000.000

Which is rougly another 7 GB/sec for rotating, translating, storing the
verteces etc.
no. you're still assuming the video card stores vertices with 64-bit
precision internally. it doesn't. 32-bit is more common. 16-bit and
24-bit is also used on the GPU itself to varying degrees.
So that's a lot of data moving through any processor/memory !
it would be, but it's not.
I still think that if AMD or Intel is smart... though will increase the
bandwidth with main memory... so it reaches the Terabyte age ;)
you're misleading yourself now. it's not Intel or AMD's responsibility.
And I think these graphic cards will stop existing ;) just like windows
graphic accelerator cards stopped existing...
they still exist. what do you think your TNT2 does? it accelerates
graphics. on windows. imagine that.
And then things will be back to normal =D

Just do everything via software on a generic processor <- must easier I hope
=D
you still seem terribly confused about what even is stored in GPU memory.

the vertex data is actually quite small and isn't stored in video memory
for very long.

the reason video card susually come with so much memory is for TEXTURE,
FRAME BUFFER, and AUXILLIARY BUFFER storage.

the frame buffer is what you see on the screen.

the Z buffer is the buffer that keeps track of the vertex distance from
the screen. this is so it can sort the polygons and display them correctly.

auxilliary buffers can be a lot of things: stencil buffers are used for
shadow volume techniques, for example.

textures take up the majority of that storage space. a single 256x256
texture takes up 262,144 bytes. a 512x512 texture takes up 1MB.
1024x1024 is 4MB. textures as large as 4096x4096 are possible (though
not common) - that's 64MB.

and what of 3D textures? let's take a 64x64x64 texture. small, right?
that's 1MB all on its own.

so how big is the frame buffer? well, if there's only one, that's just
fine. but DirectX supports triple-buffering and OpenGL supports
double-buffering. that means 2 or 3 frames are stored at once. they
are flipped to the screen through the RAMDAC.

and not only must the GPU store and keep track of all that data, but it
PROCESSES IT in real-time with each frame.

your proposal requires we go back to the 200+ instructions per pixel
that games once required. do you expect us to go back to mode 13h where
that kind of computation is still feasible with the same kind of graphic
quality we have now?

for us as developers, the GPU is a Godsend. it has saved us from doing
a lot of work ourselves. it has allowed us to expand our engines to the
point where we can do in real-time that once required vast
super-computer clusters to do over the course of MONTHS. the GeForce 4
(as i recall) rendered Final Fantasy: The Spirits Within at 0.4 frames
per second. it took several MONTHS to render the final movie on CPUs.

one final time: CPUs are general purpose. they are meant to do a lot
of things equally well. GPUs are specialized. they are meant to do one
thing and do it damned well. drivers for those GPUs are written to make
developers' lives easier, and let developers do wat is otherwise impossible.

now, i'll close because lightning may strike the power grid at any
moment after the rain passes.
 
C

Charles Banas

Skybuck said:
Well such an answer ofcourse won't do for any serieus developer.
and the serious developer is more worried about other things than the
implementation details of a particular card. we don't need to know how
the GPU does it, just that the GPU does it and how we can take advantage
of that fact.
The developer has to know how fast a card does this.
no he doesn't. the developer needs to know how much work he can
eliminate to make the whole thing work as efficiently as possible. it
just so happens that the GPU is much better at it than the CPU.
So he can decide to use the card (gpu) or do it on the cpu :p :D
the *ENTIRE* graphical side of an engine can be done on the GPU. ALL of
it. from vertex transformation to complex visual effects and anti-aliasing.

the CPu has other thigns to worry about. like input, data processing,
AI, audio, networking, task management, memory management, and a host of
other thigns you've never heard of.
 
J

joe smith

the basic idea is that light levels are calculated based on wether the
bump map is rising or falling in the direction fo the light. if you
want to know more, there are a lot of tutorials out there.

The bumpmapping is commonly done using dot3 or dot4, which returns cosine of
the angle between two vectors.. namely surface normal and vector from
surface normal to the light source. What is stored in the normal map are the
normal vectors for the surface. It is common practise novadays compute the
normal map from higher precision geometry which is then discarded and the
resulting normal map is mapped into lower resolution trimesh.

Just want to clarify this bit so that it is more obvious that the sourcedata
from which the normal map is generated is not input to the GPU dp3/dp4.

To make matters more interesting, the normal map can be in tangent or model
coordinates. In model coordinates it is more "straightforward" as the light
emitters/sources can be transformed to model coordinates and then the
interpolated normal map samples can be used directly for lighting
computation. This approach lends itself poorly to animated data, say, a
skinned character because when trimesh is skinned the vertices move relative
to model coordinate system. Therefore the normal map is inaccurately stored
; the solution is to store the normal map in "tangent space", which is space
relative to 3x3 transformation matrix at each vertex of primitives where the
normal map is, uh, mapped. Only two basis vectors are often stored because
it is possible to synthesize the third with cross product from other two.

*phew*, so the input for the GPU is actually:

for ps:
- texture coordinate for normal map
- sampler for normal map
- light vector in tangent space
- tangent space transformation

I played around with optimization to the default way a while ago that I
never stored light vector in tangent space, but rather embedded the
transformation to tangent space transformation.. yessir, it worked - but
problem: it was more hassle than it was worth, because this way then needed
multiple tangent space transformation, one for each lightsource and
generally it was more efficient (no shit!) to pass on the light vector (and
other information like attenuations) separately instead. Just want to warn
anyone from this shitty approach. It is pretty useless, but I wanted to
document my error, the point is that when in doubt, try! In that sense the
original poster IMHO is close to the truth..

A lot of design choises are based on information and experience, but
sometimes just trying shit is the only way to "click" new information into
the existing framework of experience.. if That Guy is interested in GPU
programming (why else would he hang around here asking these questions?) I
recommend he tries some GPU programming, it's easy & fun.. and get
impressive results really easily. :)

MUCH easier than software rendering days, much... and things work a hell of
a lot better "out of the box" novadays than just 4-5 years ago! Biggest
problem is getting started.. maybe he'll ask about that.. if not.. I assume
he is already doing whatever he likes to do..
 
S

Skybuck Flying

joe smith said:
BSP trees are used for getting perfect zero-overlap sorted set of primitives
for renderer. GPU does not need, infact, the performance is hurt seriously
by such "optimization".. like I explained sorting is a Big No. I must check:
do you know what BSP tree is and what it is commonly used for? And what
Quake engine uses it for, specificly? Or are you just throwing buzzwords
around?

Well let's take a look at doom 3 alpha... it runs slow... I dont know why it
runs slow.

I do know that id software engines are one of the fastest engines on the
face of the planet for games :) with decent graphics =D

Looking at doom 3 I see the following:

+ portal engine
+ bsp trees
+ possibly backface culling.

Ok, now I have been under a rock for the past 5 years or so lol when it
comes to graphics.

I was amazing doom 3 still does all this in cpu, but I could be wrong.

Some parts of these algorithms could be done inside the gpu.

I can imagine that a gpu can 'cut' triangles with the viewing frustum ? (
Yet I think you say this is bad or something ? )

Also new verteces can not be created inside a gpu ? or maybe it can ?
because clipping a triangle with any plane can introduce new verteces.

I saw some other power point presentation... that verteces can be clipped
??? or maybe they mean projected onto a plane or something ???

How can verteces be clipped and be usefull since now their coordinates are
totally different ? seems weird to me... seems like objects would get
deformed...

I can see how a line can be clipped or a triangle... but a vertex ? huh ?

So one question is:

Can gpu's clip lines, triangles (maybe even verteces?) against the frustum
or any other plane or triangle ?

I can also see how a gpu could do backface culling.

But I am guessing the portal engine and the bsp trees are still inside the
cpu.

I know that bsp trees are used to detect if a certain wall is behind another
certain wall so that the wall that is behind can be skipped/not drawn.

Portal engines are used to skip entire rooms.

It also looks like doom3 is using some technique to wrap objects inside
boxes... and then probably do a simpel 'is box in view' test... does that
technique have a name which is easy to remember ;) ? :)

Bye,
Skybuck.
 
S

Skybuck Flying

Stephen H. Westin said:
Au contraire. GPU's may run at lower clock speeds, but they compute
far faster than CPU's. And that speed is increasing much faster than
for CPU's. Because of this, many are working on ways to implement
traditional CPU computations on GPU's. See, for example,
<http://www.gpgpu.org/>. The advantage of the GPU is that it doesn't
have to implement the traditional von Neumann model of a single
processor executing sequentially from a large memory that can all be
accessed at the same speed and that also contains data.

Are you suggesting or anybody else... that we just give up on cpu's and
start using gpu's ?

And just throw away 30 years of assembler/compiler/development environments
and software libraries ?

Than rather increase cpu performance so that all this 30 year old proven and
tested stuff can still be used and even run faster ? =D

Bye,
Skybuck.
 
S

Stephen H. Westin

Skybuck Flying said:
Are you suggesting or anybody else... that we just give up on cpu's and
start using gpu's ?

No. But you were suggesting giving up on GPU's because you thought
they were slower, and developing more slowly, than CPU's. I was giving
concrete evidence that the opposite is true: GPU's are faster, and
developing faster, so people have an incentive to go the other
direction.
And just throw away 30 years of assembler/compiler/development environments
and software libraries ?

That would be over 50 years, actually. And GPU's use compilers,
too. An alumnus from this program visited a couple of weeks ago and
showed us nVidia's IDE for their just-announced chip.
Than rather increase cpu performance so that all this 30 year old proven and
tested stuff can still be used and even run faster ? =D

Whoops, I got caught again, troller.
 
R

Ralf Hildebrandt

Skybuck said:
Are you suggesting or anybody else... that we just give up on cpu's and
start using gpu's ?

Yes - iff (if and only if) you have a problem to compute, that fits well
to a GPU.

In former times you had two major options, if you had a problem to compute:
1) you could have taken general prupose CPUs or
2) build an individual ASIC (e.g. neural network processors, systolic
arrays...)

Option 1 is a mass-product, relatively cheap and flexible (can be used,
even if your problem is solved for a different problem).

Option 2 gives you a higly optimized (and therefore faster) solution,
but that may be incapable of solving different problems.


Today GPUs are a mass-product, but they have structures, that are better
optimized for a specific problem than an CPU. So they _may_ be a
compromise between the two mentioned options.

And just throw away 30 years of assembler/compiler/development environments
and software libraries ?

Is your world only black and white?



Ralf
 
S

Skybuck Flying

Stephen H. Westin said:
No. But you were suggesting giving up on GPU's because you thought
they were slower, and developing more slowly, than CPU's. I was giving
concrete evidence that the opposite is true: GPU's are faster, and
developing faster, so people have an incentive to go the other
direction.


That would be over 50 years, actually. And GPU's use compilers,
too. An alumnus from this program visited a couple of weeks ago and
showed us nVidia's IDE for their just-announced chip.

Seems like re-inventing the wheel to me dude =D

How long is it going to take before programming for gpu's face the same
problems as cpu programmers...

out of range errors, buffer overflows, exceptions etc... complexity
management.

Seems like the same old shit all over again =D

And it's (ughhhhly) C only :( :D ;)

Where is my delphi compiler for gpu's ? :)

I would say 10% trolling and 90% I have a point ;) :p =D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top