RAMGATE in full swing. (GPU/Cuda Memory Bandwidth Performance Test now available) nice for GTX 970 t

S

Skybuck Flying

I had some hopes that maybe RAMGATE wasn't going to be so bad... but in
reality it's quite BAD.

Seems like NVIDIA's latest GTX 970 graphics chip is seriously bugged.

It has 8 processors packed in a 4x2 package, sort of.

One of those packages has one chip missing. So that chip is running at 50%
capacity.

This would normally reduce the entire system speed to just 4x instead of 8x.
Cause it would be the bottleneck.

Apperently they found someway to still use 7 out of 8 processors. So the GTX
970 is a 7x chip.

However they got a bit greedy... they also wanted that last 8x 500MB
segment.

I don't quite understand why they did not simply connect all of the 8x500 MB
ram chips to the 8 processors and 7 caches in such a way that the full
addressing range of 4 GB could be used.

I can vaguely understand that chip 4 out of 4 would be bottlenecked... but
by sending 1/7 of the workload there and the other 6/7 to the other 3x2
chips the system should be able to run somewhat decently at 7x the speed...
especially since processors normally faster than memory chips.

Anyway let's look at it one more time:

http://www.pcper.com/reviews/Graphi...Full-Memory-Structure-and-Limitations-GTX-970

I suspect they made the decision to segment the space like ms-dos once
did... to prevent queuing/sandman/sand clock effect because of a missing L2
cache chip.

But the funny thing is... the cache chip is probably not even that important
for some things at least. Then again maybe it is but still...

Now they will probably have to limit the chip to just 3.5 GB to limit access
to the 0.500 MB slow part.

Odd thing is by accessing both segments at the same time, they claim it can
still access full speed, which is ofcourse a bit cheasy... what software
would actually do that ? Probably not...

Their chip memory requests go something like 0,1,2,3,4,5,6,0,1,2,3,4,5,6,

instead of 0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7

So one DRAM chip is probably completely being skipped most of the time
whenever something is being access below 3.5 GB ?!

I find this a somewhat weird decision.

But maybe they tested it and came to the conclusion that the missing cache
chip has such a bad influence on total system performance they decided to
segment it.

Kinda funny... for my own corewars app the cache is almost useless so... I
guess they had games in mind... maybe they can re-configure the card for
either gaming or cuda.

I wonder if it's hard-wired or if it's somewhat flexible. If it's flexible
they be a little bit in luck... if it's capable in driver then they lucky
too... but if not... they got a big shit storm coming LOL.

I am touchy/feely about RAM cause it's the main bottleneck of
would-be-applications.

DON'T **** WITH RAM EVER ! =D

Well they ****ed it... now they gonna be ****ed LOL.

There solution is interesting from a processing perspective but from a
gamer/stutter perspective it's currently horrible, unless they can fix the
stutter.

If they can't fix the stutter they ****ed.

Anyway what's worse is they now have an admin behaving like a NAZI or
something and deleting postings from RAVING nvidia customers.

What's even more funny is they deleted my posting about my new RAM BANDWIDTH
test as well ?!

I kinda wanted to write a RAM BANDWIDTH test like this for a long time
now...

But seeing this horrible situation I can no longer stick my head in the
sand.

I had to write a tool to help people figure this out, inspired by Nai's
benchmark.

Some interesting discoveries have been made by me with this tool.

Apperently NVIDIA may have more dirty secrets they don't wanna anybody to
know about, hence their possible reason to deleting most of postings about
my new tool.

Anyway the discovery is: As more and more "cuda memory objects" are created
the cuda memory becomes slower and slower and slower.

I am not yet sure why this is ? Is the driver being "taxed" because of all
the objects ? Is it perhaps processing a "List of Memories" ?

I could instead allocate one gigantic block and treat it as if it were
multiple memory blocks... just to see if the multiple allocations is causing
the slow down.

Another theory could be that the higher memory addresses are being hosted on
slower and slower chips, maybe to save costs ?

However I think I can vagely remember reading about higher addresses being
slower in some pdf some long time ago ? Perhaps I am being delussional ?

Well whatever the case may be... now that RAMGATE is in full swing and we
are talking about a 7/8 of a performance drop you can bet your ass more and
more and more people are going to jump on this and investigate graphics
cards memory... closer and closer...

Apperently other nvidia cards also had some dirty tricks... with "unbalanced
memory". It wasn't as bad that time...

I also have been wondering why cheaper graphics cards always have less
bandwidth... It's easy to say well... that's just because of less chips on
it etc... but is it really ?

Is it really that expensive to give more bandwidth to GPUs ? For now I will
believe it... but hmm... Why not feed 512 bits from a single DRAM chip
across 512 pins ? Why would that be bad ? Too much energy cost for 512 pins
? Weird... I guess it needs to be processed anyway... but at least there is
some caching effect that way... now it just seems 8x64 bits or so.= 512 ?
dont think so. Or perhaps 4x64 bits, or 8x32 bits = 256 bits.

It says: "32 bit memory controller segment" so apperently all these chips
are "32 bit" addresseable. Perhaps older chips from the 32 bit addressing
era... Perhaps 64 bit addressable chips cost more... hmm...

Well anyway... I don't like ADMIN NAZIS at all... and I am most certainly
not going to put up with ANY of that CRAP what so ever EVER.

So once again I will have to rely back on usenet to spread the message =D

There was a saying once: "Don't shoot the messenger" LOL.

There's a lot of "messenger shootings" nowadays on FORUMS LOL.

Here is my Test CUDA Memory Bandwidth Performance application, with nice
gui, block size setup, round setup, graph/chart, log/error messages, kernel
source and ptx source.

The packed folder contains a winrar file containing the 3 files, 2 of them
are necessary to run the application (*.exe and *.ptx).


http://www.skybuck.org/CUDA/BandwidthTest/


My GT520 is showing approx 1.5 GigaByte/sec bandwidth with these float4 and
kernel and short running time. It should be able to achieve 9 GigaByte/Sec
so not sure why it's so low... (maybe kernel launch parameters could be
better) I am kinda curious what other graphics cards will show.

Also this a first version/release (0.03), maybe later I will update it a
little bit, so it has some better launch parameters/optimal calculation
support for newer graphics cards, for now this will have to do ! ;)

The unpacked folder contains the 3 files unpacked in case anybody is having
troubles with extracting them.

I just added a little "save chart to file" button, which saves the chart
into two files one "bitmap" and one "wmf" which is a new kind of graphics
format which is much smaller. So that basically all windows systems should
be able to read that file, you could then open it in ms-paint and re-save it
as a jpg or so.

Here is example of single run:

Here is example of multiple runs:


And finally I will convert the single one to jpg so it can be shown here:


I hope you enjoy it... maybe this little app will shine some more light on
things ;) :)

I'd be curious to seem some charts of GTX 970 and perhaps other models like
that as well to see if there is indeed some thruth to it all ?! ;) :)

Best of all check the setup tab of my app. It allows other block sizes to be
tested as well... most interesting graphs are then rendered.

(The memory: 1 GB) is rounded to whole numbers for now, so either 1 or 2 or
3 or 4 GB and so forth... no 3.5 GB or so will be mentioned... Didn't have
time yet to code that properly but it's a minor issue... just a title above
the graph... so if you do have a graphics card memory system with 0.x and
you wondering why this tool doesn't state the full correct fractional
number... now you know.

I've been up for a while... did a lot of enjoyment reading into this
issue... and this posting kinda long, mixed with text from the forum and
ofcourse this new text.

And I am kinda feeling fuzzy... and you know me... my usenet postings are
often fuzzy... I like fuzzy... don't you just like fuzzy ?! =D

EEEEEEEhhhhhummmm what more can I write for your entertainment oh yeah
that's right... the fun I had reading about this RAMGATE !!!!!!!!!!!!!!!

I JUST LIKE TO SAY ONE THING AND ONE THING ONLY:

"BIG KADOOS/APPLAUSE to the people that figured this (one) OUT ! Cause this
was one hell of a bitch to discover.. IT WAS UNDOCUMENTED maybe COVERED
UP... it was questioned... it was disbelieved by some... certainly not me...
WHEN MY FELLOW GAMERS NAG ABOUT STUTTER I BELIEVE THEM.... CAUSE THESE GUYS
OWN THE GAMING WORLD YEAH... THEY CAN SPOT A SNUTTERY SNOT SNOT SNOT STUTTER
FROM FAAAAAR AWAY THE OTHER SIDE OF THE GALAXY !!! LOLOLOLOLOLOLOLOLOL =D
just like mmmmmmmmmmmmmmmmmmmmmmmmeeeeeeeeee YEAH LOL" =D

We gamers really sensitive to "MICRO STUTTER" or any "STUTTER" for that
matter. When ya think about it, it's kinda funny that NVIDIA thought they
could hide this STUTTER ?!

It's a bit like trying to hide a SPOOK/GOON right in front of RAMBO's NOSE
LOLOLOLOLOL.

"HEY, ? HEY ? HEYYYYYY ? What's that I SMELL ?" Rambo goes.... "IS THAT A
GOON I SMELL" LOLOL.

And you know what... that guy at GURU 3D that had an ICON/avatar like CLINT
MOTHA ****ING EASTWOOD ! WAS FOKKING
BRILLIANT/HIRALIOUS/AWESOME/UNFORGETTABLE !

It was like BIG MISTER CLINT EASSTWOOD himself FROM THAT WILD WEST MOVIE
came to clear things up =D YEAH BABY =D

TITITITIT TUT TUT TUUU TITITITITI TU TU TUUUUU ! =D

And then add a bit of Dirty Hirty entering a time machine into the wild west
movie: "The Good, The Bad, And The Ugly" and then saying with a real
bad/dark deep beer/smoke/raw out of bed cracked up voice:

"HRRREEYYY NVIDRRIA ?! Wrrhartt's THRRRAT I HEEEAARRRRR? ! Yourrrrrr
rrrsselling BAD GRAPHICS CARDS I HEAR ?!"

"HHMMRRR YOU FEEEEL LUCKY PUNK ??!!!

"I know whatrrr youurrr thinkingrrr ?!"

"Willll.... or willl they not discover the memory bug issue ?!"

"I know whatrrr yourr thinking ?!"

"Will they run at full speed 8x times... or will they drop back to 1x time
and notice it ?!"



"Well whatta ya think punks ?!" "Huhh" "You think we not gonna detect it !?"

This here is a bad ass bug detector... It will blow your cover up wide open
! =D HAHAHAHAHAHAHA =D

Oh yeah... just making fun of this whole situation and especially nvidia...
sorry can't resist... this is so bad... it must be published.

Meeanwhile my outlook express editor is also behaving badly... probably
because of graphics/wmf or maybe my accidental key pressed somehow ****ed it
up.

Maybe something with shift or something

Anyway my clint eastwoord impressio was better the first few times or so...
actually I have it on video camera... when I was watching that forum with
it... cause it's some legendary... obsolutely... this is going to be stuff
of legends... and you bet I got it on video tape ! =D HAHAHAHAHAHA =D

No matter what gets deleted... it's on my video tape you get it ! =D
OOOHHHHHHH yeah =D

Gonna sit this one out and watch it play out... but maybe meanwhile I even
improve my bandwidth test program.

It just bbbbbbbb bbb bbbuuu buuu beaytifull this whole thing... hopefully
this will cause more focus on RAAAAAMMMMMMMMMMMMMMM

We sure need better RRRRRRRRRRRRAAAAAAMMMMM now and for ever at least the
coming decades !!

MORRRRRRRRRRRRRREEE RRRRAAMMMM SPEED PLS !!!!!!!!!!!!!!!!!!!!!!!!

NOT LESS !

NVIDIA get it backwards this time !

It felt like a time jump back to ms-dos time.

Not sorry I certainly don't wanna experience a time machine in that way !
NO-NO-NO=NO=NOOooooo

THAT-WOULD-BE-BAD.

EVEN BILL G. SHUDDERS at the THOUGHT of THAT... I BET ! hahahahahahahahah.

No segment/offset or segmentation bullshit EVER AGAIN !

NNNNNNNNNNNNNOoooooooo

It's programmer horror.

Just when ya thought the segmentation horror days were over... IT GETS EVEN
WORSE !!!!!!!!!!! AAAAAAAAAAAAAAHHHH

Now you don't even know it's segmented PERFORMANCE WISE ?!

AAAAAHHH

THE RAM IS THERE... BUT THE SPEEEEEEEED IS NOT ?! NO WHERE TO BE FOUND ?!!!

At certain addresses ?!!!
AHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Just the idea of that makes me
go:AAAHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Just to be clear one more time ^

Yeah this BURN IS GOOD I SAY !

BURN THIS WITCH !

AND BURN IT FAST =D

MY RECOMMENDATION FOR THE GTX 970 DESIGN: BURN IT AND NEVER EVER LOOK BACK
AT IT EVER AGAIN =D NOT AS LONG AS YOU CANNOT GARANTUEE CONSISTENT
PERFORMANCE OF THE RAM SYSTEM !

There ya have it again folks: I hate inconsistency, that's my sentence as a
software programmer.

A hardware (nintendo or was it the cell chip designer ?) programmer/designer
once said: "I consider symmetry to be an esthetic".

It's basically the same thing:

8 processors, 8 caches, 8 ram chips.

That's symmetry.

8 processors, 7 caches, 8 ram chips is asking for dissaster LOL.

And now NVIDIA is in BIG dissaster LOL =D

It was a nice try... and a fail try.

LET'S BURN IT, AND LET S BURN IT REAL GOOD.

If it was a terminator having to recover from 1 failed cache it might be
nice, but for a gaming system... NOPE.

NO STUTTERY ARNOLDS.

Break that piggy, them cents gonna be necessary for lawyers. Most probably
YES ! ;) =D


Total recall might be necessary too.

Bye,
Bye,
Skyburn(the-witch-gtx-970.)
 
S

Skybuck Flying

And Skybuck wouldn't be Skybuck, if Skybuck didn't make up for the missing
image links.

Apperently they somehow got filtered out... I was kinda hoping outlook
express would auto-convert them to plain text links instead of images but
guess not.

So I will just give you a link which you could have found yourself anyway if
you followed the other link.

But just to be clear... here there will be pictures of charts...... charts
of performance... boooeyeah:


http://www.skybuck.org/CUDA/BandwidthTest/Charts/


Yeah and if you guys don't have webspace or don't want your name attached to
pictures send me them and I host them for you.

[email protected].

Remove one dot after each symbol and you should be good to go:

(e-mail address removed)


One more time e-mail addres in cryptic mode:

skybuck
2000
at
hotmail
dot
com

Send me stuff (hopefully your charts :)) and piece out mothers ! YEEEEEAAAH
=D

Bye,
Skybuck.
 
S

Skybuck Flying

Total recall might be necessary too.

"
You're a goddamned total retard.
"

Lol no, you welcome to try and make me look like a retard, but the only
retard in this story is the person(s) that designed this stuttery RAM system
! LOLOLOLOLOLOL.

If this doesn't get you fired then I don't know what will ! ;)

Bye,
Skybuck.
 
S

Skybuck Flying

Hmmm...

Interestingly enough... I just came across a message on a thread on the
nvidia, the thread that has now been read 750.000+ times in just a couple of
days.

The message gave an explanation for the missing messages on the forum. An
explanantion that I have not heard before yet.

Apperently the nvidia forum had a funtionality, where people could file
"reports/complaints".

Just like in games.

And just like in games, apperently this functionality got abused, and people
starting hitting "reports/complaints" just to piss people off and sabotage
the forum.

If enough reports/complaints where filled then the message would be
automatically hidden.

If this is the true reason for the removal of certain messages we will never
know, but it does sound a little bit plausible.

Apperently this functionality has now been disabled to prevent further
abuse.

Let's take all this information with a grant of salt.

But I do like mentioning it, because it's the first time I heard an
explanation like that on a forum !

And I am LOLLING at those people hitting those report buttons ! LOL.

Nice strategy LOL.

Bye,
Skybuck =D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top