While we are discussing the new operator, different question about it...

Stephan Rose · Jul 7, 2006

I am currently working on an EDA app and heavily working on squeezing
the last bits of performance out of it. Going as far as sending
batches of geometry to the video card while still processing geometry
to get some parallization going. Though at least on my hardware, this
does not buy me much. I luv my two 7800 GTs in SLI =) But for users
with a lower spec video card, this may actually be of help.

I am also going ahead and running two render threads each processing
half the geometry to make use of hyperthreading or dual core if
available. Bought me a few ms rendering speedup!

The next thing I did instead of using a generic List<Vertex> to store
my created triangles in, I use a static Vertex[] list instead. Once it
fills up, data is committed to the video hardware and the next batch
is processed. I was halfway expecting a speedup here already since the
overhead of calling .Add for the List is now removed but it actually
did not make any significant measurable difference.

So to finally get to my question of the new operator, the next thing I
am looking at is how I am assigning data to my vertex list.

Currently this looks as follows:

polys[currentPoly++] = new Vertex(...);
polys[currentPoly++] = new Vertex(...);
polys[currentPoly++] = new Vertex(...);

and so on...there are quite a few cases where I have multiple lines of
assignments like that, generally always in sets of 3. Triangles are
just wierd that way =)

Now out of all my drawing function, the one that gets called the most
number of times is my function to render a triangulated line with
round caps. So I took this function apart and did the following:

polys[currentPoly].x = coordinate;
polys[currentPoly].y = coordinate;
polys[currentPoly].remaining parameters = values...;
currentPoly++;

polys[currentPoly].x = coordinate;
polys[currentPoly].y = coordinate;
polys[currentPoly].remaining parameters = values...;
currentPoly++;

polys[currentPoly].x = coordinate;
polys[currentPoly].y = coordinate;
polys[currentPoly].remaining parameters = values...;
currentPoly++;

Repeat as necessary for all the assignments. I was expecting that
eliminating the new operator and subsequent copy of the vertex
structure would give me a speed up if I assign the parameters
directly.

I was rather surprised, both pleasantly and not to see it made no
difference. It's nice because the new Vertex() way is more readable
code-wise. But...I would have really liked to have seen a performance
improvement.

So what exatly does the new operator do in this case? Does the
compiler somehow optimize the new operator away and generate code to
assign the values manually like I tried to avoid creating a struct and
copying it? Technically it is a possibility since it is assigning
identical value types to each other...so it does know what's
ultimately going to happen.

Just curious...

And damnit..I now need to find something else to do to get more
speed!! =)

--
Stephan
2003 Yamaha R6

kimi no koto omoidasu hi
nante nai no wa
kimi no koto wasureta toki ga nai kara

Guest · Jul 7, 2006

Very nice. Now, kindly post me a Mandelbrot vertex shader so I can make a
nice Fractal screensaver with DirectX 9c.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com

Stephan Rose said:
I am currently working on an EDA app and heavily working on squeezing
the last bits of performance out of it. Going as far as sending
batches of geometry to the video card while still processing geometry
to get some parallization going. Though at least on my hardware, this
does not buy me much. I luv my two 7800 GTs in SLI =) But for users
with a lower spec video card, this may actually be of help.

I am also going ahead and running two render threads each processing
half the geometry to make use of hyperthreading or dual core if
available. Bought me a few ms rendering speedup!

The next thing I did instead of using a generic List<Vertex> to store
my created triangles in, I use a static Vertex[] list instead. Once it
fills up, data is committed to the video hardware and the next batch
is processed. I was halfway expecting a speedup here already since the
overhead of calling .Add for the List is now removed but it actually
did not make any significant measurable difference.

So to finally get to my question of the new operator, the next thing I
am looking at is how I am assigning data to my vertex list.

Currently this looks as follows:

polys[currentPoly++] = new Vertex(...);
polys[currentPoly++] = new Vertex(...);
polys[currentPoly++] = new Vertex(...);

and so on...there are quite a few cases where I have multiple lines of
assignments like that, generally always in sets of 3. Triangles are
just wierd that way =)

Now out of all my drawing function, the one that gets called the most
number of times is my function to render a triangulated line with
round caps. So I took this function apart and did the following:

polys[currentPoly].x = coordinate;
polys[currentPoly].y = coordinate;
polys[currentPoly].remaining parameters = values...;
currentPoly++;

polys[currentPoly].x = coordinate;
polys[currentPoly].y = coordinate;
polys[currentPoly].remaining parameters = values...;
currentPoly++;

polys[currentPoly].x = coordinate;
polys[currentPoly].y = coordinate;
polys[currentPoly].remaining parameters = values...;
currentPoly++;

Repeat as necessary for all the assignments. I was expecting that
eliminating the new operator and subsequent copy of the vertex
structure would give me a speed up if I assign the parameters
directly.

I was rather surprised, both pleasantly and not to see it made no
difference. It's nice because the new Vertex() way is more readable
code-wise. But...I would have really liked to have seen a performance
improvement.

So what exatly does the new operator do in this case? Does the
compiler somehow optimize the new operator away and generate code to
assign the values manually like I tried to avoid creating a struct and
copying it? Technically it is a possibility since it is assigning
identical value types to each other...so it does know what's
ultimately going to happen.

Just curious...

And damnit..I now need to find something else to do to get more
speed!! =)

--
Stephan
2003 Yamaha R6

kimi no koto omoidasu hi
nante nai no wa
kimi no koto wasureta toki ga nai kara

Jon Shemitz · Jul 7, 2006

Stephan said:
So what exatly does the new operator do in this case? Does the
compiler somehow optimize the new operator away

Reflector is your friend - <http://www.aisto.com/roeder/dotnet>. The
smart thing to do is to look at the generated code and see for
yourself, instead of asking questions and waiting for replies.

In this case, yes, calling new on a struct constructs in place. That
is, it does NOT create a temporary struct, initialize it, and then
copy it to the struct you're assigning. Rather, code like

Point P = new Point(1, 2);

allocates space for P, and passes a pointer to P to the Point
constructor, as the this reference. The constructor then sets this.X
and and this.Y in the normal way.

Greg Young · Jul 7, 2006

How are you measuring that it made "no difference"

Cheers,

Greg

Stephan Rose said:
I am currently working on an EDA app and heavily working on squeezing
the last bits of performance out of it. Going as far as sending
batches of geometry to the video card while still processing geometry
to get some parallization going. Though at least on my hardware, this
does not buy me much. I luv my two 7800 GTs in SLI =) But for users
with a lower spec video card, this may actually be of help.

I am also going ahead and running two render threads each processing
half the geometry to make use of hyperthreading or dual core if
available. Bought me a few ms rendering speedup!

The next thing I did instead of using a generic List<Vertex> to store
my created triangles in, I use a static Vertex[] list instead. Once it
fills up, data is committed to the video hardware and the next batch
is processed. I was halfway expecting a speedup here already since the
overhead of calling .Add for the List is now removed but it actually
did not make any significant measurable difference.

So to finally get to my question of the new operator, the next thing I
am looking at is how I am assigning data to my vertex list.

Currently this looks as follows:

polys[currentPoly++] = new Vertex(...);
polys[currentPoly++] = new Vertex(...);
polys[currentPoly++] = new Vertex(...);

and so on...there are quite a few cases where I have multiple lines of
assignments like that, generally always in sets of 3. Triangles are
just wierd that way =)

Now out of all my drawing function, the one that gets called the most
number of times is my function to render a triangulated line with
round caps. So I took this function apart and did the following:

polys[currentPoly].x = coordinate;
polys[currentPoly].y = coordinate;
polys[currentPoly].remaining parameters = values...;
currentPoly++;

polys[currentPoly].x = coordinate;
polys[currentPoly].y = coordinate;
polys[currentPoly].remaining parameters = values...;
currentPoly++;

polys[currentPoly].x = coordinate;
polys[currentPoly].y = coordinate;
polys[currentPoly].remaining parameters = values...;
currentPoly++;

Repeat as necessary for all the assignments. I was expecting that
eliminating the new operator and subsequent copy of the vertex
structure would give me a speed up if I assign the parameters
directly.

I was rather surprised, both pleasantly and not to see it made no
difference. It's nice because the new Vertex() way is more readable
code-wise. But...I would have really liked to have seen a performance
improvement.

So what exatly does the new operator do in this case? Does the
compiler somehow optimize the new operator away and generate code to
assign the values manually like I tried to avoid creating a struct and
copying it? Technically it is a possibility since it is assigning
identical value types to each other...so it does know what's
ultimately going to happen.

Just curious...

And damnit..I now need to find something else to do to get more
speed!! =)

--
Stephan
2003 Yamaha R6

kimi no koto omoidasu hi
nante nai no wa
kimi no koto wasureta toki ga nai kara

Stephan Rose · Jul 7, 2006

How are you measuring that it made "no difference"

The average measured amount of time (using GetPerformanceCounter) it
took to render the identical data was no different than before. =)

--
Stephan
2003 Yamaha R6

kimi no koto omoidasu hi
nante nai no wa
kimi no koto wasureta toki ga nai kara

Greg Young · Jul 7, 2006

average measured for 1 run, 5000, 1000?

Have you looked at the optimized JIT output yet?

Cheers,

Greg

Stephan Rose · Jul 7, 2006

average measured for 1 run, 5000, 1000?

Just a couple runs. I am not really concerned at this moment about
speed increases in the sub 10% range where I need lots of timing
accuracy to note any improvements.

I was more concerned with improvements like I managed to do today such
as reducing a rough 120ms average down to a rougly 70ms average.
Managed to find a spot to optimize in my polygon triangulator. At that
point in time, I know I made a significant improvement with a gap in
time that large.

Have you looked at the optimized JIT output yet?

No I haven't. I actually had been meaning to ask that one of these
days, where can I look at the JIT output?

--
Stephan
2003 Yamaha R6

kimi no koto omoidasu hi
nante nai no wa
kimi no koto wasureta toki ga nai kara

Greg Young · Jul 7, 2006

the new operator is one of those places. It will not be a huge performance
gain but it should offer a slight performance gain.

Cheers,

Greg

While we are discussing the new operator, different question about it...

Stephan Rose

Guest

Jon Shemitz

Greg Young

Stephan Rose

Greg Young

Stephan Rose

Greg Young