Question about function templates and performance

C

colin

Hi,
I profile my code and find its spending a lot of time doing implicit
conversions from similar structures.

the conversions are mainly things like this

class Point
{
implicit conversion to vector3; //this conversion just returns positon
Vector3 position;
..lots of stuff
}
struct vector3 ///part of library
{
float X,Y,Z;
}
struct lineV3
{
Vector3 A,B;
}

class Wire
{
implicit conversion to lineV3; //this conversion has to create a new
struct
Point A,B;
...lots of other stuff
}

Say I have want a function wich determines the angle between two lines or
wires,
how do I create a generic function wich avoids the implicit conversion of
wire to lineV3 ?
at the moment the function just takes lineV3 and when a wire is passed it
does the conversion
but this is slow, I could copy paste the function to have one for both but
how do I make it generic ? (without adding much code)

I could add an interface to lineV3 and Wire, but then id have to access the
vector3 via properties,
Am i right in thinking this would make it slower or would it get optimised
out if its just simply returns an element?


Colin =^.^=
 
P

Peter Duniho

I profile my code and find its spending a lot of time doing implicit
conversions from similar structures.

Really? What is "a lot of time"?
[...]
Say I have want a function wich determines the angle between two lines
or wires,
how do I create a generic function wich avoids the implicit conversion of
wire to lineV3 ?

I'm skeptical that the data conversion would consume a significant portion
of the time it takes to calculate that angle (I'm assuming it's doing the
usual dot-product, divide by product of vector magnitude, inverse cos).
What percentage are we really talking about here?
at the moment the function just takes lineV3 and when a wire is passed it
does the conversion
but this is slow, I could copy paste the function to have one for both
but
how do I make it generic ? (without adding much code)

It's difficult to answer the question without an actual example of the
code you're using now.

But, if you can come up with a way to represent the commonality between
the classes that you want to have share the same code, you could put that
commonality into a base class from which the using classes are derived,
and then write a generic method that constrains the type parameter to the
base class.

If you can't do this (for example, maybe the numerical types aren't
actually the same) then short of having a single method that checks the
type of the passed in data and then does the appropriate thing according
to the type, I think you're out of luck. The C# generic stuff doesn't
provide for generic application of arithmetic operators.

Again, with a complete example, it'd be easier to provide better advice.
I could add an interface to lineV3 and Wire, but then id have to access
the
vector3 via properties,
Am i right in thinking this would make it slower or would it get
optimised
out if its just simply returns an element?

Properties do in fact get inlined if they are simple enough. It's not
clear from your question that you could use properties to do this, but if
you can get around the issue with properties, that might be one way to
address the problem.

Pete
 
B

Bram

Hi Colin,

It's possible to create implicit casts like this:

public struct Vector3 {
public double X, Y, Z;
}

public struct Point {
public double X, Y;

public static implicit operator Vector3(Point p) {
return new Vector3() {
X = p.X, // (C# 3.0 syntax)
Y = p.Y
};
}
}

Not that this is not particularly fast, because a new object is
created for each conversion. Furthermore, using implicit operators can
be tricky because they allow you to use Points wherever you can use
Vector3s.

Regards,
Bram Fokke
 
C

colin

Peter Duniho said:
Really? What is "a lot of time"?

the implicit conversion was at the top of the list of the profiler !
the real hog is LineIntersectLine routine wich was just beneath that.
the xna Vector3.subtract operation was about the same amount of time.
[...]
Say I have want a function wich determines the angle between two lines
or wires,
how do I create a generic function wich avoids the implicit conversion of
wire to lineV3 ?

I'm skeptical that the data conversion would consume a significant portion
of the time it takes to calculate that angle (I'm assuming it's doing the
usual dot-product, divide by product of vector magnitude, inverse cos).
What percentage are we really talking about here?

well the data conversion is used all over the place,
so although it may be relativly quick compared to dot product etc, its
called so often ...
actually to be exact the function returns the cos of the angle so
inverse cos isnt used, so its just a handfull of floating point ops.
It's difficult to answer the question without an actual example of the
code you're using now.

But, if you can come up with a way to represent the commonality between
the classes that you want to have share the same code, you could put that
commonality into a base class from which the using classes are derived,
and then write a generic method that constrains the type parameter to the
base class.

If you can't do this (for example, maybe the numerical types aren't
actually the same) then short of having a single method that checks the
type of the passed in data and then does the appropriate thing according
to the type, I think you're out of luck. The C# generic stuff doesn't
provide for generic application of arithmetic operators.

Again, with a complete example, it'd be easier to provide better advice.


well some of the data items are structs not classes so I cant use a base
class unfortunatly,
If I use classes instead of structs then im adding a whole load of calls to
new.
also one of the data items is a library struct.
Properties do in fact get inlined if they are simple enough. It's not
clear from your question that you could use properties to do this, but if
you can get around the issue with properties, that might be one way to
address the problem.

if this is so I might wel try it. as I cant use base class in a struct but I
can use an interface,
but then I cant have variables in the interface so il have to use properties
....
the other option is to store the other type as duplicate information
but the problems of making sure it is kept upto date maybe too much.

for now ive just copied and pasted lots of functions and changed the types
lol....
//first just use a wrapper and convert it to use individual points rather
than line
public static Vector3 ClosestPointOnLineSegment(Vector3 point, LineVector3
line)
{
return ClosestPointOnLineSegment(point, line.A, line.B);
}
public static Vector3 ClosestPointOnLineSegment(Vector3 point, Line3d line)
{
return ClosestPointOnLineSegment(point, line.A.Position,
line.B.Position);
}
public static Vector3 ClosestPointOnLineSegment(Vector3 point, Vector3 A,
Vector3 B)
{
Vector3 result;
Vector3 lineVector = B - A;
float length = lineVector.Length();
Vector3 lineNormal = lineVector / length;
float distance = Vector3.Dot(lineNormal, point - A);
if (distance < 0)
result = A;
else
{
if (distance > length)
result=B;
else
result = A + lineNormal * distance;
}
return result;
}

// this is smaller so just cut and paste ...
public static bool PointOnLineSegment(Vector3 point, LineVector3 line)
{
Vector3 pointOnLine = ClosestPointOnLineSegment(point, line);
float err = Vector3.Distance(point, pointOnLine);
profile.FunctionLeave();
if (err > Math3d.MinDistance)
return false;
return true;
}
public static bool PointOnLineSegment(Vector3 point, Line3d line)
{
Vector3 pointOnLine = ClosestPointOnLineSegment(point, line);
float err = Vector3.Distance(point, pointOnLine);
profile.FunctionLeave();
if (err > Math3d.MinDistance)
return false;
return true;
}

this shaves about 10 seconds off the exectution time.
admitdely this is only about 5% percent,
but theres lots of other places where a similar conversion takes place,
and ofc lots of other data types but im just atacking anything at the top of
the profile list atm ...

is there anyway to control or even examine if functions are expanded inline
?

Colin =^.^=
 
P

Peter Duniho

the implicit conversion was at the top of the list of the profiler !
the real hog is LineIntersectLine routine wich was just beneath that.
the xna Vector3.subtract operation was about the same amount of time.

But what's the total time spent in the conversion versus the total time
recorded by the profiler?

If you have a method that is used by a wide variety of other methods, it's
not uncommon for it to show up at the top of a sorted list according to
time in that method. But that doesn't mean the method represents a
significant amount of your actual processing time. If it's used ten
different places and only takes up 10% of the time spent, its time could
still be larger than the time for any one of the users of the code. But
eliminating the cost of conversion entirely would still only improve your
performance by 10%.
[...]
this shaves about 10 seconds off the exectution time.
admitdely this is only about 5% percent,
but theres lots of other places where a similar conversion takes place,
and ofc lots of other data types but im just atacking anything at the
top of
the profile list atm ...

But be sure that you are using the profile information correctly. Just
because one method shows up at the top of the list, that doesn't mean
that's the method you need to fix, or that doing so will result in any
dramatic improvement in performance. It could be that for a really
significant performance improvement, you simply need to visit multiple
areas of the code and take out the fat in each area.

If the top item is only 10% of the total time, then your best improvement
even eliminating it altogether is only going to be 10%. But if you've got
10 other places representing 90% of your time cost, where you can reduce
the cost by half in each place, you can improve your total performance by
45%. Which one would you think is more important to focus on in that case?
is there anyway to control or even examine if functions are expanded
inline?

You can use the ildasm.exe tool to inspect your code after it's compiled.
There's an attribute that can prevent inlining, but I'm not aware of any
to force it. However, my understanding is that the rules for inlining
pretty much guarantee that a simple one-line return or assignment property
getter or setter will be inlined.

Pete
 
C

colin

Peter Duniho said:
But what's the total time spent in the conversion versus the total time
recorded by the profiler?

the time spent in the implicit conversion routine was about 5-10% of the
total process time
this varied becuase some of the other external libraries seem to
be doing something strange sometimes.

Im using AMD codeanalyst, as it seems to give me what I want quickly,
ive tried many of the other (free) profilers but just couldnt work
out how to get any sense out of them quickly.

it uses a timer/snapshot, so when it reports 400+ hits out of 3000 hits
it means its spending quite a bit of time actually in that function,
but doesnt register time spent in sub functions for that function.
If you have a method that is used by a wide variety of other methods, it's
not uncommon for it to show up at the top of a sorted list according to
time in that method. But that doesn't mean the method represents a
significant amount of your actual processing time. If it's used ten
different places and only takes up 10% of the time spent, its time could
still be larger than the time for any one of the users of the code. But
eliminating the cost of conversion entirely would still only improve your
performance by 10%.
[...]
this shaves about 10 seconds off the exectution time.
admitdely this is only about 5% percent,
but theres lots of other places where a similar conversion takes place,
and ofc lots of other data types but im just atacking anything at the
top of
the profile list atm ...

But be sure that you are using the profile information correctly. Just
because one method shows up at the top of the list, that doesn't mean
that's the method you need to fix, or that doing so will result in any
dramatic improvement in performance. It could be that for a really
significant performance improvement, you simply need to visit multiple
areas of the code and take out the fat in each area.

If the top item is only 10% of the total time, then your best improvement
even eliminating it altogether is only going to be 10%. But if you've got
10 other places representing 90% of your time cost, where you can reduce
the cost by half in each place, you can improve your total performance by
45%. Which one would you think is more important to focus on in that
case?

yep its 10% not much but hey you got to start somewhere !
might as well start at the top of the list and work down,
and if its just a case of avoiiding/deleting code rather than spending hours
trying to optimise an algoithm ...

it takes upto 3 minutes for a run wich is a long time realy.
this is before it displays aything on the screen,
thats not even on a particularly large model data set.

I KNOW theres a lot of room for improvment cos I didnt pay any attention to
performance untill I was fairly sure the algorithms actually worked in
principle.

the item on the top of the list now is LineIntersectLine wich is
quite mathmatically complex, still thats only 10%,
I cant realy reduce the execution time of that function I dont think,
but I did previously reduced the number of calls by doing a simple bounds
check first,
so instead of it being called 40million times
its now only called 10 million times.
(I actually put a counter in a few functions)

incidently the XNA Vector3 library functions
multiply,subtract,normalise,length,distance
together take up the majority of cpu cycles.

the total XNA library takes up 30% of the cpu,
my code takes up about 30% too.
another 30% is taken up by mscorwks wich I dont know what its doing there
as the symbols dont seem to work well ... but i think its todo with
memory management


ive gradually reduced the total time by a huge factor so far.
the algorithms for finding the intersection of two polyhedrons is
the most time consuming, within that there are many other algorithms
such as for polygon-polygon intersection etc...
You can use the ildasm.exe tool to inspect your code after it's compiled.
There's an attribute that can prevent inlining, but I'm not aware of any
to force it. However, my understanding is that the rules for inlining
pretty much guarantee that a simple one-line return or assignment property
getter or setter will be inlined.

il look for that ..
thanks
Colin =^.^=
 
C

colin

You can use the ildasm.exe tool to inspect your code after it's compiled.
There's an attribute that can prevent inlining, but I'm not aware of any
to force it. However, my understanding is that the rules for inlining
pretty much guarantee that a simple one-line return or assignment property
getter or setter will be inlined.

Ive run this and seen the assembler for my modules of interest,
however I notice it seems to be CLI assembly ,wich is a bit like P code
rather than x86 assembly am I right ?

I havnt got the hang of this yet so its hard to tel how wel its optimised.

Ive adjusted a few algorithms wich were sloppy with regard to efficiency,
it was hard work to get them to behave in the first place
so I intentionaly ignored performance till I at least got something that
displayed correctly

I now have one function wich predominates wich finds the closest polygon to
a point
wich is a pre requisite for determining if a point is inside a model.

if anyones interested in my solution its further down the post ...

I notice the ildasm dissasembly shows a call to my function
GetDistanceFromPlane,
this is a very simple function wich just calls Vector3.Dot and adds a number
to it.

does the ildasm listing reflect the actual aseembly code that will be
executed or
is there further optimisation wich will make these simple functions inline,
and is there anyway to tell ? is it possible to pre optimise some code with
an external tool ?

I also pass structs without ref but i am kinda hoping the optimizer will do
this for me
where possible but its hard to tell ...

I might well try expanding the functions to do the dot product and addition
directly, dot product is only the sum of the product.

~~~~~~~~~~~~~~~~~~~

this finds the distance to the closest point on the surface,
it also returns a 2nd distances so for 2 surfaces wich are equal distance
these 2nd distances are compared,
and if those 2 distances are the same also the distance from the plane is
compared
the further it is from the plane the more the surface is facing the point.
some of the necesary computations wich are constant for each surface are
cached in
the list 'Edges'

I intend to change this to find the distance to a wire first,
then do this for each surface on that wire rather than do this for every
surface.

public Vector4 Distance(Vector3 p, bool dumpit)
{
float distance1=Math3d.DistanceFromPlane(plane, p);
float minDistance2 = float.MaxValue;
Vector4 result;
result.W = distance1;
result.X =result.Y =result.Z = Math.Abs(distance1);
foreach (SurfEdge edge in Edges)
{
float distance2 = Math3d.DistanceFromPlane(edge.plane, p);
if (distance2 > Math3d.MinDistance)
{
float distance3 = Vector3.Dot(edge.line.Direction, p) +
edge.line.Distance;
if (distance3 > 0)
{
if (distance3 < edge.line.Length)
distance3 = 0;
else
distance3 -= edge.line.Length;
}
float distancex = (float)Math.Sqrt((distance1 * distance1) +
(distance2 * distance2) + (distance3 * distance3));
float diff_x = distancex - minDistance2;
if (diff_x> Math3d.MinDistance)
continue;
float distancey = result.Z;// (float)Math.Sqrt((distance1 *
distance1) + (distance2 * distance2));
if (diff_x > -Math3d.MinDistance)
{
distancey = (float)Math.Sqrt((distance1 * distance1) +
(distance2 * distance2));
float diff_y = distancey - result.Y;
if (diff_x - diff_y > 0)
continue;
}
result.X = minDistance2 = distancex;
result.Y = distancey;
}
}
return result;
}

thanks
Colin =^.^=
 
P

Peter Duniho

[...]
I notice the ildasm dissasembly shows a call to my function
GetDistanceFromPlane,
this is a very simple function wich just calls Vector3.Dot and adds a
number
to it.

If it has to call another function that can't be inlined, I suppose it's
possible the C# compiler figures there's no point in inline the calling
function. I'm not sure.
does the ildasm listing reflect the actual aseembly code that will be
executed or
is there further optimisation wich will make these simple functions
inline,
and is there anyway to tell ? is it possible to pre optimise some code
with
an external tool ?

The listing from ildasm.exe definitely is not the actual assembly code
that will be executed. As you said, it's the .NET IL (intermediate
language). When you run your program, this is compiled finally to native
machine code by the just-in-time compiler, and yes the JIT compiler does
its own optimizations.

I was under the impression that inlining was done at the IL level, but I
think I could be wrong about that. This should be easy enough to test:
create a simple property that you know should be inlined and compile that
and see what comes out in the IL. I did a little bit of searching on the
MSDN site, and found at least one forum post where someone said that
inlining is only done by the JIT.

So, I guess I was wrong...you probably can't tell whether something is
inlined by using ildasm.exe. The only way I know of to look at the actual
machine instructions the JIT generates is to debug at the assembly level
in the debugger when your application is running. Maybe you can use that
to verify inlining is occurring?
I also pass structs without ref but i am kinda hoping the optimizer will
do
this for me
where possible but its hard to tell ...

I am practically certain that the compiler will _never_ generate calls
that pass variables by reference unless you explicit use the "ref" or
"out" keywords. I would be very nervous about using a compiler that did.

Pete
 
C

colin

Peter Duniho said:
[...]
I notice the ildasm dissasembly shows a call to my function
GetDistanceFromPlane,
this is a very simple function wich just calls Vector3.Dot and adds a
number
to it.

If it has to call another function that can't be inlined, I suppose it's
possible the C# compiler figures there's no point in inline the calling
function. I'm not sure.
does the ildasm listing reflect the actual aseembly code that will be
executed or
is there further optimisation wich will make these simple functions
inline,
and is there anyway to tell ? is it possible to pre optimise some code
with
an external tool ?

The listing from ildasm.exe definitely is not the actual assembly code
that will be executed. As you said, it's the .NET IL (intermediate
language). When you run your program, this is compiled finally to native
machine code by the just-in-time compiler, and yes the JIT compiler does
its own optimizations.

I was under the impression that inlining was done at the IL level, but I
think I could be wrong about that. This should be easy enough to test:
create a simple property that you know should be inlined and compile that
and see what comes out in the IL. I did a little bit of searching on the
MSDN site, and found at least one forum post where someone said that
inlining is only done by the JIT.

So, I guess I was wrong...you probably can't tell whether something is
inlined by using ildasm.exe. The only way I know of to look at the actual
machine instructions the JIT generates is to debug at the assembly level
in the debugger when your application is running. Maybe you can use that
to verify inlining is occurring?
I also pass structs without ref but i am kinda hoping the optimizer will
do
this for me
where possible but its hard to tell ...

I am practically certain that the compiler will _never_ generate calls
that pass variables by reference unless you explicit use the "ref" or
"out" keywords. I would be very nervous about using a compiler that did.

erm yes actually it would have to be very carefull to ensure there was no
chance of it being modified
by the called function, not only that but also to make sure there was no
posibility of it being modified
via a reference somewhere else, but I think some C compilers are capable of
determining this,
maybe c# has issues with this ...

well I noticed it was spending 10% of module time in the implicit operator
wich just simply returned
an element, however it was a struct, so I took that out and labouriously
entered in the element name
directly.

Im begining to miss c++ style macros even though I think they can make a big
mess.

since I made the first post ive reduced the execution to about 10% of what
it was.
the top 2 items at the top of the list each acount for 10% of my code
but only about 2% overall.

im not sure what to tackle next, but it only takes 30 seconds now as opposed
to 300 seconds
wich means at least its a lot quicker to do a modify/debug cycle ...

I might even expand out the vector3.dot function if can identify somewhere
its being called the most.
if only there was a way to force inlining 'sigh'

thanks for you help :)
Colin =^.^=
 
P

Peter Duniho

[...]
if only there was a way to force inlining 'sigh'

I wouldn't worry about that if I were you. In general, I would expect the
JIT compiler to do a good job identifying scenarios where inlining will
actually help performance. If it's helpful to inline, it's already doing
it for you.

You need to keep in mind that the cost of a function call is not the only
thing that affects performance. Inlining code makes the code bigger,
causing cache issues, and it also creates duplicates of the same code,
making it harder for the CPU to take advantage of pipelining and branch
prediction. Since the CPU (x86 especially) is highly dependent on these
features to get best performance, it's very important to not defeat those
optimizations in the pursuit of a different one.

In other words, think twice before you try to second-guess the compiler.
And if after thinking twice, you still want to do it, think again.

The folks who wrote the JIT compiler aren't as dumb as some people seem to
think they are. Even with the best compilers, _someone_ can find an
optimization the compiler gets wrong. But the number of people in the
world who are capable of finding that optimization are far and few
between. I don't know you well enough to know if you're one of those
people, but from a purely statistical point of view, chances are
practically nil that you are. :)

Pete
 
C

colin

Peter Duniho said:
[...]
if only there was a way to force inlining 'sigh'

I wouldn't worry about that if I were you. In general, I would expect the
JIT compiler to do a good job identifying scenarios where inlining will
actually help performance. If it's helpful to inline, it's already doing
it for you.

You need to keep in mind that the cost of a function call is not the only
thing that affects performance. Inlining code makes the code bigger,
causing cache issues, and it also creates duplicates of the same code,
making it harder for the CPU to take advantage of pipelining and branch
prediction. Since the CPU (x86 especially) is highly dependent on these
features to get best performance, it's very important to not defeat those
optimizations in the pursuit of a different one.

In other words, think twice before you try to second-guess the compiler.
And if after thinking twice, you still want to do it, think again.

wouldnt that be third guesing ?
The folks who wrote the JIT compiler aren't as dumb as some people seem to
think they are. Even with the best compilers, _someone_ can find an
optimization the compiler gets wrong. But the number of people in the
world who are capable of finding that optimization are far and few
between. I don't know you well enough to know if you're one of those
people, but from a purely statistical point of view, chances are
practically nil that you are. :)

yeah I appreciate that, I find compilers are a heck of a lot more robust
than they were
1 or 2 decades ago,
I used to break them a lot way back then, but cant remember the last time I
did recently,
although ive found less than optimum code on more than a few occasions,
partucularly on microcontrollers with no hardware divide,
such as not always recognizing divide by constant 2 is a simple shift to the
right.

the problem here comes when a function is called in excess of 100 million
times,
yet may only be called often by a handfull of places in the code.
such a function compares two 3d points, to test if they are closer than
rounding errors.

this comes about by having to compare every object with every other object,
obviously there are ways to segregate them so fewer comparisons need to be
done but
this itself is quite diffucult, and I think I have persued that as far as
possible.

its a just question of being able to determine places that give a good
performance gain for effort spent optimising.

Colin =^.^=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top