Hi!
You have not commented if the alternative workaround is fail safe:
| I would implement another Vec1Container that would have the finalizer
| and a pointer to Vec1 and Vector1 would only point to Vec1Container.
| When struct Vector1 would be released from the stack, the finalizer
| will get its chance..
|
| I was thinking that struct's are allocated on the stack and not on the
heap
| and the stack is not cleared up until the expression has been evaluted.
| This is not the same as for objects, where only the reference is stored
| on the stack...
But your finalizer does not release unmanaged memory, it moifies an shared
object instance , and this is not what you should do i a finalizer.
In my full source code it does..
This is not true, I suggest you read something about how the GC is
working.
Short living objects aren't out-living the Gen0 part of the GC heap
(provided they don't have finalizers), the CLR tries to keep the size of
Gen0 more or less the same as the size of the L2 cache of the CPU the CLR
runs on (but there are othe heuristics, like high allocation frequency
which
makes that the Gen0 grows beyond that size). Of course finalizable objects
(like yours) will survive the next GC collection and will end on Gen1,
this
way you are disturbing the optimum circumstances for the GC to do it's
job.
Anyway, the Gen0 is always empty after a GC run and follows the Gen1 and
Gen2 parts in the heap, that means that the Gen0 is always at the start of
a
large piece of contigious free memory, new object allocations are adjacent
in Gen0 and are in sequence of their allocation time, so they never span a
large area.
In theory
. I did my tests. I ran the math function with different vector
lenghts: 10,100,1000,10000 etc.. and timed them.
If CPU cache would have been used, then the speed at 10..10000 elements
would be up to 6x higher as with 100000 elements long vectors
which can not fit in the cache. But guess what... There was no dip,
absolutely
no gain. The curve decayed exponentially towards longer vectors...
You can check that exponential decaying curve yourself. The source is
attached below...
And with my "weird" design the performance gain over the garbage
collected vectors was 400% for lengths between 500 and 3000.
So the GC and CPU cache pairing in this case gives 0.0.
| So, the idea is to allocate unmanaged memory and have that memory
| preallocated
| in a pool. Each time you allocate an object, it simply obtains a pointer
| from the pool. Each time you free an object (via finalizer), the pointer
| is released to the pool.
|
Not sure what you are talking about, managed objects cannot be allocated
in
unmanaged memory, they always end in the GC heap.
Unless you use the Marsall.AllocHGlobal.
| I was running some tests and due to tighter memory pool, the
| performance gains of vectorized against plain function
| for the function posted are around 5x and
| that despite the fact that I call GC.Collect inside the
| constructor every now and then...
|
| The proper solution to this problem however would be to introduce
| proper reference counted classes that have their destructors called
| by the IL immediately after the function ends if their reference count
| is 0. But this is only something that Microsoft can do...
| .
By the IL? You must be kidding, right?
If the SafeHandle class would have its Dispose method called
when its compiler managed reference count would fall to zero,
that would merge the best of both worlds.
Delphi implements a reference counter and when the
procedures exits, it checks if the reference count for local variables
is zero and frees them...
That would be piece of cake to implement for SafeHandle in IL.
Regards!
Atmapuri
public class Vec1
{
private int fLength;
public double[] Data;
public Vec1()
{
Length = 100;
}
public int Length
{
get
{
return this.fLength;
}
set
{
Data = new double[value];
fLength = value;
}
}
}
public class Vector1
{
public Vec1 Data;
static private int VecIterCount = 0;
static private int GCInterval = 200;
static private int GCGeneration = 0;
protected virtual void Dispose(bool disposing)
{
if (Data != null)
{
Data.Length = 0;
}
}
public Vector1(int aLength)
{
Data = null;
Data = new Vec1();
Data.Length = aLength;
}
public Vector1(Vec1 Src)//: base(IntPtr.Zero, true)
{
Data = null;
Data = new Vec1();
Data.Length = Src.Length;
}
public static implicit operator Vec1 (Vector1 AValue)
{
return AValue.Data;
}
public int Length
{
get
{
return this.Data.Length;
}
set
{
this.Data.Length = value;
}
}
public static Vector1 operator *(Vector1 Left, double Right)
{
Vector1 vector = new Vector1(Left.Length);
double[] SrcLeft, Dst;
SrcLeft = Left.Data.Data;
Dst = vector.Data.Data;
int Len = vector.Length;
for (int i = 0; i < Len; i++)
{
Dst
= SrcLeft * Right;
}
return vector;
}
public static Vector1 operator *(double Left, Vector1 Right)
{
Vector1 vector = new Vector1(Right.Length);
double[] SrcRight, Dst;
SrcRight = Right.Data.Data;
Dst = vector.Data.Data;
int Len = vector.Length;
for (int i = 0; i < Len; i++)
{
Dst = Left * SrcRight;
}
return vector;
}
public static Vector1 operator *(Vector1 Left, Vec1 Right)
{
Vector1 vector = new Vector1(Left.Length);
double[] SrcLeft, SrcRight, Dst;
SrcLeft = Left.Data.Data;
SrcRight = Right.Data;
Dst = vector.Data.Data;
int Len = vector.Length;
for (int i = 0; i < Len; i++)
{
Dst = SrcLeft * SrcRight;
}
return vector;
}
public static Vector1 operator *(Vec1 Left, Vector1 Right)
{
Vector1 vector = new Vector1(Left.Length);
double[] SrcLeft, SrcRight, Dst;
SrcLeft = Left.Data;
SrcRight = Right.Data.Data;
Dst = vector.Data.Data;
int Len = vector.Length;
for (int i = 0; i < Len; i++)
{
Dst = SrcLeft * SrcRight;
}
return vector;
}
public static Vector1 operator *(Vector1 Left, Vector1 Right)
{
Vector1 vector = new Vector1(0);
vector.Length = Left.Length;
double[] SrcLeft, SrcRight, Dst;
SrcLeft = Left.Data.Data;
SrcRight = Right.Data.Data;
Dst = vector.Data.Data;
int Len = vector.Length;
for (int i = 0; i < Len; i++)
{
Dst = SrcLeft * SrcRight;
}
return vector;
}
}
public class Vec1Math
{
public static Vector1 Exp(Vec1 x)
{
Vector1 vector = new Vector1(0);
vector.Length = x.Length;
double[] Src, Dst;
Src = x.Data;
Dst = vector.Data.Data;
int Len = vector.Length;
for (int i = 0; i < Len; i++)
{
Dst = Math.Exp(Src);
}
return vector;
}
public static Vector1 Sqrt(Vec1 x)
{
Vector1 vector = new Vector1(0);
vector.Length = x.Length;
double[] Src, Dst;
Src = x.Data;
Dst = vector.Data.Data;
int Len = vector.Length;
for (int i = 0; i < Len; i++)
{
Dst = Math.Sqrt(Src);
}
return vector;
}
}
//The function benchmarked
private double MaxwellExpression1(int Iterations)
{
double a = 1;
Vector1 xx, res1;
res1 = null;
int counter = Environment.TickCount;
for (int i = 0; i < Iterations; i++)
{
xx = x1 * x1;
res1 = Math.Sqrt(4 * Math387.INVTWOPI * a) * a * xx *
Vec1Math.Exp(-0.5 * a * xx);
}
int result = Environment.TickCount - counter;
return result;
}
Iternations count was: 10,100,1000,10000,100000,1000000
Short vectors are not faster than long vecors, which prooves that
CPU cache is not being leveraged at all..