A
Atmapuri
Hi!
As you noticed the difference only made for 2.5% which is negligable.
The way I got 75% of difference is pinning down the arrays inside
each of the operators and passing the arrays to external SSE2
optimized dll. The cost of the math operation was reduced drastically
and GC cost landed at 75%. The relative cost of pinning down
was negligable...
Furthermore, I evaluated the formula also by reusing the
same arrays (without the expression) and the result was:
1.) Plain C# function working on one element at a time: 1600ms
2.) Expression in C# on 1000 elements with math in SSE2: 1200ms
3.) Expression in C# with 1000 elements with math in C#: 1900ms
4.) Formula evaluation without releasing arrays and without
pinning down: 300ms (no expression but methods passing pointers around
and arrays pinned down before the timer starts).
So, for the case #2, the overhead of GC as measured by
AQTime was 75%.
Interesting. How would you fix the design to achieve 300ms of
the case #4 for expression in case #3?
The difference is huge. The application is very common and
practical and technology should server the people... and not
vice versa...
Thanks!
Atmapuri
this gives as result = 7520 ticks (7.5 secs.) with an "average % in GC" =
10.4.% > but again you have a 'bug' in you code...
Data.Length = aLength;
As you noticed the difference only made for 2.5% which is negligable.
The way I got 75% of difference is pinning down the arrays inside
each of the operators and passing the arrays to external SSE2
optimized dll. The cost of the math operation was reduced drastically
and GC cost landed at 75%. The relative cost of pinning down
was negligable...
Furthermore, I evaluated the formula also by reusing the
same arrays (without the expression) and the result was:
1.) Plain C# function working on one element at a time: 1600ms
2.) Expression in C# on 1000 elements with math in SSE2: 1200ms
3.) Expression in C# with 1000 elements with math in C#: 1900ms
4.) Formula evaluation without releasing arrays and without
pinning down: 300ms (no expression but methods passing pointers around
and arrays pinned down before the timer starts).
So, for the case #2, the overhead of GC as measured by
AQTime was 75%.
Also, allocating arrays of doubles are expensive, an array of 1000 or more
doubles ends on the Large Object Heap, this heap is never compacted and
only
Interesting. How would you fix the design to achieve 300ms of
the case #4 for expression in case #3?
The difference is huge. The application is very common and
practical and technology should server the people... and not
vice versa...
Thanks!
Atmapuri