ArrayList without Boxing and UnBoxing

P

Peter Olcott

Arne Vajhøj said:
You need to do more than 7000 boxing and unboxing every 1/10 seconds
with less overhead than 1/100 ?

I would call that rather unusual requirements.

Yes - generics will avoid boxing and unboxing, but it will
add some overhead too.

Arne

Take a look a the technology:
 
P

Peter Olcott

Arne Vajhøj said:
I would say that the two requirements:
1) be able to dynamically grow
2) must not have any more overhead than an unmanaged array
are in conflict.

Nothing is free.

Unmanaged array access costs no more than unmanaged array access, I think it is
about one clock.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Peter said:
Unmanaged array access costs no more than unmanaged array access, I think it is
about one clock.

The functionality offered by a list will cost. Maybe only in adding
and not in getting. But it will cost.

That should be rather obvious.

Arne
 
P

Peter Olcott

Arne Vajhøj said:
The functionality offered by a list will cost. Maybe only in adding
and not in getting. But it will cost.

std::vector provides access time equal to the access time of unmanaged array
access, yet can dynamically grow.
Does .NET have anything with EXACTLY these performance characteristics?
 
J

Jon Skeet [C# MVP]

Peter Olcott said:
std::vector provides access time equal to the access time of unmanaged array
access, yet can dynamically grow.
Does .NET have anything with EXACTLY these performance characteristics?

Frankly, I doubt that std::vector provides that, to be honest. Assuming
it's an array + size, you need to do a size check before the array
access, so there's a performance penalty there, even if it's really
small.

Certainly I haven't seen anything in .NET which has *no* performance
penalty over using arrays. You could probably write a list which would
return potentially bad data rather than throwing the appropriate
exception if, say, you had a backing array of size 5, a list of size 2,
and asked for element 4. I'd rather have correctness with the tiny
performance penalty though...
 
I

ironpythonster

If you only store the valueType in ArraryList,may be useedthe template
would be help.
Out of this way ,i couldn't think out what way to fix

May be help for you
 
P

Peter Olcott

Jon Skeet said:
Frankly, I doubt that std::vector provides that, to be honest. Assuming
it's an array + size, you need to do a size check before the array
access, so there's a performance penalty there, even if it's really
small.

No there is no size check, and this cost is not be really small, it at least
doubles the access time.
 
W

Willy Denoyette [MVP]

|
| | >> > The functionality offered by a list will cost. Maybe only in adding
| >> > and not in getting. But it will cost.
| >>
| >> std::vector provides access time equal to the access time of unmanaged
array
| >> access, yet can dynamically grow.
| >> Does .NET have anything with EXACTLY these performance characteristics?
| >
| > Frankly, I doubt that std::vector provides that, to be honest. Assuming
| > it's an array + size, you need to do a size check before the array
| > access, so there's a performance penalty there, even if it's really
| > small.
|
| No there is no size check, and this cost is not be really small, it at
least
| doubles the access time.
|


This is definitely not true, consider following small sample:

#include <vector>
int main(int argc, char *argv[])
{
size_t size = 100;
std::vector<int> array(size); // make room for 10 integers,
// and initialize them to 0
// do something with them:
for(int i=0; i<size; ++i){
array = i;
}
// break here, disassembly follows
int v = array[55];
return 0;
}

comments added [n]

004025e8 85ff test edi,edi [1]
004025ea 740a je vect!main+0x56 (004025f6)
004025ec 2bdf sub ebx,edi [2]
004025ee c1fb02 sar ebx,2 [3]
004025f1 83fb37 cmp ebx,37h [4]
004025f4 7705 ja vect!main+0x5b (004025fb)
004025f6 e8f1070000 call vect!_invalid_parameter_noinfo (00402dec)
004025fb 8b8fdc000000 mov ecx,dword ptr [edi+0DCh]
ds:0023:00324c3c=00000037 [5]

[1] checks array base pointer for null (non initialized)
[2] calculate length of array in bytes (edi = base, edx points to past end
of array)
[3] convert to length in int's (divide by 4)
[4] index < length of array
[5] move int at edi+55 -> ecx

You see, there is a pointer validation check and a bounds check for each
access. This accounts for 6 instructions, supposed index is within bounds.
Exactly the same is done in .NET when accessing array elements.


Willy.
 
P

Peter Olcott

Willy Denoyette said:
|
| | >> > The functionality offered by a list will cost. Maybe only in adding
| >> > and not in getting. But it will cost.
| >>
| >> std::vector provides access time equal to the access time of unmanaged
array
| >> access, yet can dynamically grow.
| >> Does .NET have anything with EXACTLY these performance characteristics?
| >
| > Frankly, I doubt that std::vector provides that, to be honest. Assuming
| > it's an array + size, you need to do a size check before the array
| > access, so there's a performance penalty there, even if it's really
| > small.
|
| No there is no size check, and this cost is not be really small, it at
least
| doubles the access time.
|


This is definitely not true, consider following small sample:

#include <vector>
int main(int argc, char *argv[])
{
size_t size = 100;
std::vector<int> array(size); // make room for 10 integers,
// and initialize them to 0
// do something with them:
for(int i=0; i<size; ++i){
array = i;
}
// break here, disassembly follows
int v = array[55];
return 0;
}

comments added [n]

004025e8 85ff test edi,edi [1]
004025ea 740a je vect!main+0x56 (004025f6)
004025ec 2bdf sub ebx,edi [2]
004025ee c1fb02 sar ebx,2 [3]
004025f1 83fb37 cmp ebx,37h [4]
004025f4 7705 ja vect!main+0x5b (004025fb)
004025f6 e8f1070000 call vect!_invalid_parameter_noinfo (00402dec)
004025fb 8b8fdc000000 mov ecx,dword ptr [edi+0DCh]
ds:0023:00324c3c=00000037 [5]

[1] checks array base pointer for null (non initialized)
[2] calculate length of array in bytes (edi = base, edx points to past end
of array)
[3] convert to length in int's (divide by 4)
[4] index < length of array
[5] move int at edi+55 -> ecx

You see, there is a pointer validation check and a bounds check for each
access. This accounts for 6 instructions, supposed index is within bounds.


You are not supposed to get range checking unless you explicitly ask for it by
invoking the std::vector:at(n) member function.
 
W

Willy Denoyette [MVP]

|
| | >
| > | > |
| > | | > | >> > The functionality offered by a list will cost. Maybe only in
adding
| > | >> > and not in getting. But it will cost.
| > | >>
| > | >> std::vector provides access time equal to the access time of
unmanaged
| > array
| > | >> access, yet can dynamically grow.
| > | >> Does .NET have anything with EXACTLY these performance
characteristics?
| > | >
| > | > Frankly, I doubt that std::vector provides that, to be honest.
Assuming
| > | > it's an array + size, you need to do a size check before the array
| > | > access, so there's a performance penalty there, even if it's really
| > | > small.
| > |
| > | No there is no size check, and this cost is not be really small, it at
| > least
| > | doubles the access time.
| > |
| >
| >
| > This is definitely not true, consider following small sample:
| >
| > #include <vector>
| > int main(int argc, char *argv[])
| > {
| > size_t size = 100;
| > std::vector<int> array(size); // make room for 10 integers,
| > // and initialize them to 0
| > // do something with them:
| > for(int i=0; i<size; ++i){
| > array = i;
| > }
| > // break here, disassembly follows
| > int v = array[55];
| > return 0;
| > }
| >
| > comments added [n]
| >
| > 004025e8 85ff test edi,edi [1]
| > 004025ea 740a je vect!main+0x56 (004025f6)
| > 004025ec 2bdf sub ebx,edi [2]
| > 004025ee c1fb02 sar ebx,2 [3]
| > 004025f1 83fb37 cmp ebx,37h [4]
| > 004025f4 7705 ja vect!main+0x5b (004025fb)
| > 004025f6 e8f1070000 call vect!_invalid_parameter_noinfo
(00402dec)
| > 004025fb 8b8fdc000000 mov ecx,dword ptr [edi+0DCh]
| > ds:0023:00324c3c=00000037 [5]
| >
| > [1] checks array base pointer for null (non initialized)
| > [2] calculate length of array in bytes (edi = base, edx points to past
end
| > of array)
| > [3] convert to length in int's (divide by 4)
| > [4] index < length of array
| > [5] move int at edi+55 -> ecx
| >
| > You see, there is a pointer validation check and a bounds check for each
| > access. This accounts for 6 instructions, supposed index is within
bounds.
|
| You are not supposed to get range checking unless you explicitly ask for
it by
| invoking the std::vector:at(n) member function.
|

This is an implementation detail. The std library as supplied by MSFT
(Dinkumware's implementation)) since 7.1 does range checking, don't know
about other iplementations though. Point is that you should not rely on
this, the standard does not enforce operator[] to be 'unchecked' AFAIK.
The major difference between operator[] and at() access is that the former
calls a "crt debugger" hook followed by a 'drwatson' call, while the latter
throws an "out_of_range" exception.

Willy.
 
W

Willy Denoyette [MVP]

|
| ||
|| || >
|| > || > |
|| > | || > | >> > The functionality offered by a list will cost. Maybe only in
| adding
|| > | >> > and not in getting. But it will cost.
|| > | >>
|| > | >> std::vector provides access time equal to the access time of
| unmanaged
|| > array
|| > | >> access, yet can dynamically grow.
|| > | >> Does .NET have anything with EXACTLY these performance
| characteristics?
|| > | >
|| > | > Frankly, I doubt that std::vector provides that, to be honest.
| Assuming
|| > | > it's an array + size, you need to do a size check before the array
|| > | > access, so there's a performance penalty there, even if it's really
|| > | > small.
|| > |
|| > | No there is no size check, and this cost is not be really small, it
at
|| > least
|| > | doubles the access time.
|| > |
|| >
|| >
|| > This is definitely not true, consider following small sample:
|| >
|| > #include <vector>
|| > int main(int argc, char *argv[])
|| > {
|| > size_t size = 100;
|| > std::vector<int> array(size); // make room for 10 integers,
|| > // and initialize them to 0
|| > // do something with them:
|| > for(int i=0; i<size; ++i){
|| > array = i;
|| > }
|| > // break here, disassembly follows
|| > int v = array[55];
|| > return 0;
|| > }
|| >
|| > comments added [n]
|| >
|| > 004025e8 85ff test edi,edi [1]
|| > 004025ea 740a je vect!main+0x56 (004025f6)
|| > 004025ec 2bdf sub ebx,edi [2]
|| > 004025ee c1fb02 sar ebx,2 [3]
|| > 004025f1 83fb37 cmp ebx,37h [4]
|| > 004025f4 7705 ja vect!main+0x5b (004025fb)
|| > 004025f6 e8f1070000 call vect!_invalid_parameter_noinfo
| (00402dec)
|| > 004025fb 8b8fdc000000 mov ecx,dword ptr [edi+0DCh]
|| > ds:0023:00324c3c=00000037 [5]
|| >
|| > [1] checks array base pointer for null (non initialized)
|| > [2] calculate length of array in bytes (edi = base, edx points to past
| end
|| > of array)
|| > [3] convert to length in int's (divide by 4)
|| > [4] index < length of array
|| > [5] move int at edi+55 -> ecx
|| >
|| > You see, there is a pointer validation check and a bounds check for
each
|| > access. This accounts for 6 instructions, supposed index is within
| bounds.
||
|| You are not supposed to get range checking unless you explicitly ask for
| it by
|| invoking the std::vector:at(n) member function.
||
|
| This is an implementation detail. The std library as supplied by MSFT
| (Dinkumware's implementation)) since 7.1 does range checking, don't know
| about other iplementations though. Point is that you should not rely on
| this, the standard does not enforce operator[] to be 'unchecked' AFAIK.
| The major difference between operator[] and at() access is that the former
| calls a "crt debugger" hook followed by a 'drwatson' call, while the
latter
| throws an "out_of_range" exception.
|
| Willy.
|

Forgot to mention that this behavior can be turned of by means of
the -D_SECURE_SCL switch or in code using:
#define _SECURE_SCL 0
Note that by doing so, you throw away all security related enhancements.

Willy.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Peter said:
std::vector provides access time equal to the access time of unmanaged array
access, yet can dynamically grow.
Does .NET have anything with EXACTLY these performance characteristics?

Hm.

GCC 3.2 with -O3:

integer array 0.06 0.03
vector 0.33 0.06

BCB 5.6:

integer array 0.06 0.03
vector 0.33 0.05

VC++ 7.1 with /Ox:

integer array 0.06 0.03
vector 0.44 0.05

Results fluctuate a bit, but at average integer array
is faster than vector.

Arne

PS: Ask for code if interested.
 
P

Peter Olcott

Arne Vajhøj said:
Hm.

GCC 3.2 with -O3:

integer array 0.06 0.03
vector 0.33 0.06

BCB 5.6:

integer array 0.06 0.03
vector 0.33 0.05

VC++ 7.1 with /Ox:

integer array 0.06 0.03
vector 0.44 0.05

Results fluctuate a bit, but at average integer array
is faster than vector.

Arne

PS: Ask for code if interested.

What are the two columns?
 
P

Peter Olcott

Arne Vajhøj said:
Storing and retrieving.

Arne

Is that storing using push_back(), or storing using operator[] ?
I would think that the former would be much slower than the latter.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Is that storing using push_back(), or storing using operator[] ?
I would think that the former would be much slower than the latter.

push_back

[] will crash unless the array is preallocated - in which case
a simple array could just as well be used.

Arne
 
P

Peter Olcott

Arne Vajhøj said:
Is that storing using push_back(), or storing using operator[] ?
I would think that the former would be much slower than the latter.

push_back

[] will crash unless the array is preallocated - in which case
a simple array could just as well be used.

Arne

When I am talking about array access, I am talking about reading from the array.
If .NET takes more than 50% more time to read from an array of scalar types, it
would be unacceptable for my application. With Boxing and UnBoxing we have much
more than a 50% cost penalty. I am not sure about the cost penalty (if any) for
generics.
 
B

Bill Butler

Peter Olcott said:
When I am talking about array access, I am talking about reading from
the array. If .NET takes more than 50% more time to read from an array
of scalar types, it would be unacceptable for my application. With
Boxing and UnBoxing we have much more than a 50% cost penalty. I am
not sure about the cost penalty (if any) for generics.

As has been suggested in the past, You should set up a realistic proof
of concept to see if this is the issue that you think it is. If, after
testing, you determine that the performance hit is unacceptable, THEN
you should start working on ways around the issue.

Bill
 
P

Peter Olcott

Bill Butler said:
As has been suggested in the past, You should set up a realistic proof of
concept to see if this is the issue that you think it is. If, after testing,
you determine that the performance hit is unacceptable, THEN you should start
working on ways around the issue.

Bill
Its not worth spending the learning curve that is required. If I could know in
advance that benchmarks prove that the read access time to scalar types is
comparable to that to read access time of unmanaged array access, then it might
be worth the learning curve to pursue further.
 
B

Bill Butler

Its not worth spending the learning curve that is required. If I could
know in advance that benchmarks prove that the read access time to
scalar types is comparable to that to read access time of unmanaged
array access, then it might be worth the learning curve to pursue
further.

Is your current App running so close to the edge that you cannot afford
any spare clock cycles?

Doing a quick proof of concept shouldn't be THAT time consuming. Perhaps
you really can't afford the overhead, but you will never know if you
don't do a test

Bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top