Roll your own std::vector ???

P

Peter Olcott

Lucian Wischik said:
Peter, that's incorrect. Have you ever looked at the IL? It really has
lost very little information. That's why e.g. Microsoft's code
analysis FxCop is able to run and give useful results on IL.

http://www.osnews.com/img/5602/results.jpg

This was the benchmark you posted. It did not show what you claimed.
It showed both .net languages to have identical performance, except
that the VB version used a slower IO library.

You are not nearly precise enough in your interpretation. It actually showed
that Visual C++ .NET was the fastest of every language listed except for I/O
where C# beat it by 6%. C++ beat C# by 277% on double precision math.
 
P

Peter Olcott

Lucian Wischik said:
I tested it on VS2003 and VS2005. I did the c++ version with
optimizations for speed, and profile-guided optimization.

In any case, the c++ runs just as fast with optimizations turned on as
with them turned off. That's because the benchmark is testing how fast
the library can copy a 40k block of memory (as it removes or inserts
the head element from a 10,000 element vector). This is sure as
anything going to be a ready-provided small machine code library
routine, e.g. memcpy, and outside the realm of compiler optimizations.

If you did not get the same results as the original benchmarks then you did not
match the original benchmark conditions correctly.
 
L

Lucian Wischik

Peter Olcott said:
If you did not get the same results as the original benchmarks then you did not
match the original benchmark conditions correctly.

Ah, okay, I see what happened. The original benchmark used vector<int>
in c++ and ArrayList in C#, and performance was comparable. That's
what he got, and it's what I got, so we agree.

For the later tests he switched to using a linked list for c++.
Unsurprisingly this was fast, since the benchmark was about
performance at inserting and removing the head element: O(n). But he
stuck to the vector-like ArrayList for C#, O(n^2).

(which means that his benchmark figures are irrelevant for your topic,
"roll your own std::vector"...)



Isn't that how it always goes. Instead of focusing on the efficiency
of different compiler switches, or different language constructs, or
of boxing and unboxing, we're better off checking that we've picked
the correct basic datastructure and algorithm!
 
P

Peter Olcott

Lucian Wischik said:
Ah, okay, I see what happened. The original benchmark used vector<int>
in c++ and ArrayList in C#, and performance was comparable. That's
what he got, and it's what I got, so we agree.

For the later tests he switched to using a linked list for c++.
Unsurprisingly this was fast, since the benchmark was about
performance at inserting and removing the head element: O(n). But he
stuck to the vector-like ArrayList for C#, O(n^2).

(which means that his benchmark figures are irrelevant for your topic,
"roll your own std::vector"...)



Isn't that how it always goes. Instead of focusing on the efficiency
of different compiler switches, or different language constructs, or
of boxing and unboxing, we're better off checking that we've picked
the correct basic datastructure and algorithm!

The main reason that generics were created was because the boxing/unboxing
overhead of ArrayList was too expensive.
 
J

Jon Skeet [C# MVP]

Peter Olcott said:
Since I always use the construct where C# did 500% worse, and good programming
practice would require always using such a construct, then C# is often (if not
typically) 500% slower.

No - it may be often/typically 500% slower *in your particular
application* (although I doubt that, too) but that's not the same as
your *general* statement.

I could make a similarly false that "houses typically exist in the UK
rather than the US". After all, nearly all the houses that *I* see are
in the UK, because I live there. That doesn't make the general
statement true, any more than it makes your general statement about the
speed of .NET true.
As it turns out, the solution derived from this thread directly
addresses this specific problem. The Boxing and UnBoxing of ArrayList
simply costs way too much.

Except that boxing and unboxing have almost *no* cost in the benchmark
you keep talking about.

As Lucian has pointed out, most of the time in the List/Vector
benchmark is actually taken removing elements from the start of a list
or inserting them into the start. That has nothing to do with
boxing/unboxing *and* it's not the typical usage for a list in most
applications I've either written or seen. Does your application
typically spend *most* of its time adding/removing entries to/from the
start of a list? If so, you should consider using a LinkedList instead.
If not, you shouldn't be basing your conclusions on this benchmark.

Changing the benchmark code to use a reference type instead of a value
type (i.e. removing the boxing/unboxing entirely) makes very little
difference to the performance of the code. (In fact, on my box it
actually makes it worse - I'm not entirely sure why at the moment.)

However, changing the benchmark code to always add/remove from the
*end* of the list makes a *vast* difference to the performance - it
makes it about *100 times* faster on my box, despite it not changing
how much boxing and unboxing occurs. (There is a slight difference in
terms of what's going on, as the lists are always reversed instead of
sometimes being copied in order, but I don't think you can possibly
argue that that is responsible for the change in performance.)

In a nutshell, the benchmark you're using to show that boxing/unboxing
are too expensive for you isn't in any way, shape or form dominated by
the cost of boxing/unboxing. Your conclusion is completely invalid.
That is not the way that compiler optimization works. Too much semantic
information is lost after it has been translated into the intermediate code for
much optimization to be applied.

If I can decompile back from IL to the original C#, what information do
you claim has been lost, precisely? Last time I looked, comments and
local variable names aren't part of what an optimiser looks at - and
often that's the only difference between the decompiled code and the
original.
I posted another benchmark across the different
.NET languages, there was a substantial difference.

That doesn't mean that the JIT doesn't do most of the optimisation. The
C++ compiler does more than the C# compiler in some cases, but I
believe the JIT is still the main optimization contributer.
 
J

Jon Skeet [C# MVP]

Lucian Wischik said:
Ah, okay, I see what happened. The original benchmark used vector<int>
in c++ and ArrayList in C#, and performance was comparable. That's
what he got, and it's what I got, so we agree.

For the later tests he switched to using a linked list for c++.
Unsurprisingly this was fast, since the benchmark was about
performance at inserting and removing the head element: O(n). But he
stuck to the vector-like ArrayList for C#, O(n^2).

What a cheat! I'm not entirely surprised, given the code - it's pretty
horrible for the C#, clearly not what anyone really familiar with C#
would use.
(which means that his benchmark figures are irrelevant for your topic,
"roll your own std::vector"...)

Isn't that how it always goes. Instead of focusing on the efficiency
of different compiler switches, or different language constructs, or
of boxing and unboxing, we're better off checking that we've picked
the correct basic datastructure and algorithm!

Indeed - and when checking the differences between languages, checking
that we've got reasonable benchmarks!
 
P

Peter Olcott

Jon Skeet said:
No - it may be often/typically 500% slower *in your particular
application* (although I doubt that, too) but that's not the same as
your *general* statement.

I could make a similarly false that "houses typically exist in the UK
rather than the US". After all, nearly all the houses that *I* see are
in the UK, because I live there. That doesn't make the general
statement true, any more than it makes your general statement about the
speed of .NET true.


Except that boxing and unboxing have almost *no* cost in the benchmark
you keep talking about.

As Lucian has pointed out, most of the time in the List/Vector
benchmark is actually taken removing elements from the start of a list
or inserting them into the start. That has nothing to do with
boxing/unboxing *and* it's not the typical usage for a list in most
applications I've either written or seen. Does your application
typically spend *most* of its time adding/removing entries to/from the
start of a list? If so, you should consider using a LinkedList instead.
If not, you shouldn't be basing your conclusions on this benchmark.

Changing the benchmark code to use a reference type instead of a value
type (i.e. removing the boxing/unboxing entirely) makes very little
difference to the performance of the code. (In fact, on my box it
actually makes it worse - I'm not entirely sure why at the moment.)

The most time critical aspect of my application spends about 99% of its time
comparing elements in one array to elements in other arrays. This subsystem can
tolerate no additional overhead at all, having a real-time constraint of 1/10
second.
 
W

Willy Denoyette [MVP]

Lucian Wischik said:
Jon Skeet said:
a) That's .NET 1.1 - it would be interesting to see the .NET 2.0
results
b) I haven't looked at the code yet, but the fact that Java is
significantly faster in a couple of the benchmarks suggests there's
further room for optimisation

Okay, here are my results for that benchmark:

c++ using vector: 28 seconds
C#.net1 using ArrayList: 19.3 seconds
C#.net2 using ArrayList: 19.5 seconds
C#.net2 using List<int>: 19.1 seconds
Also, VS2005 and Visual C# Express were identical in speed.

Of those 19 seconds, 6 of them are spent building a 10,000 element
list by repeatedly inserting a new member at the front, 10 of them are
spent clearing two 10,000 element lists by repeatedly removing the
element at the front, and the rest of the work is O(n) rather than
O(n^2).

So this benchmark is basically just doing memcpy() on a 40k block of
memory, repeatedly. It says nothing about the relative cost of boxing
or unboxing. That's why the generics are no faster than the ArrayList.

I don't know why I'm getting the exact opposite resoluts from that
webpage -- I'm getting a moderately slower c++vector, whereas the
webpage gets an extremely slower c#.net1. Obviously one of us has a
typo somewhere! I also don't know why ArrayList should be any faster
than c++vector.

Here, I've distilled out the essence (O^n2) of that anomalous
benchmark:

// c++ version
//
int main()
{ int t = GetTickCount();
for (int j=0; j<100; j++)
{ vector<int> v;
for (int i=0; i<10000; i++) v.push_back(i);
vector<int> v2 = v;
vector<int> v3;
for (int i=0; i<10000; i++) v3.insert(v3.begin(),i);
while (!v.empty()) v.erase(v.begin());
while (!v2.empty()) v2.erase(v2.begin());
printf(".");
}
printf("\nTime: %i\n",(int)(GetTickCount()-t));
return 0;
}


// c# version
//
static void Main(string[] args)
{ DateTime startTime = DateTime.Now;
for(int j=0; j<100; j++)
{ ArrayList v = new ArrayList();
for (int i=0; i<10000; i++) v.Insert(v.Count, i);
ArrayList v2 = new ArrayList(v);
ArrayList v3 = new ArrayList();
for (int i=0; i<10000; i++) v3.Insert(0,i);
while (v2.Count>0) v2.RemoveAt(0);
while (v3.Count>0) v3.RemoveAt(0);
Console.Write(".");
}
Console.WriteLine("\nVector elapsed time: " +
DateTime.Now.Subtract(startTime).TotalMilliseconds + " ms");
}




Following code shows the VectorTest methods I modified to use Generic List to get a fair
comparison with the Vector template class.
Compiled with csc /o filename.cs
and run on XP using 8.00.50727 of the CSharp compiler and 14.00.50727.42 of the C++ compiler
and framework....

result C#:

Vector elapsed time: 7107 ms - 10000

while the unmodified C++ code compiled with cl /EHsc /O2 filename.cpp

results in ...
Vector elapsed time: 7606 ms - 10000
(ArrayList result: Vector elapsed time: 17565 ms - 10000)
You see, C# is the clear winner.

Don't know why you got such bad figures when using List<>

[Mofified C# code ]
/* original code copyright 2004 Christopher W. Cowell-Shah
http://www.cowell-shah.com/research/benchmark/code */
/* other code portions copyright http://dada.perl.it/shootout/ and Doug Bagley
http://www.bagley.org/~doug/shootout */
/* combined, modified and fixed by Thomas Bruckschlegel - http://www.tommti-systems.com */

....
static public int VectorTest()
{
// create a list of integers (Li1) from 1 to SIZE
List<int> Li1 = new List<int>();
for (int i = 1; i < lSIZE + 1; i++)
{
Li1.Insert(Li1.Count, i); //addlast
//Li1.addLast(new Integer(i));
}
// copy the list to Li2 (not by individual items)
List<int> Li2 = new List<int>(Li1);
List<int> Li3 = new List<int>();
// remove each individual item from left side of Li2 and
// append to right side of Li3 (preserving order)
while (Li2.Count > 0)
{
Li3.Insert(Li3.Count, Li2[0]); //addlast
Li2.RemoveAt(0);
}
// Li2 must now be empty
// remove each individual item from right side of Li3 and
// append to right side of Li2 (reversing list)
while (Li3.Count > 0)
{
Li2.Insert(Li2.Count, Li3[Li3.Count - 1]); //addlast
Li3.RemoveAt(Li3.Count - 1);
}
// Li3 must now be empty
// reverse Li1
List<int> tmp = new List<int>();
while (Li1.Count > 0)
{
tmp.Insert(0, Li1[0]); //addfirst
Li1.RemoveAt(0);
}
Li1 = tmp;
// check that first item is now lSIZE
if ((int)(Li1[0]) != lSIZE)
{
Console.WriteLine("first item of Li1 != lSIZE");
return (0);
}
// compare Li1 and Li2 for equality
// where is the == operator?
// do a Li1!=Li2 comparison
if (Li1.Count != Li2.Count)
{
Console.WriteLine("Li1 and Li2 differ");
return (0);
}
for (int i = 0; i < Li1.Count; i++)
{
if (Li1 != Li2)
{
Console.WriteLine("Li1 and Li2 differ");
return (0);
}
}
// return the length of the list
return (Li1.Count);
}

The only to tests where C++ is a clear winner are the "Exception" test and the Nested loop
test (nl), exceptions are know to be slow in managed code and for the nested loop test, the
C++ compiler optimizer has more time to spend to hoist the loop. For all other tests C# is
just as fast or somewhat faster then C++.

Willy.
 
J

Jon Skeet [C# MVP]


I've finally (after a while) found the source code for these
benchmarks:

http://www.ocf.berkeley.edu/~cowell/research/benchmark/code/

Running the tests myself, the "double" result is only 50% faster in C++
than in C#, rather than the 100% faster shown. This could partly be
down to compiler options - but they aren't given, as far as I can tell.

Now, I looked at the generated code with Reflector in both cases, and
one of the interesting differences is that the C# and C++ forms are
doing different things - C++ doesn't specify the order of the
assignments in the statement:

x *= y++;

(for example)

and the C++ compiler is treating it as:

x *= y;
y++;

In C#, that's not allowed - the order is guaranteed to be:

tmp = y;
y++;
x *= tmp;

Changing the body of the loop in doubleArithmetic to this:

while (i < doubleMax)
{
doubleResult -= i;
i++;
doubleResult += i;
i++;
doubleResult *= i;
i++;
doubleResult /= i;
i++;
}

made the generated code much closer between the two.

There's still a significant timing difference though, even with the
generated IL being almost identical - and for a while I really couldn't
explain it. Then I noticed that the stacks were significantly
different, due to the reporting mechanism: the C# uses
Console.WriteLine and formats a string, whereas the C++ uses printf
directly.


At this point (after much jigging around), I redownloaded the source,
and *just* changed the reporting mechanisms for the doubleArithmetic
methods to the following:

C++:
System::Console::WriteLine(doubleResult);
System::Console::WriteLine(elapsedTime);

C#:
Console.WriteLine(doubleResult);
Console.WriteLine(elapsedMilliseconds);

The results, with them still doing different things in terms of the
post-increment operator:

C#: 8158ms
C++: 7744ms

Suddenly the C++ result looks only 5% better than C#.

Changing the i++ in each statement into a ++i (to remove the language
difference) makes the two methods run in exactly the same time, within
the boundaries of timing error (on some runs the C++ wins, on some runs
the C# wins).

Just goes to show the importance of making like-for-like comparisons.
 
J

Jon Skeet [C# MVP]

Peter Olcott said:
The main reason that generics were created was because the boxing/unboxing
overhead of ArrayList was too expensive.

Well, that was *one* reason. The other (principal, IMO) reasons were
type safety and better expression of ideas.

Generics are in Java 5+ as well, without removing the impact of
boxing/unboxing.

Of course, even if generics *had* primarily been invented to alleviate
the performance hit of boxing/unboxing, that still wouldn't make the
benchmark you're looking at any more relevant, because boxing/unboxing
simply isn't a significant factor in it.
 
J

Jon Skeet [C# MVP]

Peter Olcott said:
The most time critical aspect of my application spends about 99% of its time
comparing elements in one array to elements in other arrays. This subsystem can
tolerate no additional overhead at all, having a real-time constraint of 1/10
second.

a) arrays aren't the same as lists
b) comparing elements in an array isn't the same as adding/removing
elements to/from the start of a list

So the benchmark you've been basing your 500% figure on is irrelevant
to your code.

Try benchmarking your *actual code* rather than relying on external
benchmarks.
 
C

Chris Mullins

Peter Olcott said:
The most time critical aspect of my application spends about 99% of its
time comparing elements in one array to elements in other arrays. This
subsystem can tolerate no additional overhead at all, having a real-time
constraint of 1/10 second.

So why are you so worried about the "adding into a list" performance case?
It seems as if you would be much more worried about traversal and
comparision, rather than just inserting.

Also, if you are inserting into an array, (be it List<>, or something else)
you're always going to have hits while it resizes the array - as long as
you're doing insertions or deletions, this is going to be a problem. Use a
linked lists. You gotta pick the right datastructure for your problem.

To further confuse issues, if this is the most time critical piece of your
application, you need to spend some very serious time looking at a better
algorithm. You're going to want to partition your comparisions across
processors using multiple threads (possibly setting affinity), and do so in
a way that avoids locking in the general case, and is intelligent about use
of processor cache (aka: avoids blowing the cache constantly).

Parallel Algorithms are going to be (well, should be) a major part of your
solution. Depending on your use case and processor counts, you may also want
to look into the various lock-free list implementations.

These problems are in no way language specific. C / C++ / Java / C# / VB.Net
will all do a fine job. It's really up to you to determine the best
algorithm given your problem.
 
P

Peter Olcott

Jon Skeet said:
Well, that was *one* reason. The other (principal, IMO) reasons were
type safety and better expression of ideas.

Generics are in Java 5+ as well, without removing the impact of
boxing/unboxing.

Of course, even if generics *had* primarily been invented to alleviate
the performance hit of boxing/unboxing, that still wouldn't make the
benchmark you're looking at any more relevant, because boxing/unboxing
simply isn't a significant factor in it.

If one must convert the form of the data for every access to this data, then
this must take much longer than the simple access by itself. The last time that
we had this discussion there was all kinds of extra overhead (such as bounds
checking) that was thrown in and assumed. I am talking about the bare minimum
simple access that is translated into native code as a simple memory comparison
such as CMP EAX, MEM[EBX].
 
J

Jon Skeet [C# MVP]

Peter Olcott said:
Generics are in Java 5+ as well, without removing the impact of
boxing/unboxing.

Of course, even if generics *had* primarily been invented to alleviate
the performance hit of boxing/unboxing, that still wouldn't make the
benchmark you're looking at any more relevant, because boxing/unboxing
simply isn't a significant factor in it.

If one must convert the form of the data for every access to this data, then
this must take much longer than the simple access by itself. The last time that
we had this discussion there was all kinds of extra overhead (such as bounds
checking) that was thrown in and assumed. I am talking about the bare minimum
simple access that is translated into native code as a simple memory comparison
such as CMP EAX, MEM[EBX].

I'm not sure where the relevance of that is to anything I wrote...

If you really want unsafe code that could dive into the middle of
*anywhere* if you give it an invalid array index, you should indeed
stick to unmanaged code.

If you want code that performs extremely well anyway, whilst being much
more robust and easier to develop, go with .NET.

I'm not sure why you're continuing this line of enquiry, to be honest -
you seem to have made up your mind long ago that .NET wouldn't perform
appropriately, which is presumably why you haven't accepted that the
benchmarks you've been using as a basis for rejecting C# are entirely
inappropriate.
 
P

Peter Olcott

Chris Mullins said:
So why are you so worried about the "adding into a list" performance case? It
seems as if you would be much more worried about traversal and comparision,
rather than just inserting.

Also, if you are inserting into an array, (be it List<>, or something else)
you're always going to have hits while it resizes the array - as long as
you're doing insertions or deletions, this is going to be a problem. Use a
linked lists. You gotta pick the right datastructure for your problem.

To further confuse issues, if this is the most time critical piece of your
application, you need to spend some very serious time looking at a better
algorithm. You're going to want to partition your comparisions across
processors using multiple threads (possibly setting affinity), and do so in a
way that avoids locking in the general case, and is intelligent about use of
processor cache (aka: avoids blowing the cache constantly).

It works fine now in native C++, I just don't want the port to .NET to break
this performance. Unless I can know in advance that .NET will not break this
performance I don't have time to learn .NET.
Parallel Algorithms are going to be (well, should be) a major part of your
solution. Depending on your use case and processor counts, you may also want
to look into the various lock-free list implementations.

The simple thing to do is to use dual core processors and allocate one core to
this process, since this process must achieve its 1/10 second real time limit
concurrently with other processes. In actuality this probably will not be
required for most applications because most of the time the other process will
not be using very much CPU time during the execution of this process.
 
C

Chris Mullins

Peter Olcott said:
It works fine now in native C++, I just don't want the port to .NET to
break this performance. Unless I can know in advance that .NET will not
break this performance I don't have time to learn .NET.

You decided long ago that .Net wasn't a platform you wanted to use. You
found a set of benchmarks that supported this stance, and have proceeded to
cling to them despite a number of people showing you huge and obvious flaws
in those benchmarks.

You've then changed tactics, and talked about how it's actually not just
those benchmarks, but other technologies like boxing and unboxing that are
really going to be your problem. Now it's not about memory allocation
anymore, but about traversing arrays and datastructures for comparisons.

At this point, honestly, you're just being stubborn.
The simple thing to do is to use dual core processors and allocate one
core to this process, since this process must achieve its 1/10 second real
time limit concurrently with other processes.

That's not at all the simplist thing to do. In fact, all that tells me is
that you've not written multi-threaded code before, aren't familiar with
parallel algorithms, and don't really know how to take advantage of multiple
CPU's. It also says you don't really know how threading works under windows,
in terms of scheduling.

Given the problem you describe, these really are technologies you need to
master - regardless of C++, Java, .Net, or whatever other language you
choose.
 
P

Peter Olcott

Chris Mullins said:
You decided long ago that .Net wasn't a platform you wanted to use. You found
a set of benchmarks that supported this stance, and have proceeded to cling to
them despite a number of people showing you huge and obvious flaws in those
benchmarks.

You've then changed tactics, and talked about how it's actually not just those
benchmarks, but other technologies like boxing and unboxing that are really
going to be your problem. Now it's not about memory allocation anymore, but
about traversing arrays and datastructures for comparisons.

At this point, honestly, you're just being stubborn.

Actually it is just the opposite of your incorrect assumptions. .NET is a
technology that I have definitely decided to support. In theory there is no
fundamental reason why this technology would need to remain any slower than
traditional forms of native code execution.

Because of the truly brilliant software engineering that went into .NET there
will definitely be a point in time when the 500% productivity gains are worth
the learning curve. For most applications programming that point in time was a
few years ago.

For critical systems programming that point in time must wait until the
performance is up to snuff. I would tentatively estimate that this point in time
may have arrived at the same time that .NET 2.0 and generics arrived. I must
make sure of this before I can proceed.
 
M

Mark Wilden

Because of the truly brilliant software engineering that went into .NET
there will definitely be a point in time when the 500% productivity gains
are worth the learning curve.

The learning curve to go from C++ to like C# just isn't all that high - in
my experience, at least.

Programmers should learn a new language every year, anyway.

///ark
 
P

Peter Olcott

Mark Wilden said:
The learning curve to go from C++ to like C# just isn't all that high - in my
experience, at least.

Programmers should learn a new language every year, anyway.

///ark

Bjarne Stroustrup (the creator of C++) once said that he has not yet learned all
of C++ yet. Trying to learn a new computer programming language every year would
then necessarily result in a very shallow understanding.
 
M

Mark Wilden

Peter Olcott said:
Bjarne Stroustrup (the creator of C++) once said that he has not yet
learned all of C++ yet. Trying to learn a new computer programming
language every year would then necessarily result in a very shallow
understanding.

It's not necessary to learn "all" of anything. But being afraid to invest
time in a new skill is a sure way to make yourself extinct.

This year I learned Ruby. Do I have a shallow understanding of it? Perhaps.
But I bet it's deeper than yours. :)

///ark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top