C# generic containers from a "C++ perspective"

Jon Harrop · Mar 25, 2008

Ben said:
That is a known issue which I spoke of in my first post in this thread.
C++0x fixes that too.

Has C++0x ever been implemented?

Generics, which are early bound, don't provide parametric polymorphism.

I suggest you read up on parametric polymorphism.

What part of "both double and decimal" didn't you understand?

"decimal" because it is meaningless.

And why would different algorithms be needed?

Because these types have completely different properties in this context:

.. Ints do not accumulate round-off error. Floats do.

.. The "/" operator you use is Euclidean quotient for ints and an
approximation to real division for floats.

In C++ the same algorithm
will work for both int and float (until overflow becomes a problem, but
the accumulator can be a different type) except for roundoff, which is
expected for int.

As I thought, your algorithm is broken. This is difficult to demonstrate
because you failed to provide working C++ code.

through the use of JIT?

FFTW precompiles the most common codelets and puts them in its standard
library. With a JIT, you can compile any codelet as and when you need it.

No, as I recall the comparison process is very slow (I may be thinking of
ATLAS rather than FFTW), and the compile time is only a small fraction of
that.

That is generally the case, yes.

Oh, F# defines the bitwise and operation for all types it pretends to
implement INumeric for?

No, the INumeric interface is only used for the n.One and n.Zero. The
inlining makes this function generic over everything else just like a C++
template.

It's not part of the INumeric interface.
Yes.

It's not even part of the IIntegral interface.

Yes, of course.

And it must turn that into something pretty ugly in order for the CLR to
accept it.

I've no idea how it is compiled but it provides strictly more functionality
than your C++ template.

If they are so trivial, how come .NET generics can't solve them?

How come a ripe banana can't solve them?

Nobody cares when modern languages are more expressive in every respect.

Marc Gravell · Mar 25, 2008

It depends on what you mean by neutered - plus in all the cases discussed
above (decimal vs double vs int etc) we are talking about value-types, not
reference types.
As far as the JIT is concerned, it doesn't *need* more than one
reference-type IL version, since every reference takes the same size and is
accessed the same (possibly losing out on some "sealed" optimisations, but
heh...)

As it happens, the generic-operator stuff will cope with either
reference-type or value-type without issue (and in particular without
boxing) - but if you clarify the question I might be able to give a better
answer.

Marc

Jon Skeet [C# MVP] · Mar 25, 2008

Marc Gravell said:
It depends on what you mean by neutered - plus in all the cases discussed
above (decimal vs double vs int etc) we are talking about value-types, not
reference types.
As far as the JIT is concerned, it doesn't *need* more than one
reference-type IL version, since every reference takes the same size and is
accessed the same (possibly losing out on some "sealed" optimisations, but
heh...)

As it happens, the generic-operator stuff will cope with either
reference-type or value-type without issue (and in particular without
boxing) - but if you clarify the question I might be able to give a better
answer.

I think what Ben's driving at is that the JIT doesn't inline code which
could be inlined for reference types. It might do for value types,
because it JITs on a per-type basis, but for reference types it has to
assume that any virtual method *might* be overridden, because it's
going to use the same native code for all type arguments which are
reference types.

There may well be other optimisations with similar issues.

Ben Voigt [C++ MVP] · Mar 25, 2008

For example, how much .NET code (pick your language) would you need

No, the INumeric interface is only used for the n.One and n.Zero. The
inlining makes this function generic over everything else just like a
C++ template.

Ok, now I understand.

The ironic thing is that you've proved my point, by showing that F# needed
to implement a feature similar to C++ templates in order to overcome the
deficiencies in .NET generics.

Ben Voigt [C++ MVP] · Mar 25, 2008

Marc said:
It depends on what you mean by neutered - plus in all the cases
discussed above (decimal vs double vs int etc) we are talking about
value-types, not reference types.
As far as the JIT is concerned, it doesn't *need* more than one
reference-type IL version, since every reference takes the same size
and is accessed the same (possibly losing out on some "sealed"
optimisations, but heh...)

As it happens, the generic-operator stuff will cope with either
reference-type or value-type without issue (and in particular without
boxing) - but if you clarify the question I might be able to give a
better answer.

I was referring to this line in the page you linked me to:

<quote>
the JIT-compiler is remarkably good as inlining the simple code that invokes
the methods; enough that runtime performance is within a few percent of
regular compiled code.
</quote>

AFAIK, this sort of inlining doesn't take place at all when the generic
argument is a reference type. I was wondering if maybe using the Expression
class is causing the JIT to specialize for each different reference type,
overcoming the problem.

Ben Voigt [C++ MVP] · Mar 25, 2008

What part of "both double and decimal" didn't you understand?

"decimal" because it is meaningless.

We are posting in a C# group about .NET.

decimal is a C# keyword just like double.
System.Decimal is a standard .NET type.

Before you "pull rank" you should make sure you are familar with the subject
matter.
http://msdn2.microsoft.com/en-us/library/x53a06bb.aspx
http://msdn2.microsoft.com/en-us/library/364x0z75.aspx

Jon Harrop · Mar 25, 2008

Ben said:
We are posting in a C# group about .NET.

decimal is a C# keyword just like double.
System.Decimal is a standard .NET type.

My explanation of why your algorithm is fundamentally broken still stands.
If you implement the floating-point and integer algorithms properly you
will see that there is nothing worth factoring out between them.

Before you "pull rank"...

Out of context.

Jon Harrop · Mar 25, 2008

Ben said:
Ok, now I understand.

The ironic thing is that you've proved my point, by showing that F# needed
to implement a feature similar to C++ templates in order to overcome the
deficiencies in .NET generics.

Exactly. Templates were obsoleted by a variety of alternatives (for
parametric polymorphism, inlining and metaprogramming). Pretending that C++
is better because templates provide an unusably poor solution to problems
that generics did not even attempt to address is just deceitful propaganda.

As I said, templates can do more metaprogramming than a ripe banana. That
information is also of no use but at least it isn't deliberately
misleading.

Marc Gravell · Mar 25, 2008

I'm not sure that is a meaningful comparison to standard generics,
since operators are static and thus outside of what generics offers by
default. But yes; I think (from your and Jon's comments) that this
relates back to the "sealed" comment in my post above - it may well
miss some optimisations due to not being optimised by type. The
generic operator code *has* to be per-type (even for reference-types)
simply to exist, since it isn't driven by generics. Basically I
compile a delegate on the fly and cache it against the type.

In reality, the cost of invoking a delegate is probably (untested)
higher than the cost of a virtual call, so I don't think that the
delegate approach is going to be a blanket fix - but it would /
perhaps/ merrit testing *if* there is a demonstrable case where the
virtual call is actually being a bottleneck.

Marc

Ben Voigt [C++ MVP] · Mar 25, 2008

Jon said:
My explanation of why your algorithm is fundamentally broken still
stands. If you implement the floating-point and integer algorithms
properly you will see that there is nothing worth factoring out
between them.

The implementation would be:

Maintain the last N items in a ring buffer. (This consists of an array of
size N and the index of the oldest element).
Maintain the sum of all the items in an accumulator.
On arrival of a new data point, deduct the oldest datum from the sum,
overwrite it with the new one (updating the index), add the new one to the
sum, return sum / N.

This works very well for any numeric data type as long as the accumulator
doesn't overflow, but the accumulator doesn't need to be the same type as
each buffer element.

Jon Harrop · Mar 25, 2008

Ben said:
The implementation would be:

Maintain the last N items in a ring buffer. (This consists of an array of
size N and the index of the oldest element).
Maintain the sum of all the items in an accumulator.
On arrival of a new data point, deduct the oldest datum from the sum,
overwrite it with the new one (updating the index), add the new one to the
sum, return sum / N.

That is the fundamentally-broken algorithm that I was expecting you to
describe: you have implicitly assumed that addition and subtraction
commute, which is true for integers and not floats.

Specifically, your algorithm will suffer from cumulative round-off errors
and will not handle special values (e.g. infinities) at all.

This works very well for any numeric data type as long as the accumulator
doesn't overflow,

That is not true. You only need one element 10^17x larger than any other
currently in the buffer and you lose all precision in your accumulator from
catastrophic round-off error, hundreds of orders of magnitude before any
overflow.

but the accumulator doesn't need to be the same type as each buffer
element.

I see. And what type would you recommend for the accumulator? An
arbitrary-precision rational, perhaps?

Ben Voigt [C++ MVP] · Mar 26, 2008

Jon said:
That is the fundamentally-broken algorithm that I was expecting you to
describe: you have implicitly assumed that addition and subtraction
commute, which is true for integers and not floats.

Specifically, your algorithm will suffer from cumulative round-off
errors and will not handle special values (e.g. infinities) at all.

That is not true. You only need one element 10^17x larger than any
other currently in the buffer and you lose all precision in your
accumulator from catastrophic round-off error, hundreds of orders of
magnitude before any overflow.

I see. And what type would you recommend for the accumulator? An
arbitrary-precision rational, perhaps?

The issue you describe is theoretically valid, but not applicable to
real-world data created via analog-to-digital sampling and linear functions
thereof, which is probably somewhere upwards of 99% of all data sets. If it
is a problem, you can always recalculate the sum directly from the stored
elements.

Jon Harrop · Mar 28, 2008

Ben said:
The issue you describe is theoretically valid, but not applicable to
real-world data created via analog-to-digital sampling and linear
functions thereof, which is probably somewhere upwards of 99% of all data
sets.

I guess my experience has been tainted by my extensive use of really
esoteric mathematical functions like sqrt and even quadratics. Nobody would
want to use those so-called non-linear functions in the "real world". Or
big numbers. Nobody uses big numbers in the real world either. Or small
numbers.

Would you also say that positive numbers account for around 99% of all "real
world" numbers too?

Ben Voigt [C++ MVP] · Mar 28, 2008

Jon Harrop said:
I guess my experience has been tainted by my extensive use of really
esoteric mathematical functions like sqrt and even quadratics. Nobody
would
want to use those so-called non-linear functions in the "real world". Or
big numbers. Nobody uses big numbers in the real world either. Or small
numbers.

The question isn't whether people use big numbers, or people use small
numbers. The question is whether people have both really big and really
small numbers in the same dataset. Not often. In datasets where you want
to talk about the moving average (via arithmetic mean)... almost never.

Would you also say that positive numbers account for around 99% of all
"real
world" numbers too?

Probably not 99%. If I had to guess I'd say non-negative (as opposed to
strictly positive) are around 92%. Still a large enough fraction that an
algorithm that doesn't handle any negative numbers is still pretty doggone
useful. Which might be why most programming languages have "unsigned"
types.

Just because some mathematical number exists that an algorithm can't handle
doesn't mean the algorithm is useless.

Object pointer management in C# (from a C++ perspective)	5	Nov 23, 2007
ShortWay in C# conversion from C++ to C#	5	Jun 10, 2009
Passing containers between C# and C++	5	May 24, 2008
Generic C# List<> resize vs. Template C++ vector resize	6	Sep 27, 2006
problem in defining a generics	3	Apr 1, 2008
Passing a Linked List from C++ DLL to C# PInvoke	3	Jul 9, 2007
How to access std::vector in C Sharp	2	Oct 2, 2007
How to fill a generics List with reflection?	6	May 13, 2008

C# generic containers from a "C++ perspective"

Jon Harrop

Marc Gravell

Jon Skeet [C# MVP]

Ben Voigt [C++ MVP]

Ben Voigt [C++ MVP]

Ben Voigt [C++ MVP]

Jon Harrop

Jon Harrop

Marc Gravell

Ben Voigt [C++ MVP]

Jon Harrop

Ben Voigt [C++ MVP]

Jon Harrop

Ben Voigt [C++ MVP]

Ask a Question

Similar Threads