Boxing and Unboxing ??

J

Jon Skeet [C# MVP]

Peter Olcott said:
It is not taking very much effort to completely eliminate all factors
significant and otherwise. Certainly passing large aggregate data types by value
is significant. I want to make sure that I have a 100% understanding on how to
always avoid passing large aggregate data types by value. Boxing and Unboxing
can be comparable to passing a large aggregate data type by value.

Well, they're not really similar and boxing/unboxing is relatively rare
when generics are available. Even when boxing/unboxing *is* involved,
as shown in the benchmark you were worried about, the cost of the
boxing was negligible compared with the copying involved in the rest of
the benchmark (so neatly sidestepped by the "updated" C++ version which
made the comparison completely irrelevant).
Imagine passing an array with millions of elements by value, instead of by
reference. Imagine further still that the only need of this array was to do a
binary search. Now we have a function that is 100,000-fold slower than necessary
simply because the difference between boxing and unboxing was not completely
understood.

If I do:
int[] x = new int[100000];
DoSomething (x);

how many bytes do you think are copied?

You seem to imagine that arrays are value types. Arrays are reference
types, so you could never pass the contents by value. That's why I've
said repeatedly that you *really* need to know more about value types
and reference types (and the fact that you very, very rarely *get* big
value types) before going much further. You have latched on to one
particular aspect of .NET and want to go very deeply into it without
getting a reasonable understanding of the rest of it. Getting the
basics right across the board will help you work out where it *is*
worth getting deeper understanding, and help you in achieving that
understanding too.
 
J

Jon Skeet [C# MVP]

It also increases language complexity, which can increase programmer
effort.

It reduces complexity. The programmer would not even need to know what the terms
reference type and value type means much less explicitly distinguish when one is
more appropriate than the other. There would only be two types of parameters;
[out] I want to be able to change it in the function, and [in], I want to make
sure that it won't be changed in the function, and neither one of these ever has
to be passed by value. The underlying CLR can pass by value if it is quicker
than by reference for items of the size of [int] and smaller. The [ref]
parameter qualifier could be discarded.

There's a difference between passing a value type by reference and
passing a reference type value (i.e. a reference) by value.

Please read http://www.pobox.com/~skeet/csharp/parameters.html
 
B

Barry Kelly

Peter said:
It reduces complexity.

I disagree. Unless we're in fantasy land and we're talking about a
completely different language here, adding *anything* to C# is going to
increase its complexity, by definition. Any new feature needs to add
enough value to justify itself.
The programmer would not even need to know what the terms
reference type and value type means much less explicitly distinguish when one is
more appropriate than the other.

Which programmer are you talking about:

1) The guy instantiating types and calling methods? If the types are
well-designed, this guy typically doesn't need to know already.

1) The guy writing types and methods? This is the guy who needs to make
the choice, so unless the two become semantically identical, he needs to
know the difference. And if you're suggesting some kind of semantic
fusion, then you'll need to be a whole lot more specific about what
you're talking about.
There would only be two types of parameters;
[out] I want to be able to change it in the function
and [in], I want to make
sure that it won't be changed in the function,
and neither one of these ever has
to be passed by value.
The underlying CLR can pass by value if it is quicker
than by reference for items of the size of [int] and smaller.

I had a long reply composed, but I discarded it, because I realised that
your statements don't cohere into a fully-formed whole.

You need to expand much more on what you're talking about, with
precision and detail, and give example code.

And don't forget, you can't break any existing C# code or semantics.
The [ref]
parameter qualifier could be discarded.

No it can't! You've just renamed it to 'out' above.

-- Barry
 
J

Jon Skeet [C# MVP]

Peter Olcott said:
When I say passed by reference, I mean this term literally, in other words by
memory address. The term in .NET has acquired somewhat of a figure-of-speech
meaning.

No, it hasn't. It's used in a somewhat woolly manner by some people,
particularly with respect to "pass by reference", which is a different
kettle of fish entirely, but the specs are pretty specific.
Literally passing by reference means passing by machine address and
nothing more.

No, "pass by reference" has a deeper semantic meaning.

See http://www.pobox.com/~skeet/csharp/parameters.html
and
http://www.pobox.com/~skeet/java/parameters.html

(The latter has pretty specific definitions in it.)
 
B

Barry Kelly

Peter said:
Imagine passing an array with millions of elements by value, instead of by
reference.

Arrays in C# are reference types. You can't pass an array by value (or
rather, to keep Jon happy, you can only pass a reference to the array by
value, you can't pass the array value itself by value).

-- Barry
 
J

Jesse McGrew

Peter said:
Barry Kelly said:
Peter Olcott wrote: [...]
Since my suggestion reduces programmer effort, AND increases program
performance, it is therefore an optimal improvement to the current design.

It also increases language complexity, which can increase programmer
effort.

It reduces complexity. The programmer would not even need to know what the terms
reference type and value type means much less explicitly distinguish when one is
more appropriate than the other. There would only be two types of parameters;
[out] I want to be able to change it in the function, and [in], I want to make
sure that it won't be changed in the function, and neither one of these ever has
to be passed by value. The underlying CLR can pass by value if it is quicker
than by reference for items of the size of [int] and smaller. The [ref]
parameter qualifier could be discarded.

The difference between out and ref is important, though. Out parameters
are basically extra return values - they don't have to be initialized
on the way in, but they do have to be initialized on the way out:

void DivMod(int a, int b, out int quotient, out int remainder) { ...
}

If out were the same as ref, the locations passed for quotient and
remainder would have to be initialized before calling DivMod, which is
needless work for the programmer.

Also, out and ref perform differently when you're calling methods
remotely over a network or process boundary. Ref parameter values have
to be passed both ways; out parameter values only have to be passed
back to the caller.

Jesse
 
P

Peter Olcott

Jon Skeet said:
Peter Olcott said:
It is not taking very much effort to completely eliminate all factors
significant and otherwise. Certainly passing large aggregate data types by
value
is significant. I want to make sure that I have a 100% understanding on how
to
always avoid passing large aggregate data types by value. Boxing and Unboxing
can be comparable to passing a large aggregate data type by value.

Well, they're not really similar and boxing/unboxing is relatively rare
when generics are available. Even when boxing/unboxing *is* involved,
as shown in the benchmark you were worried about, the cost of the
boxing was negligible compared with the copying involved in the rest of
the benchmark (so neatly sidestepped by the "updated" C++ version which
made the comparison completely irrelevant).
Imagine passing an array with millions of elements by value, instead of by
reference. Imagine further still that the only need of this array was to do a
binary search. Now we have a function that is 100,000-fold slower than
necessary
simply because the difference between boxing and unboxing was not completely
understood.

If I do:
int[] x = new int[100000];
DoSomething (x);

how many bytes do you think are copied?

You seem to imagine that arrays are value types. Arrays are reference
types, so you could never pass the contents by value. That's why I've
said repeatedly that you *really* need to know more about value types
and reference types (and the fact that you very, very rarely *get* big
value types) before going much further. You have latched on to one
particular aspect of .NET and want to go very deeply into it without
getting a reasonable understanding of the rest of it. Getting the
basics right across the board will help you work out where it *is*
worth getting deeper understanding, and help you in achieving that
understanding too.
Maybe it is getting to that point now. It was not to that point before I began
this thread.
 
P

Peter Olcott

Barry Kelly said:
I disagree.

Right glance at a couple of words before forming the preconceived refutation.
Unless we're in fantasy land and we're talking about a
completely different language here, adding *anything* to C# is going to
increase its complexity, by definition. Any new feature needs to add
enough value to justify itself.

Add [in] remove [ref] the net difference is no more elements. However we are
adding one simple parameter qualifier and removing a complex qualifier.
Which programmer are you talking about:

The one writing the programs in the C# language.
1) The guy instantiating types and calling methods? If the types are
well-designed, this guy typically doesn't need to know already.

1) The guy writing types and methods? This is the guy who needs to make
the choice, so unless the two become semantically identical, he needs to
know the difference. And if you're suggesting some kind of semantic
fusion, then you'll need to be a whole lot more specific about what
you're talking about.

There are programmers that only call methods and never write methods? That seems
like quite a stretch. Where do they put the code that calls the methods, if not
in another method?
There would only be two types of parameters;
[out] I want to be able to change it in the function
and [in], I want to make
sure that it won't be changed in the function,
and neither one of these ever has
to be passed by value.
The underlying CLR can pass by value if it is quicker
than by reference for items of the size of [int] and smaller.

I had a long reply composed, but I discarded it, because I realised that
your statements don't cohere into a fully-formed whole.

You need to expand much more on what you're talking about, with
precision and detail, and give example code.

int SomeMethod(in SomeType SomeName) // C#
Exactly Equals
int SomeMethod(const SomeType& SomeName) // C++
And don't forget, you can't break any existing C# code or semantics.
The [ref]
parameter qualifier could be discarded.

No it can't! You've just renamed it to 'out' above.

I took two different existing parameter qualifiers and combined them into a
single parameter qualifier that accomplished the purpose of both. Like I said
[ref] can be discarded. Is there really a need to make sure that a parameter
that will be written to was initialized?

There is no useful distinction between [ref] and [out]. Unify [ref] and [out]
into [out], and add [in] as a read-only pass by address parameter qualifier. The
CLR can be free to pass by value if it would be faster for very small items,
because on a read-only parameter there is no semantic difference.
 
P

Peter Olcott

Jesse McGrew said:
Peter said:
Barry Kelly said:
Peter Olcott wrote: [...]
Since my suggestion reduces programmer effort, AND increases program
performance, it is therefore an optimal improvement to the current design.

It also increases language complexity, which can increase programmer
effort.

It reduces complexity. The programmer would not even need to know what the
terms
reference type and value type means much less explicitly distinguish when one
is
more appropriate than the other. There would only be two types of parameters;
[out] I want to be able to change it in the function, and [in], I want to
make
sure that it won't be changed in the function, and neither one of these ever
has
to be passed by value. The underlying CLR can pass by value if it is quicker
than by reference for items of the size of [int] and smaller. The [ref]
parameter qualifier could be discarded.

The difference between out and ref is important, though. Out parameters
are basically extra return values - they don't have to be initialized
on the way in, but they do have to be initialized on the way out:

void DivMod(int a, int b, out int quotient, out int remainder) { ...
}

If out were the same as ref, the locations passed for quotient and
remainder would have to be initialized before calling DivMod, which is
needless work for the programmer.

Also, out and ref perform differently when you're calling methods
remotely over a network or process boundary. Ref parameter values have
to be passed both ways; out parameter values only have to be passed
back to the caller.

Jesse

That last subtle distinction is why discussing these things in a newsgroup is
much more effective than merely reading books. I have a very good 1,000 page
book that never bothers to mention this distinction. In any case it would still
seem that adding an [in] parameter qualifier might be an improvement.
 
B

Barry Kelly

Peter said:
There are programmers that only call methods and never write methods? That seems
like quite a stretch. Where do they put the code that calls the methods, if not
in another method?

There are two 'hats' for a programmer: programmer as User of types, and
programmer as Designer of types. It's useful to distinguish between
them, because they require two different ways of thinking. The User of
types solves algorithmic problems based on the spec / purpose of the
method whose body they're writing. The Designer of types creates
abstractions to model a problem or domain, but the primary purpose is to
simplify the job of the guy who Uses the types that he / she is
creating.
int SomeMethod(in SomeType SomeName) // C#
Exactly Equals
int SomeMethod(const SomeType& SomeName) // C++

What if SomeType is a reference type? If you're suggesting full C++
semantics for a deeper notion of const than simply 'pass value types by
reference for performance reasons', then I'll direct you to Anders
Hejlsberg's opinions on the matter. For example read this:

http://www.artima.com/intv/choicesP.html
There is no useful distinction between [ref] and [out].

I disagree. 'ref' means 'modify this location', while 'out' means
'return into this location'. 'out' is a mechanism to get around the fact
that C# can't return tuples. 'ref' is a means of passing by reference.
Unify [ref] and [out]
into [out], and add [in] as a read-only pass by address parameter qualifier. The
CLR can be free to pass by value if it would be faster for very small items,
because on a read-only parameter there is no semantic difference.

Effectively (ISTM) the upshot of what you're asking for is a C++-style
'const &' for value types, to avoid boxing overhead when passing large
value types.

What many other people have been trying to tell you on this thread is:

1) Large value types aren't a good idea
2) Large value types don't even occur very often (e.g. arrays are
reference types, as you've found out)
3) Reference types are good, you should try them!

In fact, the primary advantage of value types is that they're usually
copied wherever they go, and thus reduce GC overhead.

Basically, you're suggesting a feature that wouldn't be used a lot.
Again, I think it wouldn't be harmful or anything, just not very useful.

-- Barry
 
J

Jesse McGrew

Peter said:
Also, out and ref perform differently when you're calling methods
remotely over a network or process boundary. Ref parameter values have
to be passed both ways; out parameter values only have to be passed
back to the caller.

That last subtle distinction is why discussing these things in a newsgroup is
much more effective than merely reading books. I have a very good 1,000 page
book that never bothers to mention this distinction. In any case it would still
seem that adding an [in] parameter qualifier might be an improvement.

I don't see the advantage. If you leave the qualifiers off, the
parameter is already "in" by default. Optimizations like the one you're
proposing can just as well be handled by the CLR detecting that a large
value-type parameter is never modified, and deciding internally to pass
it by reference instead of copying. Adding an extra language keyword to
suggest that behavior is the kind of hint that might be common in C
(e.g. the "register" keyword) but doesn't have much of a place in C#.

Jesse
 
P

Peter Olcott

Larry Lard said:
You realise this means you are doomed to failure?

It means that I can't afford to waste any time, and must find shortcuts to
achieve the required end-results.
 
P

Peter Olcott

Jesse McGrew said:
Peter said:
Also, out and ref perform differently when you're calling methods
remotely over a network or process boundary. Ref parameter values have
to be passed both ways; out parameter values only have to be passed
back to the caller.

That last subtle distinction is why discussing these things in a newsgroup is
much more effective than merely reading books. I have a very good 1,000 page
book that never bothers to mention this distinction. In any case it would
still
seem that adding an [in] parameter qualifier might be an improvement.

I don't see the advantage. If you leave the qualifiers off, the
parameter is already "in" by default. Optimizations like the one you're

Its pass by value, which is not the same thing as "in" by default. My suggestion
is to make [in] pass by address, and read-only exactly equivalent to:
ReturnType (const ValueType&, VariableName) // C++
proposing can just as well be handled by the CLR detecting that a large
value-type parameter is never modified, and deciding internally to pass
it by reference instead of copying. Adding an extra language keyword to

Is it read-only? If its not read-only, then its not the same.
 
P

Peter Olcott

Barry Kelly said:
Peter said:
There are programmers that only call methods and never write methods? That
seems
like quite a stretch. Where do they put the code that calls the methods, if
not
in another method?

There are two 'hats' for a programmer: programmer as User of types, and
programmer as Designer of types. It's useful to distinguish between
them, because they require two different ways of thinking. The User of
types solves algorithmic problems based on the spec / purpose of the
method whose body they're writing. The Designer of types creates
abstractions to model a problem or domain, but the primary purpose is to
simplify the job of the guy who Uses the types that he / she is
creating.
int SomeMethod(in SomeType SomeName) // C#
Exactly Equals
int SomeMethod(const SomeType& SomeName) // C++

What if SomeType is a reference type? If you're suggesting full C++
semantics for a deeper notion of const than simply 'pass value types by
reference for performance reasons', then I'll direct you to Anders
Hejlsberg's opinions on the matter. For example read this:

http://www.artima.com/intv/choicesP.html
There is no useful distinction between [ref] and [out].

I disagree. 'ref' means 'modify this location', while 'out' means
'return into this location'. 'out' is a mechanism to get around the fact
that C# can't return tuples. 'ref' is a means of passing by reference.

So then [ref] could be name [io] for input and output.
Unify [ref] and [out]
into [out], and add [in] as a read-only pass by address parameter qualifier.
The
CLR can be free to pass by value if it would be faster for very small items,
because on a read-only parameter there is no semantic difference.

Effectively (ISTM) the upshot of what you're asking for is a C++-style
'const &' for value types, to avoid boxing overhead when passing large
value types.

What many other people have been trying to tell you on this thread is:

1) Large value types aren't a good idea
2) Large value types don't even occur very often (e.g. arrays are
reference types, as you've found out)
3) Reference types are good, you should try them!

In fact, the primary advantage of value types is that they're usually
copied wherever they go, and thus reduce GC overhead.

Yet the language lacks the alternative capability even when it is needed.
Basically, you're suggesting a feature that wouldn't be used a lot.

It would only be used when one needs to pass aggregate data without wasting the
machine time of boxing and unboxing, or the programmer time of a user writing to
a parameter intended to be read-only. The case that I envision is something like
the C++ friend function, can't be a class member, yet requires direct access to
internal data. Good design minimizes these cases, yet can not eliminate these
cases.
 
B

Bill Butler

Peter Olcott said:
It means that I can't afford to waste any time, and must find
shortcuts to achieve the required end-results.



You wish to switch to a dotnet based solution to reduce the development
time, but you don't want to put the time in to learning the
language(C#). Sounds to me like you are doomed to failure anyway.

I suggest sticking with C++, since that seems to be where your comfort
zone lies. Without putting in the time to learn C# and the dotnet
framework, there is no way that you will save any time on your
development. Or, if you did save time, it would perform like a pig,
because you have no desire to work with the framework, and you would
fight it instead.

For months now, you have been posting the same old recycled arguments.
Had you listened to us then, you might actually be showing progress now.

I am amazed at how much time you have to argue, when you tell us you
need to work 90 hours a week just to tread water.

Bill
 
B

Bruce Wood

Peter said:
Ah so we could create a new parameter qualifier that works like [out] and [ref]
yet in the opposite direction. We could have an [in] parameter qualifier that
allows all large objects (larger than int) to be passed by reference, yet these
are all read-only objects. The compiler does not allow writing to them. This way
we avoid the unnecessary overhead of making copies of large objects just to
avoid accidentally making changes to these large objects.

Funny you should mention that. So far as I know, the C# team has been
mulling over how to include the concept of "const" arguments (which, I
think, is really what you're proposing with "in"), whereby an object
can be passed-by-ref but be unchangeable. I'm not up on the technical
challenges facing them, but I know that they're thinking about it.

One thing I believe they do want to avoid is the C++ cascading-const
nightmare, whereby including one const declaration in a method forces
some other method's parameter to be const, which forces another
method's parameter to be const.... C++ solves this (as I recall) with a
special kind of cast to remove the "const" nature of an object, but
they're hoping to avoid that in C# should they ever introduce the
feature.
It might be possible to design a language that has essentially all of the
functionally capabilities of the lower level languages, without the requirement
of ever directly dealing with pointers.

True, but one question I know the C# team constantly asks is whether a
feature is worth the additional complexity it adds to the language. (I
know this because they frequently cite that as a reason for not
including certain features.) What does it really buy you being able to
take the address of an arbitrary variable (in safe code... I know that
you can do it in unsafe code)? As I said, I think that Java (and now
C#) have demonstrated that it doesn't buy you much. You mentioned
boxing overhead, but in .NET 2.0 you can pretty-much avoid boxing...
all you have to do is learn a new idiom: a new way to do what you've
always done, but now in a new language.

That, in the end, is what it comes down to: C# works very well. It's
just that it does things differently than does C++, and you can't take
C++ idioms and concepts and start writing C# as though it were C++. In
a few domains, C++ is much better suited to the problems than is C#,
but in most domains C# gives you all the functionality you need while
helping keep you out of trouble.

If you really, really need to use pointers and arbitrary addressing, C#
has unsafe mode... but that's why it's called "unsafe". Like any other
"escape hatch" in a language, if you find yourself using unsafe code
you should be thinking long and hard about why you need it, and whether
you have a real need or just a mental disconnect with how you're
intended to use the language.
 
P

Peter Olcott

Bruce Wood said:
Peter said:
Ah so we could create a new parameter qualifier that works like [out] and
[ref]
yet in the opposite direction. We could have an [in] parameter qualifier that
allows all large objects (larger than int) to be passed by reference, yet
these
are all read-only objects. The compiler does not allow writing to them. This
way
we avoid the unnecessary overhead of making copies of large objects just to
avoid accidentally making changes to these large objects.

Funny you should mention that. So far as I know, the C# team has been
mulling over how to include the concept of "const" arguments (which, I
think, is really what you're proposing with "in"), whereby an object
can be passed-by-ref but be unchangeable. I'm not up on the technical
challenges facing them, but I know that they're thinking about it.

One thing I believe they do want to avoid is the C++ cascading-const
nightmare, whereby including one const declaration in a method forces
some other method's parameter to be const, which forces another
method's parameter to be const.... C++ solves this (as I recall) with a
special kind of cast to remove the "const" nature of an object, but
they're hoping to avoid that in C# should they ever introduce the
feature.
It might be possible to design a language that has essentially all of the
functionally capabilities of the lower level languages, without the
requirement
of ever directly dealing with pointers.

True, but one question I know the C# team constantly asks is whether a
feature is worth the additional complexity it adds to the language. (I
know this because they frequently cite that as a reason for not
including certain features.) What does it really buy you being able to
take the address of an arbitrary variable (in safe code... I know that
you can do it in unsafe code)? As I said, I think that Java (and now
C#) have demonstrated that it doesn't buy you much. You mentioned
boxing overhead, but in .NET 2.0 you can pretty-much avoid boxing...
all you have to do is learn a new idiom: a new way to do what you've
always done, but now in a new language.

Are you referring to Generics? Does this address this issue of passing a struct
by (address) reference?
That, in the end, is what it comes down to: C# works very well. It's
just that it does things differently than does C++, and you can't take
C++ idioms and concepts and start writing C# as though it were C++. In
a few domains, C++ is much better suited to the problems than is C#,
but in most domains C# gives you all the functionality you need while
helping keep you out of trouble.

I think that it is possible to take the concept of C# further along. To be able
to provide every required feature of a language such as C++, yet to do this in
an entirely type safe way, with essentially no additional execution time
overhead, and drastically reduce the number of details that must be handled by
the programmer. I think that C# has done an excellent job of achieving these
goals up to this point. I think that there are two opportunities for
improvement:

(1) Little tweaks here and there to eliminate more of the additional execution
time overhead.

(2) Abstract out the distinction between reference types and value types so that
the programmer will not even need to know the difference. The underlying
architecture can still have separate reference types and value types, yet, this
can be handled entirely transparently by the compiler and CLR.

This last objective may be very difficult to achieve, yet will reap great
rewards in increased programmer productivity.
 
J

Jon Skeet [C# MVP]

Peter Olcott said:
It means that I can't afford to waste any time, and must find shortcuts to
achieve the required end-results.

No, if you need to achieve the impossible in order to succeed, then you
*are* doomed to failure. Whatever you do, you're not going to be able
to get 170 hours of work done in a week. You can reduce the amount of
work you do by working smarter, but that's not the same thing.

By the way, I'd consider trying to redesign .NET instead of going with
what has already been designed and trying to work with it as wasting
your time...
 
P

Peter Olcott

Bill Butler said:
You wish to switch to a dotnet based solution to reduce the development time,
but you don't want to put the time in to learning the language(C#). Sounds to
me like you are doomed to failure anyway.

I can only put in a little time to learn the language now. I am focusing this
time on assessing the feasibility of using C# and .NET for my future needs. It
looks like C# does have the performance that I need. In order to get this
performance, I must use relatively obscure nuances of C# that are barely
mentioned in even the most comprehensive textbooks. Specifically because of this
newsgroup thread, I was able to learn these relatively obscure nuances. It
definitely looks like C# and .NET will be the future of essentially ALL MS
Windows development.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top