Boxing and Unboxing ??

Jesse McGrew · Jan 17, 2007

Peter Olcott wrote:
[...]

I am not saying that this would be easy, but, then the advances from machine
language to object oriented programming weren't easy either. The end result that
I am proposing is far simpler than the prior example.

I was thinking along the lines of treating everything as a reference type and
mostly doing away with value types under the covers. There would be three
parameter qualifiers: {in, out, io}, [out] would be the same as it already is,
and [io] would take the place of the deprecated [ref], [in] would be for
read-only input parameters.

Treating everything as a reference type is, as I understand it, the
Smalltalk approach. And it carries a big performance penalty, because
every integer, boolean, and character has to be a full-fledged garbage
collected heap object.

Furthermore, "out" and "ref", when used with reference types, mean that
the *reference itself* may be changed by the method, not the object
that the reference points to. By making "in" parameters read-only, you
create a situation where if you want to call a method and let it change
the contents of an object, you also have to let it change your
reference to point to a whole new object.

Jesse

Bruce Wood · Jan 17, 2007

Peter said:
All progress is from abstraction does not entail that all abstraction results in
progress, simple non sequitur error.

True, but that was the only substance I could find in your post: the
implication that unifying the type model would introduce a new
abstraction, abstraction is progress, and therefore unifying the type
model is progressive. If there was something I missed, please tell me
so.

I then went on to explain why I find C#'s type model easier to work
with than the unified model of C++. Perhaps if you could explain why
I'm wrong, we could get the discussion back on track....

Bruce Wood · Jan 17, 2007

OK.. this discussion is descending into silly territory. Perhaps this
will help. I make the following claims.

1. The price for the unified type model and additional expressive power
of C++ (where everything is a value unless you "manually" take a
reference to it) is that you are forced to take careful note of lots of
picky details. In particular, you have to litter your code with & and
*, and _know when to do so_ and when not to.

2. Far from liberating the programmer from "having to worry about value
versus reference types", C++ throws it right in your face and forces
you to deal with it in almost every line of code. By contrast, C# does
something conceptually ugly but practically beautiful. By dividing
types into value types and reference types, C# forces you to make the
choice once, up front. From then on the language normally does what you
would expect with the type.

3. C# is simpler to code in for what appears to be a trivial reason,
but turns out to be pivotal: in order to get the behaviour you want,
you don't have to say anything. You don't have to say &, or *, or ref,
or out. You just declare a variable, work with it, pass it to methods,
and everything works the way you would like it to. As with all
heuristic rules, this one is not absolute. Sometimes the language
_doesn't_ do what you want by default, and you have to say "ref" or
"out". However, look around the Framework classes and see how often you
see those two keywords. They're very, very rarely needed.

4. Point #3 means that C# is easier to learn for newbies. One just
writes code, and by and large it does "the right thing" without any
extra tweaking. The same cannot be said for C++. I was amazed in a
previous job by how many people were guessing at when to use & and when
to pass a variable straight into a method, and when they had to
dereference with * and when they didn't. These were experienced
programmers. The only time this happens in C# is in that
one-in-a-hundred case in which you need ref, out, or a clone.

5. Far from "improving programmer productivity," unifying the type
model in C# in the style of C++ would reduce programmer productivity
and increase programmer confusion. I saw it happen in C and C++. I see
no reason why the same change in C# would not produce the same results.

6. There is a price for the greater ease of use of C#: there are some
occasionally useful C++ idioms that simply can't be done in C#. One
that comes to mind is deciding to pass an object instance on the stack.
Another is storing a reference to an arbitrary variable for later
update. C# is certainly less powerful than C++. However, I think that
the C# team has done an excellent job of eliminating power where it's
the kind of power that usually gets you into big trouble and is only
occasionally legitimately useful. Others may disagree.

Ignacio Machin \( .NET/ C# MVP \) · Jan 17, 2007

| >
| Ah so the boxing and unboxing overhead that does not cost very much

It can cost a lot, if you use it in a loop it can count for a big chunk of
the process.

| adds a lot of versatility to the underlying functionality while
maintaining type safety.

Maintaining type safety how?
In reality you lose the type safety, when you box an int for example you
will get an object reference, you could try to cast it to any other type and
get an error, unfortunately this error will be detected at runtime and not
at compile time. so it does not help but quite the opposite.

|It
| is comparable to the slight extra overhead that polymorphism requires yet
| providing much more versatile code.

Not at all, they are two very different things.

Ignacio Machin \( .NET/ C# MVP \) · Jan 17, 2007

Hi,

| >
| > But boxing/unboxing will still be there, forever and ever

| >
| I don't think so.

Why?, can you explain why you think the un/boxing will go away?

Peter Olcott · Jan 17, 2007

Jesse McGrew said:
Peter Olcott wrote:
[...]

I am not saying that this would be easy, but, then the advances from machine
language to object oriented programming weren't easy either. The end result
that
I am proposing is far simpler than the prior example.

I was thinking along the lines of treating everything as a reference type and
mostly doing away with value types under the covers. There would be three
parameter qualifiers: {in, out, io}, [out] would be the same as it already
is,
and [io] would take the place of the deprecated [ref], [in] would be for
read-only input parameters.

Click to expand...

Treating everything as a reference type is, as I understand it, the
Smalltalk approach. And it carries a big performance penalty, because
every integer, boolean, and character has to be a full-fledged garbage
collected heap object.

I said mostly doing away with value types, not completely doing away with them.
In other words almost never pass any parameters (unless integer or smaller) by
value.

Furthermore, "out" and "ref", when used with reference types, mean that
the *reference itself* may be changed by the method, not the object
that the reference points to. By making "in" parameters read-only, you
create a situation where if you want to call a method and let it change
the contents of an object, you also have to let it change your
reference to point to a whole new object.

No this is not true there are many ways around this. For example you could have
[local-input] meaning that this function can not change the variable, but,
subsequent functions that have this same parameter qualified differently (such
as [ref] or [io]) could change the variable.

This does not have to be enforced at run-time it could be enforced at compile
time. If it must be enforced at run-time, and must remain immutable, then there
is no possible way to avoid its cascading effect. However if it must be enforced
at run-time, but is not immutable, then the cascading effect can be eliminated.
Each function could change the parameter qualifier attribute of the data as the
data enters the function.

Peter Olcott · Jan 17, 2007

Bruce Wood said:
True, but that was the only substance I could find in your post: the
implication that unifying the type model would introduce a new
abstraction, abstraction is progress, and therefore unifying the type
model is progressive. If there was something I missed, please tell me
so.

I then went on to explain why I find C#'s type model easier to work
with than the unified model of C++. Perhaps if you could explain why
I'm wrong, we could get the discussion back on track....

It might very well be that the C# model is superior to the C++ model in that it
can accomplish the same end-result with less programming effort. It is not true
that C# is the epitome of the best possible computer language that can ever be
created.

In order to improve computer language design one must look for ways to reduce
the number of details that the programmer must keep track of. The distinction
between value types and reference types as separate types is one detail that
might be able to be eliminated. It might be able to be eliminated with no
degradation of performance in terms of increases in either time or space. (CPU
cycles or RAM).

Peter Olcott · Jan 17, 2007

Bruce Wood said:
OK.. this discussion is descending into silly territory. Perhaps this
will help. I make the following claims.

1. The price for the unified type model and additional expressive power
of C++ (where everything is a value unless you "manually" take a
reference to it) is that you are forced to take careful note of lots of
picky details. In particular, you have to litter your code with & and
*, and _know when to do so_ and when not to.

I am not recommending making C# more like C++, just the opposite. I am
recommending that C# reduce its complexity even more, and do this in a way that
neither reduces speed nor increases space very much. Polymorphism both reduces
speed and increases space, yet the benefits far outweigh the cost, because the
benefits are large and the cost is small.

Peter Olcott · Jan 17, 2007

Ignacio Machin ( .NET/ C# MVP ) said:
Hi,

| >
| > But boxing/unboxing will still be there, forever and ever
| >
| I don't think so.

Why?, can you explain why you think the un/boxing will go away?

If you always keep everything in a box that needs to be in a box, and permit
access to internal data without taking it out of the box, then there is no need
to put it in a box, and take it out of a box, back and forth, just initialize it
in the box. For function parameters, the function can keep track of the type,
data does not need to keep track of itself. This could be done entirely at
compile-time, or run-time support could be added.

Bruce Wood · Jan 17, 2007

Peter said:
It might very well be that the C# model is superior to the C++ model in that it
can accomplish the same end-result with less programming effort.

But if that is true, doesn't that hint that making the C# model more
like the C++ model would result in more programming effort?

It is not true that C# is the epitome of the best possible computer language that can ever be
created.

Oh, I would never claim that it is. I just happen to think that they
got this detail (dividing types into "value" and "reference") right.
There are other areas where I think C# is lacking; this just doesn't
happen to be one of them.

In order to improve computer language design one must look for ways to reduce
the number of details that the programmer must keep track of. The distinction
between value types and reference types as separate types is one detail that
might be able to be eliminated.

Perhaps, but the only ways I can think of to eliminate this detail
create even more ugly details, or result in horrid inefficiencies. I
can't for the life of me figure out how to unify the C# type system
(and thus eliminate the value / reference distinction and the need to
pay any attention to it at all) without introducing far more details
for the programmer to keep track of. I freely admit that this may be a
lack of imagination on my part.

It might be able to be eliminated with no
degradation of performance in terms of increases in either time or space. (CPU
cycles or RAM).

AND while making things better (simpler) for the programmer, not worse.

Barry Kelly · Jan 17, 2007

|It
| is comparable to the slight extra overhead that polymorphism requires yet
| providing much more versatile code.

Not at all, they are two very different things.

I don't agree with this statement - polymorphism is exactly what boxing
is about. It allows an int, a "fundamental" type in most statically
typed languages, to be stored polymorphically in a location of type
object.

-- Barry

Peter Olcott · Jan 17, 2007

Bruce Wood said:
Perhaps, but the only ways I can think of to eliminate this detail
create even more ugly details, or result in horrid inefficiencies. I
can't for the life of me figure out how to unify the C# type system
(and thus eliminate the value / reference distinction and the need to
pay any attention to it at all) without introducing far more details
for the programmer to keep track of. I freely admit that this may be a
lack of imagination on my part.

These details might still exist under the covers, I am only proposing that the
distinction between value type and reference type be made entirely transparent
to the C# programmer.

Peter Olcott · Jan 17, 2007

Barry Kelly said:
I don't agree with this statement - polymorphism is exactly what boxing
is about. It allows an int, a "fundamental" type in most statically
typed languages, to be stored polymorphically in a location of type
object.

If you simply keep everything in a box, then there is no boxing and unboxing
overhead, merely boxing initialization. Small value types such as integer and
double can have the functions that use them serve as their box.

Bruce Wood · Jan 17, 2007

Peter said:
I am not recommending making C# more like C++, just the opposite. I am
recommending that C# reduce its complexity even more, and do this in a way that
neither reduces speed nor increases space very much. Polymorphism both reduces
speed and increases space, yet the benefits far outweigh the cost, because the
benefits are large and the cost is small.

Yes, but HOW? I can't for the life of me see how, or even see a way to
approach the problem. I've thought of only three alternatives so far,
and in all cases the cure is worse than the disease, as it were:

1. Unify in favour of reference types. Everything is (or appears to be)
a reference type. If everything really is a reference type, then
performance goes in the crapper as every int, bool, and double goes
onto the heap and requires a pointer dereference. Whether everything
appears to be or really is a reference type, whenever you want to pass
a value (like an int) to a method by value, you have to say something
special, like "val" (as opposed to the current "ref" which would become
the default). This would litter your code with "val" markers on method
headers, and if you forgot one, you might hose your caller. Some
languages take this approach (FORTRAN did, I'm not sure about
Smalltalk). It doesn't strike me as helping matters at all.

2. Unify in favour of value types. This is effectively what C++ does.
The problem is that then _almost_ every time I pass an object to a
method I have to remember to say "ref". This just ends up peppering my
method signatures with "ref" all over the place, and if I forget to add
a "ref" then I end up with a horribly inefficient call as some monster
instance is loaded onto the stack. Guess what every newbie out there
will be doing? All this gives me is a bunch more busywork (saying "ref"
all over the place) for no gain that I can see, other than the ability
to pass an object on the stack that one time out of 100 when it's
really what I want.

3. Keep the distinction, but hide it somehow. Besides not really
understanding how this would work, it still doesn't help me, because
the distinction is _important_. It really does matter which semantics I
choose for a type, just as it matters in C++ whether I choose to pass
an object/value by value or by reference. It deeply affects my program
and how it works. It matters a lot whether an assignment gives me a
copy or a reference to the same object instance. I'm not sure how to
abstract that away.

As I said, I freely admit that this may be lack of imagination on my
part. Feel free to propose another solution.

What would this "abstracting away" look like?

Jon Skeet [C# MVP] · Jan 17, 2007

These details might still exist under the covers, I am only proposing that the
distinction between value type and reference type be made entirely transparent
to the C# programmer.

If the distinction is entirely transparent, how am I meant to say that
I want one thing to be treated with value semantics and another with
reference semantics?

Note that when it comes to parameter passing, there are 4 options
(leaving "out" aside for the moment):

1) Pass value type argument by value
2) Pass reference type argument by value
3) Pass value type argument by reference
4) Pass reference type argument by reference

2) and 3) are quite similar (although not the same), but the others are
very different. In other words, I believe there are more semantics (all
of which are useful in some situations) than your proposal allows.

Peter Olcott · Jan 17, 2007

Jon Skeet said:
If the distinction is entirely transparent, how am I meant to say that
I want one thing to be treated with value semantics and another with
reference semantics?

Note that when it comes to parameter passing, there are 4 options
(leaving "out" aside for the moment):

1) Pass value type argument by value
2) Pass reference type argument by value
3) Pass value type argument by reference
4) Pass reference type argument by reference

Pass most everything by reference except items that are [int] or smaller and do
not need to be changed by the called function. Large items that need to be
protected from change would be passed by reference using the [in] parameter
qualifier indicating that they are read-only. When I am referring to the term
[passing by reference] I am only referring to the fact that the machine address
of the data is passed, and not the actual data itself.

Peter Olcott · Jan 17, 2007

Bruce Wood said:
Yes, but HOW? I can't for the life of me see how, or even see a way to
approach the problem. I've thought of only three alternatives so far,
and in all cases the cure is worse than the disease, as it were:

1. Unify in favour of reference types. Everything is (or appears to be)
a reference type. If everything really is a reference type, then
performance goes in the crapper as every int, bool, and double goes
onto the heap and requires a pointer dereference. Whether everything

A pointer dereference is not expensive. I just benchmarked it at only 16% more
total time, in a tight loop.

appears to be or really is a reference type, whenever you want to pass
a value (like an int) to a method by value, you have to say something
special, like "val" (as opposed to the current "ref" which would become

There is no [val] or [ref] the idea it to remove these concepts from the
language domain. In their place are [in], (input read-only) [out] (output
write-only) and [io] (input/output read/write).

Jon Skeet [C# MVP] · Jan 17, 2007

1) Pass value type argument by value
2) Pass reference type argument by value
3) Pass value type argument by reference
4) Pass reference type argument by reference

Click to expand...

Pass most everything by reference except items that are [int] or smaller and do
not need to be changed by the called function. Large items that need to be
protected from change would be passed by reference using the [in] parameter
qualifier indicating that they are read-only. When I am referring to the term
[passing by reference] I am only referring to the fact that the machine address
of the data is passed, and not the actual data itself.

Right. So how do I differentiate a method which changes the contents of
the "object I pass in" and a method which changes the value of the
variable to refer to a completely different object? They are different
semantics, and both are useful at times. How does your scheme allow
them to be differentiated?

Jesse McGrew · Jan 17, 2007

Peter said:
A pointer dereference is not expensive. I just benchmarked it at only 16% more
total time, in a tight loop.

First, "only 16%" is quite a significant performance hit for a feature
of questionable usefulness. The performance hit from virtual methods,
interfaces, delegates, etc. is at least one that you only have to take
when you use those features. This one would affect every single
operation.

Also, your benchmark ignores the effects of all these pointers on cache
performance, as well as the additional work the GC would have to
perform if *everything* were referred to by reference, increasing the
number of pointers and heap objects in the average program by a factor
of... ten or more?

Jesse

Barry Kelly · Jan 17, 2007

Peter Olcott wrote:

Peter, what you wrote doesn't make sense. It implies either you don't
know what boxing means, or you don't know what a reference type is, or
both.

If you simply keep everything in a box

Boxes are on the heap. They are on the heap to avoid lifetime issues. If
they were not on the heap, and were instead on the stack for locals,
then one could store such a local in a global structure and violate
memory safety. For example, permitting that would permit the following
(in C for your ease of understanding):

---8<---
#include <stdio.h>

static int *value;

void store(int *x)
{
value = x;
}

int *retrieve(void)
{
return value;
}

void do_store(void)
{
int x = 42; // here's my local
store(&x); // here I am passing it by reference (boxed)
}

void recurse(int count)
{
if (count > 0)
recurse(count - 1);
}

int main(void)
{
do_store();
recurse(10); // trashing stack
printf("%d\n", *retrieve()); // whups! CORRUPTED!
return 0;
}
--->8---

We don't store fundamental types like 'int' on the heap for performance
reasons. That's why they are value types. Value types are usually copied
instead of passed by reference. When they are passed by reference (via
'ref'), then their usefulness is severly constrained in order to avoid
the above problem (demonstrated in the C program).

, then there is no boxing and unboxing
overhead, merely boxing initialization.

Small value types such as integer and
double can have the functions that use them serve as their box.

Functions cannot serve as a box. Functions are code. Boxes are objects
allocated on the heap. Functions are not mutable objects. You can't
store values inside functions.

-- Barry

nullable types is a struct ?	6	Sep 30, 2008
ViewSonic ColorPro VP2776 Pantone Validated Professional Monitor with Color Calibrator Unboxing	3	Nov 5, 2022
Boxing & Unboxing F1 F1 F1...	6	Dec 2, 2003
about performance	2	Dec 5, 2011
Unbox IL instruction question	5	Jul 26, 2006
Compiler and it’s (in)ability to detect incompatible assignments	8	Jan 22, 2009
A question about understanding arraylist and object	3	Feb 9, 2008
Want to see boxing/unboxing timings - is this code no good?	2	Jan 4, 2004

Boxing and Unboxing ??

Jesse McGrew

Bruce Wood

Bruce Wood

Ignacio Machin \( .NET/ C# MVP \)

Ignacio Machin \( .NET/ C# MVP \)

Peter Olcott

Peter Olcott

Peter Olcott

Peter Olcott

Bruce Wood

Barry Kelly

Peter Olcott

Peter Olcott

Bruce Wood

Jon Skeet [C# MVP]

Peter Olcott

Peter Olcott

Jon Skeet [C# MVP]

Jesse McGrew

Barry Kelly

Ask a Question

Similar Threads