Boxing and Unboxing ??

P

Peter Olcott

According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a value
type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value type"
and "reference type" mean something entirely different than what they mean on
every other platform in every other language. Normally a value type is the
actual data itself stored in memory, (such as an integer) and a reference type
is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what the
terms "value type" and "reference type" actually mean in terms of their
underlying architecture?
 
B

Bob Graham

Value types are stored on the "Stack" and go away, as it were, immediately
when they go out of scope. Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected when
the system feels like it. References to ref types are passed normally as
a pointer to the address. Value types are passed a copy of the value.
I'm sure someone with more years under their Microsoft belt will chime
in here with a more exlicit and concise answer, but this is basically how
it is.
Bob
According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms
"value type"
and "reference type" mean something entirely different than what they
mean on every other platform in every other language. Normally a value
type is the actual data itself stored in memory, (such as an integer)
and a reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean
something entirely different. can someone please give me a quick
overview of what the terms "value type" and "reference type" actually
mean in terms of their underlying architecture?



Posted by NewsLook (Trial Licence) from http://www.ghytred.com/NewsLook/about.aspx
 
P

Peter Olcott

Bob Graham said:
Value types are stored on the "Stack" and go away, as it were, immediately
when they go out of scope. Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected when
the system feels like it. References to ref types are passed normally as
a pointer to the address. Value types are passed a copy of the value.
I'm sure someone with more years under their Microsoft belt will chime
in here with a more exlicit and concise answer, but this is basically how
it is.
Bob

What I am looking for is all of the extra steps that form what is referred to as
boxing and unboxing. In C/C++ converting a value type to a reference type is a
very simple operation and I don't think that there are any runtime steps at all.
All the steps are done at compile time. Likewise for converting a reference type
to a value type.

in C/C++
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;
 
B

Bob Graham

From Troellsen's Professional c#:

"Given that .NET defines two major categories of types (value based and
reference based), you may occasionally need to represent a variable of
one category as a variable of the other category. C# provides a very simple
mechanism, termed boxing, to convert a value type to a reference type.
Assume that you have created a variable of type short:
// Make a short value type.
short s = 25;
If, during the course of your application, you wish to represent this
value type as a reference type, you would "box" the value as follows:
// Box the value into an object reference.
object objShort = s;
Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type by storing the variable
in a System.Object. When you box a value, the CLR allocates a new object
on the heap and copies the value type's value (in this case, 25) into that
instance. What is returned to you is a reference to the newly allocated
object. Using this technique, .NET developers have no need to make use
of a set of wrapper classes used to temporarily treat stack data as heap-allocated
objects. The opposite operation is also permitted through unboxing. Unboxing
is the process of converting the value held in the object reference back
into a corresponding value type on the stack. The unboxing operation begins
by verifying that the receiving data type is equivalent to the boxed type,
and if so, it copies the value back into a local stack-based variable.
For example, the following unboxing operation works successfully, given
that the underlying type of the objShort is indeed a short (you'll examine
the C# casting operator in detail in the next chapter, so hold tight for
now): // Unbox the reference back into a corresponding short.
short anotherShort = (short)objShort;"

I'll stop there due to my distaste for violating copyrights. You may wan
to pick up this book for your language jump. It's more about the language
and makes a lot of comparisons to c/c and Java.
Bob
What I am looking for is all of the extra steps that form what is
referred to as boxing and unboxing. In C/C converting a value type to
a reference type is a very simple operation and I don't think that
there are any runtime steps at all.
All the steps are done at compile time. Likewise for converting a
reference type to a value type.

in C/C
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;



Posted by NewsLook (Trial Licence) from http://www.ghytred.com/NewsLook/about.aspx
 
B

Bob Graham

But Generics are a more powerful alternative that you may want to read
up on. They get rid of boxing and unboxing penalties.
Bob
What I am looking for is all of the extra steps that form what is
referred to as boxing and unboxing. In C/C converting a value type to
a reference type is a very simple operation and I don't think that
there are any runtime steps at all.
All the steps are done at compile time. Likewise for converting a
reference type to a value type.

in C/C
int X = 56;
int *Y = &X;
Now both X and *Y hold 56, and Y is a reference to X;



Posted by NewsLook (Trial Licence) from http://www.ghytred.com/NewsLook/about.aspx
 
D

Dave Sexton

Hi Bob,
Value types are stored on the "Stack" and go away, as it were, immediately
when they go out of scope.

They can also be stored on the heap when they are fields of an object, for
instance. I like to think of value types as being in-line in terms of
memory. In other words, they can live anywhere since it's their "value"
that's important. On the contrary, reference types must live somewhere
where their "reference" can be used - the heap in .NET.
Mostly numeric types and structs.

If you include enums then you've named them all, although they are all
really structures (structs, if you want to use the term loosely). A value
type in the .NET framework is any object that derives from
System.ValueType. The C# compiler, though, requires you to specify the
struct keyword instead of class, but that just means your class derives from
System.ValueType.

<snip>
 
D

Dave Sexton

Hi Peter,
According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a
value type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value
type" and "reference type" mean something entirely different than what
they mean on every other platform in every other language. Normally a
value type is the actual data itself stored in memory, (such as an
integer) and a reference type is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what
the terms "value type" and "reference type" actually mean in terms of
their underlying architecture?

Your definitions are correct even in .NET. The real difference between the
framework and some of the other platforms you may be accustomed to is in the
management of memory. i.e., garbage collection.
 
P

Peter Olcott

Bob Graham said:
From Troellsen's Professional c#:

"Given that .NET defines two major categories of types (value based and
reference based), you may occasionally need to represent a variable of
one category as a variable of the other category. C# provides a very simple
mechanism, termed boxing, to convert a value type to a reference type.
Assume that you have created a variable of type short:
// Make a short value type.
short s = 25;
If, during the course of your application, you wish to represent this
value type as a reference type, you would "box" the value as follows:
// Box the value into an object reference.
object objShort = s;
Boxing can be formally defined as the process of explicitly converting
a value type into a corresponding reference type by storing the variable
in a System.Object. When you box a value, the CLR allocates a new object
on the heap and copies the value type's value (in this case, 25) into that
instance. What is returned to you is a reference to the newly allocated
object. Using this technique, .NET developers have no need to make use
of a set of wrapper classes used to temporarily treat stack data as
heap-allocated
objects. The opposite operation is also permitted through unboxing. Unboxing
is the process of converting the value held in the object reference back
into a corresponding value type on the stack. The unboxing operation begins
by verifying that the receiving data type is equivalent to the boxed type,
and if so, it copies the value back into a local stack-based variable.
For example, the following unboxing operation works successfully, given
that the underlying type of the objShort is indeed a short (you'll examine
the C# casting operator in detail in the next chapter, so hold tight for
now): // Unbox the reference back into a corresponding short.
short anotherShort = (short)objShort;"

So a reference type is not anything at all like what the term "reference type"
means everywhere outside of the .NET. architecture. They probably should have
chosen different names such as Managed Heap Type and Stack Type, this would have
been far less misleading.

What I really want to see is the underlying architecture of Managed Heap Type
and Stack Type. In particular is there a whole lot of extra baggage for this
"value type" (Stack Type) as there seems to be for the Managed Heap Type
(reference type) ???
 
P

Peter Olcott

Dave Sexton said:
Hi Peter,


Your definitions are correct even in .NET. The real difference between the
framework and some of the other platforms you may be accustomed to is in the
management of memory. i.e., garbage collection.

It seems that .NET adds a whole lot of extra baggage to these otherwise very
simple terms.
int X = 56; // refers to 56 (value type)
int* Y = &X; // Y refers to the address of 56 (reference type)
That is all there is to it, no runtime cost involved at all, no complex
underlying infrastructure.
 
J

Jesse McGrew

Peter said:
According to Troelsen in "C# and the .NET Platform"
"Boxing can be formally defined as the process of explicitly converting a value
type into a corresponding reference type."

I think that my biggest problem with this process is that the terms "value type"
and "reference type" mean something entirely different than what they mean on
every other platform in every other language. Normally a value type is the
actual data itself stored in memory, (such as an integer) and a reference type
is simply the address of this data.

It seems that .NET has made at least one of these two terms mean something
entirely different. can someone please give me a quick overview of what the
terms "value type" and "reference type" actually mean in terms of their
underlying architecture?

Well, if you're familiar with Delphi or Java, you've already seen
reference types. Class instances in those languages are always stored
as pointers to data on the heap, just like reference types in .NET, and
when you access an object's fields, you're implicitly deferencing the
pointer. In Delphi, records are equivalent to value types; in Java,
primitives like int and double are.

A value type is a type that's normally passed by value, and whose
contents *can* (but don't have to) live on the stack. A reference type
is always passed by reference, and always lives on the heap. A variable
of a value type takes up the entire size of the type, and assigning one
such variable to another copies the contents; a variable of a reference
type only takes up the size of a pointer, and assigning one to another
simply makes both variables point to the same data.

Boxing means copying a value type onto the heap, along with some type
information, so that it can be used like any other instance of
System.Object. This is because even though all types in .NET derive
from System.Object (a reference type), value types are stored
differently. To keep polymorphism and garbage collection working, the
data has to be copied at runtime, because you can't just use a pointer
to a value type on the stack as a managed reference - for example, you
might store that pointer in a global variable, where it would have to
live on after the function returns and its stack frame is destroyed.

Unboxing is the reverse - copying the contents of a boxed value type
(from the heap) back onto the stack so you can work with it in its
usual form.

Jesse
 
P

Peter Olcott

Jesse McGrew said:
Well, if you're familiar with Delphi or Java, you've already seen
reference types. Class instances in those languages are always stored
as pointers to data on the heap, just like reference types in .NET, and
when you access an object's fields, you're implicitly deferencing the
pointer. In Delphi, records are equivalent to value types; in Java,
primitives like int and double are.

A value type is a type that's normally passed by value, and whose
contents *can* (but don't have to) live on the stack. A reference type
is always passed by reference, and always lives on the heap. A variable
of a value type takes up the entire size of the type, and assigning one
such variable to another copies the contents; a variable of a reference
type only takes up the size of a pointer, and assigning one to another
simply makes both variables point to the same data.

Boxing means copying a value type onto the heap, along with some type
information, so that it can be used like any other instance of
System.Object. This is because even though all types in .NET derive
from System.Object (a reference type), value types are stored
differently. To keep polymorphism and garbage collection working, the
data has to be copied at runtime, because you can't just use a pointer
to a value type on the stack as a managed reference - for example, you
might store that pointer in a global variable, where it would have to
live on after the function returns and its stack frame is destroyed.

Unboxing is the reverse - copying the contents of a boxed value type
(from the heap) back onto the stack so you can work with it in its
usual form.

Jesse

Well that is a little more clear now, thanks. So the "value types" have less
baggage? I try to understand these things in the same way that I understand
their equivalents in C and C++. I try to understand them in terms of the
underlying machine operations in assembly language.

With .NET this is a little trickier because it has another layer in-between, and
does not seem to be able to directly expose the actual platform specific
assembly language of what it is doing. In C or C++ I simply tell the compiler to
output assembly language, then I can see everything.
 
B

Bruce Wood

Peter said:
It seems that .NET adds a whole lot of extra baggage to these otherwise very
simple terms.
int X = 56; // refers to 56 (value type)
int* Y = &X; // Y refers to the address of 56 (reference type)
That is all there is to it, no runtime cost involved at all, no complex
underlying infrastructure.

Yes, but you're comparing apples to oranges.

One of the explicit goals of C# (and Java) is to disallow the kind of
pointer aliasing that your example demonstrates, and all of the
security issues that that implies. In C# (and Java) you can't just
"take the address of" something. There is no "&" operator in either
language (unless, in C#, you resort to "unsafe" code).

Both languages are garbage collected, and both languages prevent us
(the programmers) from arbitrarily messing with memory.

This means that in both languages, you can't just take the address of a
value type (like your int X) and treat that as a reference type. If you
want to treat a value type as an object (a reference type), the runtime
must box it into a structure on the heap, like all other objects, and
then you can have a reference to it.

In brief, C# does _not_ allow you the same kind of low-level control
that C++ does. If you move from C++ to C# you lose expressive power. On
the other hand, you also lose a lot of constructs that allow you to
royally hose yourself. Using your example, you can't return the pointer
Y from a function and then later use that pointer into a
no-longer-valid part of the stack to hammer whatever might be there. No
can do in C# and Java, because neither language allows you to take the
address of an arbitrary variable.

C# is much more like Java than it is like C++, IMHO, which doesn't mean
that comparisons can't be made between C# and C++... just that many of
the concepts don't match up precisely.
 
J

Jesse McGrew

Peter said:
Well that is a little more clear now, thanks. So the "value types" have less
baggage? I try to understand these things in the same way that I understand
their equivalents in C and C++. I try to understand them in terms of the
underlying machine operations in assembly language.

Value types do have less baggage (in their unboxed form). For example,
int is a value type - you wouldn't want to have to dereference
pointers, call methods, etc. every time you added or compared two
integers. But they also have less functionality, because you can only
take full advantage of inheritance and polymorphism when you're using
reference types, just like you can only do it with pointers and
references in C++. You need reference types to get that kind of OOP
behavior, as well as to implement structures like trees and lists.

Take the following C# definitions:

struct Value {
public int foo;
}

class Ref {
public int bar;
}

Value my_val;
Ref my_ref;

my_val.foo = my_ref.bar = 0;

The equivalent in C++ would be:

class Value {
public:
int foo;
};

class Ref {
public:
int bar;
};

Value my_val;
Ref * my_ref;

my_val.foo = my_ref->bar = 0;

Every time you declare a variable or field of the type Ref, you're
really declaring a pointer; and when you call its methods or access its
fields, you still write "." in C#, but it works like "->".
With .NET this is a little trickier because it has another layer in-between, and
does not seem to be able to directly expose the actual platform specific
assembly language of what it is doing. In C or C++ I simply tell the compiler to
output assembly language, then I can see everything.

You can view the assembly code in Visual Studio 2005. Run the program,
hit pause to break into the debugger, then right-click on a source line
and choose "Go to Disassembly".

Jesse
 
J

Jon Skeet [C# MVP]

Bob Graham said:
Value types are stored on the "Stack" and go away, as it were, immediately
when they go out of scope.

Value types aren't always stored on the stack.

See http://www.pobox.com/~skeet/csharp/memory.html
Mostly numeric types and structs.
Reference types are stored on the "Heap" and are garbage collected when
the system feels like it. References to ref types are passed normally as
a pointer to the address. Value types are passed a copy of the value.

It=3Fs simpler than that - the value of the expression is always passed
by value unless you use ref/out - it=3Fs just that with reference types,
the value of the expression *is* a reference.

See http://www.pobox.com/~skeet/csharp/parameters.html for more
details.
 
J

Jon Skeet [C# MVP]

Peter Olcott said:
So a reference type is not anything at all like what the term "reference type"
means everywhere outside of the .NET. architecture.

=3FReference type=3F means exactly the same in Java as it means in .NET.
 
J

Jon Skeet [C# MVP]

Jesse McGrew said:
A value type is a type that's normally passed by value, and whose
contents *can* (but don't have to) live on the stack. A reference type
is always passed by reference, and always lives on the heap.

Saying that reference types are "passed by reference" leads to
misunderstandings. Reference type instances are never passed at all -
there's no expression whose value is the instance itself, only the
reference. That reference is passed by value.

See http://www.pobox.com/~skeet/csharp/parameters.html for more details
of this distinction, and why it's an important one to make.
 
P

Peter Olcott

Jesse McGrew said:
Value types do have less baggage (in their unboxed form). For example,
int is a value type - you wouldn't want to have to dereference
pointers, call methods, etc. every time you added or compared two
integers. But they also have less functionality, because you can only

Does that mean that you do have to call a method every time you add or compare
two integers that are stored in reference types?
take full advantage of inheritance and polymorphism when you're using
reference types, just like you can only do it with pointers and
references in C++. You need reference types to get that kind of OOP
behavior, as well as to implement structures like trees and lists.

Take the following C# definitions:

struct Value {
public int foo;
}

class Ref {
public int bar;
}

Value my_val;
Ref my_ref;

my_val.foo = my_ref.bar = 0;

The equivalent in C++ would be:

class Value {
public:
int foo;
};

class Ref {
public:
int bar;
};

Value my_val;
Ref * my_ref;

my_val.foo = my_ref->bar = 0;

Every time you declare a variable or field of the type Ref, you're
really declaring a pointer; and when you call its methods or access its
fields, you still write "." in C#, but it works like "->".


You can view the assembly code in Visual Studio 2005. Run the program,
hit pause to break into the debugger, then right-click on a source line
and choose "Go to Disassembly".

Is that actual Intel machine specific assembly language, or the .NET virtual
machine assembly language?
 
B

Barry Kelly

Peter said:
Jesse McGrew said:
Does that mean that you do have to call a method every time you add or compare
two integers that are stored in reference types?

From the point of view of C#, an integer (or any other value type) is
only boxed if it's been assigned to a location of type 'object' -
whether local variable, argument or field.

Value types that are fields of a reference type are stored inline in the
memory for that object on the heap.

For example:

class A { int x; }

.... can be imagined as being roughly equivalent (from a memory layout
perspective) to this in C:

typedef void *MethodTable; // CLR implementation detail
typedef struct A_ { MethodTable *mt; int x; } *A;

In fact, you can't add two boxed integers in C#, since it's got no way
to represent them as anything other than 'object'. You need to cast them
to 'int' to add them - and that unboxes them.

Example:

object x = 42; // x now contains a boxed int
object y = 10; // as does y
Console.WriteLine(x + y); // can't add object to object

int unboxedX = (int) x;
int unboxedY = (int) y;
Console.WriteLine(unboxedX + unboxedY); // etc.
Is that actual Intel machine specific assembly language, or the .NET virtual
machine assembly language?

Why don't you try it and see, before asking this kind of question?

It's the actual Intel machine code. Be aware of the usual gotchas re
Debug and Release mode.

You can get a higher-quality disassembly, with more correct CLR symbols,
with the MS symbol server (SRV* etc.) combined with SOS.DLL (or use
WinDbg with SOS).

-- Barry
 
B

Barry Kelly

Barry said:
Peter said:
Jesse McGrew said:
Does that mean that you do have to call a method every time you add or compare
two integers that are stored in reference types?

From the point of view of C#, an integer (or any other value type) is
only boxed if it's been assigned to a location of type 'object' -
whether local variable, argument or field.

I should hasten to point out one thing though: when calling a method on
a value type (e.g. ToString() or GetHashCode()) that hasn't been
(re)declared or overridden in the value type, the value type needs to be
boxed to be passed as the 'this' argument (whether it be
System.Object::ToString(), System.Object::GetHashCode(), etc.)

It's the same principle ('this' in these cases is typically of type
'object'), but it is a little hidden.

-- Barry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top