Boxing and Unboxing ??

J

Jon Skeet [C# MVP]

Peter Olcott said:
I carefully read it, yet, did not fully understand the meaning of all of the
terminology that was used. For one thing, I don't see why there is ever any need
for boxing and unboxing. I know that there is no such need in C++. I also know
that it must somehow support GC, and that is why it is needed. I don't see how
it supports GC. Is it something like maintaining a chain of pointers indicating
who owns what?

It's not required for C++ because C++ doesn't have a single type
hierarchy. You can't treat an int as if it were an object.
 
J

Jon Skeet [C# MVP]

It is good to know that aggregate data can be passed by reference without the
boxing and unboxing overhead, if need be.

Normally aggregate data is stored in a reference type to start with,
where there's no boxing penalty anyway.

Last time you were concerned with the performance penalty of boxing and
unboxing, we proved that in the benchmark you were worried about, the
cost of boxing and unboxing was negligible. Now, do you have a
different evidence-based reason for worrying about the penalty? If not,
I'd suggest you get some more experience (and start profiling) before
worrying about it any more.
 
B

Barry Kelly

Peter said:
Couldn't there possibly be a way to create safe code that does not ever require
any extra runtime overhead? Couldn't all the safety checking somehow be done at
compile time?

What safety checking are you talking about? Work with the C# types (such
as 'int') and there won't be any boxing, the work will be done at
compile time, and you won't pay any costs. You only need to worry about
boxing if you need to store an int or other value type into a location
of type 'object' (which isn't too often, but occasionally useful, in my
experience).

If you're talking about creating a pointer to int, and keeping it safe,
then the only verifiable way to do that is via ref parameters we talked
about in another branch of this thread. That's because the compiler can
guarantee things relating to the flow of code. Storing such references
in other structures isn't allowed because it can trivially create
dangling references:

// theoretical field type
ref int _savedX;

void Foo(ref int x)
{
_savedX = ref x;
}

void Bar(int x)
{
Foo(ref x);
}

void Baz()
{
Bar(42);
Console.WriteLine(_savedX); // uh oh - reading from invalid location
}

That's why such reference are restricted to parameters only.

-- Barry
 
B

Barry Kelly

Peter said:
A strongly type language like C++ effectively prevents any accidental type
errors,

Not all errors. If you take the address of a variable, C++ doesn't do
anything to ensure that the variable you've taken the address of lives
longer than variable which stores the taken address. This is more what I
mean by memory safety, over type safety. It's a far stronger commitment.

The mere existence of access violations in commercial programs is
evidence enough for this.
why bother with more than this?

There is also another class of error: intentional errors, to (e.g.)
violate security when running in a browser as another poster indicated,
or in some hosted process such as a web hosting provider's ASP.NET
context, or in a SQL Server 2005 process, etc.

-- Barry
 
B

Barry Kelly

Bruce said:
However, I don't see how a language that allowed one to take the
address of arbitrary data could implement garbage collection.

It's actually possible, albeit with conservative collection. The
Boehm-Demers-Weiser collector can be linked with C++ to give it GC, for
example.
Once you open up the language to allow arbitrary addressing of objects
and the values within them, you create a nightmare situation for the
garbage collector.

Actually, it wouldn't be that much of a problem, except certain rules
would start applying. For example, if you ever took the address of a
local variable or parameter, that variable would have to be moved out to
the heap behind the scenes (a lot like variable capture in anonymous
delegates). Similarly, taking the address of a field would become an
interior pointer, and would keep the object alive.
Personally, I'm glad that arbitrary addressing was never put into Java
or C#. When I moved from C / C++ to Java I wondered how I would ever do
without the "&" operator

Don't forget that C# has a unary '&' operator in unsafe code.

-- Barry
 
P

Peter Olcott

Jesse McGrew said:
Peter said:
[1] By "memory-safe", I mean that it's provably impossible to violate
language's memory model. See e.g. type safety on Wikipedia for more
info:

http://en.wikipedia.org/wiki/Type_safety

A strongly type language like C++ effectively prevents any accidental type
errors, why bother with more than this?

Because this also prevents *intentional* type errors, which is
important for running code in a sandbox. Your web browser can guarantee
that a Java applet embedded into a page won't crash your system or
delete all your files, because Java enforces type safety at all levels;
this is the same sort of thing.

Jesse
Ah, I see. So now we can have safe ActiveX components that are embedded in
webpages. There is no longer a tradeoff between the safety of Java and the
functionality of ActiveX.
 
P

Peter Olcott

Jesse McGrew said:
If you disallow type casting, you neuter the language. You need to be
able to cast instances of derived classes to their bases and back. You
can do the first kind of cast without any runtime overhead, but you
need *some* runtime overhead to cast a base instance to its actual
derived class, even in C++ with dynamic_cast<>.

(The overhead in C++ isn't for performing the actual cast, but for
verifying that the cast is valid - that the object actually belongs to
the class you're casting it to. In C#, that's usually the case, but for
unboxing casts there's also overhead for copying the value out of its
box.)

I was not referring to this kind of type casting. This is not literally type
casting from one entirely different type to another. It looks like the really
dangerous type casting is casting from an integer to a pointer to a function,
this is the kind of type casting that allows malicious code such as viruses and
worms to exist, and take control.
The desire to avoid that overhead (as well as other problems with
reference counting) is, presumably, why .NET uses a garbage collector
instead.

That does not really eliminate reference counting, it merely delegates it to the
GC.
No, that's what "static objects" refers to. In C#, you typically only
store global data by putting it in the static fields of a class. (There
are a couple other types of global data used with C++/CLI: bare global
variables and gcroots.)


The wrapper is there so that the int on the heap can be treated like
any other object, with a type pointer, virtual methods, etc. If it were
just stored on the heap as a plain integer, there'd be no way for your
code (and more importantly, the garbage collector) to tell it apart
from a float or an object reference at runtime.

Okay, now I am getting it.
Boxing lets you write a method like this:

public static void PrintIt(object foo)
{
Console.WriteLine("Thanks for this " + foo.GetType().Name + ": " +
foo.ToString());
}

And then pass in *any* value, whether it's an integer, a structure, or
an object reference. An unboxed integer is just a number, with no type
information other than that stored in the compiler's internals; a boxed

Which cease to exist at runtime.
 
P

Peter Olcott

Bruce Wood said:
C# supports pass-by-reference using the "ref" keyword.

However, I don't see how a language that allowed one to take the
address of arbitrary data could implement garbage collection. Even with
reference counting, the theory is that an _object_ counts references to
itself. An int, however, isn't an object. You're faced with the problem
of an object counting references to itself _or piece of data that it
holds_. How could you engineer a system whereby object A could keep
track of this sort of thing:

int *p = &(A.X);
int *q = p;

How does the object A now know that there are two references to it, p
and q, which point to a field inside A and not to A itself?

I don't see how you could automate this kind of reference counting,
even in C++, but then I'm no C++ guru.


No. Global data is allowed. That's what I meant by "static".


C# and Java don't do reference counting. They walk the network of
object references at garbage collection time. "Mark and sweep."

I guess a good summary would be to say that the more regular the
situation, the easier it is to write good code to deal with it. By
forcing every collectable object to be the same, and allowing
references only to objects on the heap (apart from pass-by-ref, which
doesn't enter into garbage collection), C# and Java make it easier on
the garbage collector, which allows the GC to be more efficient.

Ah so we could create a new parameter qualifier that works like [out] and [ref]
yet in the opposite direction. We could have an [in] parameter qualifier that
allows all large objects (larger than int) to be passed by reference, yet these
are all read-only objects. The compiler does not allow writing to them. This way
we avoid the unnecessary overhead of making copies of large objects just to
avoid accidentally making changes to these large objects.
Once you open up the language to allow arbitrary addressing of objects
and the values within them, you create a nightmare situation for the
garbage collector. Not that a sufficiently clever team of people
couldn't do it, I suppose, but it adds a lot of additional complexity,
and one has to ask exactly what would be gained? Java has demonstrated
that you can write perfectly good code without the ability to take
arbitrary addresses, pointer arithmetic, and the other stuff that C and
C++ pointers provide. There are some domains where the power of C / C++
pointers is arguably a great boon, but for most programming problems it
isn't required. So, you don't lose very much, and you gain a much
simpler garbage collector and better run-time security.

And yes, in .NET 2.0 you can pretty-much avoid boxing (and unboxing)
altogether. It was difficult in .NET 1.1 because all of the standard
collections were collections of Object, and so storing values in a
Hashtable or an ArrayList (aka Vector in C++) meant incurring boxing
overhead. Even in .NET 1.1, however, you could roll your own
collections that didn't box or unbox, but they had to be type-specific.
.NET 2.0's generics (aka templates in C++) eliminate this problem. I
wouldn't say that boxing is a thing of the past, but more than 90% of
boxing in .NET 1.1 was in collections, and that's no longer necessary.

So the runtime penalty is almost non-existent, assuming that you use
appropriate language constructs.

Personally, I'm glad that arbitrary addressing was never put into Java
or C#. When I moved from C / C++ to Java I wondered how I would ever do
without the "&" operator, but I quickly realized that for the type of
software I write (business software) it really isn't needed. If,
however, I ever go back to writing real-time switching systems, I will
no doubt want C++ back again. Each tool has its uses, and C# is, in my
opinion, better suited to most day-to-day programming problems than is
C++. However, there are places that C# won't take you, where C++ is
much better suited.
It might be possible to design a language that has essentially all of the
functionally capabilities of the lower level languages, without the requirement
of ever directly dealing with pointers. I myself have always avoided pointers,
(since the early 1980's) they were always too difficult to debug. Instead of
using pointers I used static arrays, at least in this case I could print out the
subscripts. Now that I know C++, I can still avoid pointers by using the STL
constructs.
 
P

Peter Olcott

Jon Skeet said:
Normally aggregate data is stored in a reference type to start with,
where there's no boxing penalty anyway.

Last time you were concerned with the performance penalty of boxing and
unboxing, we proved that in the benchmark you were worried about, the
cost of boxing and unboxing was negligible. Now, do you have a
different evidence-based reason for worrying about the penalty? If not,
I'd suggest you get some more experience (and start profiling) before
worrying about it any more.
I want to fully understand exactly how the underlying architecture works so that
I can design it from the ground up using the best means. With C++ I already know
exactly what kind of machine code that anything and everything will translate
into. I need to acquire this degree of understanding of .NET before I begin
using it.

The systems that I am developing are not business information systems where
something can be 10,000-fold slower than necessary and there is no way for
anyone to notice the difference. In some cases a two-fold difference in the
speed of an elemental operation can noticeably effect response time. I am not
comfortable switching to C# until I know every detail of exactly how to at least
match the performance of native code C++.
 
P

Peter Olcott

Barry Kelly said:
What safety checking are you talking about? Work with the C# types (such
as 'int') and there won't be any boxing, the work will be done at
compile time, and you won't pay any costs. You only need to worry about
boxing if you need to store an int or other value type into a location
of type 'object' (which isn't too often, but occasionally useful, in my
experience).

If you're talking about creating a pointer to int, and keeping it safe,
then the only verifiable way to do that is via ref parameters we talked
about in another branch of this thread. That's because the compiler can
guarantee things relating to the flow of code. Storing such references
in other structures isn't allowed because it can trivially create
dangling references:

// theoretical field type
ref int _savedX;

void Foo(ref int x)
{
_savedX = ref x;
}

void Bar(int x)
{
Foo(ref x);
}

void Baz()
{
Bar(42);
Console.WriteLine(_savedX); // uh oh - reading from invalid location
}

That's why such reference are restricted to parameters only.

That would seem to be a fine restriction. Now if we can only add an [in]
parameter qualifier that passes all large objects by reference, yet makes them
read-only. Objects the size of [int] or smaller can be passed by value, yet
still as read-only. The compiler flags all write access to these [in] parameters
as an error, at compile time. We can do the same sort of thing for the [out]
parameter qualifier, (now all data is passed by reference, and is read-write)
and thus have no need for the [ref] parameter qualifier.

Now the programmer could do the right thing with these parameters without even
the need for understanding the underlying mechanisms of pass by value or pass by
reference. Now it becomes pass by I-want-to-change-it and pass by
I-want-to-make-sure-it-wont-be-changed.
 
P

Peter Olcott

Barry Kelly said:
Not all errors. If you take the address of a variable, C++ doesn't do
anything to ensure that the variable you've taken the address of lives
longer than variable which stores the taken address. This is more what I
mean by memory safety, over type safety. It's a far stronger commitment.

The mere existence of access violations in commercial programs is
evidence enough for this.


There is also another class of error: intentional errors, to (e.g.)
violate security when running in a browser as another poster indicated,
or in some hosted process such as a web hosting provider's ASP.NET
context, or in a SQL Server 2005 process, etc.

I was originally thinking that it might be useless to make one set of languages
completely type safe as long as another set of languages exists that is not type
safe. The authors of malicious code simply would not migrate to the new
technology.
 
B

Barry Kelly

Peter said:
That would seem to be a fine restriction. Now if we can only add an [in]
parameter qualifier that passes all large objects by reference, yet makes them
read-only.

I don't mean to be harsh, but why don't you try programming in C# & .NET
for a year or two before you suggest ways to improve it?

The way I personally see it, you've got a myopic view of the world based
on a C++ perspective, and want to "fix" things to make you yourself feel
more comfortable.

Don't take that as me saying that I think a const by-ref for value type
parameters would be a bad thing (I don't). However, I don't think it's
badly needed either.

-- Barry
 
B

Barry Kelly

Peter said:
I want to fully understand exactly how the underlying architecture works so that
I can design it from the ground up using the best means. With C++ I already know
exactly what kind of machine code that anything and everything will translate
into. I need to acquire this degree of understanding of .NET before I begin
using it.

I think learning about a system is most easily achieved *while* using
it, not *before* using it. Experimentation and experience are better
teachers than replies to questions on newsgroups.
I am not
comfortable switching to C# until I know every detail of exactly how to at least
match the performance of native code C++.

I suggest you:

1) Make a first attempt at converting some algorithm you're concerned
about to C#.
2) Profile it to find out what's slower than your budget allows.
3) Then perhaps ask specific questions on the newsgroups about how to
improve and / or redesign a particular construct / technique.

-- Barry
 
P

Peter Olcott

Barry Kelly said:
I think learning about a system is most easily achieved *while* using
it, not *before* using it. Experimentation and experience are better
teachers than replies to questions on newsgroups.


I suggest you:

1) Make a first attempt at converting some algorithm you're concerned
about to C#.
2) Profile it to find out what's slower than your budget allows.
3) Then perhaps ask specific questions on the newsgroups about how to
improve and / or redesign a particular construct / technique.

I don't have time to do it this way. By working 90 hours a week, I am still 80
hours a week short of what I need to get done.
 
B

Barry Kelly

Peter said:
It might be possible to design a language that has essentially all of the
functionally capabilities of the lower level languages, without the requirement
of ever directly dealing with pointers. I myself have always avoided pointers,
(since the early 1980's) they were always too difficult to debug. Instead of
using pointers I used static arrays, at least in this case I could print out the
subscripts.

You know, memory is just one big static array of bytes (albeit sparse
due to OS address space allocation), and pointers are just indexes into
the array. It follows that for every problem with pointers, there's an
analog with array indexes, though many of those problems seem contrived
unless one is really trying to replace all pointers with indexes.

I once wrote an in-memory database that used a .NET byte array for its
storage. "Dangling" indexes, incorrectly typed indexes, all the usual
problems had to be ironed out with diagnostic tools and integrity
checkers early on in the development.

-- Barry
 
P

Peter Olcott

Barry Kelly said:
Peter said:
That would seem to be a fine restriction. Now if we can only add an [in]
parameter qualifier that passes all large objects by reference, yet makes
them
read-only.

I don't mean to be harsh, but why don't you try programming in C# & .NET
for a year or two before you suggest ways to improve it?

The way I personally see it, you've got a myopic view of the world based
on a C++ perspective, and want to "fix" things to make you yourself feel
more comfortable.

Don't take that as me saying that I think a const by-ref for value type
parameters would be a bad thing (I don't). However, I don't think it's
badly needed either.

Although I do not have nearly the same degree of experience with C# as most C#
programmers, I do have more experience with computer language design than most
C# programmers. So if you are rating the quality of my suggestion on the
incorrect basis of credibility rather than the correct basis of validity, you
are rating from an incorrect basis.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Peter said:
Couldn't there possibly be a way to create safe code that does not ever require
any extra runtime overhead? Couldn't all the safety checking somehow be done at
compile time?

Maybe.

I doubt that the final truth on language design is written yet.

But C# in current versions are as they are.

Arne
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Peter said:
I want to fully understand exactly how the underlying architecture works so that
I can design it from the ground up using the best means. With C++ I already know
exactly what kind of machine code that anything and everything will translate
into. I need to acquire this degree of understanding of .NET before I begin
using it.

The systems that I am developing are not business information systems where
something can be 10,000-fold slower than necessary and there is no way for
anyone to notice the difference. In some cases a two-fold difference in the
speed of an elemental operation can noticeably effect response time. I am not
comfortable switching to C# until I know every detail of exactly how to at least
match the performance of native code C++.

In most cases C# will be as fast as C++.

But if you want to understand the language -> machine instructions
stuff, then maybe C# is not for you.

It is a language designed to abstract that stuff away. You are not
supposed to care about it.

Arne
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Peter said:
I don't have time to do it this way. By working 90 hours a week, I am still 80
hours a week short of what I need to get done.

90 hours work per week and quality code does not get along very well.

Arne
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Peter said:
A strongly type language like C++ effectively prevents any accidental type
errors, why bother with more than this?

Effectively ?

Ever seen someone forget a copy constructor and assignment
operator and get memory messed up ?

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top