How does managed code work?

S

Siegfried Heintze

Would like to pursue some questions I recently encountered in an
interview -- someone might ask them again in another interview. I was not
very satisfied with the answers I gave the interviewer.

(1) Can someone point me to a discussion of the managed stack? Does it work
the same way as the native (CPU Vender implemented) stack with a frame
pointer that is the head of a linked list of stack frames where each time we
enter a function we create a new stack frame in which new variables are
pushed and each time we exit a function the entire stack frame is popped?

(2) Can someone point me to a discussion of the managed heap? How does it
work? Does it use counted pointers like COM often does? What, exactly,
happens when use operator= to (shallow) copy a SqlDataReader object from a
stack local variable to a global variable? How does it prevent memory leaks
that occur in COM when two objects reference each other and keep the others
reference count nonzero? How is the managed heap different than the native
heap? I think the managed heap implements defragmentation automatically like
Java. Does it use the mark and sweep algorithm or some other algorithm.

(3) Why do we have non-determanistic destructors in C#? Asked differently:
Why do some classes, like the SQL data reader, need to have their dispose
function called explicitly? Why did not the language designers implement
deterministic destructors so we would not have to manually use the "using"
statement or (worse yet) manually call the dispose function when the object
goes out of scope? Could not the language designers design the C# language
so the compiler tells runtime: "hey! this sqldatareader is going out of
scope so you better call the dispose function."

(4) Is there any circumstance where I would NOT want to call dispose
(explicitly or via the "using" statement) on function local SQL Data Reader
object when it is going out of scope?

(5) How is the structure of a managed DLL different from a native DLL?

(6) What choices of XML parser implementatoins do I have? I can call the
native MSXML via COM interop or PInvoke and I can the ones in System.Xml.
What is the difference? Is System.XML just a wrapper for MSXML?

(7) What choices of XML parser types are there? There is SAX and DOM. Any
other choices?

(8) What is the difference between using PInvoke to manipulate a semephore
or mutex and using System.Threading?

Thanks,
Siegfried
 
H

henk holterman

Some good questions, I will answer a few:

Siegfried Heintze wrote:

(2) Can someone point me to a discussion of the managed heap?

Simply a heap for managed objects. The real issue is the Garbage collector.

(3) Why do we have non-determanistic destructors in C#?

The only possibility with a GC. But well written code will always use
the explicit (deterministic) approach, and the GC/finalizer method is a
safety net for situations that would be a resource-leak in an unmanaged
language. The whole non-deterministic issue turns out to be largely
theoretical.

A few points in favor of GC:
- automatic cleanup of all non-resource holding objects
- effective sharing of objects makes for much better OOP
- compacting means better use of Processor cache

(5) How is the structure of a managed DLL different from a native DLL?

It is strong-named and .NET allows side-by-side (different versions
loaded at the same time).

(8) What is the difference between using PInvoke to manipulate a semephore
or mutex and using System.Threading?

Most objects in System.Threading are wrapped Win32 semaphores (the
P/Invoke already done). But Monitor is native .NET

-HH-
 
J

Jeroen Mostert

Siegfried said:
Would like to pursue some questions I recently encountered in an
interview -- someone might ask them again in another interview. I was not
very satisfied with the answers I gave the interviewer.

(1) Can someone point me to a discussion of the managed stack? Does it work
the same way as the native (CPU Vender implemented) stack with a frame
pointer that is the head of a linked list of stack frames where each time we
enter a function we create a new stack frame in which new variables are
pushed and each time we exit a function the entire stack frame is popped?

(2) Can someone point me to a discussion of the managed heap? How does it
work? Does it use counted pointers like COM often does? What, exactly,
happens when use operator= to (shallow) copy a SqlDataReader object from a
stack local variable to a global variable? How does it prevent memory leaks
that occur in COM when two objects reference each other and keep the others
reference count nonzero? How is the managed heap different than the native
heap? I think the managed heap implements defragmentation automatically like
Java. Does it use the mark and sweep algorithm or some other algorithm.
First of all, on the language level, there is no stack and there is no heap.
There are only reference types and value types, and instances of both are
objects. This focus on objects rather than memory is actually part of
languages like C and C++ too (my copy of the C standard contains no
instances at all of the words "heap" and "stack"), but since the mechanisms
are ubiquitous and people usually work much closer to the metal there this
tends to be ignored. It bears mentioning because in managed languages it's
both easier and more convenient to not focus on this at all.

This helps avoid the misleading detail in questions like "What, exactly,
happens when use operator= to (shallow) copy a SqlDataReader object from a
stack local variable to a global variable?" What *exactly* happens is that
the reference to the object is copied (not the object itself), so now there
are two references to the object instead of one, and that's all that
happens. The interesting part doesn't happen until later, when the garbage
collector draws up the set of objects which are still reachable. That can be
couched in terms of stacks and heap, but there's no need for that; local
variables and fields will do.

On the runtime instruction level it's more complicated: there's an
evaluation stack for expressions, there are "slots" for local variables, and
there are instructions for creating new objects, calling methods and
returning from them. The "evaluation stack", "call stack" and "heap" exist
only as concepts to facilitate this; they operate according to certain
abstract rules but how they're implemented is of no concern.

Finally, when you get down to the implementation there is a heap and there
are stacks (plural since we can have multiple threads), and if you're
interested you can dig into the current implementation of the CLR, but it's
of little consequence when you're programming. Managed code has no access to
these details and unmanaged code only needs to bother if it's the CLR itself
(or possibly when you're debugging interop scenarios).

FWIW, the current Windows implementation of the CLR uses call stacks which
work the same as the native call stack (in fact, they use the same
mechanism, so managed and unmanaged stack frames are mixed). The managed
frames have their own format; they cannot be decoded like unmanaged frames.
The managed heap on the other hand works quite differently from the native
heap. http://msdn.microsoft.com/library/f144e03t explains it a lot better
than I can. Summarizing: .NET uses mark-and-sweep garbage collecting with
generations (so no reference counting issues like in COM).

If you are interested in the nitty-gritty of .NET and you have a C/C++/Win32
background (as you seem to do) I can recommend "CLR via C#" by Jeffrey
Richter; Jeffrey's an old hand at Win32 and he goes into much detail in a
familiar way.
(3) Why do we have non-determanistic destructors in C#?

For "destructor", read "finalizer". We have it because .NET uses the
philosophy that deterministic finalization is the minority case. There's
certainly something to say for this, namely that non-deterministic
finalization for memory, the most common resource, happens to be a good idea
-- this is garbage collection.
Asked differently:
Why do some classes, like the SQL data reader, need to have their dispose
function called explicitly? Why did not the language designers implement
deterministic destructors so we would not have to manually use the
"using" statement or (worse yet) manually call the dispose function when
the object goes out of scope?

Because objects don't go out of scope, they become unreachable. But when
they are determined to be unreachable is left undefined. In most cases this
additional freedom for the compiler and the runtime pays off.

There would be very little gain to forcing the compiler/runtime to treat the
case of an object to which only local variables hold references (the only
case to which we could realistically apply deterministic finalization) in a
special manner. For starters, this would force you to spell out the rules
for when a variable is live in the programming language itself (and the
programmer would have to know them). Even if you let this coincide with
scope (arguably the simplest way to do it), the case where a reference
accidentally "escapes" and ceases to be deterministically finalizable would
be trivial to overlook (in C++ this would be the classic "reference to local
variable" problem). A "using" scope makes these things explicit.
Could not the language designers design the C# language so the compiler
tells runtime: "hey! this sqldatareader is going out of scope so you
better call the dispose function."
But this is exactly what finalization does. As soon as the object "goes out
of scope" (is no longer reachable and is garbage collected) it's finalized
(well, technically, it's put on the list to be finalized Real Soon Now). The
issue is that in .NET, "going out of scope" does not have an exact timeframe
associated with it like in C++. An object is determined to be unreachable at
some unspecified time.

Associating an explicit timeframe (for purposes of cleaning up non-memory
resources, at least) is what you do in C# with "using", and what you do in
C++ by allocating the object on the stack (or wrapping a heap-allocated
object in a smart pointer).
(4) Is there any circumstance where I would NOT want to call dispose
(explicitly or via the "using" statement) on function local SQL Data Reader
object when it is going out of scope?
Assuming you don't pass the reference off to another object (so you can no
longer tell when it's "going out of scope"), which would probably be a bad
idea, the answer is "no". Actually, you might want to dispose it *before* it
goes out of scope, or better yet, combine the scope and the disposing...
which is precisely what "using" does. In C++, you might introduce an
artifical scope for this. In neither case can you afford to ignore what's
happening when those little objects blink in and out of life, since they're
using precious resources under the hood. The main difference is that C++
considers memory precious as well, while .NET treats it more like a
renewable resource.
(5) How is the structure of a managed DLL different from a native DLL?
Managed code is organized in the form of assemblies (whether .DLL or .EXE,
that's the same thing as far as managed code is concerned). As for the
differences between assemblies and DLLs, I couldn't possibly do it justice
here. Google is your friend. Structurally, the main difference is that
assemblies have complete metadata (types, methods, argument lists) while
DLLs have little to none (list of function names imported/exported, maybe
type names for a C++ DLL, that's about it).
(6) What choices of XML parser implementatoins do I have? I can call the
native MSXML via COM interop or PInvoke and I can the ones in System.Xml.
What is the difference? Is System.XML just a wrapper for MSXML?
Using COM interop is a performance hit generally to be avoided, and no,
System.Xml is not a wrapper around unmanaged code but a grounds-up
implementation in managed code. I can think of no scenario at all where
P/Invoking to MSXML would make sense, except perhaps stringent backwards
compatibility (but then, don't write managed code).
(7) What choices of XML parser types are there? There is SAX and DOM. Any
other choices?
The framework has no SAX-like parser for XML. It has a DOM parser, it has
lightweight pull-model parsers in the form of XPathNavigator and XmlReader
and with the advent of .NET 3.5 it has a non-DOM but still in-memory model
(albeit lazily evaluated) in the form of LINQ to XML.

This link explains why the framework has XmlReader but not SAX:
http://msdn.microsoft.com/library/sbw89de7
(8) What is the difference between using PInvoke to manipulate a semephore
or mutex and using System.Threading?
Using the Semaphore class from System.Threading is portable (.NET isn't just
for Windows). Using P/Invoke is not. Other than that, as the class is at
present a wrapper around the unmanaged functionality, there's no difference.
 
J

Jeroen Mostert

Peter said:
I agree with practically everything Jeroen wrote. But, there are a
couple of (admittedly nitpicky) points I'd like to touch on, hopefully
in a way that usefully clarifies the issues:



This isn't really literally true. While it's only touched on briefly,
the C# specification does in fact discuss local variables in the context
of a "stack". C# even has a "stackalloc" keyword (albeit for unsafe
code only).
Exaggeration under dramatic license -- although I will give you
"stackalloc", that's rather direct. In fact the C# spec actually specifies
that "stackalloc" allocates memory from the *call stack*, something I'm not
sure the spec has any business dictating. The underlying CLI uses the
"localloc" instruction for this and calls what's being allocated from the
"local memory pool", taking care not to throw the various memory areas
together prematurely. The C# spec speaking directly of the stack is a bit of
level confusion, based on the obvious implementation.
Frankly it would seem a bit disingenous to me for anyone to try to
discuss a language where one has the ability for functions to call
functions without having the concept of a "stack".

Yes, conceptually the call stack, at least, is a real place. However, the
call stack is special because it's just the thing that makes nested function
calls possible -- this need not have anything to do with memory allocation
for objects. In particular, the familiar notion of stack frames directly
containing local variables is just one way to do it.
With respect to the phrase "determined to be unreachable", I think it
bears emphasizing that objects do become unreachable in a deterministic
way.

It's true, but I'm not sure emphasizing it is necessary, for the reason you
give: there is no practical use for a system that makes the determination
"at the same time" objects become unreachable, much less one that acts on it
at the same time. I'm not sure thinking about such a system at all doesn't
actually involve more confusion.

For practical purposes, all that matters is when an object is determined to
be unreachable. Although at any given time it's either reachable or not (and
the rules for this are deterministic) this is just what makes garbage
collection feasible in the first place.
For the very reason that "going out of scope" doesn't really relate to
the actual definition of "scope" as it applies in the language, I prefer
to avoid using that term when discussing memory management. Variables
have scope. Objects do not.
Hence the use of quotes, I'm trusting the OP to connect the dots in this
analogy -- and it is just an analogy. Being precise and describing how
variable scope and object lifetime are (not) connected would be tedious and
not illustrate the particular point I was making here, even though it is of
course what actually happens.
For example, when one writes "as soon as the object 'goes out of scope'
(is no longer reachable and is garbage collected)", that seems to imply
either than the object hasn't "gone out of scope" until it's actually
_determined_ to be unreachable and is collected, or that the
determination of unreachability is done in a deterministic way.
Obviously, neither of those are actually true.

Because objects, as you actually mentioned, do not *have* scope, you can
define object "scope" for the purpose of analogy with C++ in whatever way
you wish. I think the interpretation here was clear enough.

This is just a lie-to-children. Trust me, I don't talk of objects going out
of scope when going about my daily business. :)
A managed object is either reachable or not. If it's not reachable, it
became unreachable at the exact, deterministic moment at which there is
no longer any way to reach the object via a chain of references starting
at a root reference. At some undetermined time later, the framework
will eventually actually _discover_ that it's unreachable. But it was
always unreachable, from the moment that last reference was removed (or
itself made unreachable).

It's the finalization and collection that's non-deterministic (i.e. the
_known_ state of the object), not the state of the object itself.
In case this wasn't clear yet, it surely is now. :)
 
J

Joe Fawcett

I'll start by answering the XML questions. System.Xml is not a wrapper for
the COM classes, it's completely new managed code. As well as DOM and Sax
(Rarely used in .NET) you also have the XmlReader which offers a pull model,
XPathDocument and the newer Linq to XML stuff (XElement etc.). If you really
want upset me, you could do things like load the XML into a DataSet and
manipulate from there.
The actual difference in usage between msxml2.DomDocument and
System.Xml.XmlDocument is not that great as they both try to implement the
standard DOM methods although XmlDocument has many more methods and
properties than the COM version.
 
B

Ben Voigt [C++ MVP]

Siegfried Heintze said:
Would like to pursue some questions I recently encountered in an
interview -- someone might ask them again in another interview. I was not
very satisfied with the answers I gave the interviewer.

All these questions are quite reasonable. I will answer your immediate
questions, but don't expect to do well in an interview without learning
through experience.
(1) Can someone point me to a discussion of the managed stack? Does it
work the same way as the native (CPU Vender implemented) stack with a
frame pointer that is the head of a linked list of stack frames where each
time we enter a function we create a new stack frame in which new
variables are pushed and each time we exit a function the entire stack
frame is popped?

It IS the native stack. Managed code is converted into machine code by the
JIT compiler. Just like with native code, the inlining optimization means
that there is not a stack frame created for every function call in the
source code.
(2) Can someone point me to a discussion of the managed heap? How does it
work? Does it use counted pointers like COM often does? What, exactly,
happens when use operator= to (shallow) copy a SqlDataReader object from a
stack local variable to a global variable? How does it prevent memory
leaks that occur in COM when two objects reference each other and keep the
others reference count nonzero? How is the managed heap different than the
native heap? I think the managed heap implements defragmentation
automatically like Java. Does it use the mark and sweep algorithm or some
other algorithm.

The managed heap isn't really a heap at all, it's a stack. The garbage
collector is generational. At each collection of Gen0, objects that are
still reachable are moved onto the end of the Gen1 stack, and Gen0 is reset
to empty. Reachability is determined using a few roots (static variable,
stack variables) so that mutual and circular references do not keep objects
alive.

Reference types aren't copied, just new references to them are made. Value
types are copied bitwise. So there are no user-defined copy constructors.
(3) Why do we have non-determanistic destructors in C#? Asked differently:
Why do some classes, like the SQL data reader, need to have their dispose
function called explicitly? Why did not the language designers implement
deterministic destructors so we would not have to manually use the "using"
statement or (worse yet) manually call the dispose function when the
object goes out of scope? Could not the language designers design the C#
language so the compiler tells runtime: "hey! this sqldatareader is going
out of scope so you better call the dispose function."

That design is definitely possible because C++/CLI does provide "stack
semantics" where Dispose is called when the reference goes out of scope.
(4) Is there any circumstance where I would NOT want to call dispose
(explicitly or via the "using" statement) on function local SQL Data
Reader object when it is going out of scope?

Yes. If you have another reference to that instance stored in a member
variable, or your return value.
(5) How is the structure of a managed DLL different from a native DLL?

The managed DLL contains .NET metadata and MSIL code. It also has an entry
for mscoree.dll in its import table, forcing the .NET runtime to load before
the managed "assembly". The .NET runtime provides a JIT compiler which
converts the MSIL into native machine code.
(6) What choices of XML parser implementatoins do I have? I can call the
native MSXML via COM interop or PInvoke and I can the ones in System.Xml.
What is the difference? Is System.XML just a wrapper for MSXML?

See Joe's answer.
(7) What choices of XML parser types are there? There is SAX and DOM. Any
other choices?

See Joe's answer.
(8) What is the difference between using PInvoke to manipulate a semephore
or mutex and using System.Threading?

System.Threading is a trusted library so the security checks are different.
P/Invoke requires an UnmanagedCode permission which is almost never
available in a partial trust scenarion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top