newbie trouble making array of instances

  • Thread starter Thread starter luke
  • Start date Start date
Peter said:
G.Doten said:
Well, that just says it's documented - it doesn't explain *how* it
happens. Calling a constructor should logically (and even according
to the C# spec) create a new object. You certainly can't create a C#
class which has this behaviour, for instance.

Sure you can. All the char[] constructor is doing is looking to see if
the passed-in char array is null or if it has 0 elements, in which
case it sets it's internal string storage to the string.Empty
instance. Any class can easily be written to do the same.

Please post some sample code in which you can instantiate the same class
twice, using the "new" operator, in which the _references_ of both
instances are identical.

I think you guys are right and I'm wrong on this one. At least I can't
get a class to do this (see my attempt below). I thought that
String.Intern would cause a single instance of my Empty field to be
created, but that doesn't seem to be the case, and I don't understand why.

The String class is integrated right into the CLR so that it knows the
exact layout of the class' fields (quoting Jeffrey Richter here), so I
bet if you look at the IL generated that there is special "string code"
that checks the one instance of the Empty field. But I'll admit I'm a
little puzzled on this one too.
No. The reference contained by the variable str1 is being replaced by a
new reference, which is what Jon wrote. However, the original string is
not replaced. For example:

string str1, str2;

str2 = "test string";
str1 = str2;
str1 = str1.Remove(0, str1.Length);

The original string "test string" still exists. It was not replaced. A
new string instance was created by the Remove() method, and the
reference to this new instance was assigned to str1.

I didn't say that the str2 was replaced or that str1's old string value
was replaced. Here's what Jon said:

and I said:
Well, the value of str1 was replaced with a reference to a different
instance. The string itself wasn't being replaced.

Indeed, str1 is replaced by a new instance of a string object, the one
returned by the call to str1.Remove. And the reference to the string
that str1 used to point to is decremented by 1 so when it goes to 0 the
GC can have at it.
None of this, however, explains why the Remove() method doesn't simply
return the string.Empty instance, rather than actually create a brand
new instance that happens to have data equivalent to string.Empty.

I see. I guess for the same reason that the char[] constructor behaves
the way it does, because it is documented to. This is in the Remove
documentation: "returns a new String that is equivalent to this instance
less count number of characters." Notice there is no verbiage describing
special-casing around a string that happens to come back as empty; it
says it *always* returns a new string. Which makes sense to me given the
immutable nature of strings. I don't think I'd want the Remove method to
have that special case, but I can see it's usefulness (in terms of
performance) for the char[] constructor.

-----------------------------------------------------------------

Here's the code I tried:

using System;

namespace MyStringTest
{
class Program
{
static void Main()
{
MyString x = new MyString(new char[] { });
MyString y = new MyString(new char[] { });
Assert("case 1a", true, ReferenceEquals(x, y));
Assert("case 1b", true, ReferenceEquals((object)x, (object)y));

string s1 = new string(new char[] { });
string s2 = new string(new char[] { });
Assert("case2", true, ReferenceEquals(s1, s2));

string s = "Hello";
Assert("case2", true, ReferenceEquals("Hello", s));

string h1 = "Hel";
string h2 = "lo";
Assert("case3a", false, ReferenceEquals("Hello", h1+h2));
Assert("case3b", true, ReferenceEquals("Hello", string.Intern(h1+h2)));
}

static void Assert(string name, bool expected, bool actual)
{
Console.Write(name + " - ");
Console.WriteLine(expected == actual ? "success" : "failure");
}
}

public class MyString
{
public static readonly string Empty = string.Intern("");

string _s;

public MyString(char[] ch)
{
if (ch == null || ch.Length == 0)
_s = Empty;
else
_s = new string(ch);
}

public bool IsEmpty
{
get
{
return _s == Empty;
}
}
}
}
 
G.Doten wrote:
[...]
I didn't say that the str2 was replaced or that str1's old string value
was replaced.

That is what you appeared to say. Call it a semantic misunderstanding
if you like.

Here's what Jon said:
and I said:


Indeed, str1 is replaced by a new instance of a string object, the one
returned by the call to str1.Remove.

Here you appear to say it again. IMHO, it is worthwhile to be very
careful about the language you use. The computer is very literal, and
so when talking about what the computer does, one should be very literal.

In particular: "str1" is a variable. It's not replaced at all, so
writing "str1 is replaced" doesn't make sense. There is a string to
which "str1" refers. That's not replaced at all either. There is a
reference value that is stored in "str1". This _is_ replaced, by a new
and different reference value.

And the reference to the string
that str1 used to point to is decremented by 1 so when it goes to 0 the
GC can have at it.

C# garbage collection doesn't use reference counting, so even if you
meant "the count of references to the string that str1 used to point to"
(clearly the reference itself shouldn't be decremented), the statement
would be incorrect.
None of this, however, explains why the Remove() method doesn't simply
return the string.Empty instance, rather than actually create a brand
new instance that happens to have data equivalent to string.Empty.

I see. I guess for the same reason that the char[] constructor behaves
the way it does, because it is documented to.

I don't see that as a reason. Things are documented to describe how
those things behave. Saying that things behave in a particular way
_because_ they are documented to do so doesn't explain anything.

This is in the Remove
documentation: "returns a new String that is equivalent to this instance
less count number of characters." Notice there is no verbiage describing
special-casing around a string that happens to come back as empty; it
says it *always* returns a new string.

Again, that's simply the documentation describing what it does. That
doesn't explain _why_ it does what it does.

Which makes sense to me given the
immutable nature of strings. I don't think I'd want the Remove method to
have that special case, but I can see it's usefulness (in terms of
performance) for the char[] constructor.

Why and why not? Why is it okay to special case the string(char[])
constructor, but not the Remove() method? You get a memory performance
benefit out of both, at the (very tiny tiny) cost of execution performance.

Your code doesn't do anything to attempt to change the reference
returned by the "new" operator. All it does is ensure that each
instance of your MyString class contains a reference to a single "Empty"
string instance when possible.

The reference to the class itself is not at all the same as a reference
to some member of the class.

Pete
 
Peter said:
G.Doten wrote:
[...]
I didn't say that the str2 was replaced or that str1's old string
value was replaced.

That is what you appeared to say. Call it a semantic misunderstanding
if you like.

Fair enough.
Here's what Jon said:

Here you appear to say it again. IMHO, it is worthwhile to be very
careful about the language you use. The computer is very literal, and
so when talking about what the computer does, one should be very literal.

In particular: "str1" is a variable. It's not replaced at all, so
writing "str1 is replaced" doesn't make sense. There is a string to
which "str1" refers. That's not replaced at all either. There is a
reference value that is stored in "str1". This _is_ replaced, by a new
and different reference value.

I believe we are saying the same thing. I fully understand that str1 is
a reference to a string but the places I work "str1 is replaced," in the
context I said it, would be normal parlance for the more accurate
terminology "the reference in str1 is replaced with a value (reference)
that points to a different string instance." That's what I meant so,
again, we mean the same thing, I just wasn't being pedantic when I said
it. But as you rightly point out, that can lead to technical
miscommunication.
And the reference to the string

C# garbage collection doesn't use reference counting, so even if you
meant "the count of references to the string that str1 used to point to"
(clearly the reference itself shouldn't be decremented), the statement
would be incorrect.

You are absolutely correct. My point was that once there are no more
references to the instance of the string that str1 was originally
pointing to (as in Jon's example), the GC will be able to collect that
string (assuming no nothing else has a reference to it).
None of this, however, explains why the Remove() method doesn't
simply return the string.Empty instance, rather than actually create
a brand new instance that happens to have data equivalent to
string.Empty.

I see. I guess for the same reason that the char[] constructor behaves
the way it does, because it is documented to.

I don't see that as a reason. Things are documented to describe how
those things behave. Saying that things behave in a particular way
_because_ they are documented to do so doesn't explain anything.

This is in the Remove
documentation: "returns a new String that is equivalent to this
instance less count number of characters." Notice there is no verbiage
describing special-casing around a string that happens to come back as
empty; it says it *always* returns a new string.

Again, that's simply the documentation describing what it does. That
doesn't explain _why_ it does what it does.

Which makes sense to me given the
immutable nature of strings. I don't think I'd want the Remove method
to have that special case, but I can see it's usefulness (in terms of
performance) for the char[] constructor.

Why and why not? Why is it okay to special case the string(char[])
constructor, but not the Remove() method? You get a memory performance
benefit out of both, at the (very tiny tiny) cost of execution performance.

I guess I just don't understand the distinction here. Are you looking
for some sort of "philosophical" reason why the BCL writers decided to
write these two methods like that? Because otherwise these behaviors are
nothing more than implementation details, and they just are what they
are. To me the behavior makes sense (or at least it doesn't make me ask,
"huh?"). If Remove special-cased null/empty and returned Empty in such a
case I think, as far as I understand what you are asking, that you would
have just as legitimate a question. So I think I'm being thick here, to
be honest.
Your code doesn't do anything to attempt to change the reference
returned by the "new" operator. All it does is ensure that each
instance of your MyString class contains a reference to a single "Empty"
string instance when possible.

The reference to the class itself is not at all the same as a reference
to some member of the class.

Yes, you are correct again. That dawned on me after playing around with
the code some more; I was barking up the wrong tree. Again, the CLR has
special knowledge of the actual fields within a String object and that
has got to be how it does it's magic allowing what you would normally
think are two separate instances of a string (because all the other
objects behave that way) to be the very same instance.

My understanding is that these special String hooks in the CLR are
totally for enhanced performance of strings. I think you just have to be
aware of this special behavior of strings when working with them and
that no other object behaves this way (AFAIK).
 
G.Doten wrote:
[...]
I guess I just don't understand the distinction here. Are you looking
for some sort of "philosophical" reason why the BCL writers decided to
write these two methods like that?

Essentially, yes.

Because otherwise these behaviors are
nothing more than implementation details, and they just are what they
are.

That's fine, but it still doesn't explain the _reason_ behind why they
are what they are.

To me the behavior makes sense (or at least it doesn't make me ask,
"huh?"). If Remove special-cased null/empty and returned Empty in such a
case I think, as far as I understand what you are asking, that you would
have just as legitimate a question.

Yes, I could still ask the question "why?". You can always ask the
question "why?" But I would be more inclined to be able to infer an
answer to that question if the behavior were always consistent. That
is, if the string class always converted empty strings to the single
empty instance, or if it never did.

The fact that sometimes it does and sometimes it doesn't is what raises
the question for me. So far, no one's offered an explanation, so
perhaps there isn't one. Sometimes, things are inconsistent due to
simple lack of consideration on the part of the designer.

But sometimes the designer is aware of the inconsistency, but lives with
it because of some other issue that is less obvious.

Pete
 
G.Doten said:
Well, that just says it's documented - it doesn't explain *how* it
happens. Calling a constructor should logically (and even according to
the C# spec) create a new object. You certainly can't create a C# class
which has this behaviour, for instance.

Sure you can. All the char[] constructor is doing is looking to see if
the passed-in char array is null or if it has 0 elements, in which case
it sets it's internal string storage to the string.Empty instance.

No, that's not what it's doing. It's a constructor *to String* which
isn't returning a reference to a new object. That defies the C# spec
which states:

<quote>
The new operator implies creation of an instance of a type, but does
not necessarily imply dynamic allocation of memory.
</quote>

(The latter part is for value types, basically.)

In this case, the string constructor is *not* creating a new instance
of string.
Any class can easily be written to do the same.

No, it can't.
Well, yes, str1 is being replaced by a new string instance.

No. str1 isn't a string. It's a reference to a string.
The expression:

str1.Remove(0, str1.Length)

creates a new string object and it is this new instance that replaces
the old instance that str1 used to point to.

No, it's the *reference* to the new string that replaces the previous
value of str1. It's not a case of one string *instance* replacing
another.
Strings cannot be modified
in-place due to their immutable nature (unless unsafe code is used).

Indeed, and that doesn't go against what I was saying.
 
Oh now I've got no hope. I might have been returning the address of
the pointer but that was my only explanation for why updating any item
in the array updates all items. And I was kinda hoping I knew so
little about C# that I was just messing up my constructor. I suppose
I'll post my actual code tomorrow and see if anyone can spot my dumb
mistake. Thanks everyone. Glad to know that I'm not too far off
track and that C# isn't as esoteric as I was beginning to fear.

I woke this morning and without looking at my code I new what I'd
done. I'd declared all the member variables of 'item' static. I
don't know why - retarded moment. Sorry for wasting your time. I
learned about posting short working code though. Thanks Jon.

I was thrown by the reference to each 'item' being the same. Had the
watch window displayed address for 'item' rather than contents of item
I'd not been compelled to use &item. I tried *item and that seems to
show the address - does this mean show item as a pointer?

Thanks All.
 
luke said:
I woke this morning and without looking at my code I new what I'd
done. I'd declared all the member variables of 'item' static.

So you had separate instances of the class, but none of them contain any
data at all. :)
I
don't know why - retarded moment. Sorry for wasting your time. I
learned about posting short working code though. Thanks Jon.

I was thrown by the reference to each 'item' being the same. Had the
watch window displayed address for 'item' rather than contents of item
I'd not been compelled to use &item. I tried *item and that seems to
show the address - does this mean show item as a pointer?

A reference is a pointer, but the syntax when you use a reference is
different from using a pointer.

The & operator gives you the address of the variable, i.e. the address
of the pointer. If the reference is a local variable, this is an address
inside the stack frame for the method.

The * operator dereferences the reference and gives you the value of the
variable, which is the address of the object.


In your code, &i gives you the address of the local variable i, which of
course is the same regardless of what the variable contains.
 
Back
Top