System::String class design decisions

E

Edward Diener

Am I curious about two of the major design decisions for the System::String
class and why they were made. Those two decisions are:

1) System::String is an immutable class, with System::Text::StringBuilder
being the class that allows a string to be modified internally.
2) System::String is a reference class rather than a value class.

OK, I will admit right off, rather than try to forestall my criticism of
these design decisions in some other way, that I am a C++ progammer and that
the std::string class in C++ is mutable and almost always stack-based in
actual use rather than dynamically allocated. Given that prior orientation,
and knowing that the Java string class, like the .NET System::String class,
is also immutable and reference based, I am still curious as to the design
decision in .NET to follow this same path.

The practical reason I mention this is that it seems as if it would have
been much easier from the programmer's point of view to deal with
System::String as a value class and as a mutable class, rather than have to:

1) Make changes to System::String which returns another dynamically
allocated System::String, or using another class,
System::Text::StringBuilder just to make changes, something which seems like
a kludge to me.
2) Create System::Strings simply by writing 'System::String x(...some
constructor)' without having to write 'System::String x = new
System::String(...some constructor)'.

I do realize that any string class which can contain a variable length
series of characters does have to use dynamic memory internally, so that my
criticism of the second design decision above is not base directly on using
dynamic memory rather than stack-based memory. My questioning of the design
decsions above, and being interested in any justifications for them, is
based on ease of use from the programmer's point of view. To me it is much
easier to be able to instantiate a string class directly as a stack-based
object, and change that particular object if necessary, than to have to
instantiate a string as a dynamically allocated object, and have changes
made to that object result in a new string being dynamically allocated and
passed back to me.
 
B

Bruno Jouhier [MVP]

Interesting. Here is how I view it.

1) I think that this is an excellent design decision. The main benefit is
that 99% of the time, you read strings, and 1% of the time, you modify them
(at least in the kind of applications I write). So, with immutable strings,
you can directly return the strings that are inside your objects, nobody
will ever be able to modify them and corrupt the internal state of your
object. With C++ strings, you have the choice between returning a clone
(safe but inefficient) or returning a const string (not completely safe
because of C++ lets you cast the const away). Also, immutability is critical
for objects that you use as keys into hashtables, and strings are very good
candidate for this.

2) Seems like a mistake to me. System.String should be a value type. The
only problem is that then, you lose the null value, and this can create
problems with existing APIs, databases, etc. (not with Oracle, though, that
cannot make the difference between "" and null).

My personal theory is that all values should be immutable, but it is too
long to explain here.

Bruno.
 
D

Derek Slager

Am I curious about two of the major design decisions for the
System::String class and why they were made. Those two decisions are:

1) System::String is an immutable class, with
System::Text::StringBuilder being the class that allows a string to be
modified internally.

Like the Java platform, the .NET framework class library was designed to
make it more difficult to make common programming mistakes. Certainly
there are some advantages to making strings mutable, but I think Microsoft
and Sun both (wisely) chose the path which is less prone to programmer
error. A nice example can be found below:

http://www.churchillobjects.com/c/11027b.html

A good resource on the value of immutability in general is 'Effective
Java', by Joshua Bloch. Many, if not most of the concepts in Bloch's book
are applicable to the .NET framework class library as well.

-Derek
 
E

Edward Diener

Bruno said:
Interesting. Here is how I view it.

1) I think that this is an excellent design decision. The main
benefit is that 99% of the time, you read strings, and 1% of the
time, you modify them (at least in the kind of applications I write).

You are writing different applications than I am. All my apps and modules do
heavy modifications to strings and passed them around quite a bit. The idea
of string that are set upon construction and rarely modified after that is
certainly not a paradigm I have met too often.
So, with immutable strings, you can directly return the strings that
are inside your objects, nobody will ever be able to modify them and
corrupt the internal state of your object. With C++ strings, you have
the choice between returning a clone (safe but inefficient) or
returning a const string (not completely safe because of C++ lets you
cast the const away).

You are technically right but practically wrong. Casting away const in C++
is something that every good C++ programmers knows is to be done only in an
emergency basis. A typical emergency basis is a function that is typed
incorrectly as taking a "std::string &" but should be typed to take a "const
std::string &" since the string itself is never changed. But this is a real
rarity. I would say that passing stack-based string values by reference in
C++, so that the value can be changed, is one of the most common C++-isms in
the language. In the case where a string is passed as a const reference, all
C++ programmers know not to attempt to const_cast the const away except in
the type of emergency I specified. If C++ didn't have const_cast, so that
one could cast a const away, and therefore was much "safer" in your
estimation, it wouldn't be in the spirit of C++ and its practical, instead
of theoretical, considerations.
Also, immutability is critical for objects that
you use as keys into hashtables, and strings are very good candidate
for this.

I agree with the commoness of the example, but passing a mutable object to
something which expects an immutable object should never be a problem.
2) Seems like a mistake to me. System.String should be a value type.
The only problem is that then, you lose the null value, and this can
create problems with existing APIs, databases, etc. (not with Oracle,
though, that cannot make the difference between "" and null).

Not having a 'null' value doesn't mean one couldn't check for an empty
string and put a null into a table if that were the case. I actually think
it is generally good etiquette not to pass a string as a 'null' value as
opposed to passing an empty string. Perhaps there is a reason, beside the
database one, to distinguish between a 'null' string and an empty string,
but I have not seen it.
My personal theory is that all values should be immutable, but it is
too long to explain here.

Given that value types are normally passed by their value in .NET, I can see
your point. But in other languages, such as C++, a value type, ie. a
stack-based variable, is often passed by reference ( or pointer ). So I
certainly wouldn't agree with you in those types of languages.
 
E

Edward Diener

Derek said:
Like the Java platform, the .NET framework class library was designed
to make it more difficult to make common programming mistakes.

It is generally a programming mistake to change a string value ? If a
System::String were a value type, one could specify a System::String type
parameter as a value in order to not affect the original value, or specify a
System::String type parameter as a reference in order to allow the original
value to be changed. That is a paradigm with which, I admit, I am much more
comfortable. Evidently the .NET designers found that specify a reference
parameter or not was too error prone, but I have to admit to laughing at
that assumption.
Certainly there are some advantages to making strings mutable, but I
think Microsoft and Sun both (wisely) chose the path which is less
prone to programmer error. A nice example can be found below:

http://www.churchillobjects.com/c/11027b.html

Thanks for the link.
A good resource on the value of immutability in general is 'Effective
Java', by Joshua Bloch. Many, if not most of the concepts in Bloch's
book are applicable to the .NET framework class library as well.

I am well aware that the .NET System::String closely models the Java string
class as far as immutability and a proxy class to change the string
internally is concerned.
 
E

Eric Newton

There's also the notion that we think of strings as one "thing" instead of a
group of chars in memory, all consec. ordered near each other... whereas
int's and even decimal's are represented by a finite space of memory
bytes... hence why strings arent value types, as the concept goes...

Personally I like the immutability of it, and using StringBuilder isnt
internally modifying a string... its modifying a char array that looks like
a string but if you look closely, there's always some padding at the end of
the internal "string"

As more seasoned developers, we just need to stress the importance of
encouraging string manipulation via StringBuilder or via char arrays, and
help the novice developers understand why this is a better practice.
Especially in light of C++ strings which were easy to overflow and hence
write arbitrary code...
 
E

Edward Diener

Eric said:
There's also the notion that we think of strings as one "thing"
instead of a group of chars in memory, all consec. ordered near each
other... whereas int's and even decimal's are represented by a finite
space of memory bytes... hence why strings arent value types, as the
concept goes...

That is true, but I like the analogy that value types are fairly simple as
far as the data they contain is concerned while ref types have much more
complicated data requirements. By this analogy System::String should be a
value type.
Personally I like the immutability of it, and using StringBuilder isnt
internally modifying a string... its modifying a char array that
looks like a string but if you look closely, there's always some
padding at the end of the internal "string"

I think it is a kludge, whereas modifying strings directly is a much clearer
and easier programming idiom.
As more seasoned developers, we just need to stress the importance of
encouraging string manipulation via StringBuilder or via char arrays,
and help the novice developers understand why this is a better
practice.

I do understand that manipulating strings as character arrays in-place is
better, or more natural, so perhaps that is what you mean. Still I would
have preferred Sytem::String to have the same ability to manipulate string
in-place, as a character array, as System::Text::StringBuilder and dispense
with the latter.
Especially in light of C++ strings which were easy to
overflow and hence write arbitrary code...

This last doesn't make sense to me unless you are talking about C
null-terminated strings. I admit that because of std::string in C++ I
haven't used a C null-terminated string in many years. Of course one can try
to assign a character to a std::string position which doesn't exist, but I
think this usage of std::string is really minimal given all of the
functionality in the class for manipulating strings via member functions. I
find std::string to be much more usable than
System::String-System::Text::StringBuilder, or any other string class past
or present.
 
J

Jon Skeet [C# MVP]

Eric Newton said:
There's also the notion that we think of strings as one "thing" instead of a
group of chars in memory, all consec. ordered near each other... whereas
int's and even decimal's are represented by a finite space of memory
bytes... hence why strings arent value types, as the concept goes...

Personally I like the immutability of it, and using StringBuilder isnt
internally modifying a string... its modifying a char array that looks like
a string but if you look closely, there's always some padding at the end of
the internal "string"

No, it *is* modifying a string. Yes, there's some padding at the end of
the string, but it's still a string. In Java it is genuinely a char
array, but in .NET it's a string - strings themselves are mutable
within mscorlib.
As more seasoned developers, we just need to stress the importance of
encouraging string manipulation via StringBuilder or via char arrays, and
help the novice developers understand why this is a better practice.

Agreed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top