instantiating string

B

Bas

Hello,

I'm relativly new to C#.

I wonder why this statement is legal:

string str = "Hello World";

while string is a reference type.
I would expect something like this:

string str = new string();
str = "Hello World";

or

string str = new string ("Hello World");

as far as I know every reference type must be instantiated with the keyword
'new'. Is string just an exception for convenience?


Bas from Holland
 
G

Göran Andersson

Bas said:
Hello,

I'm relativly new to C#.

I wonder why this statement is legal:

string str = "Hello World";

while string is a reference type.
I would expect something like this:

string str = new string();
str = "Hello World";

or

string str = new string ("Hello World");

as far as I know every reference type must be instantiated with the
keyword 'new'. Is string just an exception for convenience?


Bas from Holland

Strings is a special case handled by the compiler. All the string
literals in the code is created as interned strings. That means that the
string is only created once and not when you assign it to a reference,
and that strings that occur more than once are reused.

A code like this:

string a = "asdf";
string b = "asdf";
string c = "asdf";

is not compiled into something like:

string a = new String("asdf");
string b = new String("asdf");
string c = new String("asdf");

instead it's compiled to something similar to:

static string _s0001 = new String("asdf");

string a = _s0001;
string b = _s0001;
string c = _s0001;

So, a string literal is not just a syntax simplification for creating a
string instance, the string instances are also pre-created and reused
for efficiency.
 
R

RayLopez99

as far as I know every reference type must be instantiated with the keyword
'new'. Is string just an exception for convenience?

Bas from Holland

Goran's answer is good, I vote for it.

My advice is also if you create lots of strings in a loop, use
"StringBuilder" not string, because it's 10x or greater faster. Don't
ask me why (Goran would know) but it is. Use string only for a small
number of instantiations.

RL
 
B

Bas

Göran Andersson said:
Strings is a special case handled by the compiler. All the string literals
in the code is created as interned strings. That means that the string is
only created once and not when you assign it to a reference, and that
strings that occur more than once are reused.

A code like this:

string a = "asdf";
string b = "asdf";
string c = "asdf";

is not compiled into something like:

string a = new String("asdf");
string b = new String("asdf");
string c = new String("asdf");

instead it's compiled to something similar to:

static string _s0001 = new String("asdf");

string a = _s0001;
string b = _s0001;
string c = _s0001;

So, a string literal is not just a syntax simplification for creating a
string instance, the string instances are also pre-created and reused for
efficiency.

Ok thanks, I was already afraid that this was the case!

Bas
 
P

Peter Duniho

Göran Andersson said:
Bas wrote:
[...]
I would expect something like this:

string str = new string();
str = "Hello World";

or

string str = new string ("Hello World");
[...]

Strings is a special case handled by the compiler. All the string
literals in the code is created as interned strings. That means that the
string is only created once and not when you assign it to a reference,
and that strings that occur more than once are reused.

Note also: the class System.String is immutable, making the OP's first
example impossible. So part of the reason for the above (the main
reason, IMHO) is that the alternative would be to require initialization
like this (as in the OP's second example):

string str = new string ("Hello World");

That would be pointless, of course, since the string literal itself has
to live somewhere. It makes a lot more sense for the language to let
the literals be used as instances of System.String directly.

In fact, it's not so much the string interning that justifies/requires
the syntax we get. The language could just as easily be inefficient and
store a new literal for every declaration in code, and it would still
make more sense to just allow literals to be instances of System.String
than to have them be something else passable to the constructor for
System.String.

Pete
 
B

Bas

Peter Duniho said:
Göran Andersson said:
Bas wrote:
[...]
I would expect something like this:

string str = new string();
str = "Hello World";

or

string str = new string ("Hello World");
[...]

Strings is a special case handled by the compiler. All the string
literals in the code is created as interned strings. That means that the
string is only created once and not when you assign it to a reference,
and that strings that occur more than once are reused.

Note also: the class System.String is immutable, making the OP's first
example impossible. So part of the reason for the above (the main reason,
IMHO) is that the alternative would be to require initialization like this
(as in the OP's second example):

string str = new string ("Hello World");

That would be pointless, of course, since the string literal itself has to
live somewhere. It makes a lot more sense for the language to let the
literals be used as instances of System.String directly.

In fact, it's not so much the string interning that justifies/requires the
syntax we get. The language could just as easily be inefficient and store
a new literal for every declaration in code, and it would still make more
sense to just allow literals to be instances of System.String than to have
them be something else passable to the constructor for System.String.

Pete

you are right.
If I understand well: a string is in fact sort of a constant object; every
string is a reference to a constant.

Bas
 
G

Göran Andersson

Bas said:
If I understand well: a string is in fact sort of a constant object;
every string is a reference to a constant.

Yes, the literal strings are created when the assembly loads, so the
exist all the time the application does.

However, calling strings constant objects is somewhat misleading, as
constants doesn't exist at all. A constant is just a name defined at
compile time, they never exist as variables at runtime.

Also, you can have a constant of the type string, which is a bit of a
special case. It's the only reference type that can be a constant, and
it's only the string literal that exists. The constant is still just a
name that exist at compile time.
 
P

Peter Duniho

Göran Andersson said:
Yes, the literal strings are created when the assembly loads, so the
exist all the time the application does.

However, calling strings constant objects is somewhat misleading, as
constants doesn't exist at all. A constant is just a name defined at
compile time, they never exist as variables at runtime.

I agree. It's important to distinguish between "constant" and
"immutable", mainly because C# has a specific concept of "constant"
(i.e. "const"). I mean, I suppose you could use the word "constant" to
describe a type that is immutable, but that can lead to communication
problems, since you'd be using different jargon from what everyone else
is using.
Also, you can have a constant of the type string, which is a bit of a
special case. It's the only reference type that can be a constant, and
it's only the string literal that exists.

In particular, this is allowed _because_ the object can be known at
compile time. Any constant has to have a known value at compile time,
so other reference types – requiring the execution of a constructor at
run-time in order to exist – cannot be used for constants. String
literals are treated as having been constructed at compile-time and so
there's a kind of reference that can be resolved at compile time for
string constants.

Basically, strings behave as much as a primitive value type as they do
the reference type that they really are, and so in the language they get
special treatment to allow things that wouldn't be allowed for other
reference types. Of course, the immutability of the string type is a
very important aspect of this; a mutable string type (e.g.
StringBuilder) couldn't support the kind of compile-time features that
System.String can.
The constant is still just a
name that exist at compile time.

I guess that depends on your definition of "exist". A public constant
declared in a given assembly is still visible in that assembly after
compilation. To me, that's "existence" of a sort.

But yes, inasmuch as there is no variable per se that is referenced when
your code uses a constant – the value of the constant is simply
hard-coded into the code – the constant doesn't exist after compilation.
But according to that definition of "exist", I'd say it doesn't exist
at compile time either. :)

IMHO, the important thing about constants – as opposed to worrying about
how they are implemented – is that they are "baked" into the code. Once
your code is compiled, if it used a constant declared in a different
assembly, even if that assembly is recompiled with a different value of
the constant, your own assembly will not use the new, different value.

It's that "non-variable" behavior that IMHO sets constants apart from
other value identifiers.

Pete
 
J

Jeff Gaines

I guess that depends on your definition of "exist". A public constant
declared in a given assembly is still visible in that assembly after
compilation. To me, that's "existence" of a sort.

It's life Jim, but not as we know it :)
 
R

RayLopez99

My advice is also if you create lots of strings in a loop, use
"StringBuilder" not string, because it's 10x or greater faster.  Don't
ask me why (Goran would know) but it is.  Use string only for a small
number of instantiations.

Further, if you store string literals that get used a lot, a way to
increase performance is to use an "intern pool"--Google this.

RL
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top