Stringbuilder, how does it internally work ?

G

Guest

I know how to use a StringBuilder, which supposedly does
not create a new copy of it each time you modify it
contents by adding or removing text.
But, I wonder how does it do that internally ?
I was planning to use a stringbuilder to hold big amounts
of text, with several megs o size, to be read later based
on fixed offsets, so I need to know if this is suitable
for it.


Thanks,
 
J

Jon Skeet [C# MVP]

I know how to use a StringBuilder, which supposedly does
not create a new copy of it each time you modify it
contents by adding or removing text.
But, I wonder how does it do that internally ?
I was planning to use a stringbuilder to hold big amounts
of text, with several megs o size, to be read later based
on fixed offsets, so I need to know if this is suitable
for it.

The *exact* details are quite tricky - I believe it internally has a
string which it changes, creating a new string of larger capacity when
it needs to.

The *basic* details, however, are that it's effectively like an
ArrayList but for chars. If you understand how ArrayList works, that
should give you a good feeling for StringBuilder.

In your case, if you're creating the StringBuilder, appending a lot of
text to it, and then *only* reading from it, I'd convert it to a String
(using ToString) before you start reading from it. That will make the
code easier to understand for one thing. (Most developers know about
reading from a string in various ways, but not reading from a
StringBuilder.)
 
N

Nicholas Paldino [.NET/C# MVP]

Basically, the StringBuilder holds an array of bytes internally (I'm
guessing, but it can't hold stirngs, since strings are immutable, so it can
just store the byte representation of the string). When the capacity is
exceeded, then a new buffer is allocated.

If you know that you are definitely going to have text of that size in
thye StringBuilder, then for the best performance, you might want to
pre-allocate the buffer that is used internally. Basically, set the
Capacity property to a value that is above what you are going to need on
average. This way, in most cases, the internal buffer will be allocated to
hold what you need.

Hope this helps.
 
N

Nicholas Paldino [.NET/C# MVP]

Jon,

It can't use a string internally, because of the immutable nature of
strings in .NET.

Unless of course by string, you don't mean an actual instance of
System.String, but rather, an array of characters.
 
J

Jon Skeet [C# MVP]

Nicholas Paldino said:
It can't use a string internally, because of the immutable nature of
strings in .NET.

Unless of course by string, you don't mean an actual instance of
System.String, but rather, an array of characters.

Nope, I mean a string.

Strings don't have to be *absolutely* immutable in .NET - they just
have to be *publicly* immutable, and only tampered with *very, very*
carefully within the assembly which defines the string type - and I
believe that's exactly the case.

If you look at the Rotor BCL source code, you'll see how it can be
done. Reading Don Box's "Essential .NET volume 1" book suggests that
it's how .NET does it too.

(There's also a web page about this kind of thing, but I can't find it
at the moment.)
 
S

Shawn B.

I recent created a set of classes that add low-level binary value-types to
C#, MC++ (Not VB). I wanted to make them immutable because it seems
everytimed I overload an operator I have to return a new instance of the
object with the new value. The Stringbuilder is mutable so I studied its
internal workings for quite some time.
http://www.visualassembler.com/binary

There is a reason why we have to use Appen instead of +=. Internally, the
StringBuilder is an array of bytes. When you Append or do any other
operation on the object, that array is manipulated rather than reallocating
the string an returning it. There is no special magic taking place. No
undocumented "API"'s or anything like that. Just a plain, boring array of
chars. But it works. So there is the secret.


Thanks,
Shawn



Nicholas Paldino said:
Basically, the StringBuilder holds an array of bytes internally (I'm
guessing, but it can't hold stirngs, since strings are immutable, so it can
just store the byte representation of the string). When the capacity is
exceeded, then a new buffer is allocated.

If you know that you are definitely going to have text of that size in
thye StringBuilder, then for the best performance, you might want to
pre-allocate the buffer that is used internally. Basically, set the
Capacity property to a value that is above what you are going to need on
average. This way, in most cases, the internal buffer will be allocated to
hold what you need.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

I know how to use a StringBuilder, which supposedly does
not create a new copy of it each time you modify it
contents by adding or removing text.
But, I wonder how does it do that internally ?
I was planning to use a stringbuilder to hold big amounts
of text, with several megs o size, to be read later based
on fixed offsets, so I need to know if this is suitable
for it.


Thanks,
 
N

Nicholas Paldino [.NET/C# MVP]

Shawn,

Actually, the string builder is not an array of bytes... See Jon Skeet's
post for more information.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Shawn B. said:
I recent created a set of classes that add low-level binary value-types to
C#, MC++ (Not VB). I wanted to make them immutable because it seems
everytimed I overload an operator I have to return a new instance of the
object with the new value. The Stringbuilder is mutable so I studied its
internal workings for quite some time.
http://www.visualassembler.com/binary

There is a reason why we have to use Appen instead of +=. Internally, the
StringBuilder is an array of bytes. When you Append or do any other
operation on the object, that array is manipulated rather than reallocating
the string an returning it. There is no special magic taking place. No
undocumented "API"'s or anything like that. Just a plain, boring array of
chars. But it works. So there is the secret.


Thanks,
Shawn



message news:[email protected]...
Basically, the StringBuilder holds an array of bytes internally (I'm
guessing, but it can't hold stirngs, since strings are immutable, so it can
just store the byte representation of the string). When the capacity is
exceeded, then a new buffer is allocated.

If you know that you are definitely going to have text of that size in
thye StringBuilder, then for the best performance, you might want to
pre-allocate the buffer that is used internally. Basically, set the
Capacity property to a value that is above what you are going to need on
average. This way, in most cases, the internal buffer will be allocated to
hold what you need.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

I know how to use a StringBuilder, which supposedly does
not create a new copy of it each time you modify it
contents by adding or removing text.
But, I wonder how does it do that internally ?
I was planning to use a stringbuilder to hold big amounts
of text, with several megs o size, to be read later based
on fixed offsets, so I need to know if this is suitable
for it.


Thanks,
 
J

Jon Skeet [C# MVP]

Shawn B. said:
I recent created a set of classes that add low-level binary value-types to
C#, MC++ (Not VB). I wanted to make them immutable because it seems
everytimed I overload an operator I have to return a new instance of the
object with the new value. The Stringbuilder is mutable so I studied its
internal workings for quite some time.

How did you go about studying those internal workings, out of interest?
http://www.visualassembler.com/binary

There is a reason why we have to use Appen instead of +=. Internally, the
StringBuilder is an array of bytes. When you Append or do any other
operation on the object, that array is manipulated rather than reallocating
the string an returning it. There is no special magic taking place. No
undocumented "API"'s or anything like that. Just a plain, boring array of
chars. But it works. So there is the secret.

I beg to differ - and now I've found the page which describes a lot of
it in detail:

http://www.codeproject.com/dotnet/strings.asp

<quote>
StringBuilder will construct a string object (which, you thought, were
immutable) and modify it directly.
</quote>

Note that there's no reason why it would count as an "undocumented
API" for String to be modifiable internally to the defining assembly
any more than any other internal member would count as an "undocumented
API". It's unavailable for other assemblies to use, which is why we
don't get to see the documentation for it.
 
S

Shawn B.

Well, one need look no further than ILDASM on mscorlib.dll. Rotor. Mono
(appears to be very similar in its implimentation to what I see in IL when
DASM'd). What is described in the article link you provide, I don't see it
in the ILDASM when looking at the System.Text.StringBuilder object.


Thanks,
Shawn
 
J

Jon Skeet [C# MVP]

Shawn B. said:
Well, one need look no further than ILDASM on mscorlib.dll. Rotor.

What member of Rotor are you looking at? I'm looking at the member
m_StringValue, which is what gets updated. (Look at Append(char) for
instance, calling String.AppendInPlace.)
Mono (appears to be very similar in its implimentation to what I see in IL when
DASM'd). What is described in the article link you provide, I don't see it
in the ILDASM when looking at the System.Text.StringBuilder object.

With respect, I still think you're definitely wrong.

Out of interest, which version of the framework are you looking at, and
what particular member is it that hold the char array in StringBuilder?
 
J

Jon Skeet [C# MVP]

Shawn B. said:
Well, one need look no further than ILDASM on mscorlib.dll. Rotor. Mono
(appears to be very similar in its implimentation to what I see in IL when
DASM'd). What is described in the article link you provide, I don't see it
in the ILDASM when looking at the System.Text.StringBuilder object.

I've just downloaded the Mono 0.28 source, and that does indeed use a
char array in StringBuilder. The Rotor source is entirely different.
 
J

Jon Skeet [C# MVP]

Shawn B. said:
Well, one need look no further than ILDASM on mscorlib.dll. Rotor. Mono
(appears to be very similar in its implimentation to what I see in IL when
DASM'd). What is described in the article link you provide, I don't see it
in the ILDASM when looking at the System.Text.StringBuilder object.

Rather than use ILDASM, I've written the following program which
probably isn't quite sailing quite as close to the wind in terms of the
EULA.

using System;
using System.Reflection;
using System.Text;

public class Test
{
static void Main()
{
foreach (FieldInfo field in typeof(StringBuilder).GetFields
(BindingFlags.NonPublic|
BindingFlags.Public|
BindingFlags.Instance))
{
Console.WriteLine ("{0} ({1})",
field.Name,
field.FieldType);
}
}
}


On my machine (.NET v1.1) this produces:

m_currentThread (System.Int32)
m_MaxCapacity (System.Int32)
m_StringValue (System.String)

What does it produce on yours?

(This is my final reply to that particular post :)
 
W

William Stacey

StringBuilder uses an internal string defined as: internal string
m_StringValue;

"string" ultimately uses an array of chars and keeps a pointer to the first
char and the length to manage it.
SB also uses internal methods of the "string" class to help it.
public StringBuilder Append(string value)
{
....
//AppendInPlace
//ReplaceTheString
....
}

--
William Stacey, MVP

Shawn B. said:
Well, one need look no further than ILDASM on mscorlib.dll. Rotor. Mono
(appears to be very similar in its implimentation to what I see in IL when
DASM'd). What is described in the article link you provide, I don't see it
in the ILDASM when looking at the System.Text.StringBuilder object.


Thanks,
Shawn



value-types
to

How did you go about studying those internal workings, out of interest?
array
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top