StringBuilder much much faster and better than String forconcatenation !!!

R

raylopez99

StringBuilder better and faster than string for adding many strings.

Look at the below. It's amazing how much faster StringBuilder is than
string.

The last loop below is telling: for adding 200000 strings of 8 char
each, string took over 25 minutes while StringBuilder took 40
milliseconds!

Can anybody explain such a radical difference?

The hardware running this program was a Pentium IV with 2 GB RAM.

RL

// stringbuilder much faster than string in concatenation

//////////////
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace console1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("hi \n");
UpdateTime myUpdateTime = new UpdateTime(1000);
myUpdateTime.UpdateTimeMethod();
Console.WriteLine("times str,sb are: {0}, {1}",
myUpdateTime.txtConcatTime, myUpdateTime.txtStringBTime);
}
}
}

/*
* OUTPUT
* results:
* for 1000 iterations: string = 10.01ms; stringbuilder = 0
* for 5000 iterations: string = 410.6ms; stringbuilder = 0
* for 50k iterations: sring = 79013 ms; stringbuilder = 0;
* for 10k iterations : string = 1772.5 ms; stringbuilder = 0;
* for 75k iterations : string = 186237.8ms; stringbuilder = 20.03
ms
* for 100k iterations : string = 334.4k ms (5.6 min); stringbuilder =
20.03 ms;
* for 200k iterations: string = 1515.6k ms (25.3 min); stringbuilder
= 40.06 ms;
*
*
* */
//////////////////////////////
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace console1
{
class UpdateTime
{
int txtInterations;
public string txtConcatTime;
public string txtStringBTime;
public UpdateTime(int i)
{
txtInterations = i;
txtConcatTime = "";
txtStringBTime = "";
}

public void UpdateTimeMethod()
{

int iterations = txtInterations;

string theString = "MyString";

DateTime strCall = DateTime.Now;

string targetString = null;

for (int x = 0; x < iterations; x++)
{
targetString += theString;
}

TimeSpan time = (DateTime.Now - strCall);

txtConcatTime = time.TotalMilliseconds.ToString();

//StringBuilder

DateTime inCall = DateTime.Now;

string theString2 = "MyStrig2";
StringBuilder sb = new StringBuilder(theString2);

for (int x = 0; x < iterations; x++)
{
sb.Append(theString2);
}

time = (DateTime.Now - inCall);

txtStringBTime = time.TotalMilliseconds.ToString();



}

}
}
/////////////////////
 
C

colin

ive noticed this, and have had to use it in verious places,
usualy after ive run profiler, to see where its needed.

basically theres a fair bit of work when you add two strings together,
as your making a new object each time, stringbuilder is a single object,
and so doesnt have any where near the same overhead to apend.

Colin.
 
B

Brian Gideon

StringBuilder better and faster than string for adding many strings.

Look at the below.  It's amazing how much faster StringBuilder is than
string.

The last loop below is telling:  for adding 200000 strings of 8 char
each, string took over 25 minutes while StringBuilder took 40
milliseconds!

Can anybody explain such a radical difference?

The hardware running this program was a Pentium IV with 2 GB RAM.

RL

See the following article.

http://www.yoda.arachsys.com/csharp/stringbuilder.html
 
J

Jon Skeet [C# MVP]

raylopez99 said:
StringBuilder better and faster than string for adding many strings.

Look at the below. It's amazing how much faster StringBuilder is than
string.

The last loop below is telling: for adding 200000 strings of 8 char
each, string took over 25 minutes while StringBuilder took 40
milliseconds!

Can anybody explain such a radical difference?

Very easily. It's all to do with creating copies. This difference is
the whole point of StringBuilder existing in the first place.

See http://pobox.com/~skeet/csharp/stringbuilder.html
 
M

Marc Gravell

The "string table" you mention is the interner; by default, strings in
compiled code get interned, but not strings that you build at runtime
(for example, via concatenation); as such, the string "abc" never gets
collected, because it is interned. In fact, the compiler is too clever
by half, and actually does the "abcd" concatenation itself, so the
string "abcd" is interned too ;-p

StringBuilder *operates* like a list/array of characters, but is
actually implemented as a regular .NET string, which it abuses and
tortures to mutate at runtime.

Marc
 
G

Göran Andersson

raylopez99 said:
The last loop below is telling: for adding 200000 strings of 8 char
each, string took over 25 minutes while StringBuilder took 40
milliseconds!

Can anybody explain such a radical difference?

For each time you add eight characters to the string, the entire string
is copied into a new string along with the new characters.

In the first iteration you copy 16 bytes (8 character, each two bytes).
In the second iteration you copy 32 bytes.
In the third iteration you copy 48 bytes.
And son on...

When you reach the 200000th iteration, you will have copied:

16*(1+2+3+4+5+...+200000) = 16*(100000*200001) = 320001600000 bytes

That is 320 GB. That 160 times more than you have in your computer. To
create a string that is 1600000 characters, you have copied 100000 times
that much data.

The StringBuilder has to grow it's internal string several times during
the loop, but each time it's size is doubled, so in the end the
StringBuilder will have copied about two times the size of the string.

So in this case the StringBuilder should be about 50000 times faster
than concatenating the strings, which corresponds to your result.

(If you specify the final size when creating the StringBuilder, the
interal string never has to be reallocated, so it will be twice as fast.)
int txtInterations;

If you want to use hungarinan notation to specify the data type, you
should not use a prefix that contradicts the data type.

However, in a type safe language there isn't really any need to use
hungarian notation to keep track of the data types.
 
R

raylopez99

Göran Andersson said:
When you reach the 200000th iteration, you will have copied:

16*(1+2+3+4+5+...+200000) = 16*(100000*200001) = 320001600000 bytes

That is 320 GB. That 160 times more than you have in your computer. To
create a string that is 1600000 characters, you have copied 100000 times

Well that's interesting Goran. But my PC did not crash, and I don't
have 320 GB of HD, so somehow it must be doing some fancy stuff in the
background to truncate.

The StringBuilder has to grow it's internal string several times during
the loop, but each time it's size is doubled, so in the end the
StringBuilder will have copied about two times the size of the string.

So in this case the StringBuilder should be about 50000 times faster
than concatenating the strings, which corresponds to your result.

(If you specify the final size when creating the StringBuilder, the
interal string never has to be reallocated, so it will be twice as fast.)

That's counterintuitive, if you're saying specifiying the final size
will make StringBuilder *slower*. Very strange if true. Anyway I
never specify anything so I'm OK.
If you want to use hungarinan notation to specify the data type, you
should not use a prefix that contradicts the data type.

However, in a type safe language there isn't really any need to use
hungarian notation to keep track of the data types.

And why is that? Anyhow, I just discovered this cool property for
runtime type checking:

// using public static object ChangeType (object value, Type
conversionType);
// example:

Type myTargetType = typeof (int);
object theSourceStr = “42”;
object theResult = Convert.ChangeType(theSourceStr, myTargetType);
Console.WriteLine(theResult); //42
Console.WriteLine(theResult.GetType()); //System.Int32

//pretty cool, eh? I bet it only works though for 'primitive' data
types like int, etc.
// UPDATE: I see C# has no easy way of casting any object...or so it
seems. I'll post in a separate thread on this...

RL
 
G

gerry

assuming all the calcs here are correct,
your computer shouldn't crash, you have COPIED 320GB of data, however the
largest string actually created is only 1.6GB.
after a string has been copied it is available for garbage collection and
part of the vast time difference you are seeing is GC doing its job.

re specifiying the final size being 'twice as fast' , that would be twice
as fast as not specifying a final size -or- 100000 times faster that
concatenating strings


Göran Andersson said:
When you reach the 200000th iteration, you will have copied:

16*(1+2+3+4+5+...+200000) = 16*(100000*200001) = 320 001 600 000 bytes

That is 320 GB. That 160 times more than you have in your computer. To
create a string that is 1 600 000 characters, you have copied 100000 times

Well that's interesting Goran. But my PC did not crash, and I don't
have 320 GB of HD, so somehow it must be doing some fancy stuff in the
background to truncate.

The StringBuilder has to grow it's internal string several times during
the loop, but each time it's size is doubled, so in the end the
StringBuilder will have copied about two times the size of the string.

So in this case the StringBuilder should be about 50000 times faster
than concatenating the strings, which corresponds to your result.

(If you specify the final size when creating the StringBuilder, the
interal string never has to be reallocated, so it will be twice as fast.)

That's counterintuitive, if you're saying specifiying the final size
will make StringBuilder *slower*. Very strange if true. Anyway I
never specify anything so I'm OK.
If you want to use hungarinan notation to specify the data type, you
should not use a prefix that contradicts the data type.

However, in a type safe language there isn't really any need to use
hungarian notation to keep track of the data types.

And why is that? Anyhow, I just discovered this cool property for
runtime type checking:

// using public static object ChangeType (object value, Type
conversionType);
// example:

Type myTargetType = typeof (int);
object theSourceStr = “42”;
object theResult = Convert.ChangeType(theSourceStr, myTargetType);
Console.WriteLine(theResult); //42
Console.WriteLine(theResult.GetType()); //System.Int32

//pretty cool, eh? I bet it only works though for 'primitive' data
types like int, etc.
// UPDATE: I see C# has no easy way of casting any object...or so it
seems. I'll post in a separate thread on this...

RL
 
B

Brian Gideon

Well that's interesting Goran.  But my PC did not crash, and I don't
have 320 GB of HD, so somehow it must be doing some fancy stuff in the
background to truncate.

All of that memory isn't in play simultaneously.
That's counterintuitive, if you're saying specifiying the final size
will make StringBuilder *slower*.  Very strange if true.  Anyway I
never specify anything so I'm OK.

He said it would be twice as *fast*.
And why is that?  

I'll pass on that.
Anyhow, I just discovered this cool property for
runtime type checking:

// using public static object ChangeType (object value, Type
conversionType);
// example:

Type myTargetType = typeof (int);
  object theSourceStr = “42”;
  object theResult = Convert.ChangeType(theSourceStr, myTargetType);
  Console.WriteLine(theResult); //42
  Console.WriteLine(theResult.GetType()); //System.Int32

//pretty cool, eh?  I bet it only works though for 'primitive' data
types like int, etc.
// UPDATE:  I see C# has no easy way of casting any object...or so it
seems.  I'll post in a separate thread on this...

It does, but like you said it's better to post that question in
another thread.
 
R

raylopez99

Brian Gideon wrote:

OK, got it. And I understood now Hungarian Notation not needed since
the compiler will catch your error.

RL
 
B

Brian Gideon

Brian Gideon wrote:

OK, got it.  And I understood now Hungarian Notation not needed since
the compiler will catch your error.

RL

Basically. And for what it's worth I've adopted the "m_" prefix for
instance members and "s_" for static members.
 
G

Göran Andersson

Peter said:
Even in a type unsafe language, Hungarian's primary purpose isn't to
keep track of the data type. If and when the type is identical to the
semantic of the variable, then of course it will. But otherwise, the
type tag (not prefix) in Hungarian reflects the _semantic_ usage of the
data, not it's literal type.

In fact, for variables that are typed as built-in types such as "int",
"char", etc. the tag will most often _not_ reflect the actual type of
the variable. For example, "x", "dx", and "cx" are common variable
names when dealing with the X coordinate in a Cartesian coordinate
space, but they can all refer to a variety of integral types: "int",
"short", "ushort", "long", etc. In the Hungarian philosophy, the naming
is there to ensure semantic correctness, not compiler correctness.

This is in fact why Hungarian is still valuable even when using a
strongly-typed language.

This is something that Microsoft's "Systems" version of Hungarian gets
very, very wrong. Unfortunately, that's the Hungarian most people are
exposed to.

Pete

I've done quite a bit ASP/VBScript, and although it wasn't exactly the
original intention of the hungarian notation, using it to keep track of
the data type is very useful in that environment. Otherwise you could
easily get surprised by the results, like:

Dim page
page = Request.QueryString("page")
If page = 42 Then
' we don't get here even if we put 42 in the query string
' as the variable page contains "42", not 42.
End If
 
G

Göran Andersson

raylopez99 said:
Well that's interesting Goran. But my PC did not crash, and I don't
have 320 GB of HD, so somehow it must be doing some fancy stuff in the
background to truncate.

For each string that you create, the previous string is up for garbage
collection.

As you go through a lot more memory than there are in the computer, it
means that it has done more than 160 garbage collections during the
loop, and probably something closer to a 1000.
That's counterintuitive, if you're saying specifiying the final size
will make StringBuilder *slower*. Very strange if true. Anyway I
never specify anything so I'm OK.

If you find it counter intuitive, then perhaps you should read it again
to see if you got it right. In this case you got it backwards.
Anyhow, I just discovered this cool property for
runtime type checking:

// using public static object ChangeType (object value, Type
conversionType);
// example:

Type myTargetType = typeof (int);
object theSourceStr = “42”;
object theResult = Convert.ChangeType(theSourceStr, myTargetType);
Console.WriteLine(theResult); //42
Console.WriteLine(theResult.GetType()); //System.Int32

//pretty cool, eh? I bet it only works though for 'primitive' data
types like int, etc.

The type has to implement the IConvertible interface, which the
primitive types do.

The primitive types already have methods in the Convert class, so
instead of doing Convert.ChangeType(var, typeof(int)) you can use
Convert.ToInt32(var).

Casting to a type that you specify dynamically isn't very useful in a
strongly typed language. You can't do much with the data anyway without
casting the reference to the actual type.
// UPDATE: I see C# has no easy way of casting any object...or so it
seems. I'll post in a separate thread on this...

Converting a string to an int is parsing. There are several method for
doing that, like int.Parse(string), int.TryParse(string, out int),
Convert.ToInt32(string)...

For casting, C# uses the same syntax as C/C++:

int value = 42;
long bigValue = (long)value;
 
R

raylopez99

Göran Andersson said:
Casting to a type that you specify dynamically isn't very useful in a
strongly typed language. You can't do much with the data anyway without
casting the reference to the actual type.

Very good. It explains why C# doesn't have, like C++/CLI, this
function "safe_cast":
Object ^ obj = safe_cast <Object^> (anEnumHere->Current);

Or maybe not.

Anyway today, via FTM's and Jon Skeet's help, I learned this query,
for iterating through a list having mixed strings and ints and picking
out the first two letters of the strings:
List<object> words = new List<object> { "green", "blue", 3,
"violet", 5 };

IEnumerable<string> query =
words.AsQueryable().OfType<string>().Cast<string>().Select(str =>
str.Substring(0,Math.Min(str.Length, 2))); //prevents out-of-range
exception and also string cast problem

Pretty cool eh? On top of doing my day job, which is heating up
now... multitasking big time...I gotta run play a game of online chess
then crash...

RL
 
B

Bill Butler

raylopez99 said:
StringBuilder better and faster than string for adding many strings.

Look at the below. It's amazing how much faster StringBuilder is than
string.

The last loop below is telling: for adding 200000 strings of 8 char
each, string took over 25 minutes while StringBuilder took 40
milliseconds!

Can anybody explain such a radical difference?

You got lots of answers as to why String performed so badly in this
situation compared to StringBuilder.
Many people misinterpret this to say that you should always prefer
StringBuilder over String when doing concatenation.
But this is not a true assumption.

For straight concatenation like this

string foo = "A" + "B" +C"....

String is optimized to handle this case and it is very efficient

Even for strings constructed by looping

String foo = "";
foreach(string str in Bar)
{
foo += str;
}

If the number of loops is under 10-20 this code can actually outperform
similar code using StringBuilder

Obviously you saw what can happen if the number gets large, so it is
good to understand how it works under the covers and use the right tool
for the job.

Bill
 
B

Bill Butler

Peter Duniho said:
[...]
Even for strings constructed by looping

String foo = "";
foreach(string str in Bar)
{
foo += str;
}

If the number of loops is under 10-20 this code can actually
outperform
similar code using StringBuilder

For what it's worth, curious I did a quick test. I found that the two
techniques reach near-parity at just 5 concatenations, and
StringBuilder is definitively faster at 10 concatenations. At 20,
there's no contest.

Correct, That's what I get for trying to pull the number from memory.
Even with just two concatenations, the difference is "only" a factor
of 2. And of course, in that situation the performance of the actual
concatenation is unlikely to be significant in the overall algorithm.

To me, that means that it's "safe" to use plain string concatenation
as a _simplification_ of the code, when you're sure that the number
of concatenations will be small. I would even accept the performance
overhead up to 20 concatenations or so, as long as it wasn't a
critical difference, because I do feel the code is easier to read.
But only when I could be assured the number of concatenations
wouldn't ever be much greater than that.

It's hard to imagine a situation in which choosing string
concatenation over StringBuilder would be a legitimate real-world
performance optimization. Practically any situation in which you
have few enough concatenations for string concatenation to win, there
would be too few concatenations for the concatenation to matter much
at all.

Again, perfectly valid as a code maintenance/readability choice in
certain "safe" situations, but probably not something someone's going
to do as a performance enhancement. For example, I wouldn't create
code that has a special code-path for small numbers of concatenations
just to take advantage of that performance difference. The decrease
in maintainability isn't worth the marginal improvement in
performance, even in the best case.


I agree completely.
But I have seen folks that avoid string concatenation like the plague,
since they heard that it is slower.
In most cases readability triumphs.

Thanks for the correction
Bill
 
G

Göran Andersson

raylopez99 said:
Very good. It explains why C# doesn't have, like C++/CLI, this
function "safe_cast":
Object ^ obj = safe_cast <Object^> (anEnumHere->Current);

Or maybe not.

Actually it doesn't. The reason that there is no "safe_cast" in C# is
that every cast is safe. If you want an unsafe cast you have to do it in
an unsafe code block, and even then you might have to cast it to a void
pointer before casting to a different type to prevent the compiler from
telling you that you are doing something wrong.
Anyway today, via FTM's and Jon Skeet's help, I learned this query,
for iterating through a list having mixed strings and ints and picking
out the first two letters of the strings:
List<object> words = new List<object> { "green", "blue", 3,
"violet", 5 };

IEnumerable<string> query =
words.AsQueryable().OfType<string>().Cast<string>().Select(str =>
str.Substring(0,Math.Min(str.Length, 2))); //prevents out-of-range
exception and also string cast problem

The OfType extension already returns a typed enumerator, so you don't
need the Cast extension. Also, the OfType is an extension of
IEnumerable, so you don't need the extension AsQueryable to use it on
the List:

IEnumerable<string> query = words.OfType<string>().Select(str =>
str.Substring(0,Math.Min(str.Length, 2)));
 
G

Göran Andersson

Bill said:
Even for strings constructed by looping

String foo = "";
foreach(string str in Bar)
{
foo += str;
}

If the number of loops is under 10-20 this code can actually outperform
similar code using StringBuilder

Where the break-even is depends on the length of the strings in the Bar
collection, but it's somewhere in that neighborhood.

In most cases however, you want scalability. If you only have a few
strings it doesn't matter if it takes twice as long to use a
StringBuilder, as it's so little work anyway. It's when the data starts
to grow that you want it to perform well.


An interresting side track is that if you just concatenate the strings
in a different order, you get a much better performance. By
concatenating them in pairs instead of accumulating them in a single
string, you get almost the same performance as a StringBuilder.

Like this:

while (Bar.Length > 1) {
int len = (Bar.Length + 1) / 2;
string[] s = new String[len];
for (int i = 0; i < len; i++) {
if (i * 2 + 1 < Bar.Length) {
s = Bar[i * 2] + Bar[i * 2 + 1];
} else {
s = Bar[i * 2];
}
}
Bar = s;
}
string foo = Bar[0];

Test run:

Accumulating: 16032 strings, 150968 characters in 1,849 ms.
Pair concat: 16032 strings, 150968 characters in 0,004 ms.
StringBuilder: 16032 strings, 150968 characters in 0,002 ms.

:)
 
R

raylopez99

For what it's worth, curious I did a quick test.  I found that the two  
techniques reach near-parity at just 5 concatenations, and StringBuilder  
is definitively faster at 10 concatenations.  At 20, there's no contest..

I'm curious how you did such a quick test, especially since the
DateTime structure is only accurate to at best 10 ms or greater.

You lying *again*, Peter Duniho?

Hahahaha.

RL
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top