| Thanks all... I think i got it...
| Interestingly enough..
|
| If I do...
| StringBuilder sb = new StringBuilder(10000000);
| for (int i=0; i<100; i++)
| sb.Append("Hello World");
| string a1 = sb.ToString();
|
| and..
| StringBuilder sb = new StringBuilder();
| for (int i=0; i<100; i++)
| sb.Append("Hello World");
| string a2 = sb.ToString();
|
| in this case, a1 actually takes LESS physical memory then a2. a1 gets
| trimmed while a2 returns the internal string.
| i guess the lesson is to put a capacity whenever possible. but that
| also has drawbacks..
|
Right, what you see here is the result of an optimization.
Let me try to explain what's happening ....
First you need to know how a SB and a String looks like on the managed heap,
this is how a StringBuilder object looks:
<standard object header> // to IntPtr sized values, not relevant here
IntPtr m_currentThread;
int m_maxCapacity;
string m_StringValue;
while a String looks like:
<object header>
int m_arrayLength;
int m_stringLength;
char m_firstChar;
In the first case, you create a SB with capacity 10000000, that means that
the size of the underlying String object is larger than 85Kb, so, the String
will end on the Large Object Heap (LOH).
Then you start filling the string buffer, the result at the end of the loop
looks like:
Your StrinBuilder sb on the Gen0 heap
m_currentThread = xxxx // not important here
m_maxCapacity = 2147483647 // 2GB
m_StringValue = 03271000 // reference - points to a string on the
LOH (value as a sample)
m_arrayLength = 10000001 // Buffer space (in no. of char)
m_stringLength = 1100 // actual string Length
m_firstChar = 'H' // First char in buffer (start of buffer
.... // following chars
.... = 'd' // last char of string (buffer position 1100)
.... = 0x0000 // last char in buffer (buffer position 1101)
Now when you execute ... sb.ToString();
the CLR rightfully decides that this String object doesn't belong to the LOH
(is < 85Kb), so he creates a new String on the Gen0 heap and returns it's
reference in a1, the new string object looks now like:
m_arrayLength = 1101 // Buffer space (in no. of char)
m_stringLength = 1100 // actual string Length
m_firstChar = 'H' // First char in buffer (start of buffer
.... // following chars
.... = 'd' // last char of string (buffer position 1100)
.... = 0x0000
Notice the new m_arrayLength of 1100 ...
Note that the m_arrayLength = 10000001 has never been committed, only
reserved that is why you don't see this allocated in physical memory.
What's happening in the second case is:
A SB is created on the Gen0 heap and looks like:
m_currentThread = xxxx // not important here
m_maxCapacity = 2147483647 // 2GB
m_StringValue = 01274e34 // reference - points to a string on the Gen0
heap(value as a sample)
and the underlying String object at the end of the loop:
m_arrayLength = 2049 // Buffer space (in no. of char)
m_stringLength = 1100 // actual string Length
m_firstChar = 'H' // First char in buffer (start of buffer
.... // following chars
.... = 'd' // last char of string (buffer position 1100)
.... = 0x0000
Notice the m_arrayLength ...
But before you get this final string, a number of temporary strings need to
be build. Remember that the SB starts with an m_arrayLength = 17 (16 + 1 for
the 0x0000 string termination char).
That means that after the loop you have effectively created 8 intermediate
string objects (16, 32, 64, ...2048).
That means that you have wasted some memory, but also some CPU cycles, more,
you also put some additional stress on the GC which will have to clean-up
the intermediate objects.
Conclusion: you should try to pre-allocate SB's whenever possible. Note that
this is especially important for server applications and for client
(WinForms) applications that need to run in Terminal Server environments.
Hope this clears things up a bit
Willy.