PC Review


Reply
Thread Tools Rating: Thread Rating: 7 votes, 3.29 average.

maximum string length in c# and .net

 
 
fawcett@gmail.com
Guest
Posts: n/a
 
      10th May 2006
Hi,

Having read this article:
http://www.codeproject.com/dotnet/st...&select=773966

I got curious about the limit of string lengths, so I wrote this
program:
public static void Main(string[] args)
{
StringBuilder s = new StringBuilder();
String adder = "A";
for(int i = 0; i < 10000; i++)
{
adder += "A";
}
try
{
while(true)
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
s.Append(adder);

}
}
catch (Exception exp)
{
System.Console.Out.WriteLine(exp.ToString());
}
finally
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
System.Console.In.ReadLine();
}
}

The program consistently ends with:

System.OutOfMemoryException: Exception of type
System.OutOfMemoryException was thrown.
327,732,770 - 163,866,385

Does this mean that the max length of a string is 163,866,385
characters? Isn't it interesting that the length of the string is not a
multiple of 10,000?

thanks,
fawce

 
Reply With Quote
 
 
 
 
Greg Young
Guest
Posts: n/a
 
      10th May 2006
The string builder keeps doubling the string size ...

You happen to be dying during one of those doublings .. when it is trying to
allocate a new string of size 655465540 ... its not that that is the maximum
size, it is that it can't re-grow the internal string

btw if you put the following line for your constructor you can get further

StringBuilder s = new StringBuilder(455465540);

Cheers,

Greg Young
MVP - C#
<(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi,
>
> Having read this article:
> http://www.codeproject.com/dotnet/st...&select=773966
>
> I got curious about the limit of string lengths, so I wrote this
> program:
> public static void Main(string[] args)
> {
> StringBuilder s = new StringBuilder();
> String adder = "A";
> for(int i = 0; i < 10000; i++)
> {
> adder += "A";
> }
> try
> {
> while(true)
> {
> System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
> s.Append(adder);
>
> }
> }
> catch (Exception exp)
> {
> System.Console.Out.WriteLine(exp.ToString());
> }
> finally
> {
> System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
> System.Console.In.ReadLine();
> }
> }
>
> The program consistently ends with:
>
> System.OutOfMemoryException: Exception of type
> System.OutOfMemoryException was thrown.
> 327,732,770 - 163,866,385
>
> Does this mean that the max length of a string is 163,866,385
> characters? Isn't it interesting that the length of the string is not a
> multiple of 10,000?
>
> thanks,
> fawce
>



 
Reply With Quote
 
 
 
 
fawcett@gmail.com
Guest
Posts: n/a
 
      10th May 2006
Greg,

Thanks for your reply.

I modified the test app to this:
public static void Main(string[] args)
{
int size = (int)Math.Pow(2, 29);
StringBuilder s = new StringBuilder(size);
String adder = "A";
for(int i = 0; i < 10000; i++)
{
adder += "A";
}
try
{
while(true)
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
s.Append(adder);

}
}
catch (Exception exp)
{
System.Console.Out.WriteLine(exp.ToString());
}
finally
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length + " -
" + size);
System.Console.In.ReadLine();
}
}


I experimented with the size variable, and I found that 2^29 was the
max I could allocate without getting an out of memory on the string
buffer create.

The result is now consistently:
System.OutOfMemoryException: Exception of type
System.OutOfMemoryException was thrown.
1073727362 - 536863681 - 536870912

The VMsize in task manager is 1.07G.

Using the default of 2^4, the result is:
System.OutOfMemoryException: Exception of type
System.OutOfMemoryException was thrown.
327732770 - 163866385 - 16

and a VMSize of 497M in task manager.

So, I guess this all just proves that the max string size is 2^29, but
that in practice, the replication of the string data required to grow
the stringBuffer can limit the useable string size to something much
smaller?

thanks,
fawce

 
Reply With Quote
 
Barry Kelly
Guest
Posts: n/a
 
      10th May 2006
(E-Mail Removed) wrote:

> So, I guess this all just proves that the max string size is 2^29, but
> that in practice, the replication of the string data required to grow
> the stringBuffer can limit the useable string size to something much
> smaller?


The maximum string size depends on a lot more in Win32, specifically,
memory fragmentation. If you've got lumps of memory pinned (possibly on
other threads, for example, or by allocations using VirtualAlloc by
unmanaged code), the compacting GC won't be able to make most of the
available address space contiguous the way it wants to, so you could end
up with a maximum string length a lot less than 0.5 billion characters.

The limits you're hitting are effectively 32-bit architectural limits,
rather than .NET limits, although it is true that all GC languages like
2x to 3x more memory for efficient GC behaviour compared to languages
using manual allocation. You'll see different results on 64-bit
hardware, but it won't be much longer before you start hitting the upper
limit of Int32.MaxValue, limiting the Length.

..NET strings aren't designed for this kind of use anyway - when you have
data this large, you need to custom-design your algorithms and in-memory
(and probably on-disk, for really huge data) structures around what you
want to do with it all. Probably the most important thing is changing
all algorithms that require whole-data access into ones that process the
data linearly, possibly with reference to dictionaries created in
earlier passes of the data - consider the way compilers worked back in
the '60s etc.

-- Barry
 
Reply With Quote
 
=?Utf-8?B?QWx0YWYgQWwtQW1pbiBOYWp3YW5p?=
Guest
Posts: n/a
 
      10th May 2006
Interesting !!!
Well lets see facts first .... when you declare string it creates contigous
chuck of memory for screen ... Now when I do append some character to String,
if new added character is making string longer than previously declared
length then .net creates new inmemory string and place the pervious string in
new string and appends it. If this is working of string then i dont think so
there is limit for string length. But I also created program to check it and
found very very interesting facts.

String a = new String('a',10000);
for (int loop = 0; loop < 10000 ; loop++)
a+= "a";

string b = "";
for (int loop = 0; loop < 10000 ; loop++)
{
b+= a;
Console.WriteLine(b.Length);
}

Console.ReadLine();

Results :
On my PC
First Run : Maximum size was 29340000
Second Run : Maximum size was 29340000

Then i run it on another PC
First Run : Maximum size was 29340000
Second Run : Maximum size was 29340000

See above results are same .... I dont think so there is limit for string
length .. although i found same results on two PCs but these two pcs have
exactly same configration ... string totally works on memory chunks .. So in
nut shell

Length of String = When memory gets fulll !!!!!

I have also found some interesting facts .. I would request you to run my
code on ur pc and post results.

"(E-Mail Removed)" wrote:

> Greg,
>
> Thanks for your reply.
>
> I modified the test app to this:
> public static void Main(string[] args)
> {
> int size = (int)Math.Pow(2, 29);
> StringBuilder s = new StringBuilder(size);
> String adder = "A";
> for(int i = 0; i < 10000; i++)
> {
> adder += "A";
> }
> try
> {
> while(true)
> {
> System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
> s.Append(adder);
>
> }
> }
> catch (Exception exp)
> {
> System.Console.Out.WriteLine(exp.ToString());
> }
> finally
> {
> System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length + " -
> " + size);
> System.Console.In.ReadLine();
> }
> }
>
>
> I experimented with the size variable, and I found that 2^29 was the
> max I could allocate without getting an out of memory on the string
> buffer create.
>
> The result is now consistently:
> System.OutOfMemoryException: Exception of type
> System.OutOfMemoryException was thrown.
> 1073727362 - 536863681 - 536870912
>
> The VMsize in task manager is 1.07G.
>
> Using the default of 2^4, the result is:
> System.OutOfMemoryException: Exception of type
> System.OutOfMemoryException was thrown.
> 327732770 - 163866385 - 16
>
> and a VMSize of 497M in task manager.
>
> So, I guess this all just proves that the max string size is 2^29, but
> that in practice, the replication of the string data required to grow
> the stringBuffer can limit the useable string size to something much
> smaller?
>
> thanks,
> fawce
>
>

 
Reply With Quote
 
Willy Denoyette [MVP]
Guest
Posts: n/a
 
      10th May 2006
This has been discussed a number of times recently in this NG.
The maximum size of all reference type (like a string) instances is limited
by the CLR to 2GB, that means that a string can hold a maximum of ~1G
characters.
While it's possible to reach that limit when running on a 64 bit OS, you
will never be able to create such large strings (or arrays) on a 32 bit OS.
The reason is that you won't have that amount of "contiguous" address space
available to create the backing store (a char array) for the string.
The size of the largest contiguous memory space highly depends on how
modules are mapped (see: Win32 and framework DLL's base addresses) into the
process address space. Some modules are laid-out in such a way that the
largest chunk becomes something like 950.000Kb, this before you even have
created a single object.
Lesson learned, always be prepared to get some OOM exceptions thrown on you
if you don't care about your memory allocation patterns on 32 bit windows
(also true for unmanaged!).

Willy.

<(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
| Hi,
|
| Having read this article:
|
http://www.codeproject.com/dotnet/st...&select=773966
|
| I got curious about the limit of string lengths, so I wrote this
| program:
| public static void Main(string[] args)
| {
| StringBuilder s = new StringBuilder();
| String adder = "A";
| for(int i = 0; i < 10000; i++)
| {
| adder += "A";
| }
| try
| {
| while(true)
| {
| System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
| s.Append(adder);
|
| }
| }
| catch (Exception exp)
| {
| System.Console.Out.WriteLine(exp.ToString());
| }
| finally
| {
| System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
| System.Console.In.ReadLine();
| }
| }
|
| The program consistently ends with:
|
| System.OutOfMemoryException: Exception of type
| System.OutOfMemoryException was thrown.
| 327,732,770 - 163,866,385
|
| Does this mean that the max length of a string is 163,866,385
| characters? Isn't it interesting that the length of the string is not a
| multiple of 10,000?
|
| thanks,
| fawce
|


 
Reply With Quote
 
fawcett@gmail.com
Guest
Posts: n/a
 
      10th May 2006
Hi,

Thanks for the explanations about win32 memory, the allocation and
doubling effects make sense.

I wanted to point out a quote from the article I referenced in the
first post:
<quote>m_stringLength int,
This is the logical length of the string, the one returned by
String.Length.
Because a number of high bits are used for additional flags to enhance
performance, the maximum length of the string is constrained to a limit
much smaller than UInt32.Max for 32bit systems. Some of these flags
indicate the string contains simple characters such as plain ASCII and
will not required invoking complex UNICODE algorithms for sorting and
comparison tests.</quote>

In other words, the string object in .net has a member variable for its
length (makes sense, keeps length checks fast). This variable imposes a
natural limit to the string's length, because you can't track more than
what the length variable can hold. This would have no effect whatsoever
on a 32 bit system, because the max string length would exceed the max
object size.

However, the length variable is also used to hold some control bits.
>From my test, I believe it is 3 bits, knocking the max string length

down to 2^29 chars. So, that would be 2^29 chars X 2 bytes/char / 1024
byte per kb / 1025 kb per M / 1024 M per G
or
1 Gig, which is almost exactly the result I got from my StringBuilder
test.

Even more interesting, is that the uninitialized StringBuilder couldn't
make it to the 1Gig limit. From Willy's posting, I postulate that the
limit of the uninitialized StringBuilder is capped because the Win32
memory manager can't easily allocate a very large, continguous block of
memory, having run through many allocations in the growth from a 16
char string to a very large string.

So to summarize:
- There is a 2G byte limit for Win32 objects. This is never the
limiting factor for a string.
- There is a 2^29 length cap for a string from the construction of the
length member variable. This is rarely the limiting factor, because
stringbuilders are not often allocated with a size cap.
- There is a practical limit, dependent on the current system
conditions, that limits the growth of strings to the largest
continguous block of memory available. This cap is deterministic, but
unpredictable.

Conclusion:
- It is a terrible idea to use huge strings in .net or pretty much
anywhere else. Data that large should be streamed to and from its
source (disk, network, etc).

thanks for all your explanations,
fawce

 
Reply With Quote
 
Ignacio Machin \( .NET/ C# MVP \)
Guest
Posts: n/a
 
      10th May 2006
Hi,


"Willy Denoyette [MVP]" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> This has been discussed a number of times recently in this NG.
> The maximum size of all reference type (like a string) instances is
> limited
> by the CLR to 2GB, that means that a string can hold a maximum of ~1G
> characters.


Last week was the last time I remember

honestly I do not see why the maximum length of a string is so interesting
to so many people. I would have other worries if I have to handle such a
huge piece of data.



--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation


 
Reply With Quote
 
Willy Denoyette [MVP]
Guest
Posts: n/a
 
      10th May 2006
See inline...

Willy.

<(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
| Hi,
|
| Thanks for the explanations about win32 memory, the allocation and
| doubling effects make sense.
|
| I wanted to point out a quote from the article I referenced in the
| first post:
| <quote>m_stringLength int,
| This is the logical length of the string, the one returned by
| String.Length.
| Because a number of high bits are used for additional flags to enhance
| performance, the maximum length of the string is constrained to a limit
| much smaller than UInt32.Max for 32bit systems. Some of these flags
| indicate the string contains simple characters such as plain ASCII and
| will not required invoking complex UNICODE algorithms for sorting and
| comparison tests.</quote>
|
| In other words, the string object in .net has a member variable for its
| length (makes sense, keeps length checks fast). This variable imposes a
| natural limit to the string's length, because you can't track more than
| what the length variable can hold. This would have no effect whatsoever
| on a 32 bit system, because the max string length would exceed the max
| object size.
|
| However, the length variable is also used to hold some control bits.
| >From my test, I believe it is 3 bits, knocking the max string length
| down to 2^29 chars. So, that would be 2^29 chars X 2 bytes/char / 1024
| byte per kb / 1025 kb per M / 1024 M per G
| or
| 1 Gig, which is almost exactly the result I got from my StringBuilder
| test.
|

You mean 1 Gig bytes I guess, that is 500M Char.

| Even more interesting, is that the uninitialized StringBuilder couldn't
| make it to the 1Gig limit. From Willy's posting, I postulate that the
| limit of the uninitialized StringBuilder is capped because the Win32
| memory manager can't easily allocate a very large, continguous block of
| memory, having run through many allocations in the growth from a 16
| char string to a very large string.
|

The reason is the way the CLR expands the SB, the underlying char array has
to be copied to a new contigious memory block each time the SB expands. That
means that finaly you need two times the size of the last SB object as a
free contigious block of free memory in order to be able to copy the
contents to the new expanded SB.

| So to summarize:
| - There is a 2G byte limit for Win32 objects. This is never the
| limiting factor for a string.

No, there is currently a 2GB limit for all CLR objects, no matter what OS
you are running on (32 or 64 bit).


| - There is a 2^29 length cap for a string from the construction of the
| length member variable. This is rarely the limiting factor, because
| stringbuilders are not often allocated with a size cap.

You better create a StringBuilder with the correct size (or somewhat bigger)
if you can, the exponential growth can put high pressure on the GC
especially when using large strings.


| - There is a practical limit, dependent on the current system
| conditions, that limits the growth of strings to the largest
| continguous block of memory available. This cap is deterministic, but
| unpredictable.
|
| Conclusion:
| - It is a terrible idea to use huge strings in .net or pretty much
| anywhere else. Data that large should be streamed to and from its
| source (disk, network, etc).
|

Yep, but this isn't limitted to strings, the same applies to arrays and all
other containers using arrays as backing store.


Willy.


 
Reply With Quote
 
fawcett@gmail.com
Guest
Posts: n/a
 
      10th May 2006
Ignacio,

The reason I find it interesting is that the practical limit for
default StringBuilder and string behavior is much smaller than the
theoretical limit. So, a 100Mb string with default handling can
generate much more massive memory swings. Understanding why the vm size
drastically exceeds the size of strings is important. In .net, if you
have a string in the megabyte range, you'll generate a lot of objects
that end up on the large object heap as residue as you grow the string.

thanks,
fawce

 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding a string of unknown length in a string of unknown length, Help! Hankjam Microsoft Excel Misc 8 3rd Jul 2008 06:49 PM
len() function returning byte length instead of string length Demosthenes Microsoft VB .NET 5 23rd Jan 2008 07:55 PM
Extracting Irregular Length String from Another Irregular Length String DenBis Microsoft Access Form Coding 5 21st Nov 2006 10:14 PM
left(string, length) or right(string, length)? Sam Microsoft ASP .NET 3 17th Feb 2005 01:01 PM
How to get the fixed length of a fixed length string active Microsoft VB .NET 1 21st Nov 2003 10:43 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 10:20 PM.