maximum string length in c# and .net

F

fawcett

Hi,

Having read this article:
http://www.codeproject.com/dotnet/strings.asp?df=100&forumid=13838&exp=0&select=773966

I got curious about the limit of string lengths, so I wrote this
program:
public static void Main(string[] args)
{
StringBuilder s = new StringBuilder();
String adder = "A";
for(int i = 0; i < 10000; i++)
{
adder += "A";
}
try
{
while(true)
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
s.Append(adder);

}
}
catch (Exception exp)
{
System.Console.Out.WriteLine(exp.ToString());
}
finally
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
System.Console.In.ReadLine();
}
}

The program consistently ends with:

System.OutOfMemoryException: Exception of type
System.OutOfMemoryException was thrown.
327,732,770 - 163,866,385

Does this mean that the max length of a string is 163,866,385
characters? Isn't it interesting that the length of the string is not a
multiple of 10,000?

thanks,
fawce
 
G

Greg Young

The string builder keeps doubling the string size ...

You happen to be dying during one of those doublings .. when it is trying to
allocate a new string of size 655465540 ... its not that that is the maximum
size, it is that it can't re-grow the internal string

btw if you put the following line for your constructor you can get further
:)
StringBuilder s = new StringBuilder(455465540);

Cheers,

Greg Young
MVP - C#
 
F

fawcett

Greg,

Thanks for your reply.

I modified the test app to this:
public static void Main(string[] args)
{
int size = (int)Math.Pow(2, 29);
StringBuilder s = new StringBuilder(size);
String adder = "A";
for(int i = 0; i < 10000; i++)
{
adder += "A";
}
try
{
while(true)
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
s.Append(adder);

}
}
catch (Exception exp)
{
System.Console.Out.WriteLine(exp.ToString());
}
finally
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length + " -
" + size);
System.Console.In.ReadLine();
}
}


I experimented with the size variable, and I found that 2^29 was the
max I could allocate without getting an out of memory on the string
buffer create.

The result is now consistently:
System.OutOfMemoryException: Exception of type
System.OutOfMemoryException was thrown.
1073727362 - 536863681 - 536870912

The VMsize in task manager is 1.07G.

Using the default of 2^4, the result is:
System.OutOfMemoryException: Exception of type
System.OutOfMemoryException was thrown.
327732770 - 163866385 - 16

and a VMSize of 497M in task manager.

So, I guess this all just proves that the max string size is 2^29, but
that in practice, the replication of the string data required to grow
the stringBuffer can limit the useable string size to something much
smaller?

thanks,
fawce
 
B

Barry Kelly

So, I guess this all just proves that the max string size is 2^29, but
that in practice, the replication of the string data required to grow
the stringBuffer can limit the useable string size to something much
smaller?

The maximum string size depends on a lot more in Win32, specifically,
memory fragmentation. If you've got lumps of memory pinned (possibly on
other threads, for example, or by allocations using VirtualAlloc by
unmanaged code), the compacting GC won't be able to make most of the
available address space contiguous the way it wants to, so you could end
up with a maximum string length a lot less than 0.5 billion characters.

The limits you're hitting are effectively 32-bit architectural limits,
rather than .NET limits, although it is true that all GC languages like
2x to 3x more memory for efficient GC behaviour compared to languages
using manual allocation. You'll see different results on 64-bit
hardware, but it won't be much longer before you start hitting the upper
limit of Int32.MaxValue, limiting the Length.

..NET strings aren't designed for this kind of use anyway - when you have
data this large, you need to custom-design your algorithms and in-memory
(and probably on-disk, for really huge data) structures around what you
want to do with it all. Probably the most important thing is changing
all algorithms that require whole-data access into ones that process the
data linearly, possibly with reference to dictionaries created in
earlier passes of the data - consider the way compilers worked back in
the '60s etc.

-- Barry
 
G

Guest

Interesting !!!
Well lets see facts first .... when you declare string it creates contigous
chuck of memory for screen ... Now when I do append some character to String,
if new added character is making string longer than previously declared
length then .net creates new inmemory string and place the pervious string in
new string and appends it. If this is working of string then i dont think so
there is limit for string length. But I also created program to check it and
found very very interesting facts.

String a = new String('a',10000);
for (int loop = 0; loop < 10000 ; loop++)
a+= "a";

string b = "";
for (int loop = 0; loop < 10000 ; loop++)
{
b+= a;
Console.WriteLine(b.Length);
}

Console.ReadLine();

Results :
On my PC
First Run : Maximum size was 29340000
Second Run : Maximum size was 29340000

Then i run it on another PC
First Run : Maximum size was 29340000
Second Run : Maximum size was 29340000

See above results are same .... I dont think so there is limit for string
length .. although i found same results on two PCs but these two pcs have
exactly same configration ... string totally works on memory chunks .. So in
nut shell

Length of String = When memory gets fulll !!!!!

I have also found some interesting facts .. I would request you to run my
code on ur pc and post results.

Greg,

Thanks for your reply.

I modified the test app to this:
public static void Main(string[] args)
{
int size = (int)Math.Pow(2, 29);
StringBuilder s = new StringBuilder(size);
String adder = "A";
for(int i = 0; i < 10000; i++)
{
adder += "A";
}
try
{
while(true)
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
s.Append(adder);

}
}
catch (Exception exp)
{
System.Console.Out.WriteLine(exp.ToString());
}
finally
{
System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length + " -
" + size);
System.Console.In.ReadLine();
}
}


I experimented with the size variable, and I found that 2^29 was the
max I could allocate without getting an out of memory on the string
buffer create.

The result is now consistently:
System.OutOfMemoryException: Exception of type
System.OutOfMemoryException was thrown.
1073727362 - 536863681 - 536870912

The VMsize in task manager is 1.07G.

Using the default of 2^4, the result is:
System.OutOfMemoryException: Exception of type
System.OutOfMemoryException was thrown.
327732770 - 163866385 - 16

and a VMSize of 497M in task manager.

So, I guess this all just proves that the max string size is 2^29, but
that in practice, the replication of the string data required to grow
the stringBuffer can limit the useable string size to something much
smaller?

thanks,
fawce
 
W

Willy Denoyette [MVP]

This has been discussed a number of times recently in this NG.
The maximum size of all reference type (like a string) instances is limited
by the CLR to 2GB, that means that a string can hold a maximum of ~1G
characters.
While it's possible to reach that limit when running on a 64 bit OS, you
will never be able to create such large strings (or arrays) on a 32 bit OS.
The reason is that you won't have that amount of "contiguous" address space
available to create the backing store (a char array) for the string.
The size of the largest contiguous memory space highly depends on how
modules are mapped (see: Win32 and framework DLL's base addresses) into the
process address space. Some modules are laid-out in such a way that the
largest chunk becomes something like 950.000Kb, this before you even have
created a single object.
Lesson learned, always be prepared to get some OOM exceptions thrown on you
if you don't care about your memory allocation patterns on 32 bit windows
(also true for unmanaged!).

Willy.

| Hi,
|
| Having read this article:
|
http://www.codeproject.com/dotnet/strings.asp?df=100&forumid=13838&exp=0&select=773966
|
| I got curious about the limit of string lengths, so I wrote this
| program:
| public static void Main(string[] args)
| {
| StringBuilder s = new StringBuilder();
| String adder = "A";
| for(int i = 0; i < 10000; i++)
| {
| adder += "A";
| }
| try
| {
| while(true)
| {
| System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
| s.Append(adder);
|
| }
| }
| catch (Exception exp)
| {
| System.Console.Out.WriteLine(exp.ToString());
| }
| finally
| {
| System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
| System.Console.In.ReadLine();
| }
| }
|
| The program consistently ends with:
|
| System.OutOfMemoryException: Exception of type
| System.OutOfMemoryException was thrown.
| 327,732,770 - 163,866,385
|
| Does this mean that the max length of a string is 163,866,385
| characters? Isn't it interesting that the length of the string is not a
| multiple of 10,000?
|
| thanks,
| fawce
|
 
F

fawcett

Hi,

Thanks for the explanations about win32 memory, the allocation and
doubling effects make sense.

I wanted to point out a quote from the article I referenced in the
first post:
<quote>m_stringLength int,
This is the logical length of the string, the one returned by
String.Length.
Because a number of high bits are used for additional flags to enhance
performance, the maximum length of the string is constrained to a limit
much smaller than UInt32.Max for 32bit systems. Some of these flags
indicate the string contains simple characters such as plain ASCII and
will not required invoking complex UNICODE algorithms for sorting and
comparison tests.</quote>

In other words, the string object in .net has a member variable for its
length (makes sense, keeps length checks fast). This variable imposes a
natural limit to the string's length, because you can't track more than
what the length variable can hold. This would have no effect whatsoever
on a 32 bit system, because the max string length would exceed the max
object size.

However, the length variable is also used to hold some control bits.
From my test, I believe it is 3 bits, knocking the max string length
down to 2^29 chars. So, that would be 2^29 chars X 2 bytes/char / 1024
byte per kb / 1025 kb per M / 1024 M per G
or
1 Gig, which is almost exactly the result I got from my StringBuilder
test.

Even more interesting, is that the uninitialized StringBuilder couldn't
make it to the 1Gig limit. From Willy's posting, I postulate that the
limit of the uninitialized StringBuilder is capped because the Win32
memory manager can't easily allocate a very large, continguous block of
memory, having run through many allocations in the growth from a 16
char string to a very large string.

So to summarize:
- There is a 2G byte limit for Win32 objects. This is never the
limiting factor for a string.
- There is a 2^29 length cap for a string from the construction of the
length member variable. This is rarely the limiting factor, because
stringbuilders are not often allocated with a size cap.
- There is a practical limit, dependent on the current system
conditions, that limits the growth of strings to the largest
continguous block of memory available. This cap is deterministic, but
unpredictable.

Conclusion:
- It is a terrible idea to use huge strings in .net or pretty much
anywhere else. Data that large should be streamed to and from its
source (disk, network, etc).

thanks for all your explanations,
fawce
 
I

Ignacio Machin \( .NET/ C# MVP \)

Hi,


Willy Denoyette said:
This has been discussed a number of times recently in this NG.
The maximum size of all reference type (like a string) instances is
limited
by the CLR to 2GB, that means that a string can hold a maximum of ~1G
characters.

Last week was the last time I remember :)

honestly I do not see why the maximum length of a string is so interesting
to so many people. I would have other worries if I have to handle such a
huge piece of data.
 
W

Willy Denoyette [MVP]

See inline...

Willy.

| Hi,
|
| Thanks for the explanations about win32 memory, the allocation and
| doubling effects make sense.
|
| I wanted to point out a quote from the article I referenced in the
| first post:
| <quote>m_stringLength int,
| This is the logical length of the string, the one returned by
| String.Length.
| Because a number of high bits are used for additional flags to enhance
| performance, the maximum length of the string is constrained to a limit
| much smaller than UInt32.Max for 32bit systems. Some of these flags
| indicate the string contains simple characters such as plain ASCII and
| will not required invoking complex UNICODE algorithms for sorting and
| comparison tests.</quote>
|
| In other words, the string object in .net has a member variable for its
| length (makes sense, keeps length checks fast). This variable imposes a
| natural limit to the string's length, because you can't track more than
| what the length variable can hold. This would have no effect whatsoever
| on a 32 bit system, because the max string length would exceed the max
| object size.
|
| However, the length variable is also used to hold some control bits.
| >From my test, I believe it is 3 bits, knocking the max string length
| down to 2^29 chars. So, that would be 2^29 chars X 2 bytes/char / 1024
| byte per kb / 1025 kb per M / 1024 M per G
| or
| 1 Gig, which is almost exactly the result I got from my StringBuilder
| test.
|

You mean 1 Gig bytes I guess, that is 500M Char.

| Even more interesting, is that the uninitialized StringBuilder couldn't
| make it to the 1Gig limit. From Willy's posting, I postulate that the
| limit of the uninitialized StringBuilder is capped because the Win32
| memory manager can't easily allocate a very large, continguous block of
| memory, having run through many allocations in the growth from a 16
| char string to a very large string.
|

The reason is the way the CLR expands the SB, the underlying char array has
to be copied to a new contigious memory block each time the SB expands. That
means that finaly you need two times the size of the last SB object as a
free contigious block of free memory in order to be able to copy the
contents to the new expanded SB.

| So to summarize:
| - There is a 2G byte limit for Win32 objects. This is never the
| limiting factor for a string.

No, there is currently a 2GB limit for all CLR objects, no matter what OS
you are running on (32 or 64 bit).


| - There is a 2^29 length cap for a string from the construction of the
| length member variable. This is rarely the limiting factor, because
| stringbuilders are not often allocated with a size cap.

You better create a StringBuilder with the correct size (or somewhat bigger)
if you can, the exponential growth can put high pressure on the GC
especially when using large strings.


| - There is a practical limit, dependent on the current system
| conditions, that limits the growth of strings to the largest
| continguous block of memory available. This cap is deterministic, but
| unpredictable.
|
| Conclusion:
| - It is a terrible idea to use huge strings in .net or pretty much
| anywhere else. Data that large should be streamed to and from its
| source (disk, network, etc).
|

Yep, but this isn't limitted to strings, the same applies to arrays and all
other containers using arrays as backing store.


Willy.
 
F

fawcett

Ignacio,

The reason I find it interesting is that the practical limit for
default StringBuilder and string behavior is much smaller than the
theoretical limit. So, a 100Mb string with default handling can
generate much more massive memory swings. Understanding why the vm size
drastically exceeds the size of strings is important. In .net, if you
have a string in the megabyte range, you'll generate a lot of objects
that end up on the large object heap as residue as you grow the string.

thanks,
fawce
 
J

Jon Skeet [C# MVP]

The reason I find it interesting is that the practical limit for
default StringBuilder and string behavior is much smaller than the
theoretical limit. So, a 100Mb string with default handling can
generate much more massive memory swings. Understanding why the vm size
drastically exceeds the size of strings is important. In .net, if you
have a string in the megabyte range, you'll generate a lot of objects
that end up on the large object heap as residue as you grow the string.

That has nothing to do with the maximum size of strings though - it has
everything to do with the fact that strings are (publically) immutable.

Using StringBuilder instead of string concatenation to catenate several
strings (particularly when the number is variable) is a well known
optimisation (unfortunately less well understood). The benefits in
appropriate situations would be exactly the same even if strings had a
much larger or smaller theoretical maximum size.
 
F

fawcett

Jon,

The complete independence of the practical string limit from the
theoretical is precisely what I find so interesting. Also note that I
the program ran out of memory at widely differing string lengths using
different initializations for the stringbuilder. It wasn't just
comparing string concatenation to string building, but string building
under different initializations.

It is very difficult to intuit the cause of out of memory errors. The
strings in question were large but not absurdly large. Honestly, even
though it is clearly inefficient, I wouldn't expect an out of memory
from growing a stringBuilder to 325M or so. My first thought was, well
maybe the string max isn't very big. So that is where my questions
started.

The experiments show that the repeated doubling of the char array just
savages the memory manager. This limits the practical limit of the
string created by a stringbuilder to 1/3rd of its theoretical max.

So, the way you initialize the string builder is far more important
than the length of the string in terms of the memory management.


thanks,
fawce
 
W

Willy Denoyette [MVP]

| Jon,
|
| The complete independence of the practical string limit from the
| theoretical is precisely what I find so interesting. Also note that I
| the program ran out of memory at widely differing string lengths using
| different initializations for the stringbuilder. It wasn't just
| comparing string concatenation to string building, but string building
| under different initializations.
|
| It is very difficult to intuit the cause of out of memory errors. The
| strings in question were large but not absurdly large. Honestly, even
| though it is clearly inefficient, I wouldn't expect an out of memory
| from growing a stringBuilder to 325M or so. My first thought was, well
| maybe the string max isn't very big. So that is where my questions
| started.
|
| The experiments show that the repeated doubling of the char array just
| savages the memory manager. This limits the practical limit of the
| string created by a stringbuilder to 1/3rd of its theoretical max.
|
| So, the way you initialize the string builder is far more important
| than the length of the string in terms of the memory management.
|
|
| thanks,
| fawce
|

Note that creating a SB with a capacity of > 750M characters (1.5 GB) may
well be possible when done very early in the process, this is what you
should do if ever you need to process very large strings, pre-allocate the
SB with a capacity large enough to hold the largest string you expect, don't
let SB extend dynamically until you run OOM.

Willy.
 
Joined
May 19, 2011
Messages
1
Reaction score
0
Thanks for the informative post on max length of s c# string. I am developing a file encryption utility, so knowledge of the max. length a string can take is very important and interesting to me. In reflection, shouldn't the next generation operating system be running on virtual memory? This way, the computer RAM and HDD, and pendrive should all be organized by the OS and taken as its virtual memory. Also, when the OS finds its virtual memory is insufficient, it should pause and as user to add memory hardware, such as a pendrive or HDD, so it can continue to run without crashing!
Anyone working on this project already?
 
Top