maximum string length in c# and .net

Discussion in 'Microsoft C# .NET' started by fawcett@gmail.com, May 10, 2006.

  1. Guest

    Hi,

    Having read this article:
    http://www.codeproject.com/dotnet/strings.asp?df=100&forumid=13838&exp=0&select=773966

    I got curious about the limit of string lengths, so I wrote this
    program:
    public static void Main(string[] args)
    {
    StringBuilder s = new StringBuilder();
    String adder = "A";
    for(int i = 0; i < 10000; i++)
    {
    adder += "A";
    }
    try
    {
    while(true)
    {
    System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
    s.Append(adder);

    }
    }
    catch (Exception exp)
    {
    System.Console.Out.WriteLine(exp.ToString());
    }
    finally
    {
    System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
    System.Console.In.ReadLine();
    }
    }

    The program consistently ends with:

    System.OutOfMemoryException: Exception of type
    System.OutOfMemoryException was thrown.
    327,732,770 - 163,866,385

    Does this mean that the max length of a string is 163,866,385
    characters? Isn't it interesting that the length of the string is not a
    multiple of 10,000?

    thanks,
    fawce
     
    , May 10, 2006
    #1
    1. Advertisements

  2. Greg Young Guest

    The string builder keeps doubling the string size ...

    You happen to be dying during one of those doublings .. when it is trying to
    allocate a new string of size 655465540 ... its not that that is the maximum
    size, it is that it can't re-grow the internal string

    btw if you put the following line for your constructor you can get further
    :)
    StringBuilder s = new StringBuilder(455465540);

    Cheers,

    Greg Young
    MVP - C#
    <> wrote in message
    news:...
    > Hi,
    >
    > Having read this article:
    > http://www.codeproject.com/dotnet/strings.asp?df=100&forumid=13838&exp=0&select=773966
    >
    > I got curious about the limit of string lengths, so I wrote this
    > program:
    > public static void Main(string[] args)
    > {
    > StringBuilder s = new StringBuilder();
    > String adder = "A";
    > for(int i = 0; i < 10000; i++)
    > {
    > adder += "A";
    > }
    > try
    > {
    > while(true)
    > {
    > System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
    > s.Append(adder);
    >
    > }
    > }
    > catch (Exception exp)
    > {
    > System.Console.Out.WriteLine(exp.ToString());
    > }
    > finally
    > {
    > System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
    > System.Console.In.ReadLine();
    > }
    > }
    >
    > The program consistently ends with:
    >
    > System.OutOfMemoryException: Exception of type
    > System.OutOfMemoryException was thrown.
    > 327,732,770 - 163,866,385
    >
    > Does this mean that the max length of a string is 163,866,385
    > characters? Isn't it interesting that the length of the string is not a
    > multiple of 10,000?
    >
    > thanks,
    > fawce
    >
     
    Greg Young, May 10, 2006
    #2
    1. Advertisements

  3. Guest

    Greg,

    Thanks for your reply.

    I modified the test app to this:
    public static void Main(string[] args)
    {
    int size = (int)Math.Pow(2, 29);
    StringBuilder s = new StringBuilder(size);
    String adder = "A";
    for(int i = 0; i < 10000; i++)
    {
    adder += "A";
    }
    try
    {
    while(true)
    {
    System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
    s.Append(adder);

    }
    }
    catch (Exception exp)
    {
    System.Console.Out.WriteLine(exp.ToString());
    }
    finally
    {
    System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length + " -
    " + size);
    System.Console.In.ReadLine();
    }
    }


    I experimented with the size variable, and I found that 2^29 was the
    max I could allocate without getting an out of memory on the string
    buffer create.

    The result is now consistently:
    System.OutOfMemoryException: Exception of type
    System.OutOfMemoryException was thrown.
    1073727362 - 536863681 - 536870912

    The VMsize in task manager is 1.07G.

    Using the default of 2^4, the result is:
    System.OutOfMemoryException: Exception of type
    System.OutOfMemoryException was thrown.
    327732770 - 163866385 - 16

    and a VMSize of 497M in task manager.

    So, I guess this all just proves that the max string size is 2^29, but
    that in practice, the replication of the string data required to grow
    the stringBuffer can limit the useable string size to something much
    smaller?

    thanks,
    fawce
     
    , May 10, 2006
    #3
  4. Barry Kelly Guest

    wrote:

    > So, I guess this all just proves that the max string size is 2^29, but
    > that in practice, the replication of the string data required to grow
    > the stringBuffer can limit the useable string size to something much
    > smaller?


    The maximum string size depends on a lot more in Win32, specifically,
    memory fragmentation. If you've got lumps of memory pinned (possibly on
    other threads, for example, or by allocations using VirtualAlloc by
    unmanaged code), the compacting GC won't be able to make most of the
    available address space contiguous the way it wants to, so you could end
    up with a maximum string length a lot less than 0.5 billion characters.

    The limits you're hitting are effectively 32-bit architectural limits,
    rather than .NET limits, although it is true that all GC languages like
    2x to 3x more memory for efficient GC behaviour compared to languages
    using manual allocation. You'll see different results on 64-bit
    hardware, but it won't be much longer before you start hitting the upper
    limit of Int32.MaxValue, limiting the Length.

    ..NET strings aren't designed for this kind of use anyway - when you have
    data this large, you need to custom-design your algorithms and in-memory
    (and probably on-disk, for really huge data) structures around what you
    want to do with it all. Probably the most important thing is changing
    all algorithms that require whole-data access into ones that process the
    data linearly, possibly with reference to dictionaries created in
    earlier passes of the data - consider the way compilers worked back in
    the '60s etc.

    -- Barry
     
    Barry Kelly, May 10, 2006
    #4
  5. Guest Guest

    Interesting !!!
    Well lets see facts first .... when you declare string it creates contigous
    chuck of memory for screen ... Now when I do append some character to String,
    if new added character is making string longer than previously declared
    length then .net creates new inmemory string and place the pervious string in
    new string and appends it. If this is working of string then i dont think so
    there is limit for string length. But I also created program to check it and
    found very very interesting facts.

    String a = new String('a',10000);
    for (int loop = 0; loop < 10000 ; loop++)
    a+= "a";

    string b = "";
    for (int loop = 0; loop < 10000 ; loop++)
    {
    b+= a;
    Console.WriteLine(b.Length);
    }

    Console.ReadLine();

    Results :
    On my PC
    First Run : Maximum size was 29340000
    Second Run : Maximum size was 29340000

    Then i run it on another PC
    First Run : Maximum size was 29340000
    Second Run : Maximum size was 29340000

    See above results are same .... I dont think so there is limit for string
    length .. although i found same results on two PCs but these two pcs have
    exactly same configration ... string totally works on memory chunks .. So in
    nut shell

    Length of String = When memory gets fulll !!!!!

    I have also found some interesting facts .. I would request you to run my
    code on ur pc and post results.

    "" wrote:

    > Greg,
    >
    > Thanks for your reply.
    >
    > I modified the test app to this:
    > public static void Main(string[] args)
    > {
    > int size = (int)Math.Pow(2, 29);
    > StringBuilder s = new StringBuilder(size);
    > String adder = "A";
    > for(int i = 0; i < 10000; i++)
    > {
    > adder += "A";
    > }
    > try
    > {
    > while(true)
    > {
    > System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
    > s.Append(adder);
    >
    > }
    > }
    > catch (Exception exp)
    > {
    > System.Console.Out.WriteLine(exp.ToString());
    > }
    > finally
    > {
    > System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length + " -
    > " + size);
    > System.Console.In.ReadLine();
    > }
    > }
    >
    >
    > I experimented with the size variable, and I found that 2^29 was the
    > max I could allocate without getting an out of memory on the string
    > buffer create.
    >
    > The result is now consistently:
    > System.OutOfMemoryException: Exception of type
    > System.OutOfMemoryException was thrown.
    > 1073727362 - 536863681 - 536870912
    >
    > The VMsize in task manager is 1.07G.
    >
    > Using the default of 2^4, the result is:
    > System.OutOfMemoryException: Exception of type
    > System.OutOfMemoryException was thrown.
    > 327732770 - 163866385 - 16
    >
    > and a VMSize of 497M in task manager.
    >
    > So, I guess this all just proves that the max string size is 2^29, but
    > that in practice, the replication of the string data required to grow
    > the stringBuffer can limit the useable string size to something much
    > smaller?
    >
    > thanks,
    > fawce
    >
    >
     
    Guest, May 10, 2006
    #5
  6. This has been discussed a number of times recently in this NG.
    The maximum size of all reference type (like a string) instances is limited
    by the CLR to 2GB, that means that a string can hold a maximum of ~1G
    characters.
    While it's possible to reach that limit when running on a 64 bit OS, you
    will never be able to create such large strings (or arrays) on a 32 bit OS.
    The reason is that you won't have that amount of "contiguous" address space
    available to create the backing store (a char array) for the string.
    The size of the largest contiguous memory space highly depends on how
    modules are mapped (see: Win32 and framework DLL's base addresses) into the
    process address space. Some modules are laid-out in such a way that the
    largest chunk becomes something like 950.000Kb, this before you even have
    created a single object.
    Lesson learned, always be prepared to get some OOM exceptions thrown on you
    if you don't care about your memory allocation patterns on 32 bit windows
    (also true for unmanaged!).

    Willy.

    <> wrote in message
    news:...
    | Hi,
    |
    | Having read this article:
    |
    http://www.codeproject.com/dotnet/strings.asp?df=100&forumid=13838&exp=0&select=773966
    |
    | I got curious about the limit of string lengths, so I wrote this
    | program:
    | public static void Main(string[] args)
    | {
    | StringBuilder s = new StringBuilder();
    | String adder = "A";
    | for(int i = 0; i < 10000; i++)
    | {
    | adder += "A";
    | }
    | try
    | {
    | while(true)
    | {
    | System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
    | s.Append(adder);
    |
    | }
    | }
    | catch (Exception exp)
    | {
    | System.Console.Out.WriteLine(exp.ToString());
    | }
    | finally
    | {
    | System.Console.Out.WriteLine(s.Length * 2 + " - " + s.Length);
    | System.Console.In.ReadLine();
    | }
    | }
    |
    | The program consistently ends with:
    |
    | System.OutOfMemoryException: Exception of type
    | System.OutOfMemoryException was thrown.
    | 327,732,770 - 163,866,385
    |
    | Does this mean that the max length of a string is 163,866,385
    | characters? Isn't it interesting that the length of the string is not a
    | multiple of 10,000?
    |
    | thanks,
    | fawce
    |
     
    Willy Denoyette [MVP], May 10, 2006
    #6
  7. Guest

    Hi,

    Thanks for the explanations about win32 memory, the allocation and
    doubling effects make sense.

    I wanted to point out a quote from the article I referenced in the
    first post:
    <quote>m_stringLength int,
    This is the logical length of the string, the one returned by
    String.Length.
    Because a number of high bits are used for additional flags to enhance
    performance, the maximum length of the string is constrained to a limit
    much smaller than UInt32.Max for 32bit systems. Some of these flags
    indicate the string contains simple characters such as plain ASCII and
    will not required invoking complex UNICODE algorithms for sorting and
    comparison tests.</quote>

    In other words, the string object in .net has a member variable for its
    length (makes sense, keeps length checks fast). This variable imposes a
    natural limit to the string's length, because you can't track more than
    what the length variable can hold. This would have no effect whatsoever
    on a 32 bit system, because the max string length would exceed the max
    object size.

    However, the length variable is also used to hold some control bits.
    >From my test, I believe it is 3 bits, knocking the max string length

    down to 2^29 chars. So, that would be 2^29 chars X 2 bytes/char / 1024
    byte per kb / 1025 kb per M / 1024 M per G
    or
    1 Gig, which is almost exactly the result I got from my StringBuilder
    test.

    Even more interesting, is that the uninitialized StringBuilder couldn't
    make it to the 1Gig limit. From Willy's posting, I postulate that the
    limit of the uninitialized StringBuilder is capped because the Win32
    memory manager can't easily allocate a very large, continguous block of
    memory, having run through many allocations in the growth from a 16
    char string to a very large string.

    So to summarize:
    - There is a 2G byte limit for Win32 objects. This is never the
    limiting factor for a string.
    - There is a 2^29 length cap for a string from the construction of the
    length member variable. This is rarely the limiting factor, because
    stringbuilders are not often allocated with a size cap.
    - There is a practical limit, dependent on the current system
    conditions, that limits the growth of strings to the largest
    continguous block of memory available. This cap is deterministic, but
    unpredictable.

    Conclusion:
    - It is a terrible idea to use huge strings in .net or pretty much
    anywhere else. Data that large should be streamed to and from its
    source (disk, network, etc).

    thanks for all your explanations,
    fawce
     
    , May 10, 2006
    #7
  8. Hi,


    "Willy Denoyette [MVP]" <> wrote in message
    news:...
    > This has been discussed a number of times recently in this NG.
    > The maximum size of all reference type (like a string) instances is
    > limited
    > by the CLR to 2GB, that means that a string can hold a maximum of ~1G
    > characters.


    Last week was the last time I remember :)

    honestly I do not see why the maximum length of a string is so interesting
    to so many people. I would have other worries if I have to handle such a
    huge piece of data.



    --
    Ignacio Machin,
    ignacio.machin AT dot.state.fl.us
    Florida Department Of Transportation
     
    Ignacio Machin \( .NET/ C# MVP \), May 10, 2006
    #8
  9. See inline...

    Willy.

    <> wrote in message
    news:...
    | Hi,
    |
    | Thanks for the explanations about win32 memory, the allocation and
    | doubling effects make sense.
    |
    | I wanted to point out a quote from the article I referenced in the
    | first post:
    | <quote>m_stringLength int,
    | This is the logical length of the string, the one returned by
    | String.Length.
    | Because a number of high bits are used for additional flags to enhance
    | performance, the maximum length of the string is constrained to a limit
    | much smaller than UInt32.Max for 32bit systems. Some of these flags
    | indicate the string contains simple characters such as plain ASCII and
    | will not required invoking complex UNICODE algorithms for sorting and
    | comparison tests.</quote>
    |
    | In other words, the string object in .net has a member variable for its
    | length (makes sense, keeps length checks fast). This variable imposes a
    | natural limit to the string's length, because you can't track more than
    | what the length variable can hold. This would have no effect whatsoever
    | on a 32 bit system, because the max string length would exceed the max
    | object size.
    |
    | However, the length variable is also used to hold some control bits.
    | >From my test, I believe it is 3 bits, knocking the max string length
    | down to 2^29 chars. So, that would be 2^29 chars X 2 bytes/char / 1024
    | byte per kb / 1025 kb per M / 1024 M per G
    | or
    | 1 Gig, which is almost exactly the result I got from my StringBuilder
    | test.
    |

    You mean 1 Gig bytes I guess, that is 500M Char.

    | Even more interesting, is that the uninitialized StringBuilder couldn't
    | make it to the 1Gig limit. From Willy's posting, I postulate that the
    | limit of the uninitialized StringBuilder is capped because the Win32
    | memory manager can't easily allocate a very large, continguous block of
    | memory, having run through many allocations in the growth from a 16
    | char string to a very large string.
    |

    The reason is the way the CLR expands the SB, the underlying char array has
    to be copied to a new contigious memory block each time the SB expands. That
    means that finaly you need two times the size of the last SB object as a
    free contigious block of free memory in order to be able to copy the
    contents to the new expanded SB.

    | So to summarize:
    | - There is a 2G byte limit for Win32 objects. This is never the
    | limiting factor for a string.

    No, there is currently a 2GB limit for all CLR objects, no matter what OS
    you are running on (32 or 64 bit).


    | - There is a 2^29 length cap for a string from the construction of the
    | length member variable. This is rarely the limiting factor, because
    | stringbuilders are not often allocated with a size cap.

    You better create a StringBuilder with the correct size (or somewhat bigger)
    if you can, the exponential growth can put high pressure on the GC
    especially when using large strings.


    | - There is a practical limit, dependent on the current system
    | conditions, that limits the growth of strings to the largest
    | continguous block of memory available. This cap is deterministic, but
    | unpredictable.
    |
    | Conclusion:
    | - It is a terrible idea to use huge strings in .net or pretty much
    | anywhere else. Data that large should be streamed to and from its
    | source (disk, network, etc).
    |

    Yep, but this isn't limitted to strings, the same applies to arrays and all
    other containers using arrays as backing store.


    Willy.
     
    Willy Denoyette [MVP], May 10, 2006
    #9
  10. Guest

    Ignacio,

    The reason I find it interesting is that the practical limit for
    default StringBuilder and string behavior is much smaller than the
    theoretical limit. So, a 100Mb string with default handling can
    generate much more massive memory swings. Understanding why the vm size
    drastically exceeds the size of strings is important. In .net, if you
    have a string in the megabyte range, you'll generate a lot of objects
    that end up on the large object heap as residue as you grow the string.

    thanks,
    fawce
     
    , May 10, 2006
    #10
  11. <> wrote:
    > The reason I find it interesting is that the practical limit for
    > default StringBuilder and string behavior is much smaller than the
    > theoretical limit. So, a 100Mb string with default handling can
    > generate much more massive memory swings. Understanding why the vm size
    > drastically exceeds the size of strings is important. In .net, if you
    > have a string in the megabyte range, you'll generate a lot of objects
    > that end up on the large object heap as residue as you grow the string.


    That has nothing to do with the maximum size of strings though - it has
    everything to do with the fact that strings are (publically) immutable.

    Using StringBuilder instead of string concatenation to catenate several
    strings (particularly when the number is variable) is a well known
    optimisation (unfortunately less well understood). The benefits in
    appropriate situations would be exactly the same even if strings had a
    much larger or smaller theoretical maximum size.

    --
    Jon Skeet - <>
    http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
    If replying to the group, please do not mail me too
     
    Jon Skeet [C# MVP], May 10, 2006
    #11
  12. Guest

    Jon,

    The complete independence of the practical string limit from the
    theoretical is precisely what I find so interesting. Also note that I
    the program ran out of memory at widely differing string lengths using
    different initializations for the stringbuilder. It wasn't just
    comparing string concatenation to string building, but string building
    under different initializations.

    It is very difficult to intuit the cause of out of memory errors. The
    strings in question were large but not absurdly large. Honestly, even
    though it is clearly inefficient, I wouldn't expect an out of memory
    from growing a stringBuilder to 325M or so. My first thought was, well
    maybe the string max isn't very big. So that is where my questions
    started.

    The experiments show that the repeated doubling of the char array just
    savages the memory manager. This limits the practical limit of the
    string created by a stringbuilder to 1/3rd of its theoretical max.

    So, the way you initialize the string builder is far more important
    than the length of the string in terms of the memory management.


    thanks,
    fawce
     
    , May 12, 2006
    #12
  13. <> wrote in message
    news:...
    | Jon,
    |
    | The complete independence of the practical string limit from the
    | theoretical is precisely what I find so interesting. Also note that I
    | the program ran out of memory at widely differing string lengths using
    | different initializations for the stringbuilder. It wasn't just
    | comparing string concatenation to string building, but string building
    | under different initializations.
    |
    | It is very difficult to intuit the cause of out of memory errors. The
    | strings in question were large but not absurdly large. Honestly, even
    | though it is clearly inefficient, I wouldn't expect an out of memory
    | from growing a stringBuilder to 325M or so. My first thought was, well
    | maybe the string max isn't very big. So that is where my questions
    | started.
    |
    | The experiments show that the repeated doubling of the char array just
    | savages the memory manager. This limits the practical limit of the
    | string created by a stringbuilder to 1/3rd of its theoretical max.
    |
    | So, the way you initialize the string builder is far more important
    | than the length of the string in terms of the memory management.
    |
    |
    | thanks,
    | fawce
    |

    Note that creating a SB with a capacity of > 750M characters (1.5 GB) may
    well be possible when done very early in the process, this is what you
    should do if ever you need to process very large strings, pre-allocate the
    SB with a capacity large enough to hold the largest string you expect, don't
    let SB extend dynamically until you run OOM.

    Willy.
     
    Willy Denoyette [MVP], May 12, 2006
    #13
  14. bteck

    Joined:
    May 19, 2011
    Messages:
    1
    Likes Received:
    0
    Thanks for the informative post on max length of s c# string. I am developing a file encryption utility, so knowledge of the max. length a string can take is very important and interesting to me. In reflection, shouldn't the next generation operating system be running on virtual memory? This way, the computer RAM and HDD, and pendrive should all be organized by the OS and taken as its virtual memory. Also, when the OS finds its virtual memory is insufficient, it should pause and as user to add memory hardware, such as a pendrive or HDD, so it can continue to run without crashing!
    Anyone working on this project already?
     
    bteck, May 19, 2011
    #14
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Guest

    Maximum String Length

    Guest, Mar 23, 2004, in forum: Microsoft C# .NET
    Replies:
    2
    Views:
    539
    Miha Markic [MVP C#]
    Mar 23, 2004
  2. Chris B.

    Maximum String Length - C# Web Service

    Chris B., Apr 13, 2004, in forum: Microsoft C# .NET
    Replies:
    0
    Views:
    1,467
    Chris B.
    Apr 13, 2004
  3. Daniel

    c# string size limit? length of string limit?

    Daniel, Apr 19, 2006, in forum: Microsoft C# .NET
    Replies:
    5
    Views:
    1,263
    Willy Denoyette [MVP]
    Apr 19, 2006
  4. Alan Foxmore

    ?? Using String.Format() for a MAXIMUM Length ??

    Alan Foxmore, May 9, 2006, in forum: Microsoft C# .NET
    Replies:
    1
    Views:
    1,486
    Paul E Collins
    May 9, 2006
  5. =?ISO-8859-15?Q?Martin_P=F6pping?=

    Maximum length of a String?

    =?ISO-8859-15?Q?Martin_P=F6pping?=, Sep 14, 2006, in forum: Microsoft C# .NET
    Replies:
    7
    Views:
    669
    Willy Denoyette [MVP]
    Sep 14, 2006
Loading...

Share This Page