How do I get string's functionality with the StringBuilder?

Dan Aldean · May 8, 2006

Hello,

I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content, but
with the immutability the memory is badly managed.
StringBuilder is a good alternative in string processing but it lacks at
least the two methods above.
Not to mention that I would also need: EndsWith(), IndexOf(), LastIndex(),
and so on.
Does anyone know a work around, other than writing code myself to do the
job?
In my opinion, the same methods should be there for StringBuilder too.

Thanks.

Barry Kelly · May 8, 2006

Dan Aldean said:
I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content, but

What about StreamReader?

-- Barry

Guest · May 8, 2006

And how is your text file big and how often you manipulate with elemens?
Take into account that performance gap is significant if your iterating
25000 and more strings in circle.

Usually using "string" is an appropriate solution

I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content, but
with the immutability the memory is badly managed.
StringBuilder is a good alternative in string processing but it lacks at
least the two methods above.
Not to mention that I would also need: EndsWith(), IndexOf(), LastIndex(),
and so on.
Does anyone know a work around, other than writing code myself to do the
job?
In my opinion, the same methods should be there for StringBuilder too.

--
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche

Ignacio Machin \( .NET/ C# MVP \) · May 8, 2006

Hi,

What do you want to do?

the operations you mention : Split, Trim create new strings and may have
some impact in the performance,
EndsWith, IndexOf, etc does not change the string at all and have no impact
in the performance.

Dan Aldean · May 8, 2006

Thanks for the reply Barry.
I use StreamReader to read but I need to process the content. For example I
need to find trailing spaces before a '>' character and remove them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it.
Split() and then Trim() would have helped, but I cannot afford to use
strings as the file can be big and I have to process every line I read.

Dan Aldean · May 8, 2006

Thanks for the reply. Basically the file is big and stream is not a
solution, as I manipulate a lot.

Dan Aldean · May 8, 2006

Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and remove
them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.

Barry Kelly · May 9, 2006

Dan Aldean said:
Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and remove
them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.

If you have seriously long lines, I recommend that you use the
techniques of lexical analysis. Basically:

* Read your strings as System.String from StreamReader.ReadLine().
* Tokenize the strings using manual integer indexing and classify them
according to how you want to modify them.
* Write a loop which sucks in from your tokenizer and builds up a
resulting StringBuilder according to your modification rules.

This change would at least make the algorithm linear with respect to
input line length.

If the lines are very long (i.e. something that's going to really fall
out of the CPU cache), you might consider working with some kind of
pooled char arrays, using array operations to copy ranges, and thus
reduce memory management overhead. That will really help if your strings
are bigger than 80,000 bytes (i.e. 40,000 chars), since in that case
they'll fall into the large object heap and don't get collected until
generation 2 GCs.

To get the benefit from char arrays would mean using
TextReader.ReadBlock() instead of ReadLine(), and breaking into lines in
the tokenizer yourself.

-- Barry

Dan Aldean · May 9, 2006

Thanks Barry, I think this will help me a great deal.

Jon Skeet [C# MVP] · May 10, 2006

Dan Aldean said:
Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and remove
them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.

Are you sure you're not misinterpreting advice for a different
situation? It's not advisable to read a file by doing:

string result = "";

using (StreamReader reader = ...)
{
string line;
while ((line=reader.ReadLine()) != null)
{
result += line;
}
}

but that's because the strings involved become large, so copying them
for each iteration becomes a problem.

It's not nearly so bad to keep a StringBuilder to collect any content
(if indeed you need to) and use normal string operations on any one
particular line.

Have you tried the simplest solution (using strings) and found it too
slow? Have you profiled it?

Dan Aldean · May 11, 2006

Thanks Jon.
I used
private StreangBuilder line = ......
line.Append(sourceFile.ReadLine());

Then I iterated through the "line" (line) to identify the tokens, trim
whitespaces, build the identifiers.

I only used streambuilder, no strings. Even though the strings are more
flexible (IndexOf, Split, Trim), using them excessively is going to pay a
price. I do not know how big the input file is to process it.

So the answer is no, I did not use strings, I don't know how slow it would
be, the immutability was what scared me.

Jon Skeet [C# MVP] · May 11, 2006

Dan Aldean said:
I used
private StreangBuilder line = ......
line.Append(sourceFile.ReadLine());

Then I iterated through the "line" (line) to identify the tokens, trim
whitespaces, build the identifiers.

I only used streambuilder, no strings. Even though the strings are more
flexible (IndexOf, Split, Trim), using them excessively is going to pay a
price. I do not know how big the input file is to process it.

So the answer is no, I did not use strings, I don't know how slow it would
be, the immutability was what scared me.

Well, are you able to process a line at a time? If so, read the line,
process it, and *then* append it.

That's *definitely* worth trying before you start anything more
complicated (and thus error-prone).

Dan Aldean · May 11, 2006

I can process one line at a time. I also need to determine if the next line
continues the current one.
Probably I need Peek for that

Jon Skeet said:
Dan Aldean said:

I used
private StreangBuilder line = ......
line.Append(sourceFile.ReadLine());

Then I iterated through the "line" (line) to identify the tokens, trim
whitespaces, build the identifiers.

I only used streambuilder, no strings. Even though the strings are more
flexible (IndexOf, Split, Trim), using them excessively is going to pay a
price. I do not know how big the input file is to process it.

So the answer is no, I did not use strings, I don't know how slow it
would
be, the immutability was what scared me.

Click to expand...

Well, are you able to process a line at a time? If so, read the line,
process it, and *then* append it.

That's *definitely* worth trying before you start anything more
complicated (and thus error-prone).

Jon Skeet [C# MVP] · May 11, 2006

Dan Aldean said:
I can process one line at a time. I also need to determine if the next line
continues the current one.
Probably I need Peek for that

How often are there continuations? I would suggest keeping a "current
line", and when you read a line, if it's a continuation of the current
line, add it and keep going. If it's not a continuation, process the
"current line", then set the current line to the one you've just read.

Dan Aldean · May 12, 2006

There are quite often continuations, but I don't know what the next read
line is until I find tokens, which might be anywhere in the string. I might
use a second stringbuilder object for the next line until I determine what
type it is.

How do I get string's functionality with the StringBuilder?

Dan Aldean

Barry Kelly

Guest

Ignacio Machin \( .NET/ C# MVP \)

Dan Aldean

Dan Aldean

Dan Aldean

Barry Kelly

Dan Aldean

Jon Skeet [C# MVP]

Dan Aldean

Jon Skeet [C# MVP]

Dan Aldean

Jon Skeet [C# MVP]

Dan Aldean