How do I get string's functionality with the StringBuilder?

D

Dan Aldean

Hello,

I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content, but
with the immutability the memory is badly managed.
StringBuilder is a good alternative in string processing but it lacks at
least the two methods above.
Not to mention that I would also need: EndsWith(), IndexOf(), LastIndex(),
and so on.
Does anyone know a work around, other than writing code myself to do the
job?
In my opinion, the same methods should be there for StringBuilder too.

Thanks.
 
B

Barry Kelly

Dan Aldean said:
I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content, but

What about StreamReader?

-- Barry
 
G

Guest

And how is your text file big and how often you manipulate with elemens?
Take into account that performance gap is significant if your iterating
25000 and more strings in circle.

Usually using "string" is an appropriate solution
I have a file with CR/LF separated text.
string.Trim() and string.Split() came very handy to process the content, but
with the immutability the memory is badly managed.
StringBuilder is a good alternative in string processing but it lacks at
least the two methods above.
Not to mention that I would also need: EndsWith(), IndexOf(), LastIndex(),
and so on.
Does anyone know a work around, other than writing code myself to do the
job?
In my opinion, the same methods should be there for StringBuilder too.

--
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche
 
I

Ignacio Machin \( .NET/ C# MVP \)

Hi,

What do you want to do?

the operations you mention : Split, Trim create new strings and may have
some impact in the performance,
EndsWith, IndexOf, etc does not change the string at all and have no impact
in the performance.
 
D

Dan Aldean

Thanks for the reply Barry.
I use StreamReader to read but I need to process the content. For example I
need to find trailing spaces before a '>' character and remove them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it.
Split() and then Trim() would have helped, but I cannot afford to use
strings as the file can be big and I have to process every line I read.
 
D

Dan Aldean

Thanks for the reply. Basically the file is big and stream is not a
solution, as I manipulate a lot.
 
D

Dan Aldean

Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and remove
them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.
 
B

Barry Kelly

Dan Aldean said:
Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and remove
them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.

If you have seriously long lines, I recommend that you use the
techniques of lexical analysis. Basically:

* Read your strings as System.String from StreamReader.ReadLine().
* Tokenize the strings using manual integer indexing and classify them
according to how you want to modify them.
* Write a loop which sucks in from your tokenizer and builds up a
resulting StringBuilder according to your modification rules.

This change would at least make the algorithm linear with respect to
input line length.

If the lines are very long (i.e. something that's going to really fall
out of the CPU cache), you might consider working with some kind of
pooled char arrays, using array operations to copy ranges, and thus
reduce memory management overhead. That will really help if your strings
are bigger than 80,000 bytes (i.e. 40,000 chars), since in that case
they'll fall into the large object heap and don't get collected until
generation 2 GCs.

To get the benefit from char arrays would mean using
TextReader.ReadBlock() instead of ReadLine(), and breaking into lines in
the tokenizer yourself.

-- Barry
 
J

Jon Skeet [C# MVP]

Dan Aldean said:
Thanks Ignacio.
I have a class that handles this file, which is very big.
I have a method that reads and processes the content of each line
I use StreamReader to read the lines: myFile.ReadLine()

For example I need to find trailing spaces before a '>' character and remove
them.
Also within the string I should look for a character splitter ':' and remove
the spaces before and after it. I need to get the content between two
separators and save it.
Split() and then Trim() would have helped a lot, but with a file this big
strings are not recommended.

Are you sure you're not misinterpreting advice for a different
situation? It's not advisable to read a file by doing:

string result = "";

using (StreamReader reader = ...)
{
string line;
while ((line=reader.ReadLine()) != null)
{
result += line;
}
}

but that's because the strings involved become large, so copying them
for each iteration becomes a problem.

It's not nearly so bad to keep a StringBuilder to collect any content
(if indeed you need to) and use normal string operations on any one
particular line.

Have you tried the simplest solution (using strings) and found it too
slow? Have you profiled it?
 
D

Dan Aldean

Thanks Jon.
I used
private StreangBuilder line = ......
line.Append(sourceFile.ReadLine());

Then I iterated through the "line" (line) to identify the tokens, trim
whitespaces, build the identifiers.

I only used streambuilder, no strings. Even though the strings are more
flexible (IndexOf, Split, Trim), using them excessively is going to pay a
price. I do not know how big the input file is to process it.

So the answer is no, I did not use strings, I don't know how slow it would
be, the immutability was what scared me.
 
J

Jon Skeet [C# MVP]

Dan Aldean said:
I used
private StreangBuilder line = ......
line.Append(sourceFile.ReadLine());

Then I iterated through the "line" (line) to identify the tokens, trim
whitespaces, build the identifiers.

I only used streambuilder, no strings. Even though the strings are more
flexible (IndexOf, Split, Trim), using them excessively is going to pay a
price. I do not know how big the input file is to process it.

So the answer is no, I did not use strings, I don't know how slow it would
be, the immutability was what scared me.


Well, are you able to process a line at a time? If so, read the line,
process it, and *then* append it.

That's *definitely* worth trying before you start anything more
complicated (and thus error-prone).
 
D

Dan Aldean

I can process one line at a time. I also need to determine if the next line
continues the current one.
Probably I need Peek for that

Jon Skeet said:
Dan Aldean said:
I used
private StreangBuilder line = ......
line.Append(sourceFile.ReadLine());

Then I iterated through the "line" (line) to identify the tokens, trim
whitespaces, build the identifiers.

I only used streambuilder, no strings. Even though the strings are more
flexible (IndexOf, Split, Trim), using them excessively is going to pay a
price. I do not know how big the input file is to process it.

So the answer is no, I did not use strings, I don't know how slow it
would
be, the immutability was what scared me.


Well, are you able to process a line at a time? If so, read the line,
process it, and *then* append it.

That's *definitely* worth trying before you start anything more
complicated (and thus error-prone).
 
J

Jon Skeet [C# MVP]

Dan Aldean said:
I can process one line at a time. I also need to determine if the next line
continues the current one.
Probably I need Peek for that

How often are there continuations? I would suggest keeping a "current
line", and when you read a line, if it's a continuation of the current
line, add it and keep going. If it's not a continuation, process the
"current line", then set the current line to the one you've just read.
 
D

Dan Aldean

There are quite often continuations, but I don't know what the next read
line is until I find tokens, which might be anywhere in the string. I might
use a second stringbuilder object for the next line until I determine what
type it is.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top