Fast way for extracting tokens from a string.

  • Thread starter Thread starter Jensen bredal
  • Start date Start date
J

Jensen bredal

Hello,
I have a string formated in the following way:

s = 1;32;100;32;09;.........;09;76;

I need to extract the numbers separated by the seicolon.
The list can contain several thousands of items and the code is
time critical.

How can i best extract them in C#.

Many Thanks in advance

JB
 
Jensen,

I would just use the Split method on the string itself, passing a
semicolon. It's probably going to give you the fastest performance.

However, I don't know that you should use a string at all. If the
string is rather large, you should be parsing it apart as you are retrieving
the information. For example, if the string was in a file, or being read
over a stream, I would parse it out as I read the characters from the
stream, not once the string was constructed.

If you got it from someplace else, like a database field (where it is in
string format already) and you can't do anything about it being in a string,
use the Split method.

Hope this helps.
 
Nicholas Paldino said:
I would just use the Split method on the string itself, passing a
semicolon. It's probably going to give you the fastest performance.

That would not be my expectation. Split is going to create "several
thousands of" little strings, greatly increasing memory pressure. This
is a classic case where enumerating through Regex.Matches should be
faster than creating an array of strings. Plus a regex could ignore
white space, check the tokens are all digits, &c.
 
Jensen said:
could you provide some sample code?

<semicode> Your sample string

s = 1;32;100;32;09;.........;09;76;

contains a stream of digits followed by semicolons. The regex

(\d+);

will match a stream of digits followed by semicolons, capturing the
digits. This capture is only a matter of calculating string length and
start offset - no substring operations are done until you read a Value
property. So, do a foreach on Regex.Matches() of your data and the
@"(\d+);" regex pattern. Read the 2nd group in each match - that's got
the captured digits, without the semicolon.

</semicode>
 
Back
Top