regex vs. split for speed....how do I use a pointer?

  • Thread starter Thread starter Chance Hopkins
  • Start date Start date
C

Chance Hopkins

Background
---------------

I'm running some tests on a for loop where I need to split a string at the
first whitespace and get all characters from the start of the line to the
first space.

I'm using two methods and getting almost exact results (every test run
varies a bit) from either:

obj.Add(myString..Split(new char[]{' '})[0]);

or

Regex myRegex = new Regex(@"[^\s]+"); <-- outside loop of course

obj.Add(myRegex.Match(myString).ToString());

It takes about 50-58 seconds to go through a list of 800.

Question
-----------------

I read a post where they mentioned the possibility of using unsafe code with
a pointer to move through the string to the first whitespace and suggested
this might be fastest.

Could anyone give me an idea of how to do this with a simple string or point
me at the proper reading material.

I know csharp and java pretty well, but don't really understand the c or c++
stuff.

Thanks
 
sorry, nm. Neither of those things has anything to do with my speed problem.
I tried this:
obj.Add("")

and it still took as long. Then I commented that line out completely and it
still took the same amount of time.
 
Please report back if you try this...

Have you tried a simple String.SubString call to find the space?

-Chris


Chance Hopkins said:
Background
---------------

I'm running some tests on a for loop where I need to split a string at
the first whitespace and get all characters from the start of the line to
the first space.

I'm using two methods and getting almost exact results (every test run
varies a bit) from either:

obj.Add(myString..Split(new char[]{' '})[0]);

or

Regex myRegex = new Regex(@"[^\s]+"); <-- outside loop of course

obj.Add(myRegex.Match(myString).ToString());

It takes about 50-58 seconds to go through a list of 800.

Question
-----------------

I read a post where they mentioned the possibility of using unsafe code
with a pointer to move through the string to the first whitespace and
suggested this might be fastest.

Could anyone give me an idea of how to do this with a simple string or
point me at the proper reading material.

I know csharp and java pretty well, but don't really understand the c or
c++ stuff.

Thanks
 
I don't know where did you get 58 sec from - it must be something else you
do.

Test results from a 800-line file on a 426 MHz Bulverde PPC, The file is
some text grabbed from the web

1. Regex - 890 ms
Regex myRegex = new Regex(@"[^\s]+");

for( int i = 0; i < Data.Length; i ++ )

{

string [] res = Data.Split(' ');

if ( res.Length > 0 )

{

Match m = myRegex.Match(Data);

if ( m != null )

{

string s = m.Value;

}

}

}

2. String.Split - 500 ms
for( int i = 0; i < Data.Length; i ++ )

{

string [] res = Data.Split(' ');

if ( res.Length > 0 )

{

string s = res[0];

}

}



3. String.SubString + String.IndexOf('0') - 66 ms

for( int i = 0; i < Data.Length; i ++ )

{

string s = Data.Substring(0, Data.IndexOf(' '));

}
 
you are right.

It has something to do with the reading the buffer out of the TcpClient I'm
using.

I get the data almost instantly (I'm testing from the craddle on a DSL
connection), but the loop seems to take a really long time.

Thanks for the help. I guess I need to keep trying to locate the problem.

Does anyone have any TcpClient tips?


Alex Feinman said:
I don't know where did you get 58 sec from - it must be something else you
do.

Test results from a 800-line file on a 426 MHz Bulverde PPC, The file is
some text grabbed from the web

1. Regex - 890 ms
Regex myRegex = new Regex(@"[^\s]+");

for( int i = 0; i < Data.Length; i ++ )

{

string [] res = Data.Split(' ');

if ( res.Length > 0 )

{

Match m = myRegex.Match(Data);

if ( m != null )

{

string s = m.Value;

}

}

}

2. String.Split - 500 ms
for( int i = 0; i < Data.Length; i ++ )

{

string [] res = Data.Split(' ');

if ( res.Length > 0 )

{

string s = res[0];

}

}



3. String.SubString + String.IndexOf('0') - 66 ms

for( int i = 0; i < Data.Length; i ++ )

{

string s = Data.Substring(0, Data.IndexOf(' '));

}


--
Alex Feinman
---
Visit http://www.opennetcf.org
Chance Hopkins said:
Background
---------------

I'm running some tests on a for loop where I need to split a string at
the first whitespace and get all characters from the start of the line to
the first space.

I'm using two methods and getting almost exact results (every test run
varies a bit) from either:

obj.Add(myString..Split(new char[]{' '})[0]);

or

Regex myRegex = new Regex(@"[^\s]+"); <-- outside loop of course

obj.Add(myRegex.Match(myString).ToString());

It takes about 50-58 seconds to go through a list of 800.

Question
-----------------

I read a post where they mentioned the possibility of using unsafe code
with a pointer to move through the string to the first whitespace and
suggested this might be fastest.

Could anyone give me an idea of how to do this with a simple string or
point me at the proper reading material.

I know csharp and java pretty well, but don't really understand the c or
c++ stuff.

Thanks
 
Back
Top