Regular expressions performance problem

J

James Dean

I wanted to use regular expressions but unfortunetely it is too
slow.....Should they be so slow or am i doing something wrong. I am
reading in bytes from a file then converting them to char then making a
string out of each of the individual bytes. I check if its in the
correct format...and take out the various paretres i need. It looked
nice and neat so i am not happy that i may have to use another
method.....any alternative solutions?.



*** Sent via Devdex http://www.devdex.com ***
Don't just participate in USENET...get rewarded for it!
 
N

Niki Estner

James Dean said:
I wanted to use regular expressions but unfortunetely it is too
slow.....Should they be so slow or am i doing something wrong.

Usually they're pretty fast. In my experience, they are faster than most
other available searching methods (like String.IndexOf).
I am
reading in bytes from a file then converting them to char then making a
string out of each of the individual bytes.

Do you use a StringBuilder to create the string?
Why don't you use a StreamReader class?
I check if its in the
correct format... and take out the various paretres i need. It looked
nice and neat so i am not happy that i may have to use another
method.....any alternative solutions?.

As I said, regexes are usually quite fast, but that does of course depend on
the pattern you match for. So you could do the following:
1. Find out if the regex is really the bottleneck.
2. If it is, post your regex here, maybe with some sample data.

Niki
 
T

Tamir Khason

Regex is the fastest method to search in all languages, the only problem
with it is complicated syntax. I think that your performance issue is only
result of wrong regex. Please post, if you want, your regex and text you are
looking in and we try to help you withit.
 
J

James Dean

byte myValue = fileMemBufferIn.GetCurrentByte

char convertValue = Convert.ToChar(myValue);
//Select is a string type
Select += Convert.ToString(convertValue);


i use this to check the string to see if its in the correct
format
if(Regex.Match(Select,@"((&|;)\d+(&|;))+[A-Za-z]{1}").Success)

then i get each of the parameters from command:
public bool GetCommand(string headerValue)
{
paramCount = 0;
Regex SeperateParams = new Regex(@"(&|;)\d+((&|;) | (&[a-zA-Z]))*");
foreach(Match myMatches in SeperateParams.Matches(headerValue))
{
string values = myMatches.Value.ToString();
values = Regex.Replace(values,@"&","");
values = Regex.Replace(values,@";","");
values = Regex.Replace(values,@"[A-Za-z]","");
HeaderParameters[paramCount] = int.Parse(values);
paramCount++;

}
this.GetParameter1 = HeaderParameters[0];this.GetParameter2 =
HeaderParameters[1];this.GetParameter3 =
HeaderParameters[2];this.GetParameter4 = HeaderParameters[3];
for(int t = 0;t < 4;t++)
{
HeaderParameters[t] = -1;
}
return true;



return false;

}

The commands i read in are in the format
"&Param1;Param2;Param3.....&a(Some character from a-z)


*** Sent via Devdex http://www.devdex.com ***
Don't just participate in USENET...get rewarded for it!
 
N

Niki Estner

The performance problem is most probably due to your string usage, not due
to the regex.

James Dean said:
byte myValue = fileMemBufferIn.GetCurrentByte

char convertValue = Convert.ToChar(myValue);
//Select is a string type
Select += Convert.ToString(convertValue);

everytime this line is hit, it will create a new string, copy the old one
into it, discard the old one, and continue with the new one. You actually
have a O(n^2) string reading algorithm... Use the StreamReader class!
i use this to check the string to see if its in the correct
format
if(Regex.Match(Select,@"((&|;)\d+(&|;))+[A-Za-z]{1}").Success)

You didn't supply and sample data. It does look ok, though.
then i get each of the parameters from command:
public bool GetCommand(string headerValue)
{
paramCount = 0;
Regex SeperateParams = new Regex(@"(&|;)\d+((&|;) | (&[a-zA-Z]))*");
foreach(Match myMatches in SeperateParams.Matches(headerValue))
{
string values = myMatches.Value.ToString();
values = Regex.Replace(values,@"&","");
values = Regex.Replace(values,@";","");
values = Regex.Replace(values,@"[A-Za-z]","");

Ugh. You know that each of those "replace" operations again has to create a
new string (see above)? Use capturing paranthesis to get data out of the
regex.
HeaderParameters[paramCount] = int.Parse(values);

Does HeaderParameters adjust it's size automatically? Or do you set it to
the correct size at some point?
paramCount++;

}
this.GetParameter1 = HeaderParameters[0];this.GetParameter2 =
HeaderParameters[1];this.GetParameter3 =
HeaderParameters[2];this.GetParameter4 = HeaderParameters[3];

Whatever this might be, it pretty sure looks ugly. It looks as it should
better be a loop.
for(int t = 0;t < 4;t++)
{
HeaderParameters[t] = -1;
}

Using something like "HeaderParameters.Length" is probably better here, as
it would allow you to change the size of HeaderParameters some day without
breaking code like that.
return true;



return false;

}

The commands i read in are in the format
"&Param1;Param2;Param3.....&a(Some character from a-z)

Hardly. The regex you use to check your input for correctness won't eat
this.

Niki
 
T

Tamir Khason

What do you want to parse?
strings such as
&anything;anything;anything&anything;anything;anything;anything;anything;any
thing;anything;anything ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top