Stop Regex.Match(string, int) If First Character Doesn't Match

J

jehugaleahsa

Hello:

I have a large input string. Instead of contantly chopping off the
front of the string (which is inefficient), I just want to maintain
the index. I then pass the index into the Regex.Match(string, int)
method.

I would like this method to stop searching as soon as it realizes that
the first character is not a match.

You can't use the ^ character, in this case, because the method does
not consider the given index as the first index in the string.

The problem with letting the pattern check past the first character is
the potential performance hit. If the first character is not a match
and the input string is long, it could take a long time for the regex
to wor through the remainder of the input. Worse, it might succeed,
but in the wrong location.

Here is a temporary implementation that inefficiently checks the
remainder of the input string.

public bool IsMatch(string input, int startingIndex, out int
length)
{
Match match = _regularExpression.Match(input,
startingIndex);
if (match.Success && match.Index == startingIndex)
{
length = match.Length;
return true;
}
else
{
length = 0;
return false;
}
}

The regular expression is defined elsewhere. This is really only a
performance problem when the input is very large, but I'd like to see
if there was a workaround.

Thanks for any pointers!

~Travis Parks
 
J

jehugaleahsa

I have a large input string. Instead of contantly chopping off the
front of the string (which is inefficient), I just want to maintain
the index. I then pass the index into the Regex.Match(string, int)
method.
I would like this method to stop searching as soon as it realizes that
the first character is not a match.
You can't use the ^ character, in this case, because the method does
not consider the given index as the first index in the string.
[...]

I'm surprised that Regex doesn't work as you'd hope.  You might consider  
filing a report on Microsoft's Connect web site, describing the current  
behavior as a bug, and see if they agree.  Can't hurt.

In the meantime, one approach is to use String.Substring() instead of  
providing the index to Regex.  I haven't double-checked, but I'm pretty 
sure the Substring() method doesn't create a copy of the data; it just  
uses the original string's buffer with the new starting index and length. 
Then you can make your expression with the "^".

Looking at the docs for Regex.Match(), it looks to me as though you may  
also be able to use the "\G" assertion in lieu of the "^" character, to  
accomplish the same thing, instead of using String.Substring().  Normally  
that assertion would require contiguous matches, but it also has the  
effect of requiring the match to start at the first character considered  
when there is no "previous match".

Pete

I need to learn to read: http://msdn.microsoft.com/en-us/library/3583dcyh.aspx.

It says explicitly what to do. I feel like a dope.

Thanks,
Travis
 
J

jehugaleahsa

I have a large input string. Instead of contantly chopping off the
front of the string (which is inefficient), I just want to maintain
the index. I then pass the index into the Regex.Match(string, int)
method.
I would like this method to stop searching as soon as it realizes that
the first character is not a match.
You can't use the ^ character, in this case, because the method does
not consider the given index as the first index in the string.
[...]

I'm surprised that Regex doesn't work as you'd hope.  You might consider  
filing a report on Microsoft's Connect web site, describing the current  
behavior as a bug, and see if they agree.  Can't hurt.

In the meantime, one approach is to use String.Substring() instead of  
providing the index to Regex.  I haven't double-checked, but I'm pretty 
sure the Substring() method doesn't create a copy of the data; it just  
uses the original string's buffer with the new starting index and length. 
Then you can make your expression with the "^".

Looking at the docs for Regex.Match(), it looks to me as though you may  
also be able to use the "\G" assertion in lieu of the "^" character, to  
accomplish the same thing, instead of using String.Substring().  Normally  
that assertion would require contiguous matches, but it also has the  
effect of requiring the match to start at the first character considered  
when there is no "previous match".

Pete

I need to learn to read: http://msdn.microsoft.com/en-us/library/3583dcyh.aspx.

It says explicitly what to do. I feel like a dope.

Thanks,
Travis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top