I need an workaround for Regex limitation

L

Liviu Uba

Hi,

I noticed a strange behaviour, strange from my point of view anyway.

Let's say I have the regex expression: "^\w+".
When I do:

Let's say s = "first second";

Regex r = new Regex( "^\w+", RegexOptions.CaseInsensitive...);
Match m = r.Match(s) ; // here we have m.Succes = true;

m = r.Match(s,6); //here we have m.Success = false;

I would have expected that the ^ will match the beginning of the actual
text queried, but that is not the case.
The fact that ^ does not match this type of call, is possible to work around
but it means a serious performance problem: to match the expression without
^ and to verify that the first match starts from 0, or to copy the string
from the position I query which is not an option for a huge string.

Has anyone got a clue how to make ^ working?
 
K

Kevin Spencer

The '^' operator works just fine. It matches only if the word character
sequence starts at the beginning of the string. Using Regex.Match(s, 6) you
get all Matches that exist beyond index 6 in the string. Since there are
none, you get none back. You think that it should count the index as the
beginning of the string because you are only thinking about your specific
problem. The index is *not* the beginning of the string. Consider the
following, for example:

string[] strings = new string[] {"one", "two", "three", "four", "five",
"six"};
string s = "one twothree four fivesix";
string newResult;
Regex r = new Regex("\\w");

for (int i = 0; i < strings.Length; i++)
{
newResult = r.Match(s, s.IndexOf(strings));
}

In this case, you are looking for any match in the string that is found in
the array, and your results depend upon the position in the string. If the
Match returned is null, the item in the array is not in the string. The
string "one" would be found, as would the string "four." But the strings
"twothree" and "fivesix" would not be found. Now, if you were to use "\w+"
you would not be able to find any other Match than the first.

In other words, the beginning of the string is the logical beginning of the
string. The index in the string is not relevant or related to the beginning
of the string.

Now, if you can state the business rule you're trying to satisfy, I think I
can help with a solution.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Hard work is a medication for which
there is no placebo.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top