PC Review


Reply
Thread Tools Rate Thread

Complex Regular Expression

 
 
=?Utf-8?B?RU5JWklO?=
Guest
Posts: n/a
 
      22nd Mar 2005
Hello,

I'm having a bit of trouble creating my regular expression and need a guru's
help!

Here's what I have...I have a sequence of characters that need to be
validated against the database.

string: ACCCGUCAU[5Br]IAACCU

What I'm trying to do is load the available values from the database and
create my regex pattern from that. Right now I'm basically just using the "|"
operator which gets a lot of it but it still needs more. I'm also escaping
the "[" and "]" characters during generation.

pattern: A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I

My problem is I think I'm escaping things improperly or something because if
I use this whole pattern I'm able to locate all of my "A,C,G,U,I" characters.
However, if I trim off those characters from my regex and start at U\[5Br]...
I can then locate the U[5Br] in my string. This is why I think I've screwed
something up.

What I would really like for this to do is not show me what matches but what
doesn't match.

string: ACCCGUCAU[5Bxxx]IAACCU

pattern: A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I

From this I'd hope to see "U[5Bxxx]" since it's not in the database.

Any ideas?

Thanks in advance.

 
Reply With Quote
 
 
 
 
Niki Estner
Guest
Posts: n/a
 
      23rd Mar 2005
"ENIZIN" <(E-Mail Removed)> wrote in
news:A808E2FC-ACA3-450B-B22D-(E-Mail Removed)...
> Hello,
>
> I'm having a bit of trouble creating my regular expression and need a
> guru's
> help!
>
> Here's what I have...I have a sequence of characters that need to be
> validated against the database.
>
> string: ACCCGUCAU[5Br]IAACCU
>
> What I'm trying to do is load the available values from the database and
> create my regex pattern from that. Right now I'm basically just using the
> "|"
> operator which gets a lot of it but it still needs more. I'm also escaping
> the "[" and "]" characters during generation.
>
> pattern: A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I
>
> My problem is I think I'm escaping things improperly or something because
> if
> I use this whole pattern I'm able to locate all of my "A,C,G,U,I"
> characters.


You can use Regex.Escape to escape a string, however, I don't think that's
your problem.

> However, if I trim off those characters from my regex and start at
> U\[5Br]...
> I can then locate the U[5Br] in my string. This is why I think I've
> screwed
> something up.


There's a 'U' in your alternation before the 'U\[5Br\]' part: This will
match the "U" in the input string. The following "[Br]" part can't be
matched anymore, so the match ends. The regex engine has no reason to do
backtracking, so it simply returns this match (although it's not the longest
possible). You can either give it a reason to backtrack like this:

(A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I)*$

This will backtrack after the failed attempt to match, and find the correct
match (if there is one)

Another way to get a full match is to modify the original alternation
sequence. If you put the 'U' part after the 'U\[...' parts in the
alternation, those will be tried first, resulting in a good match, too. If I
got you right, you build the pattern programatically anyway, so it seems
possible to me to eliminate this kind of situation (multiple alternation
members starting with the same substring); You should be able to build an
"alternation tree" from your input patterns, recursively combining the ones
starting with a common substring:

A|C(\[5F\]|)|G|U(\[5(Br\]|F\]|I\])|)|5-M-C|2'-N-C|I

I think this should always work, as it does more or less the same thing I'd
do if I had to do it without regex's.

> What I would really like for this to do is not show me what matches but
> what
> doesn't match.
>
> string: ACCCGUCAU[5Bxxx]IAACCU
>
> pattern: A|C|G|U|U\[5Br\]|C\[5F\]|U\[5F\]|U\[5I\]|5-M-C|2'-N-C|I
>
> From this I'd hope to see "U[5Bxxx]" since it's not in the database.


But "U" is in the database, so why wouldn't the output be "[5Bxxx]"? If
there actually is a way to find out these characters belong together
(although they're not in the DB), that could make your task a lot easier.

Anyway, assuming you have a pattern that recognizes all correct input
sequences, and assuming you want to the lowest number of "mismatch
characters" (which would be [Bxx] in your example), this should be possible.
I didn't test this too much, but it seems to work:

((?>(A|C(\[5F\]|)|G|U(\[5(Br\]|F\]|I\])|)|5-M-C|2'-N-C|I)*)(?<mismatch>.*?))*$

But I don't know how fast it is if input strings get longer. You can get the
"mismatch characters" from the "mismatch"-group's captures list.

Hope this helps,

Niki


 
Reply With Quote
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Expression too complex in query expression vanderghast Microsoft Access Queries 2 12th Aug 2009 08:26 PM
Expression too complex in query expression Susan Microsoft Access Queries 3 4th Feb 2008 01:20 AM
Regular Expression Regex/Match fails if regular expression returns a null tdmailbox@yahoo.com Microsoft C# .NET 1 31st May 2005 02:19 AM
Customizing Regular Expression Editor for Regular Expression Validator Control Jason Timmerman Microsoft Dot NET Framework 0 27th Oct 2003 08:16 PM
Dynamically changing the regular expression of Regular Expression validator VSK Microsoft ASP .NET 2 24th Aug 2003 03:47 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 03:17 PM.