Regex help needed...

J

JS

I am writing a C# app that needs to parse a sentence entered by the user
for a simple boolean search.
I need to capture all of the AND words that are not inside of double
quotes. However, I am having a heck of a time figuring out a regex for it.
Can anyone assist with a regex to find all the AND's not in double quotes?

An example sentence might be:

red and blue and "crazy elephant" and "orange and red" and stuff.

I would need the 1st, 2nd, 3rd and 5th AND in the sentence, but not the 4th
one that is in "orange AND red".

I have several other parsing expressions in this program, but for some
reason, this particular regex eludes me, and I have been at it for some
time.

Any help would be appreciated.

TIA
-JS

PS: if there is a better usenet group for this question, please advise, as
I could not find one just for regex.
 
K

Kevin Spencer

You don't want a Regular Expression here. For example, as a human user is
inputting the string, what happens when the user inputs the following:

red and blue and "crazy elephant and "orange and red" and stuff.

Note that there are THREE sets of double-quotes in the input. So, what's
inside double-quotes, and what is not? Is the "and" after "elephant" inside
double-quotes? Is the "and" between "orange" nad "red" inside double quotes?
Are both? are neither?

you're only option here is to split the string on the double-quotes, and
then count. When you hit a double-quote, anything after it that is followed
by the next double-quote is "inside double-quotes." If there IS no next
double-quote, NOTHING after the first double-quote is inside double-quotes.

You will need to split the string in order to parse it in any case.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
You can lead a fish to a bicycle,
but you can't make it stink.
 
J

JS

After over an hour of working on this one...it comes to me minutes after I
post...Murphy's Law I guess...

Anyway, in case anyone needs this, the answer is...

(?:".+?")?(\s+and\s+)(?:".+?")?



Can anyone assist with a regex to find all the AND's not in
double quotes?

<snip>
 
J

JS

Thanks for the replay. You provide a very good point about the quotes.
Such a string as you provided would not pass my initial validator. In
order to help prevent any type of SQL injections, I do not allow the user
to enter symbols within the quoted sets. I also do a check for even
number of double quotes. Both of these are in an end user syntax
validator message. All symbols outside of the double quotes are allowed,
but are subsiquently removed or replaced before this regex is applied.
 
K

Kevin Spencer

Hi JS,

How many search engines have you seen that throw an exception or do not
allow certain characters to be input by the user? I haven't seen any. The
reason is, users are not always very smart people, and get discouraged
easily. It's more user-friendly to simply accept the input and deal with the
inconsistencies and possible attacks internally. Just a suggestion.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
You can lead a fish to a bicycle,
but you can't make it stink.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top