Regex question

S

Stephan Rose

Having some trouble with a regex that I hope someone can help me with.

The data I am processing looks as follows:

15 items per dataset. Most datasets are on only 1 line of text,
however on occasion a few text fields are multi-line making the
dataset span more than 1 line.

Each data item is surrounded by ", and the items are seperated by a ;.
The last item however is not terminated with a ;.

There are no quotes within quotes.

So basically the whole thing looks like this:

"1";"2";"3";..."15"

The regex I came up with almost works the way I need it to, however on
occasion some data items are empty resulting in ""; and in that case
my regex just skips it and doesn't return a match. That of course
throws off the position of the next data items and everything goes all
bad...I would need it to return a 0 length string for those items.

Can anyone help me with what I need to modify to make this work? Here
is the current regex: [^\"]*[^\";]

I am very tempted to just go do it the old fashioned way manually, but
if I can get this regex to work, that would be nicer.

Thanks all!
 
C

Chris Chilvers

Having some trouble with a regex that I hope someone can help me with.

The data I am processing looks as follows:

15 items per dataset. Most datasets are on only 1 line of text,
however on occasion a few text fields are multi-line making the
dataset span more than 1 line.

Each data item is surrounded by ", and the items are seperated by a ;.
The last item however is not terminated with a ;.

There are no quotes within quotes.

So basically the whole thing looks like this:

"1";"2";"3";..."15"

The regex I came up with almost works the way I need it to, however on
occasion some data items are empty resulting in ""; and in that case
my regex just skips it and doesn't return a match. That of course
throws off the position of the next data items and everything goes all
bad...I would need it to return a 0 length string for those items.

Can anyone help me with what I need to modify to make this work? Here
is the current regex: [^\"]*[^\";]

I am very tempted to just go do it the old fashioned way manually, but
if I can get this regex to work, that would be nicer.

Thanks all!

Something like:
..*?"(.*?)".*?(?:;|$)

..*?" -- ignore any charcacters until we find a opening "
(.*?)" -- capture all characters until we find the closing speach mark
(?:;|$) -- ignore any charcters until we find a ; or end of line

The would match one, you could call it multiple times to get each match
or make it:
(?:.*?"(.*?)".*?(?:;|$))*

to capture all the matches at once.

This is assuming you don't mind it accepting things like

aoeu "1"; "2" oaeuh; "3"

Which will find the values 1, 2, 3 and ignore the garbage outside the
quotes.
 
K

Kevin Spencer

Hi Stephan,

This may work for you:

"([\w]*)"

You didn't say, so I had to make some assumptions. First, I assumed that the
values in the items would be alphanumeric ("word" characters"). I also
assumed that there would not be other content in the target string, which
does not conform to the pattern you laid down.

Basically, it works like this:

Find a quote, followed by zero or more word characters, followed by a quote.
Put the word characters into Group 1.

Under these condition, it doesn't matter *what* delimits them, as anything
which doesn't match the pattern is eliminated. I tested it on the following:

"1";"2";"3";
"4";"5";"15";""
"..." " "";";";
ljkhl";"[]"; jhg"jkh"

Note that in the third line there are no delimiters, and a stray quote. It
found none of the third line. In the fourth line, however, it did match the
"jkh" at the end, because it satisfied the match condition.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Hard work is a medication for which
there is no placebo.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top