Regular expression problem, help please!

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi

I need to write one regex to read all the fields from the following lines /
file format
line 1 - some_alphanumeric,some_alphanumeric,"something,
something",numbers_hyphenatedORnot
line 2 - some_alphanumeric,some_alphanumeric,something
something,numbers_hyphenatedORnot

At first I thought this one will do
"[^"\r\n]*",|[A-Za-z0-9 ]*,|[0-9]*\-[0-9]*

but I am getting the delimiters such as the trailing comma and the
double-quote along with the fields !!

Can this be modified to get the fields only, or at least to get rid of the
trailing comma?

TIA


--
 
line 1 - some_alphanumeric,

[A-Za-z0-9 ]*,

some_alphanumeric,
[A-Za-z0-9 ]*,

"something, something",
"[~"]*",

numbers_hyphenatedORnot
[0-9-]*

All together: [A-Za-z0-9 ]*,[A-Za-z0-9 ]*,"[~"]*",[0-9-]*
line 2 - some_alphanumeric,some_alphanumeric,something
something,numbers_hyphenatedORnot

[A-Za-z0-9 ]*,[A-Za-z0-9 ]*,[A-Za-z0-9 ]*,[0-9-]*

Combine it all into one big one:
[A-Za-z0-9 ]*,[A-Za-z0-9 ]*,("[^"]*")|([A-Za-z0-9 ]*),[0-9-]*

Any character in "A-Za-z0-9 "
* (zero or more times)
,
Any character in "A-Za-z0-9 "
* (zero or more times)
,
Capture
"
Any character not in """
* (zero or more times)
"
End Capture
or
Capture
Any character in "A-Za-z0-9 "
* (zero or more times)
End Capture
,
Any character in "0-9-"
* (zero or more times)

(Interpretation via Regular Expression Workbench)


--
--
Truth,
James Curran
[erstwhile VC++ MVP]

Home: www.noveltheory.com Work: www.njtheater.com
Blog: www.honestillusion.com Day Job: www.partsearch.com

Hi

I need to write one regex to read all the fields from the following lines /
file format
line 1 - some_alphanumeric,some_alphanumeric,"something,
something",numbers_hyphenatedORnot
line 2 - some_alphanumeric,some_alphanumeric,something
something,numbers_hyphenatedORnot

At first I thought this one will do
"[^"\r\n]*",|[A-Za-z0-9 ]*,|[0-9]*\-[0-9]*

but I am getting the delimiters such as the trailing comma and the
double-quote along with the fields !!

Can this be modified to get the fields only, or at least to get rid of the
trailing comma?

TIA
 
Hi Curran

Thanks.
With some minor changes, I get a much cleaner one (compare to the one I
originally had)
[A-Za-z0-9 -]*,|("[^"]*"),|[0-9 ()-]*
but I have one question
1. there is still trialing comma, I guess we can't get rid of it, and just
have to trim it off, right?

TIA

James Curran said:
line 1 - some_alphanumeric,

[A-Za-z0-9 ]*,

some_alphanumeric,
[A-Za-z0-9 ]*,

"something, something",
"[~"]*",

numbers_hyphenatedORnot
[0-9-]*

All together: [A-Za-z0-9 ]*,[A-Za-z0-9 ]*,"[~"]*",[0-9-]*
line 2 - some_alphanumeric,some_alphanumeric,something
something,numbers_hyphenatedORnot

[A-Za-z0-9 ]*,[A-Za-z0-9 ]*,[A-Za-z0-9 ]*,[0-9-]*

Combine it all into one big one:
[A-Za-z0-9 ]*,[A-Za-z0-9 ]*,("[^"]*")|([A-Za-z0-9 ]*),[0-9-]*

Any character in "A-Za-z0-9 "
* (zero or more times)
,
Any character in "A-Za-z0-9 "
* (zero or more times)
,
Capture
"
Any character not in """
* (zero or more times)
"
End Capture
or
Capture
Any character in "A-Za-z0-9 "
* (zero or more times)
End Capture
,
Any character in "0-9-"
* (zero or more times)

(Interpretation via Regular Expression Workbench)


--
--
Truth,
James Curran
[erstwhile VC++ MVP]

Home: www.noveltheory.com Work: www.njtheater.com
Blog: www.honestillusion.com Day Job: www.partsearch.com

Hi

I need to write one regex to read all the fields from the following
lines
/
file format
line 1 - some_alphanumeric,some_alphanumeric,"something,
something",numbers_hyphenatedORnot
line 2 - some_alphanumeric,some_alphanumeric,something
something,numbers_hyphenatedORnot

At first I thought this one will do
"[^"\r\n]*",|[A-Za-z0-9 ]*,|[0-9]*\-[0-9]*

but I am getting the delimiters such as the trailing comma and the
double-quote along with the fields !!

Can this be modified to get the fields only, or at least to get rid of the
trailing comma?

TIA
 
instead of the last * use a + which means "count bigger than one" so that
the comma should disappear.


Hi Curran

Thanks.
With some minor changes, I get a much cleaner one (compare to the one I
originally had)
[A-Za-z0-9 -]*,|("[^"]*"),|[0-9 ()-]*
but I have one question
1. there is still trialing comma, I guess we can't get rid of it, and just
have to trim it off, right?

TIA

James Curran said:
line 1 - some_alphanumeric,

[A-Za-z0-9 ]*,

some_alphanumeric,
[A-Za-z0-9 ]*,

"something, something",
"[~"]*",

numbers_hyphenatedORnot
[0-9-]*

All together: [A-Za-z0-9 ]*,[A-Za-z0-9 ]*,"[~"]*",[0-9-]*
line 2 - some_alphanumeric,some_alphanumeric,something
something,numbers_hyphenatedORnot

[A-Za-z0-9 ]*,[A-Za-z0-9 ]*,[A-Za-z0-9 ]*,[0-9-]*

Combine it all into one big one:
[A-Za-z0-9 ]*,[A-Za-z0-9 ]*,("[^"]*")|([A-Za-z0-9 ]*),[0-9-]*

Any character in "A-Za-z0-9 "
* (zero or more times)
,
Any character in "A-Za-z0-9 "
* (zero or more times)
,
Capture
"
Any character not in """
* (zero or more times)
"
End Capture
or
Capture
Any character in "A-Za-z0-9 "
* (zero or more times)
End Capture
,
Any character in "0-9-"
* (zero or more times)

(Interpretation via Regular Expression Workbench)


--
--
Truth,
James Curran
[erstwhile VC++ MVP]

Home: www.noveltheory.com Work: www.njtheater.com
Blog: www.honestillusion.com Day Job: www.partsearch.com

Hi

I need to write one regex to read all the fields from the following
lines
/
file format
line 1 - some_alphanumeric,some_alphanumeric,"something,
something",numbers_hyphenatedORnot
line 2 - some_alphanumeric,some_alphanumeric,something
something,numbers_hyphenatedORnot

At first I thought this one will do
"[^"\r\n]*",|[A-Za-z0-9 ]*,|[0-9]*\-[0-9]*

but I am getting the delimiters such as the trailing comma and the
double-quote along with the fields !!

Can this be modified to get the fields only, or at least to get rid of the
trailing comma?

TIA
 
Back
Top