Regex help

S

Steven

Hi everyone,

I am pretty much a newbie when it comes to using regular expressions.
I guess all the fancy syntax is pretty much confusing me all the time.

Can someone help me out with an example for the following.

I have the following strings in a file, more like a few hundred

"<record><na>Test Data Form</><t>W</><d></><ph"

1. From every line that I encounter, I want to extract the "Test Data
Form" from the string. Basically the value that lie inside the ">
<". How would I do that using Regex() ?

Also I have a second question. I ended up doing a search and replace
using notepad on the file, but after I was finished replacing what I
did not need in the file, I was left with. "Test Data Form,"

I ended up putting a comma on the last instances of the string
containing "</><t>" this value. That ended up leaving me with another
question with regex().

2. How would you scan lets say a few hundred strings in a file and
when you get to the "," or ", " you would delete everything from that
point onwards so you will only be left with what you need.

old string: "Test Data Form, xy1234"

new string: "Test Data Form"



Thanks for assisting me with this, it confuses me a lot.



Steven
 
A

Arne Vajhøj

Steven said:
I am pretty much a newbie when it comes to using regular expressions.
I guess all the fancy syntax is pretty much confusing me all the time.

Can someone help me out with an example for the following.

I have the following strings in a file, more like a few hundred

"<record><na>Test Data Form</><t>W</><d></><ph"

1. From every line that I encounter, I want to extract the "Test Data
Form" from the string. Basically the value that lie inside the "> <".
How would I do that using Regex() ?

If you explain the rule precise enough, then we can come up with
a regex.

The rules you described would return two strings "Test Data Form"
and "W".

It would be something like "(?:>)([^<]+)(?:<)".

If you explain why "W" should not match then we can probably
find a regex for that.
Also I have a second question. I ended up doing a search and replace
using notepad on the file, but after I was finished replacing what I did
not need in the file, I was left with. "Test Data Form,"

I ended up putting a comma on the last instances of the string
containing "</><t>" this value. That ended up leaving me with another
question with regex().

2. How would you scan lets say a few hundred strings in a file and when
you get to the "," or ", " you would delete everything from that point
onwards so you will only be left with what you need.

old string: "Test Data Form, xy1234"

new string: "Test Data Form"

Why not simply use String IndexOf and String Substring for this?

Arne
 
S

Steven

Hi Arne,

Let me see if I can explain what I am trying to do a bit more.

This is my string:

"<record><na>Test Data Form</><t>W</><d></><ph"

I only want to extract the "Test Data Form" from the string,
everything else I don't need.


I guess since you mentioned the IndexOf() and Substring() functions, I
will have to try and use those to split the string at whatever char I
need to split it from.
 
A

Arne Vajhøj

Steven said:
Let me see if I can explain what I am trying to do a bit more.

This is my string:

"<record><na>Test Data Form</><t>W</><d></><ph"

I only want to extract the "Test Data Form" from the string, everything
else I don't need.

Yes. But what is the criteria for not picking "W" ?

You just want the first ?
I guess since you mentioned the IndexOf() and Substring() functions, I
will have to try and use those to split the string at whatever char I
need to split it from.

IndexOf and Substring were for the second problem.

Regex is fine for the first problem.

Arne
 
A

Arne Vajhøj

Steven said:
Yes, I just want the first, the other parts of the string is just garbage.

Then the regex is still good. You just call Regex.Match instead of
Regex.Matches.

Arne
 
R

RayLopez99

Yes, I just want the first, the other parts of the string is just garbage..

Regex is really hard to do right. I suggest you go online, get a
bunch of examples, and try it until you get it right. It takes a
couple of hours, but it's worth it. Your example is not one of the
trivial 'receipe' examples so to expect somebody here to do the work
is a bit much.

For example, here is a Regex to split words out of a sentence: string
text; string[] words = Regex.Split(text, @"\W+"); An excellent source
is the C# book by Albahari et al., which has a chapter on Regex.

Lots of examples on the Net as well. One thing confusing to me is
whether ">" symbol appears as ASCII in your string. If it does, it
makes your extraction much easier (search for ascii that's two times
the ">" symbol from the left).

Good luck.

RL
 
A

Arne Vajhøj

RayLopez99 said:
Regex is really hard to do right. I suggest you go online, get a
bunch of examples, and try it until you get it right. It takes a
couple of hours, but it's worth it. Your example is not one of the
trivial 'receipe' examples so to expect somebody here to do the work
is a bit much.

"(?:>)([^<]+)(?:<)" is not that complex ...

Arne
 
F

Family Tree Mike

RayLopez99 said:
Yes, I just want the first, the other parts of the string is just garbage.

Regex is really hard to do right. I suggest you go online, get a
bunch of examples, and try it until you get it right. It takes a
couple of hours, but it's worth it. Your example is not one of the
trivial 'receipe' examples so to expect somebody here to do the work
is a bit much.

For example, here is a Regex to split words out of a sentence: string
text; string[] words = Regex.Split(text, @"\W+"); An excellent source
is the C# book by Albahari et al., which has a chapter on Regex.

Lots of examples on the Net as well. One thing confusing to me is
whether ">" symbol appears as ASCII in your string. If it does, it
makes your extraction much easier (search for ascii that's two times
the ">" symbol from the left).

Good luck.

RL

If you have not downloaded Expresso, do so. It makes understanding and
building patterns fairly easy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Regex in C# 4
Regex question 1
Regex help needed 1
Regex woes 8
Regex question 17
Regex help 1
RegEx Help!! 2
regex replace question 1

Top