Regex help

  • Thread starter Thread starter Steven
  • Start date Start date
S

Steven

Hi everyone,

I am pretty much a newbie when it comes to using regular expressions.
I guess all the fancy syntax is pretty much confusing me all the time.

Can someone help me out with an example for the following.

I have the following strings in a file, more like a few hundred

"<record><na>Test Data Form</><t>W</><d></><ph"

1. From every line that I encounter, I want to extract the "Test Data
Form" from the string. Basically the value that lie inside the ">
<". How would I do that using Regex() ?

Also I have a second question. I ended up doing a search and replace
using notepad on the file, but after I was finished replacing what I
did not need in the file, I was left with. "Test Data Form,"

I ended up putting a comma on the last instances of the string
containing "</><t>" this value. That ended up leaving me with another
question with regex().

2. How would you scan lets say a few hundred strings in a file and
when you get to the "," or ", " you would delete everything from that
point onwards so you will only be left with what you need.

old string: "Test Data Form, xy1234"

new string: "Test Data Form"



Thanks for assisting me with this, it confuses me a lot.



Steven
 
Steven said:
I am pretty much a newbie when it comes to using regular expressions.
I guess all the fancy syntax is pretty much confusing me all the time.

Can someone help me out with an example for the following.

I have the following strings in a file, more like a few hundred

"<record><na>Test Data Form</><t>W</><d></><ph"

1. From every line that I encounter, I want to extract the "Test Data
Form" from the string. Basically the value that lie inside the "> <".
How would I do that using Regex() ?

If you explain the rule precise enough, then we can come up with
a regex.

The rules you described would return two strings "Test Data Form"
and "W".

It would be something like "(?:>)([^<]+)(?:<)".

If you explain why "W" should not match then we can probably
find a regex for that.
Also I have a second question. I ended up doing a search and replace
using notepad on the file, but after I was finished replacing what I did
not need in the file, I was left with. "Test Data Form,"

I ended up putting a comma on the last instances of the string
containing "</><t>" this value. That ended up leaving me with another
question with regex().

2. How would you scan lets say a few hundred strings in a file and when
you get to the "," or ", " you would delete everything from that point
onwards so you will only be left with what you need.

old string: "Test Data Form, xy1234"

new string: "Test Data Form"

Why not simply use String IndexOf and String Substring for this?

Arne
 
Hi Arne,

Let me see if I can explain what I am trying to do a bit more.

This is my string:

"<record><na>Test Data Form</><t>W</><d></><ph"

I only want to extract the "Test Data Form" from the string,
everything else I don't need.


I guess since you mentioned the IndexOf() and Substring() functions, I
will have to try and use those to split the string at whatever char I
need to split it from.
 
Steven said:
Let me see if I can explain what I am trying to do a bit more.

This is my string:

"<record><na>Test Data Form</><t>W</><d></><ph"

I only want to extract the "Test Data Form" from the string, everything
else I don't need.

Yes. But what is the criteria for not picking "W" ?

You just want the first ?
I guess since you mentioned the IndexOf() and Substring() functions, I
will have to try and use those to split the string at whatever char I
need to split it from.

IndexOf and Substring were for the second problem.

Regex is fine for the first problem.

Arne
 
Yes. But what is the criteria for not picking "W" ?

You just want the first ?

Yes, I just want the first, the other parts of the string is just garbage.
 
Steven said:
Yes, I just want the first, the other parts of the string is just garbage.

Then the regex is still good. You just call Regex.Match instead of
Regex.Matches.

Arne
 
Yes, I just want the first, the other parts of the string is just garbage..

Regex is really hard to do right. I suggest you go online, get a
bunch of examples, and try it until you get it right. It takes a
couple of hours, but it's worth it. Your example is not one of the
trivial 'receipe' examples so to expect somebody here to do the work
is a bit much.

For example, here is a Regex to split words out of a sentence: string
text; string[] words = Regex.Split(text, @"\W+"); An excellent source
is the C# book by Albahari et al., which has a chapter on Regex.

Lots of examples on the Net as well. One thing confusing to me is
whether ">" symbol appears as ASCII in your string. If it does, it
makes your extraction much easier (search for ascii that's two times
the ">" symbol from the left).

Good luck.

RL
 
RayLopez99 said:
Regex is really hard to do right. I suggest you go online, get a
bunch of examples, and try it until you get it right. It takes a
couple of hours, but it's worth it. Your example is not one of the
trivial 'receipe' examples so to expect somebody here to do the work
is a bit much.

"(?:>)([^<]+)(?:<)" is not that complex ...

Arne
 
RayLopez99 said:
Yes, I just want the first, the other parts of the string is just garbage.

Regex is really hard to do right. I suggest you go online, get a
bunch of examples, and try it until you get it right. It takes a
couple of hours, but it's worth it. Your example is not one of the
trivial 'receipe' examples so to expect somebody here to do the work
is a bit much.

For example, here is a Regex to split words out of a sentence: string
text; string[] words = Regex.Split(text, @"\W+"); An excellent source
is the C# book by Albahari et al., which has a chapter on Regex.

Lots of examples on the Net as well. One thing confusing to me is
whether ">" symbol appears as ASCII in your string. If it does, it
makes your extraction much easier (search for ascii that's two times
the ">" symbol from the left).

Good luck.

RL

If you have not downloaded Expresso, do so. It makes understanding and
building patterns fairly easy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Regex in C# 4
Regex woes 8
Regex question 1
regex replace question 1
Regex help needed 1
RegEx Help 2
Newbie question about Regex 8
regex question 3

Back
Top