Regular Expression - More learning

O

O-('' Q)

In my effort to fully understand how to use the RegExp engine in c#.net
(VS2005), I have begun to tinker with it more since I was shown about
the ability the other day in a lengthy newsgroup discussion by Jon
Skeet
(http://groups.google.com/group/micr...Anyone+know+how&rnum=9&hl=en#889b79e800a05dad).

I had not known about regular expressions in c# until that point.

Anyway, here is what I am trying to do:

Example Input: displaycopy('2,10,9,15')
Example Output: 2,10,9,15

Basically, trying to ignore the word "displaycopy" and anything that is
not a number or a comma. What I have come up with so far for a pattern
is this:

Regex re = new Regex("displaycopy\" *.\"([^(]*)");

However, it does not work as I am sure you've guessed by now.

I feel I *may* be close to using this properly, but still need a bit of
guidance with it. I am sure it is also possible that my attempt is
completely and utterly WRONG. But, this is part of learning and I am
sure the fine folks in this newsgroup (Jon included) will be able to
help me solve this puzzle.

Thanks in advance all.

-- Kirby
 
K

Kevin Spencer

Well, first, we have to determine what the exact format of your input string
is going to be. How you parse it depends upon the rules that the input
string follows. In other words, we don't need an example of an input string;
we need the business rules that define the format of the input string. For
example, without business rules, the input string could be:

"XC5];,&6TF , 3, TGI*//':;,8905XC@++"

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
You can lead a fish to a bicycle,
but you can't make it stink.
 
J

Jon Skeet [C# MVP]

Anyway, here is what I am trying to do:

Example Input: displaycopy('2,10,9,15')
Example Output: 2,10,9,15

Basically, trying to ignore the word "displaycopy" and anything that is
not a number or a comma. What I have come up with so far for a pattern
is this:

Regex re = new Regex("displaycopy\" *.\"([^(]*)");

However, it does not work as I am sure you've guessed by now.

As Kevin said, a bit more information is required - don't get me wrong
though, the sample input is useful :)

A few questions:

1) Does the string always start with "displaycopy('" and end with "')"?
2) Will there ever be anything you want to ignore between the two?

If the answers are "yes" and "no", then regular expressions aren't what
you want here - you just want:

string useful = whole.Substring (13, whole.Length-15);

If the answers are "yes" and "yes", then a regular expression *might*
still be the best thing to use - although it could be complicated. I'm
hoping for the simple solution :)

Do you need to parse the results into integers, by the way? That could
affect which solution ends up being best...
 
O

O-('' Q)

Sorry for my delayed reply. :)
As Kevin said, a bit more information is required - don't get me wrong
though, the sample input is useful :)

Ok, I will try.
A few questions:

1) Does the string always start with "displaycopy('" and end with "')"?

<script type="text/javascript"><!--
displaycopy('2,10,9,15'');
//--></script>

That is what the line always looks like. The only thing that changes is
the number inside the quotes. It could be 12,4,99,1 if that helps.
2) Will there ever be anything you want to ignore between the two?

Nothing between the two needs to be ignore. I just need the numbers and
the commas to output back to the user (me).
Do you need to parse the results into integers, by the way? That could
affect which solution ends up being best...

No, I just need to return a string. Integers are not necessary. Also,
your "simple" solution may be fine, hehe. I will give it a go and come
back later on to check on whether or not my further explaining helped
any.
 
J

Jon Skeet [C# MVP]

O-('' Q) said:
Sorry for my delayed reply. :)

Delayed? It's been 40 minutes. In newsgroup terms, that's the
equivalent of replying before I've taken the next breath ;)
Ok, I will try.


<script type="text/javascript"><!--
displaycopy('2,10,9,15'');
//--></script>

Okay - so there'll be other stuff around it, we just need whatever's in
the displaycopy bit, right? Will there only be one displaycopy in the
whole document (or whatever you've got in the string)?
That is what the line always looks like. The only thing that changes is
the number inside the quotes. It could be 12,4,99,1 if that helps.

That's really handy.
Nothing between the two needs to be ignore. I just need the numbers and
the commas to output back to the user (me).

Great :)
No, I just need to return a string. Integers are not necessary. Also,
your "simple" solution may be fine, hehe. I will give it a go and come
back later on to check on whether or not my further explaining helped
any.

If you've got the relevant line on its own already, that's going to be
the simplest solution - and if you're able to easily read the entire
HTML line by line, it's easy to check with something like:

if (line.StartsWith("displaycopy('") &&
line.EndsWith("')"))

Hope that helps.
 
J

Jon Skeet [C# MVP]

Jon Skeet said:
Delayed? It's been 40 minutes. In newsgroup terms, that's the
equivalent of replying before I've taken the next breath ;)

<snip>

I should have added here, by the way, that I'm off to bed, so I'm
afraid you'll have to wait another 9 hours or so for another reply from
me. Of course, the beauty of newsgroups is that during those 9 hours
there are likely to be other people just as capable of answering your
question as I am :)
 
O

O-('' Q)

Have a good night, Jon. Thanks again for your input, it's been
extremely... enlightening. :)
 
O

O-('' Q)

Okay - so there'll be other stuff around it, we just need whatever's in
the displaycopy bit, right? Will there only be one displaycopy in the
whole document (or whatever you've got in the string)?

That's really handy.

There is only one instance of "displaycopy." And I am glad I am giving
better information now. :)
If you've got the relevant line on its own already, that's going to be
the simplest solution - and if you're able to easily read the entire
HTML line by line, it's easy to check with something like:

if (line.StartsWith("displaycopy('") &&
line.EndsWith("')"))

Hope that helps.

Everything helps, Jon. Everything. :)

This was my effort to better understand Regular Expressions in this
environment. I am just hoping that I was at least in the general
geographical location with my crazy attempt in my original post, hehe.

Thanks again, all.
 
O

O-('' Q)

Ok, this one was solved. Thanks a TON for your hints and helpful code
examples.

Still, would like to see how the regular expression for this works. It
would really help me understand it some more. :)

The string.Substring routine was key. With a few small adjustments, it
did exactly what was needed and was easier to figure out than regex
was!

-- Kirby
 
J

Jon Skeet [C# MVP]

O-('' Q) said:
Ok, this one was solved. Thanks a TON for your hints and helpful code
examples.

Still, would like to see how the regular expression for this works. It
would really help me understand it some more. :)

The string.Substring routine was key. With a few small adjustments, it
did exactly what was needed and was easier to figure out than regex
was!

Working out when to use regular expressions and when to use
Substring/IndexOf is a matter of taste, and some people like to use
regular expressions far more than I do. I'll have a look at the regex
though, and try to figure out what was wrong.
 
J

Jon Skeet [C# MVP]

O-('' Q) wrote:

Anyway, here is what I am trying to do:

Example Input: displaycopy('2,10,9,15')
Example Output: 2,10,9,15

Regex re = new Regex("displaycopy\" *.\"([^(]*)");

Just to go through your regular expression - it's actually looking for
a string starting with displaycopy then a double quote, then any number
of spaces, then any single character, then another double quote, then
any number of non-open-brackets, then a close bracket.

What I believe you want is:

Regex re = new Regex(@"displaycopy\('([^']*)'\)");

See if you can work out why :)

Jon
 
O

O-('' Q)

What I believe you want is:
Regex re = new Regex(@"displaycopy\('([^']*)'\)");

See if you can work out why :)

Wow, I was way off, hehe.

I can see now where I went wrong for sure, given your example. It was
ignoring some of the objects in the string I needed and looking for
only parts of the same. If I am correct in saying so, that is what it
looks like to me now that I see how you formed the syntax.

I should really consider some classes on this or something. I can learn
a lot here, sure, but a class sounds like a fun and interesting
challenge. I should probably pick up a book while I am out today, too.
Can you recommend anything good for beginners?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top