Rookie thoughts on Regex--useful but not complete

R

raylopez99

I went through a bunch of Regex examples, and indeed it's quite
powerful, including 'groups' using 'matches', word boundaries,
lookahead matches, replacing and splitting text,etc--apparently
there's a whole book on it, though I just used the chapter in
Albahari et al C#3 in a Nutshell (a great book). You can indeed
'massage' a string into a number of ways, but, at the end of the
process I found that you still have to go through the 'massaged'
string, character by character, and do "fine tuning". I'm sure you
can also do a Regex to do such "fine tuning" but I am at a dead end
for a number of problems and nothing beats a character by character
analysis.

For example, suppose you want to extract all the numbered coefficents
10a, 11d, 12g and 100f from the following: given the string
"xyz10abc11def12gh100f".

string sMY = "xyz10abc11def12gh100f";

Match mMY = Regex.Match(sMY, @"\d\d\D");
Console.WriteLine("result mMY-->{0}", mMY.Value);
Regex r001 = new Regex(@"\d\d\D");
MatchCollection mc2 = r001.Matches(sMY);

foreach (Match m in mc2)
{
Console.WriteLine("m: {0}", m.Value);
}

You'll get output (as an array of strings): 10a, 11d, 12g, 00f, which
is not quite what you want, because of the last value is "00f" not
"100f". So you still have to go through the string "manually" and
extract these "end cases" or "corner cases".

Now you Regex experts I'm sure can find the exact Regex to give the
proper answer (it would help if there's a 'conditional' Regex
expression so you can do either two digits or three digits, but I'm
not aware of one), however, in the general case, I think I'm correct
in saying that while Regex helps, it cannot get all the corner cases.

RL
 
C

Chris Dunaway

You'll get output (as an array of strings): 10a, 11d, 12g, 00f, which
is not quite what you want, because of the last value is "00f" not
"100f". So you still have to go through the string "manually" and
extract these "end cases" or "corner cases".

Now you Regex experts I'm sure can find the exact Regex to give the
proper answer (it would help if there's a 'conditional' Regex
expression so you can do either two digits or three digits, but I'm
not aware of one), however, in the general case, I think I'm correct
in saying that while Regex helps, it cannot get all the corner cases.

You can use this expression:

@"\d{2,3}\D"

which returns 10a, 11d, 12g, and 100f.

The {2,3} applies to the digit (\d) and tells it to look for at least
2 digits but no more than three. You can use the pipe character to
indicate an "or" condition in a regex. For example if your string was
"balltallcallfallhallmall" and you used this regex: "[t|h]all" it
would return matches "tall" and "hall"

Here is a site with many regular expressions. You can examine them
and see how they work. Perhaps it will help in your study.

http://regexlib.com/default.aspx

Chris
 
T

Tom Dacon

Now you Regex experts I'm sure can find the exact Regex to give the
proper answer

however, in the general case, I think I'm correct
in saying that while Regex helps, it cannot get all the corner cases.

Your first point contradicts your second.

The Regex syntax is ugly and demanding, there's no question about that. It's
difficult, if not impossible, to think of another programming syntax that
comes close for ugliness and lack of readability. My intuition tells me that
there are few programmers who consider themselves competent in it, and far
far fewer that can make the claim that they've mastered it.

Nevertheless, it's as powerful as your understanding of it. Rather than
disparage it, I'd suggest that you continue your study by searching for
examples on the web (perhaps consider buying the book that you mentioned!)
and develop for yourself the ability to solve problems such as the simple
one you posed (another poster has already given you the solution).

Tom Dacon
Dacon Software Consulting
 
I

Ignacio Machin ( .NET/ C# MVP )

 however, in the general case, I think I'm correct


Your first point contradicts your second.

The Regex syntax is ugly and demanding, there's no question about that. It's
difficult, if not impossible, to think of another programming syntax that
comes close for ugliness and lack of readability. My intuition tells me that
there are few programmers who consider themselves competent in it, and far
far fewer that can make the claim that they've mastered it.

Nevertheless, it's as powerful as your understanding of it. Rather than
disparage it, I'd suggest that you continue your study by searching for
examples on the web (perhaps consider buying the book that you mentioned!)
and develop for yourself the ability to solve problems such as the simple
one you posed (another poster has already given you the solution).

Tom Dacon
Dacon Software Consulting

How old are you?
Most probably the Regex have been aroudn since before you were born :)
They are not easy to understand at first but later you can deconstruct
most of then easily.
 
T

Tom Dacon

message
How old are you?
Most probably the Regex have been aroudn since before you were born :)
They are not easy to understand at first but later you can deconstruct
most of then easily.


I take it that your question is to me, not to the OP.

I'm 65, and I've been continuously in the software industry since I was 19 -
that would be 1962. I'm currently a .Net software architect, with my .Net
experience going back to about a month after the NDA was lifted at the PDC
conference in LA in 2000. I program these days in C# and VB. Previous
experience was in Unix, various microcomputers, and IBM mainframes. I
started out on PC's in 1982 on DOS 1.0.

My first introduction to regular expressions occurred in the mid to late
1980's, IIRC, when I was given the task of integrating regular expressions
into some proprietary text search software at the company I was then working
for. That was in the C language, I believe, although parts of it were done
in assembly language. It was also around then that I started writing
documentation in HTML's predecessor, SGML, but that's another story.

I stand by my opinion of regular expression syntax. Doing work with regular
expressions is an infrequent task for most programmers. Anything you do
quite infrequently poses enormous loads on your memory. Because the syntax
is visually noisy and unstructured, because it cannot be easily organized to
let our own visual pattern-matching capabilities help in extracting the
sense out of an expression, I personally think that it's a horror, both to
write and to read. This in spite of the fact that I'm as competent at it
myself as I occasionally need to be. You can do something well, but still
not like it.

Your opinion may differ, and I cannot deny you your own. This is mine.

BTW, in 1962 how old were the people who would later become your parents? No
offense intended, of course :)

Tom Dacon
Dacon Software Consulting
 
J

Johnson

I take it that your question is to me, not to the OP.

It appeared to me - an independent observer here - that it was intended for
the OP. The OP here has a long history of posting his observations, musings,
opinions, and rants - most of which serve no purpose other than to showcase
his lack of knowledge and comprehension of .NET development (or software
development in general) coupled with his need to share his glib thoughts
(just like how he kicked off the current thread).
 
I

Ignacio Machin ( .NET/ C# MVP )

message

I take it that your question is to me, not to the OP.

It was for the OP
When I posted it (at least when I read it) your post was not there
yet :)
 
T

Tom Dacon

Even using another newsreader, I've myself occasionally replied to the wrong
post. Probably just about everyone who's active on the ng's has done it one
time or another. If it was a serious trangression of netiquette, we'd
probably all be hanging from the branch of a tree by our necks, swaying
slowly in the wind :)

Tom
 
J

Jeff Johnson

Just one more reason for people to avoid like the plague using Google
Groups as a newsreader. It's fine if you want to do a search (assuming it
hasn't lost the post you're looking for), but as a portal for someone to
actually _contribute_, it's woefully flawed.

A-freakin'-men. I don't care if you use Agent, Thunderbird, or even (gasp!)
Outlook Express*, just use a newsreader people!



*Or any other newsreader out there; those were just the ones I came up with
off the wtop of my head.
 
I

Ignacio Machin ( .NET/ C# MVP )

...then how did you manage to quote his post?!

When I posted the FIRST post TOP's was not there. when I posted the
second it was and my second post is the answer to it
 
I

Ignacio Machin ( .NET/ C# MVP )

Actually, he's using Google Groups.  It's a terrible way to post, and it's  
clear to me from the various posting errors that occur from people using  
it that it's got a terrible, misleading and buggy user-interface.

I'd guess that Ignacio really did think he was replying to the original  
post, but when he finally hit the "Post" button (or "Submit" or whatever  
Google calls it), Google "helpfully" added the quote from the  
newly-arrived post from Tom and made Ignacio's post refer to that one  
instead of the original post as Ignacio expected, creating the confusion  
we've got here.

One clue in favor of this theory is that Ignacio has the good habit of  
trimming quotes, but in that post, the quote wasn't trimmed at all,  
suggesting that he never had a chance to notice the quote was being  
included at all.

Just one more reason for people to avoid like the plague using Google  
Groups as a newsreader.  It's fine if you want to do a search (assumingit  
hasn't lost the post you're looking for), but as a portal for someone to  
actually _contribute_, it's woefully flawed.

Pete

Opps

You are right :) I did responded to Tom's post originally.
TOM: My appologies it was intended to the OP.

Peter:
You are right, google groups is not the best newsreader around :(.
But it's one of the only I have access from the office.
 
I

Ignacio Machin ( .NET/ C# MVP )

A-freakin'-men. I don't care if you use Agent, Thunderbird, or even (gasp!)
Outlook Express*, just use a newsreader people!

In my office they have blocked the 110 port. so I cannot use a
newsreader. I have to use a web based reader. I'mn pretty sure there
are other people with the same problem
 
Q

qglyirnyfgfo

In my office they have blocked the 110 port. so I cannot use a
newsreader. I have to use a web based reader. I'mn pretty sure there
are other people with the same problem

Same here, it’s a miracle that my computer actually works with all the
blocking that they have added on the company I work for.

I keep hoping someone would invent a newsreader that works through
http and have the infrastructure on the background to make it all
work.

Oh well, in the mean time use Google which happens to have the one
feature I love the most….. requiring to logon to use the newsgroups…..
why??? I have no frigging idea, but I am sure spammers love it.
 
T

Tom Dacon

Peter Duniho said:
On Mon, 17 Nov 2008 14:17:52 -0800, Ignacio Machin ( .NET/ C# MVP )

At the very least, if you're going to use Google Groups, be prepared to
have your posts occasionally mangled as yours was here. :)

If not blocked altogether. On other ng's I frequent, some of the regulars
killfile everything from Google Groups.

Tom
 
T

Tom Dacon

message

You are right :) I did responded to Tom's post originally.
TOM: My appologies it was intended to the OP.

No problemo, amigo. This has all turned into much ado about nothing ;-)

Tom
 
J

Jeff Johnson

message
In my office they have blocked the 110 port. so I cannot use a
newsreader. I have to use a web based reader. I'mn pretty sure there
are other people with the same problem

You have my sympathy. (And by the way, 110 = POP3. 119 = NNTP, as Peter
pointed out.)
 
R

raylopez99

It appeared to me - an independent observer here - that it was intended for
the OP. The OP here has a long history of posting his observations, musings,
opinions, and rants - most of which serve no purpose other than to showcase
his lack of knowledge and comprehension of .NET development (or software
development in general) coupled with his need to share his glib thoughts
(just like how he kicked off the current thread).

FU. Kill file me then bozo.

Rl
 
R

raylopez99


Thanks for these links. Actually just playing around with Albahari's
chapter on Regex is making me a sort of expert..., but there's stuff
like a string "-10,23x,10,5" that gives me problems. I can extract
the numbers {10,23,10,5} but then have to traverse the string and
using 'positive lookbehind' find whether there was a "-" sign before
the number, making it minus ten for example. So, like I say, you end
up having to go through another Regex expression and/or a char array
to find the final answer.

No big deal, since I think if you tried to combine everything into one
Regex example it would be too confusing anyway.

RL
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top