Search for multiple things in a string

tshad · Sep 19, 2005

Oliver Sturm said:
But in any proper real-world use case of regular expressions, there won't
be an expression saying "somewhere" to start with. If the pattern string
doesn't show any trace of wildcards or other recognizable regular
expression features, it should be safe to assume that regular expressions
aren't being used. If a string in some source code I don't know shows
signs of being a match pattern and there's nothing else that tells me
whether it's a regular expression or not, I'll have to look and find it
out, there's no way around that. To be safe in assuming that no string
could ever be a regular expression, regardless of whether it looks like
it, you would have to forbid them completely in your team at least.

In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions. I don't know whether the
actual code that the OP is writing might justify regexes better. Anyway, I
was merely using the case to demonstrate the fact that regular expressions
don't have a readability problem, IMHO, or at least they don't need to
have one if used properly.

I also feel that Regular Expressions, being an object in asp.net (not
necessarily C#) makes it just as valid as C#.

As far as readability, it has nothing to do with Regular Expressions whether
it is readable or not, as Oliver mentions, but how you write it.

You can also make some pretty unreadable C# code as well. Readability is a
function of the programmer not the language (in most cases). As was also
mentioned you also need to know the language. For someone not used to
objects, abstract objects and interfaces are also hard to read.

I like seeing different options and make a choice. Sometimes I may use
something like Regex just so I am used to using it, as long as the problem
warrants it.

You don't use it - you lose it.

Tom

Jon Skeet [C# MVP] · Sep 19, 2005

tshad said:
Escaping?

You've mentioned that as being a problem a couple of times.

What do you mean by this?

Are you talking about stopping if you find the first one matching?

No - I'm talking about finding things like "jon.skeet" in a string.
Using IndexOf, that's no problem - no characters are interpreted in a
"special" way by IndexOf.

Regular expressions, however, treat "." as "any character", so to find
an actual dot, you need to escape it with a backslash - and from a C#
point of view that means either doubling the backslash or using a
verbatim string literal, i.e.
"jon\\.skeet"
or
@"jon\.skeet"

tshad · Sep 19, 2005

Jon Skeet said:
No - I'm talking about finding things like "jon.skeet" in a string.
Using IndexOf, that's no problem - no characters are interpreted in a
"special" way by IndexOf.

Regular expressions, however, treat "." as "any character", so to find
an actual dot, you need to escape it with a backslash - and from a C#
point of view that means either doubling the backslash or using a
verbatim string literal, i.e.
"jon\\.skeet"
or
@"jon\.skeet"

Got ya.

I thought you were talking about escaping the function/call as you might in
a loop when you find what you are looking for.

Thanks,

Tom

Jon Skeet [C# MVP] · Sep 19, 2005

tshad said:
I also feel that Regular Expressions, being an object in asp.net (not
necessarily C#) makes it just as valid as C#.

Regular expressions have nothing to do with ASP.NET - they're a part of
"normal" .NET.

As far as readability, it has nothing to do with Regular Expressions whether
it is readable or not, as Oliver mentions, but how you write it.

No - I believe that searching for "jon.skeet" with IndexOf is clearer
than searching for "jon\\.skeet" or @"jon\.skeet". Which of them
contains just the information which is actually of concern, and which
contains information which is only present due to the technology used
to do the searching?

You can also make some pretty unreadable C# code as well.

Sure, but that's no reason to use regular expressions just to make
things worse.

Readability is a function of the programmer not the language (in most
cases).

Yes, but it's the programmer's decision how to approach things -
whether you do things the simple way or the complex way. You *could*
implement the string search by manually iterating over all the
characters in the string, perhaps even writing your own state machine
to do it. The code could be pretty readable considering what it's doing
- but it's *bound* to be more complex than using IndexOf.

As was also mentioned you also need to know the language. For someone
not used to objects, abstract objects and interfaces are also hard to
read.

Sure - but why introduce unnecessarily complexity? You're already
writing C#, so you'd better know C# - but why add regular expressions
into the mix when they're unnecessary?

I like seeing different options and make a choice. Sometimes I may use
something like Regex just so I am used to using it, as long as the problem
warrants it.

And that's the point - I don't think this problem *does* warrant it.

You don't use it - you lose it.

So do you add a database when you just need to do a hashtable lookup,
just in case you forget SQL? Do you use reflection to get at the value
of a property, just in case you forget how to use that? I hope not.

It's very important to use appropriate technology, rather than using it
for the sake of it. (It's one thing to experiment with technology for
the sake of it as a learning tool, but I wouldn't do it in production
code.)

tshad · Sep 20, 2005

Jon Skeet said:
Regular expressions have nothing to do with ASP.NET - they're a part of
"normal" .NET.

Actually, you're right.

But that was my point.

Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
So using Regex is not really like using another language (as C# is different
from VB.Net).

But the discussion was valid in you use the best tool for the situation.

No - I believe that searching for "jon.skeet" with IndexOf is clearer
than searching for "jon\\.skeet" or @"jon\.skeet".

That's maybe true. But it would be clear to someone used to using both C#
and Regex.

Also, you have the same problem when dealing with web pages or getting a
file from the disk. You still use the escape character there (and as you
say, is a little confusing) - but you still do it.

Which of them
contains just the information which is actually of concern, and which
contains information which is only present due to the technology used
to do the searching?

Sure, but that's no reason to use regular expressions just to make
things worse.

I agree with you that readability is important.

It used to be that people didn't like C and C++ for exactly the same reason
you point out. The code was not as clear as COBOL or Basic and that was the
complaint back then. I happened to be a Fortran programmer at that time and
was not interested to moving to C for that reason (not that Fortran was
better - readability wise).

The problem with C back that was that even though much of the code was
really cryptic. But it didn't have to be, that was just how people coded
back then. Mainly, it was important to make the most efficient code
possible because of the limited computing power and efficient rarely equates
to readable. And I am not even talking about compiling and linking and all
the options and cryptic command lines.

Yes, but it's the programmer's decision how to approach things -
whether you do things the simple way or the complex way. You *could*
implement the string search by manually iterating over all the
characters in the string, perhaps even writing your own state machine
to do it. The code could be pretty readable considering what it's doing
- but it's *bound* to be more complex than using IndexOf.

I agree.

Just because you can - doesn't mean you should.

Sure - but why introduce unnecessarily complexity? You're already
writing C#, so you'd better know C# - but why add regular expressions
into the mix when they're unnecessary?

But if you know both and as I (and you) mentioned regex is part of .net as
is C# - so it is already in the mix. But you're right, don't introduce any
more complexity that necessary. But if it's 6 of one ... it's really up to
the programmer. In the original case, that was what it was. You can't tell
me that you feel that the solution suggested for this case was even close to
being unreadable (if you are even a stones throw from understanding Regular
Expressions).

I personally feel that both solutions are equally usable and readable (in
this situation).

I have also seen times when I just couldn't find an easy solution in C# or
VB and it was fairly easy in Regex.

I myself would usually opt for the C# or VB solutions first, but would have
no problem using Regex. As a matter of fact, I use Regex to strip commas
and $ from my textbox fields before writing it to SQL as it was the best
solution I could find. Such as:

SalaryMax.Text =
String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))

At the time, I couldn't seem to find as simple a solution as this in VB.Net
so I use this (not saying there isn't one).

And that's the point - I don't think this problem *does* warrant it.

I agree that is isn't necessary here, but I don't think it is warranted or
unwarranted here. I think it's just as readable either way.

So do you add a database when you just need to do a hashtable lookup,
just in case you forget SQL? Do you use reflection to get at the value
of a property, just in case you forget how to use that? I hope not.

Of course not. But as was mentioned there are times where Regex may be a
good solution and if you can do it either way, why not.

It's very important to use appropriate technology, rather than using it
for the sake of it. (It's one thing to experiment with technology for
the sake of it as a learning tool, but I wouldn't do it in production
code.)

Right. But Regex is not inappropriate technology. As you said, trying to
loop through each character when there is an easier way is a bit much.

But Regex is valid and is an appropriate method for handling strings and if
you are as comfortable with one as the other than it isn't inappropriate.
It's all in how you use it. And I was not saying experiment with it. I was
saying using it for the sake of staying familier with it. I don't want to
need to use it and have to figure it out when I need to use it.

As you said. Use the appropriate tool. If the appropriate tool is Regex,
it is going to be d... inconvenient to need it and not know how to use it.

Now I am not saying go out and learn every tool out there. But if it is a
valid tool in your particular environment, and it is available - why would
you not avail yourself of it?

Tom

Jon Skeet [C# MVP] · Sep 20, 2005

tshad said:
Actually, you're right.

But that was my point.

Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
So using Regex is not really like using another language (as C# is different
from VB.Net).

It is - the regular expression *language* is a different language to
C#, in the same way that XPath is. That's why under "regular
expressions" in MSDN, there's a "language elements" section.

But the discussion was valid in you use the best tool for the situation.
Indeed.

That's maybe true. But it would be clear to someone used to using both C#
and Regex.

But not as instantly clear, I believe. Can you really say that you find
the regex version doesn't take you *any* longer to understand than the
non-regex version?

Also, you have the same problem when dealing with web pages or getting a
file from the disk. You still use the escape character there (and as you
say, is a little confusing) - but you still do it.

You have to know the C# escaping, but not the regular expression
escaping.

I agree with you that readability is important.

It used to be that people didn't like C and C++ for exactly the same reason
you point out. The code was not as clear as COBOL or Basic and that was the
complaint back then. I happened to be a Fortran programmer at that time and
was not interested to moving to C for that reason (not that Fortran was
better - readability wise).

The problem with C back that was that even though much of the code was
really cryptic. But it didn't have to be, that was just how people coded
back then. Mainly, it was important to make the most efficient code
possible because of the limited computing power and efficient rarely equates
to readable. And I am not even talking about compiling and linking and all
the options and cryptic command lines.

To me, a lot of readability comes from decent naming and commenting,
which fortunately are available in pretty much any language. I'd
certainly agree that object orientation (and exceptions, automatic
memory management etc) makes it a lot easier to write readable code
though.

I agree.

Just because you can - doesn't mean you should.
Exactly.

But if you know both and as I (and you) mentioned regex is part of .net as
is C# - so it is already in the mix.

No, it's not. It's not already used in every single C# program, any
more than SQL is.

But you're right, don't introduce any
more complexity that necessary. But if it's 6 of one ... it's really up to
the programmer.

In what way is it 6 of one or half a dozen of the other when one
solution requires knowing more than the other? I would expect *any* C#
programmer to know what String.IndexOf does. I wouldn't expect all C#
programmers to know by heart which regex language elements require
escaping - and if you don't know that off the top of your head, then
changing the code to search for a different string involves an extra
bit of brainpower.

In the original case, that was what it was. You can't tell
me that you feel that the solution suggested for this case was even close to
being unreadable (if you are even a stones throw from understanding Regular
Expressions).

It was *less* readable though - and would have been *significantly*
less readable if the string being searched for had included dots,
brackets etc.

I personally feel that both solutions are equally usable and readable (in
this situation).

I suspect not all programmers would though. Don't forget that the
person who writes the code is very often not the one to maintain it.
Can you guarantee that *everyone* who touches the code will find
regexes as readable as String.IndexOf?

I have also seen times when I just couldn't find an easy solution in C# or
VB and it was fairly easy in Regex.

Which is why I've said repeatedly that I'm not trying to suggest that
regexes are bad, or should never be used. I'm just saying that in this
case it's using a sledgehammer to crack a nut.

I myself would usually opt for the C# or VB solutions first, but would have
no problem using Regex. As a matter of fact, I use Regex to strip commas
and $ from my textbox fields before writing it to SQL as it was the best
solution I could find. Such as:

SalaryMax.Text =
String.Format("{0:c}",CalculateYearly(Regex.Replace(WagesMax.Text,"\$|\,","")))

At the time, I couldn't seem to find as simple a solution as this in VB.Net
so I use this (not saying there isn't one).

And of course there is:
SalaryMax.Text =
String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$", "")
.Replace(",", ""));

I know which version I'd rather read...

I agree that is isn't necessary here, but I don't think it is warranted or
unwarranted here. I think it's just as readable either way.

But I suspect you're more used to regular expressions than many other
programmers - and making the code less readable for other programmers
for no benefit is what makes it unwarranted here, even in the simple
case where there's nothing to escape.

Of course not. But as was mentioned there are times where Regex may be a
good solution and if you can do it either way, why not.

Because it's more complicated! You can't deny that there's more to
consider due to the escaping. There's more to know, more to consider,
and it doesn't get the job done any more cleanly.

Right. But Regex is not inappropriate technology. As you said, trying to
loop through each character when there is an easier way is a bit much.

As is using the power of regular expressions when there is an easier
way - using IndexOf, which is *precisely* there to find one string
within another.

But Regex is valid and is an appropriate method for handling strings and if
you are as comfortable with one as the other than it isn't inappropriate.
It's all in how you use it. And I was not saying experiment with it. I was
saying using it for the sake of staying familier with it. I don't want to
need to use it and have to figure it out when I need to use it.

Do you really think it would take you that long to refamiliarise
yourself with it? I don't see why it's a good idea to make some poor
maintenance engineer who hasn't used regular expressions before try to
figure out that *actually* you were just trying to find strings within
each other just so you can keep your skill set current.

As you said. Use the appropriate tool. If the appropriate tool is Regex,
it is going to be d... inconvenient to need it and not know how to use it.

I've never had a problem with reading the documentation when I've
needed to use regular expressions, without putting it in projects in
places where I *don't* need it.

Now I am not saying go out and learn every tool out there. But if it is a
valid tool in your particular environment, and it is available - why would
you not avail yourself of it?

Because it makes things more complicated for no benefit. The reflection
example was a good one - that allows you to get a property value, so do
you think it's a good idea to write:

string x = (string) something.GetType()
.GetProperty("Name")
.GetValue(something, null);
or

string x = something.Name;

?

Maybe I should use the latter. After all, I wouldn't want to forget how
to use reflection, would I?

tshad · Sep 20, 2005

Jon Skeet said:
It is - the regular expression *language* is a different language to
C#, in the same way that XPath is. That's why under "regular
expressions" in MSDN, there's a "language elements" section.

I think calling it a language is a stretch, although I know it is called a
language in places(it's all in what you define as a language). It really is
a text/string processor, as is: IndexOf, Substring, Right, Replace etc used
by various languages.

You don't build pages with it. It isn't procedural. It is a tool used by
the other languages. You don't use VB.Net in C# or Vice versa but both use
Regular expressions (as the both use Substring, Replace etc).

But not as instantly clear, I believe. Can you really say that you find
the regex version doesn't take you *any* longer to understand than the
non-regex version?

Depends on the C# code as well as the Regex code.

Again, are we talking about the best tool for the job or the most
readability. As was mentioned before, you set up loops and temporary
variables to do what you can do in a simple Regular Expression.

Again, I am not pushing Regular Expressions here, just that they are just a
valid as C# (or VB.Net) string handlers.

I do use them when convenient.

For example, I was creating a simple text search engine and wanted to modify
what the user put in and found it simpler to do the following than in VB or
C:

' The following replaces all multiple blanks with " ". It then takes
' out the anomalies, such as "and not and" and replaces them with "and"

keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
keywords = Regex.Replace(keywords, "( )", " or ")
keywords = Regex.Replace(keywords," or or "," ")
keywords = Regex.Replace(keywords,"or and or","and")
keywords = Regex.Replace(keywords,"or near or","near")
keywords = Regex.Replace(keywords,"and not or","and not")

Fairly straight forward and easy to follow.

You have to know the C# escaping, but not the regular expression
escaping.

But you do NEED to know the C# escaping (readability not high - unless you
understand it).

To me, a lot of readability comes from decent naming and commenting,
which fortunately are available in pretty much any language. I'd
certainly agree that object orientation (and exceptions, automatic
memory management etc) makes it a lot easier to write readable code
though.

But writing objects and the objects themselves are not easily readable. But
you would advocate not writing them, would you?

No, it's not. It's not already used in every single C# program, any
more than SQL is.

Nor are all the objects you use.

But if you are using .Net, it is part of the mix.

In what way is it 6 of one or half a dozen of the other when one
solution requires knowing more than the other? I would expect *any* C#
programmer to know what String.IndexOf does. I wouldn't expect all C#
programmers to know by heart which regex language elements require
escaping - and if you don't know that off the top of your head, then
changing the code to search for a different string involves an extra
bit of brainpower.

Why? Ever heard of references or cheat sheets? And what is wrong with a
little extra brainpower - if you don't use it, you lose it

I don't know all of the possible combinations of calls to every Object, but
that doesn't preclude me from using them.

My position has always been, don't memorize. You will remember what you
use. But if you know how to get it (where to look), then you have
everything you need.

I happen to use .Net. Regex is part of .Net. I would be limiting myself if
I didn't use Regex in places where it is appropriate. If I happen to know a
good way in Regex to solve a problem, I am not going use *extra brainpower*
to try to solve the problem in C#.

It was *less* readable though - and would have been *significantly*
less readable if the string being searched for had included dots,
brackets etc.

But it didn't. But if it did, it is no different than having to deal with
escapes in C (less readable)

If you are talking about

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

vs

if (Regex.IsMatch(myString, @"something1|something2|something3"))

If you know absolutely nothing about Regular expressions, I would agree that
this is less readable.

But I would also contend that IndexOf could be just as confusing. What is
the first 0 for? What about the 2nd? It is readable because you know C.

I would maintain that if even if you knew nothing about Regex, you would
assume that you are doing a Match (can't tell that from the word "IndexOf")
and it probably has something to do with the words "something1",
"something2" and "something3". Now if you know C than I would assume you
would pick up that "|" is "or" (not so clear to a VB programmer). And that
would be to someone not familier with regular expressions doing a quick
perusal

So I am at a loss as to how this regular expression is more unreadable than
the C# counterpart. That is not to say that you couldn't make it more
unreadable - but you could do the same with C# if you wanted to.

I suspect not all programmers would though. Don't forget that the
person who writes the code is very often not the one to maintain it.
Can you guarantee that *everyone* who touches the code will find
regexes as readable as String.IndexOf?

As was said, you can make readable and unreadable C or Regex code. Are you
going to tell your programmers they "cannot" use Regex for the same reason?

Are you going to leave out some objects that programmers may not be familier
with?

Which is why I've said repeatedly that I'm not trying to suggest that
regexes are bad, or should never be used. I'm just saying that in this
case it's using a sledgehammer to crack a nut.

And I don't in this case, as I think I've shown. Less typing, easy to read,
straight forward - in this case.

And of course there is:
SalaryMax.Text =
String.Format ("{0:c}",CalculateYearly(WagesMax.Text.Replace("$", "")
.Replace(",", ""));

I know which version I'd rather read...

I can read either (although, I didn't know you could string multiple
"Replace"s together).

But I suspect you're more used to regular expressions than many other
programmers - and making the code less readable for other programmers
for no benefit is what makes it unwarranted here, even in the simple
case where there's nothing to escape.

First of all, I am not. I don't use it much at all, but I find it easy to
figure out and staight forward (but you can make it really complex). I use
it to validate phone numbers, credit card numbers, zip codes etc. Which are
very well documented and when there are a myiad of ways a user can put input
these types of data, I prefer to use Regular expressions which are all over
the place (easy to find) then try to come put with some complex set of loops
and temporary variables which make it far easier to make a mistake and much
more unreadable the the Regex equivelant.

Because it's more complicated! You can't deny that there's more to
consider due to the escaping. There's more to know, more to consider,
and it doesn't get the job done any more cleanly.

Escaping seems to be your main compaint with it.

I have the same problem with C or VB when trying to remember when to use "\"
vs "/" in paths or do I need to add "\" in front of my slash or quote.
These are inherent problems with pretty much all of them.

As is using the power of regular expressions when there is an easier
way - using IndexOf, which is *precisely* there to find one string
within another.

I am not discounting IndexOf, I am just saying that both work fine and are
just as readable (in this case). In other cases, that may not be the case
(with either C or Regex).

Do you really think it would take you that long to refamiliarise
yourself with it? I don't see why it's a good idea to make some poor
maintenance engineer who hasn't used regular expressions before try to
figure out that *actually* you were just trying to find strings within
each other just so you can keep your skill set current.

So you would prefer to code to the lowest common denominator.

I am not going to code to the level of a junior programmer. I prefer that
he learn to code to a higher level.

I am not saying that that you still should write decent, readable, commented
code. But I am not going to limit myself because another programmer may not
be able to read well written code. If that were the case, I would not be
writing objects (abstract classes, interfaces, etc).

I've never had a problem with reading the documentation when I've
needed to use regular expressions, without putting it in projects in
places where I *don't* need it.

"Need" is a personal question. I don't thing it applies here. You prefer
IndexOf and I might prefer IsMatch.

Because it makes things more complicated for no benefit. The reflection
example was a good one - that allows you to get a property value, so do
you think it's a good idea to write:

string x = (string) something.GetType()
.GetProperty("Name")
.GetValue(something, null);
or

string x = something.Name;

?

Maybe I should use the latter. After all, I wouldn't want to forget how
to use reflection, would I?

Lost me on that one.

Tom

Jon Skeet [C# MVP] · Sep 20, 2005

tshad said:
I think calling it a language is a stretch, although I know it is called a
language in places(it's all in what you define as a language).

In plenty of places. It has a language with a defined syntax etc.

It really is
a text/string processor, as is: IndexOf, Substring, Right, Replace etc used
by various languages.

You don't build pages with it. It isn't procedural.

Neither of those are required for it to be a language.

It is a tool used by the other languages.

Sure - so is XPath, but that's a language too.
(See http://www.w3.org/TR/xpath)

You don't use VB.Net in C# or Vice versa but both use
Regular expressions (as the both use Substring, Replace etc).

None of those state that regular expressions aren't a language.

Depends on the C# code as well as the Regex code.

The C# code in question would be:

if (someVariable.IndexOf ("firstliteral") != -1 ||
someVariable.IndexOf ("secondliteral") != -1 ||
someVariable.IndexOf ("thirdliteral") != -1)

If I did it regularly, I'd write a short method which took a params
string array.

Again, are we talking about the best tool for the job or the most
readability.

Unless there's another compelling argument in favour of one tool or
another, readability is a very important part of choosing the best
tool.

As was mentioned before, you set up loops and temporary
variables to do what you can do in a simple Regular Expression.

Again, I am not pushing Regular Expressions here, just that they are just a
valid as C# (or VB.Net) string handlers.

But you're effectively pushing them in the situation described by the
OP when you say that the solution using regular expressions is as
readable as the solution without.

I do use them when convenient.

For example, I was creating a simple text search engine and wanted to modify
what the user put in and found it simpler to do the following than in VB or
C:

' The following replaces all multiple blanks with " ". It then takes
' out the anomalies, such as "and not and" and replaces them with "and"

keywords = trim(Regex.Replace(keywords, "\s{2,}", " "))
keywords = Regex.Replace(keywords, "( )", " or ")
keywords = Regex.Replace(keywords," or or "," ")
keywords = Regex.Replace(keywords,"or and or","and")
keywords = Regex.Replace(keywords,"or near or","near")
keywords = Regex.Replace(keywords,"and not or","and not")

Fairly straight forward and easy to follow.

Reasonably, although apart from the first regex, I'd suggest doing the
rest with straight calls to String.Replace. As an example of why I
think that would be more readable, what exactly do the second line do?
In some flavours of regular expressions, brackets form capturing
groups. Do they in .NET? I'd have to look it up. If it's really just
trying to replace the string "( )" with " or ", a call to
String.Replace would mean I didn't need to look anything up.

But you do NEED to know the C# escaping (readability not high - unless you
understand it).

Yes, but I *already* need to know that in order to write C#. Choosing
to use String.IndexOf doesn't add to what I need to remember - choosing
regular expressions does. In addition, there aren't many things which
need escaping compared with those which need escaping in regular
expressions. In addition to *that*, whenever you need to escape in
regular expressions, you also need to escape in C# (or remember to use
verbatim string literals) - yet another piece of headache.

But writing objects and the objects themselves are not easily readable. But
you would advocate not writing them, would you?

No, but I don't see how that's relevant.

Nor are all the objects you use.

But if you are using .Net, it is part of the mix.

It's not necessarily part of the mix I have to use. I suspect *very*
few programs don't do any string manipulation - knowing the string
methods well is *far* more fundamental to .NET programming than knowing
regular expressions.

Why? Ever heard of references or cheat sheets? And what is wrong with a
little extra brainpower - if you don't use it, you lose it

If you truly think that given two solutions which are otherwise equal,
the solution which is easiest to write, read and maintain doesn't win
hands down, we'll definitely never agree.

If you want to keep your hand in with respect to regular expressions,
do it in a test project, or with a regular expressions workbench. Keep
it out of code which needs to be read and maintained, probably by other
people who don't want to waste time because you wanted to keep your
skill set up to date.

I don't know all of the possible combinations of calls to every Object, but
that doesn't preclude me from using them.

Exactly - and you wouldn't go out of your way to use methods you don't
need, just to get into the habit of using them, would you?

My position has always been, don't memorize. You will remember what you
use. But if you know how to get it (where to look), then you have
everything you need.

Absolutely - so why are you so keen on making people either memorise or
look up the characters which need escaping for regular expressions
every time they read or modify your code?

I happen to use .Net. Regex is part of .Net. I would be limiting myself if
I didn't use Regex in places where it is appropriate.

I seem to be having difficulty making myself clear on this point: I
have never stated and will never state that you shouldn't use regular
expressions where they're appropriate. But they are *not* appropriate
in this case, as they are a more complex and less readable way of
solving the problem.

Show me a problem where the regex way of solving it is simpler than
using simple string operations (and there are plenty of problems like
that) and I'll plump for the regex in a heartbeat.

If I happen to know a good way in Regex to solve a problem, I am not
going use *extra brainpower* to try to solve the problem in C#.

In what way is using the method which is designed for *precisely* the
task in hand (finding something in a string) using extra brainpower? If
you're not familiar with String.IndexOf, you've got *much* bigger
things to worry about than whether or not your regular expression
skills are getting rusty.

But it didn't. But if it did, it is no different than having to deal with
escapes in C (less readable)

If you are talking about

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

vs

if (Regex.IsMatch(myString, @"something1|something2|something3"))

If you know absolutely nothing about Regular expressions, I would agree that
this is less readable.

But I would also contend that IndexOf could be just as confusing. What is
the first 0 for? What about the 2nd? It is readable because you know C.

Well, for a start the 0s aren't necessary, and I wouldn't include them.

I would maintain that if even if you knew nothing about Regex, you would
assume that you are doing a Match (can't tell that from the word "IndexOf")
and it probably has something to do with the words "something1",
"something2" and "something3". Now if you know C than I would assume you
would pick up that "|" is "or" (not so clear to a VB programmer). And that
would be to someone not familier with regular expressions doing a quick
perusal

Okay - now suppose I need to change it from searching for "something1"
to "something.1" or "something[1]". How long does it take to change in
each version? How easy is it to read afterwards?

So I am at a loss as to how this regular expression is more unreadable than
the C# counterpart. That is not to say that you couldn't make it more
unreadable - but you could do the same with C# if you wanted to.

You could start by making the C# more readable, as I've shown...

However, the regex is already less readable:
1) It's got "|" as a "magic character" in there.
2) It's got all the strings concatenated, so it's harder to spot each
of them separately.

And that's before you need to actually *maintain* the code.

Furthermore, suppose you didn't just want to search for literals -
suppose one of the strings you wanted to search for was contained in a
variable. How sure are you that *no-one* on your team would use:

x+"|something2|something3"

as the regular expression?

As was said, you can make readable and unreadable C or Regex code. Are you
going to tell your programmers they "cannot" use Regex for the same reason?

I would tell programmers on my team not to use regular expressions
where the alternative is simpler and more readbale, yes.

Are you going to leave out some objects that programmers may not be familier
with?

Absolutely, where there are simpler and more familiar ways of solving
the same problem.

And I don't in this case, as I think I've shown. Less typing, easy to read,
straight forward - in this case.

You've shown nothing of the kind - whereas I think I've given plenty of
examples of how using regular expressions make the code less easily
maintainable, even if you consider it equally readable to start with
(which I don't).

I can read either (although, I didn't know you could string multiple
"Replace"s together).

Yes, I can read either too. The point is that in reading my version, I
didn't need to wade through various special characters, understanding
exactly what was there for. Of course, your version wasn't even valid
C#, as it didn't escape the backslashes and you didn't specify a
verbatim literal. I assume it was originally VB.NET. I wonder which
version would be easier to convert to valid C#? Mine, perhaps?

First of all, I am not. I don't use it much at all, but I find it easy to
figure out and staight forward (but you can make it really complex). I use
it to validate phone numbers, credit card numbers, zip codes etc.

And in all of those cases, regular expressions are really useful.

Which are very well documented and when there are a myiad of ways a
user can put input these types of data, I prefer to use Regular
expressions which are all over the place (easy to find) then try to
come put with some complex set of loops and temporary variables which
make it far easier to make a mistake and much more unreadable the the
Regex equivelant.

Where exactly are the complex loops and temporary variables in this
specific case? After all, you have been arguing for using regular
expressions in *this specific case*, haven't you?

Escaping seems to be your main compaint with it.

It's the main potential source of problems, yes. It's a potential
source of problems which simply doesn't exist when you use
String.IndexOf.

I have the same problem with C or VB when trying to remember when to use "\"
vs "/" in paths or do I need to add "\" in front of my slash or quote.
These are inherent problems with pretty much all of them.

You already need to know that when writing C# though - my use of
String.IndexOf doesn't add to the volume of knowledge required.

I am not discounting IndexOf, I am just saying that both work fine and are
just as readable (in this case). In other cases, that may not be the case
(with either C or Regex).

Just because they're as readable *to you* doesn't mean they're as
readable to everyone. How sure are you that the next engineer to read
this code will be familiar with regular expressions? How sure are you
that when you need to change it to look for a different string, you'll
check whether any of the characters need to be escaped? Why would you
even want to force that check on yourself?

So you would prefer to code to the lowest common denominator.

When there's no good reason not to, absolutely.

I am not going to code to the level of a junior programmer. I prefer that
he learn to code to a higher level.

Learning to solve problems as simply as possible *is* learning to code
to a higher level.

I am not saying that that you still should write decent, readable, commented
code. But I am not going to limit myself because another programmer may not
be able to read well written code. If that were the case, I would not be
writing objects (abstract classes, interfaces, etc).

If it's not the simplest code for the situation, it's not well written
IMO. If it introduces risk for no reward (the risk of maintenance
failing to notice that they might need to escape something, versus no
reward) then it's not well written.

"Need" is a personal question. I don't thing it applies here. You prefer
IndexOf and I might prefer IsMatch.

I bet if I showed my code to a random sample of a hundred C# developers
and asked them to change it to search for "hello[there]", virtually all
of them would get it right. I also bet that if I showed your code to
them and asked them for the same change, some would fail to escape it
appropriately. Do you disagree?

Lost me on that one.

Both are ways of finding the value of a property. The first is harder
to maintain and harder to read, just like your use of regular
expressions in this instance. Now, which of the above snippets of code
would you use, and why?

tshad · Sep 20, 2005

Jon Skeet said:
In plenty of places. It has a language with a defined syntax etc.

Yes, but so are dolphin sounds.

When I talk about a Programming Language - I am talking about a Procedural
Language (C, Fortran, VB, Pascal, etc.).

Neither of those are required for it to be a language.

Sure - so is XPath, but that's a language too.
(See http://www.w3.org/TR/xpath)

None of those state that regular expressions aren't a language.

The C# code in question would be:

if (someVariable.IndexOf ("firstliteral") != -1 ||
someVariable.IndexOf ("secondliteral") != -1 ||
someVariable.IndexOf ("thirdliteral") != -1)

And the Regex version:

if (Regex.IsMatch(myString, @"something1|something2|something3"))

If I did it regularly, I'd write a short method which took a params
string array.

Unless there's another compelling argument in favour of one tool or
another, readability is a very important part of choosing the best
tool.

Again, why do I need a compelling reason. If I have the solution and it
happens to be Regex, I would use it, I wouldn't necessarily say to myself -
"Is there perhaps a more readable way to write this? I wonder if Jim will
be able to read this or not."

But you're effectively pushing them in the situation described by the
OP when you say that the solution using regular expressions is as
readable as the solution without.

No.

No pushing. No more than your pushing not using it.

Reasonably, although apart from the first regex, I'd suggest doing the
rest with straight calls to String.Replace. As an example of why I
think that would be more readable, what exactly do the second line do?

Actually, nothing. It is grouping a " ", which isn't necessary. I think I
used to have something else there and took it out and didn't realize I
didn't need the ().

In some flavours of regular expressions, brackets form capturing
groups. Do they in .NET? I'd have to look it up. If it's really just
trying to replace the string "( )" with " or ", a call to
String.Replace would mean I didn't need to look anything up.

Obviously, you didn't need to look this one up either - as you were correct.
It is just grouping a blank.

Yes, but I *already* need to know that in order to write C#. Choosing
to use String.IndexOf doesn't add to what I need to remember - choosing
regular expressions does. In addition, there aren't many things which
need escaping compared with those which need escaping in regular
expressions. In addition to *that*, whenever you need to escape in
regular expressions, you also need to escape in C# (or remember to use
verbatim string literals) - yet another piece of headache.

No, but I don't see how that's relevant.

Just that you don't want to Regex as it is not easily readable. Neither are
Regex.

But the fact a junior programmer might not understand Objects as you do
would not prevent you from writing them, would you?

It's not necessarily part of the mix I have to use.

You don't have to use lots of things. That doesn't make them invalid.
Neither is the fact that you use Foreach vs For {}. They are there and are
part of the mix as is Regex. I might agree with you more if Regex were some
component that you picked up and added. Or if Regex were some obscure
technique that few knew about. They have been around for quite a long time
and is just another gun in your arsenal. If I thought that MS were
deprecating it, I would also think twice about using it. But it is part of
..Net that all the languages can make use of and I would never tell a
programmer, who may be really comfortable with it and uses it responsibly
(not obscure cryptic non-commented code), that he should be using IndexOf
instead.

I suspect *very*
few programs don't do any string manipulation - knowing the string
methods well is *far* more fundamental to .NET programming than knowing
regular expressions.

I agree with part of that and think that regular expressions are just as
important to know. As we have been saying, it is here and many people use
it, so to not understand it is to limit yourself. You don't have to use it,
but you should at least understand the basics of how it works. What are you
going to do when someone uses a RegularExpressionValidator and you don't
understand what the expression is? The fact that it is not C# (neither is a
textbox, datagrid, etc), doesn't mean you should understand them. Whether
you use them is up to you.

As you point out, you are not the only programmer and many programmers like
to use Regex and that doesn't make them any lesser programmers. What are
you going to when you run into their code?

I see code all the time (much of the time it is mine) and wonder why the
programmer didn't do it another way. There are many ways to skin a cat.
Sometimes it is just style, sometimes it is all they know. But if they
follow whatever standards are setup (and in your case maybe you forbid
Regex) then as long as the code is well written and clean - I have no
problem with it.

If you truly think that given two solutions which are otherwise equal,
the solution which is easiest to write, read and maintain doesn't win
hands down, we'll definitely never agree.

I agree there.

Which is easier to write is obviously your perception. I found my example,
as easy as yours to write and just as readable.

If you want to keep your hand in with respect to regular expressions,
do it in a test project, or with a regular expressions workbench. Keep
it out of code which needs to be read and maintained, probably by other
people who don't want to waste time because you wanted to keep your
skill set up to date.

Keep regular expressions out of my code?????

So now you are saying there is no use for it?

Exactly - and you wouldn't go out of your way to use methods you don't
need, just to get into the habit of using them, would you?

Sure.

If it is valid. As I said there are many ways to skin ..., depending on the
situation I may do it one way and the next time another way. Gives me many
options. I don't do it willy nilly, as you seem to suggest, as a test
bench.

Absolutely - so why are you so keen on making people either memorise or
look up the characters which need escaping for regular expressions
every time they read or modify your code?

I am not. I don't memorize. But I still use it.

I seem to be having difficulty making myself clear on this point: I
have never stated and will never state that you shouldn't use regular
expressions where they're appropriate. But they are *not* appropriate
in this case, as they are a more complex and less readable way of
solving the problem.

No you are very clear. If you are so concerned with others being able to
read your code and problems with escape characters - why would you EVER want
them to use them. You can't have it both ways.

If they would have a hard time with a nothing expression like "if
(Regex.IsMatch(myString, @"something1|something2|something3"))" - they are
never going to get some of the of the other standard Regex solutions I
mentioned before.

As you said, the two solutions are equal. Your solution is that you MUST go
with IndexOf. Mine is you can use either.

Show me a problem where the regex way of solving it is simpler than
using simple string operations (and there are plenty of problems like
that) and I'll plump for the regex in a heartbeat.

In what way is using the method which is designed for *precisely* the
task in hand (finding something in a string) using extra brainpower?

I wasn't referring to this particular issue when I said this.

If
you're not familiar with String.IndexOf, you've got *much* bigger
things to worry about than whether or not your regular expression
skills are getting rusty.

I never said I was not familier with IndexOf.

As a matter of fact, the original question was given whether you could "do a
search for more that one string in another string".

****************************************************************
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","something3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}
***************************************************************************
IndexOf doesn't do it. This was the original question. You have to do
multiple calls as is said in the original question. Nicholas was correct in
his assessment. One Regex call would work.

Well, for a start the 0s aren't necessary, and I wouldn't include them.

You're right.

I would maintain that if even if you knew nothing about Regex, you would
assume that you are doing a Match (can't tell that from the word
"IndexOf")
and it probably has something to do with the words "something1",
"something2" and "something3". Now if you know C than I would assume you
would pick up that "|" is "or" (not so clear to a VB programmer). And
that
would be to someone not familier with regular expressions doing a quick
perusal

Click to expand...

Okay - now suppose I need to change it from searching for "something1"
to "something.1" or "something[1]". How long does it take to change in
each version? How easy is it to read afterwards?

That wasn't the question.

What if you wanted to change "something1" to "something\". Same problem.
And if escapes were a problem (if it were me) I would have a little sheet
that showed them at my desk within easy reach.

You could start by making the C# more readable, as I've shown...

As you can with Regular Expressions.

However, the regex is already less readable:
1) It's got "|" as a "magic character" in there.

| = or (same as C)

2) It's got all the strings concatenated, so it's harder to spot each
of them separately.

You are kidding, right?

And that's before you need to actually *maintain* the code.

Furthermore, suppose you didn't just want to search for literals -
suppose one of the strings you wanted to search for was contained in a
variable. How sure are you that *no-one* on your team would use:

x+"|something2|something3"

as the regular expression?

You are now leaving the original question. I never said that Regular
Expressions was the better (or not better) in all cases.

I would tell programmers on my team not to use regular expressions
where the alternative is simpler and more readbale, yes.

Why use them at all? It isn't readable.

And if your programmers can't maintain the simple Regexs, they definately
won't be able to handle the more complicated ones.

Absolutely, where there are simpler and more familiar ways of solving
the same problem.

You've shown nothing of the kind - whereas I think I've given plenty of
examples of how using regular expressions make the code less easily
maintainable, even if you consider it equally readable to start with
(which I don't).

Not in this specific case. I was never maintaining or pushing Regex for all
or any situations.

But I am not going to force my programmers to come to me to find out whether
or not Regex is the easiest way or not. That is up to the programmer. If
there is a problem with their code and feel the programmer is way off base
in his coding we would talk about (that would be the case with his C#, VB or
Regex code).

Yes, I can read either too. The point is that in reading my version, I
didn't need to wade through various special characters, understanding
exactly what was there for.

If you knew enough to know about Regex at all (which you said you would have
no problem with in some situations - so the programmers better be able to
read it), there should not be a problem with the 2 special characters which
is the same as C#. There is nothing obscure in this example - that I can
see.

Of course, your version wasn't even valid
C#, as it didn't escape the backslashes and you didn't specify a
verbatim literal. I assume it was originally VB.NET. I wonder which
version would be easier to convert to valid C#? Mine, perhaps?

Actually, it was VB.Net.

And in all of those cases, regular expressions are really useful.

But according to you, you shouldn't use them as some of the programmers may
not be able to maintain it. Definately if they would have a problem with
our example.

Can't have it both ways. If you allow Regular Expressions, you shouldn't
have a problem if a programmer used the Regex or IndexOf in our example.
Anyone maintaining the "USEFUL" ones would have zero problems with this one.

Where exactly are the complex loops and temporary variables in this
specific case? After all, you have been arguing for using regular
expressions in *this specific case*, haven't you?

I was obviously talking about Regular Expressions in general here as I was
refering to the standard ones you can get anywhere dealing with (Phone
numbers, credit card etc). There would be none in this case, obviously.
But there may be in more complicated cases.

It's the main potential source of problems, yes. It's a potential
source of problems which simply doesn't exist when you use
String.IndexOf.

You already need to know that when writing C# though - my use of
String.IndexOf doesn't add to the volume of knowledge required.

It is still an issue. Just as the Regular expressions are. And again, if
you are going to allow Regex at all, you would still need to know about the
escapes.

Just because they're as readable *to you* doesn't mean they're as
readable to everyone. How sure are you that the next engineer to read
this code will be familiar with regular expressions? How sure are you
that when you need to change it to look for a different string, you'll
check whether any of the characters need to be escaped? Why would you
even want to force that check on yourself?

Again - then don't allow them at all.

When there's no good reason not to, absolutely.

I guess that is where we disagree.

Learning to solve problems as simply as possible *is* learning to code
to a higher level.

No argument there.

If it's not the simplest code for the situation, it's not well written
IMO. If it introduces risk for no reward (the risk of maintenance
failing to notice that they might need to escape something, versus no
reward) then it's not well written.

I see no risk in the example we are talking about. At least, no more that
in the IndexOf solution (in this situation).

"Need" is a personal question. I don't thing it applies here. You
prefer
IndexOf and I might prefer IsMatch.

Click to expand...

I bet if I showed my code to a random sample of a hundred C# developers
and asked them to change it to search for "hello[there]", virtually all
of them would get it right. I also bet that if I showed your code to
them and asked them for the same change, some would fail to escape it
appropriately. Do you disagree?

No. But then the same developers would have a problem with the more
complicated expressions you claim is useful.

Both are ways of finding the value of a property. The first is harder
to maintain and harder to read, just like your use of regular
expressions in this instance. Now, which of the above snippets of code
would you use, and why?

Since I am not sure why you would use the first, I would do the 2nd.

But in our case, I would still use either - as I see the Regex version as
easy as the IndexOf.

Tom

Jon Skeet [C# MVP] · Sep 21, 2005

tshad said:
Yes, but so are dolphin sounds.

When I talk about a Programming Language - I am talking about a Procedural
Language (C, Fortran, VB, Pascal, etc.).

So you wouldn't regard LISP as a programming language, just because
it's functional rather than procedural?

Of course, you didn't even specify "programming language" before.

Regular expressions form a language in computing, and that language
needs to be learned before being used, just as any other language does,
whether it's C#, HTML, XPath or VB.NET.

And the Regex version:

if (Regex.IsMatch(myString, @"something1|something2|something3"))

Right. Immediately the IndexOf value is more readable, by more clearly
separating the three separate strings which are being searched on.
(Oliver Sturm's version is more readable than that

Again, why do I need a compelling reason. If I have the solution and it
happens to be Regex, I would use it, I wouldn't necessarily say to myself -
"Is there perhaps a more readable way to write this? I wonder if Jim will
be able to read this or not."

Then I'm afraid that's your problem. It sounds like you're basically
admitting that you're not that interested in readability. Personally, I
like writing code which is elegant but easy to maintain. Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.

Far more time is spent maintaining code than writing it in the first
place. Taking the attitude you take above just isn't cost-effective in
the long run.

No.

No pushing. No more than your pushing not using it.

But I'll readily admit to pushing the (IMO simpler) solution, for this
particular situation. So are you actually admitting that you *are*
pushing the use of regular expressions here?

Actually, nothing. It is grouping a " ", which isn't necessary. I think I
used to have something else there and took it out and didn't realize I
didn't need the ().

So again, the code could be made more readable even by just modifying
the existing regex replacement, let alone by replacing the regular
expressions with simple String.Replace calls. Had they been
String.Replace calls, the meaning of the second line would have been
unambiguous - you'd have had to write it the simple way to start with.

Note that your first replacement will replace two tabs with a single
space, but leave one tab alone, by the way. It would be better to
replace "\s+" with the space, IMO.

Obviously, you didn't need to look this one up either - as you were correct.
It is just grouping a blank.

I have had to look it up if you hadn't been answering the question
though. Why make the code harder to understand in the first place? If
you want to replace a space with " or ", just use
keywords = keywords.Replace (" ", " or ");
Much more straightforward.

Just that you don't want to Regex as it is not easily readable. Neither are
Regex.
Eh?

But the fact a junior programmer might not understand Objects as you do
would not prevent you from writing them, would you?

When using C#, one has to use objects. I will almost always try to
implement the simplest solution to a problem, unless there is a
compelling reason to use a more complex solution. That way, anyone
reading the code has to learn relatively little "extra" stuff beyond
the language itself.

You don't have to use lots of things. That doesn't make them invalid.
Neither is the fact that you use Foreach vs For {}. They are there and are
part of the mix as is Regex.

No, they really aren't. for and foreach are well-defined in the C#
language specification. If the program is in C# to start with, it is
reasonable to assume competency in C# on the part of the reader of the
code. It is *not* reasonable to assume competency in regular
expressions, and while that wouldn't prevent me from using regular
expressions where they provide value, they just *don't* here.

I might agree with you more if Regex were some
component that you picked up and added. Or if Regex were some obscure
technique that few knew about. They have been around for quite a long time
and is just another gun in your arsenal. If I thought that MS were
deprecating it, I would also think twice about using it. But it is part of
.Net that all the languages can make use of and I would never tell a
programmer, who may be really comfortable with it and uses it responsibly
(not obscure cryptic non-commented code), that he should be using IndexOf
instead.

Clearly not, as you seem to be keen on using them instead of simple
string manipulations all over the place - if I saw anyone using regular
expressions rather than String.Replace in the way you've shown in other
code posts, that code would never get through code review.

I agree with part of that and think that regular expressions are just as
important to know.

Why? I'm working on a fairly large project which hasn't needed to use
regular expressions and wouldn't have benefitted from them once. I
suspect many people could say the same thing. I suspect very few if any
of them could say the same thing about the basic string manipulation
methods - and yet you were surprised to see that one could call Replace
on the result of another Replace method call, which I'd consider a far
more "basic" level of understanding than knowledge of regular
expressions.

As we have been saying, it is here and many people use it, so to not
understand it is to limit yourself.

It's one thing to understand the general power of regular expressions,
so you would know when they may be applicable - it's another thing to
use them when they serve no purpose beyond what can be more simply
achieved with the simple String methods.

You don't have to use it, but you should at least understand the
basics of how it works. What are you going to do when someone uses a
RegularExpressionValidator and you don't understand what the
expression is?

At that point, if I didn't understand the regular expression, I'd look
it up in the documentation. Do you know every part of regular
expression syntax off by heart?

The fact that it is not C# (neither is a textbox, datagrid, etc),
doesn't mean you should understand them. Whether you use them is up
to you.

As you point out, you are not the only programmer and many programmers like
to use Regex and that doesn't make them any lesser programmers. What are
you going to when you run into their code?

If they're on my team, I'll tell them to refactor their code to only
use them when they're appropriate, frankly.

I see code all the time (much of the time it is mine) and wonder why the
programmer didn't do it another way. There are many ways to skin a cat.
Sometimes it is just style, sometimes it is all they know. But if they
follow whatever standards are setup (and in your case maybe you forbid
Regex) then as long as the code is well written and clean - I have no
problem with it.

If code uses regular expressions when they serve no purpose, it is
*not* well written and clean though - it is less maintainable than it
might be.

I agree there.

Which is easier to write is obviously your perception. I found my example,
as easy as yours to write and just as readable.

And you believe that everyone else does? Again, bear in mind that
you're unlikely to be the only person ever to read your code.

Keep regular expressions out of my code?????

So now you are saying there is no use for it?

Not at all - I'm saying that you shouldn't put regular expressions in
your code just for the sake of keeping your hand in. Use them where
they're applicable, and only there.

Sure.

If it is valid. As I said there are many ways to skin ..., depending on the
situation I may do it one way and the next time another way. Gives me many
options. I don't do it willy nilly, as you seem to suggest, as a test
bench.

But that's *exactly* what you've suggested you should do with regular
expressions - use them even when there's no real purpose in doing so,
just so that you remember what they look like.

I am not. I don't memorize. But I still use it.

Okay, so you don't memorise it, which means you *do* have to look up
which characters require escaping. I think you've just admitted that
your code is less maintainable than mine.

No you are very clear. If you are so concerned with others being able to
read your code and problems with escape characters - why would you EVER want
them to use them. You can't have it both ways.

I would use them when the solution which uses regular expressions is
clearer than the solution which doesn't use them. It seems a pretty
simple policy to me.

If they would have a hard time with a nothing expression like "if
(Regex.IsMatch(myString, @"something1|something2|something3"))" - they are
never going to get some of the of the other standard Regex solutions I
mentioned before.

Those maintaining the code could no doubt understand it after looking
at it for a little while, just like they could work out your other
regular expressions after looking at them and consulting the
documentation - but why are you trying to make their jobs harder? Why
are you not concerned that the code you're writing is costing your
company money by making it harder to maintain than it needs to be?

As you said, the two solutions are equal. Your solution is that you MUST go
with IndexOf. Mine is you can use either.

Well, they're equal in terms of their semantics. They're definitely not
equal in terms of maintainability, and as that's important to me, I
don't see what's wrong with saying that I'm very strongly in favour of
avoiding the less readable/maintainable code.

I wasn't referring to this particular issue when I said this.

It would have been nice if you'd indicated that. Do you agree then that
it doesn't actually take any more brainpower to come up with
String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
brainpower when it comes to maintaining the IndexOf solution?

I never said I was not familier with IndexOf.

As a matter of fact, the original question was given whether you could "do a
search for more that one string in another string".

And of course the answer is "yes, by calling IndexOf multiple times".

****************************************************************
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","something3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}
***************************************************************************
IndexOf doesn't do it. This was the original question. You have to do
multiple calls as is said in the original question. Nicholas was correct in
his assessment. One Regex call would work.

Yes, as would a single call to a method which called IndexOf on the
string multiple times. I disagree with you - Nicholas wasn't correct in
his assessment, as he claimed that the "best bet" would be to use a
regular expression. Using regular expressions is just *not* the best
bet here - it requires more effort, as I've described repeatedly.

Okay - now suppose I need to change it from searching for "something1"
to "something.1" or "something[1]". How long does it take to change in
each version? How easy is it to read afterwards?

Click to expand...

That wasn't the question.

Are you suggesting that maintainability isn't something that should be
considered? Do you *really* want to look for "something1",
"something2" and "something3" or were they (as I suspect) just
examples, and the real values could easily have dots, brackets etc in?

What if you wanted to change "something1" to "something\". Same problem.

Well, half the problem with IndexOf than it is with regular
expressions. With regular expressions, you'd need to know that not only
does backslash need escaping in C#, it also needs escaping in regular
expressions.

IndexOf: "something\\" or @"something\"
Regex: "something\\\\" or @"something\\"

Once again, the IndexOf version is easier to understand - there's less
to mentally unescape to work out what's actually being asked for.

And if escapes were a problem (if it were me) I would have a little sheet
that showed them at my desk within easy reach.

Whereas by needing to know less (just the C# escapes) it's really easy
to memorise everything I need to know to solve this situation.

As you can with Regular Expressions.

Well, Oliver Sturm has shown a more readable version, but you seem to
be keen on the "put them all in the same line" version.

Neither is as readable as the String.IndexOf version, however.

| = or (same as C)

Yup, but it's something that isn't used in string literals other than
for regular expressions. It's an extra thing to bear in mind
unnecessarily.

You are kidding, right?

Absolutely not! It's significantly easier to spot the three separate
values when they're three separate strings than when they're all mashed
together.

You are now leaving the original question. I never said that Regular
Expressions was the better (or not better) in all cases.

While I'm leaving the exact original question, it's far from out of the
question that the original code wouldn't need to be changed to use a
variable to be searched for some time. At that point, can you guarantee
that your team would get it right? They'd need to be on their guard
when using regular expressions - they wouldn't need to be on their
guard using IndexOf.

Why use them at all? It isn't readable.

They aren't as readable *in this case*. In other, more complicated
situations, the version which only used IndexOf would be harder to read
than the regular expression version.

Using a regular expression is like getting a car compared with walking
somewhere - it's absolutely the right thing to do when you're going on
a long journey, but in this case you're advocating getting in a car
just to travel to the next room. It's simpler to walk.

And if your programmers can't maintain the simple Regexs, they definately
won't be able to handle the more complicated ones.

You seem to fail to grasp the "make it as simple as possible" concept.
It's not a case of maintenance engineers being idiots - it's about
presenting them with fewer possible risks. Why leave them a trap to
fall into when you can write simpler code which is easier to change
later on?

Not in this specific case. I was never maintaining or pushing Regex for all
or any situations.

But you're pushing for regular expressions in *this* situation, or at
least saying it's just as good as using IndexOf. You've also shown in
your other code that you use regular expressions unnecessarily for
replacement, making a simple two-step replacement into a complicated
single-step replacement where the number of characters which *aren't*
just plain text is greater than the number of characters which are.

But I am not going to force my programmers to come to me to find out whether
or not Regex is the easiest way or not. That is up to the programmer. If
there is a problem with their code and feel the programmer is way off base
in his coding we would talk about (that would be the case with his C#, VB or
Regex code).

Using regular expressions in this case *is* a problem with their code,
IMO. It's just asking for trouble later on.

If you knew enough to know about Regex at all (which you said you would have
no problem with in some situations - so the programmers better be able to
read it), there should not be a problem with the 2 special characters which
is the same as C#. There is nothing obscure in this example - that I can
see.

Of course there is - to work out what's going on, you've got to
mentally unescape the dollar and the comma, but *not* mentally unescape
the |. All that rather than just "replace dollar with space, replace
comma with space" in a simple form with no hidden meanings to anything.

Actually, it was VB.Net.

Right. So in the C#, you'd either have to have more escapes, or make
them verbatim literals. More stuff to get right. Note how no escaping
at all is required in my version.

But according to you, you shouldn't use them as some of the programmers may
not be able to maintain it.

Definately if they would have a problem with our example.

Can't have it both ways. If you allow Regular Expressions, you shouldn't
have a problem if a programmer used the Regex or IndexOf in our example.
Anyone maintaining the "USEFUL" ones would have zero problems with this one.

How very black and white of you. Do you really have no concept of
someone being able to understand something, but having a harder time
understanding it one way than the other?

I was obviously talking about Regular Expressions in general here as I was
refering to the standard ones you can get anywhere dealing with (Phone
numbers, credit card etc). There would be none in this case, obviously.
But there may be in more complicated cases.

Yes - the complicated cases where I've already said that regular
expressions are useful!

It is still an issue.

Yes, it's still going to be harder to search for "some\thing" than
"something". However, it's *not* going to be harder to search for
"some.thing", or "(something)", or "[something]", or "some,thing", or
"some*thing" or "some+thing" etc. Furthermore, there's still going to
be less to remember when you *are* faced with searching for
"some\thing" than there would be using regular expressions.

Just as the Regular expressions are. And again, if
you are going to allow Regex at all, you would still need to know about the
escapes.

You'd need to know about the escapes where regular expressions are
used. The fewer places they're used, the fewer times someone will need
to look them up in the documentation.

Again - then don't allow them at all.

No, just allow them where they make sense. Note that if you only use
them where they're going to be doing something fairly involved, it's
much less likely that an engineer will forget that he's actually
dealing with a regular expression than with a simple string.

I guess that is where we disagree.

It certainly sounds like it.

No argument there.

But regular expressions are by their very nature more complicated than
a simple String.IndexOf call. If they weren't they wouldn't be as
powerful as they are.

I see no risk in the example we are talking about. At least, no more that
in the IndexOf solution (in this situation).

You don't think there's any risk that someone will forget one of the
regular expression characters which needs escaping? There is no string
you could need to search for which needs *less* escaping in regular
expressions than with String.IndexOf, but there are *lots* of strings
which need more escaping - thus there's more overall risk.

I bet if I showed my code to a random sample of a hundred C# developers
and asked them to change it to search for "hello[there]", virtually all
of them would get it right. I also bet that if I showed your code to
them and asked them for the same change, some would fail to escape it
appropriately. Do you disagree?

Click to expand...

No. But then the same developers would have a problem with the more
complicated expressions you claim is useful.

Actually, the fact that they were presented with a complicated
expression would immediately make them wary, I suspect. Problems tend
to creep in when something *looks* simpler than it actually is - as is
the case here.

Since I am not sure why you would use the first, I would do the 2nd.

You'd use the first to keep up your knowledge of reflection, of course.
After all, if you don't use it, you lose it, right? That's your
argument for using regular expressions where they're completely
unnecessary and provide no benefit, after all.

But in our case, I would still use either - as I see the Regex version as
easy as the IndexOf.

I think we'll have to agree to disagree. You seem to be unable to grasp
the idea that there are more potential pitfalls and more knowledge
required for the regular expression version than for the IndexOf
version.

tshad · Sep 27, 2005

I'm back.

Was a little busy and didn't have time to respond.

Jon Skeet said:
So you wouldn't regard LISP as a programming language, just because
it's functional rather than procedural?

I don't know much about LISP, but Mathematics is also a language, but not
the same way as English and German are.

Of course, you didn't even specify "programming language" before.

True.

But I did specify, that it depends on how you define it.

Regular expressions form a language in computing, and that language
needs to be learned before being used, just as any other language does,
whether it's C#, HTML, XPath or VB.NET.

OK

Right. Immediately the IndexOf value is more readable, by more clearly
separating the three separate strings which are being searched on.
(Oliver Sturm's version is more readable than that

I assume you mean "if (Regex.IsMatch(myString, @"something[123]"))".

But actually they are both Olivers.

I don't agree there. I think the Regex is just as readable, as long as you
have a bit of Regular Expression understanding, obviously. I also think
that if you understand C and didn't understand Regex - you would get what it
is saying (IsMatch is pretty much of a giveaway). Much than if you didn't
understand C and so the IndexOf - which doesn't really telling you what it
is doing. IsMatch is much more understandable term than IndexOf.

Then I'm afraid that's your problem. It sounds like you're basically
admitting that you're not that interested in readability. Personally, I
like writing code which is elegant but easy to maintain. Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.

I never said that.

I never said readability is not an issue, but I am not going to write "Cat
in the Hat" instead of a novel so that the programmers with the simplest of
experience can read it. But I am not going to write cryptic code either so
they can't read it.

I assume there are company standards to program by and I would follow that.

Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.

I am not writing simple code, I am writing code to handle a problem. I
prefer to write good code not simple code. Sometimes they are synonymous,
sometimes they aren't.

But in our case, I still them as equally readable.

Far more time is spent maintaining code than writing it in the first
place. Taking the attitude you take above just isn't cost-effective in
the long run.

Don't agree there.

But I'll readily admit to pushing the (IMO simpler) solution, for this
particular situation. So are you actually admitting that you *are*
pushing the use of regular expressions here?

In your opinion (as you say).

And you obviously are not listening. I am not pushing either side. I have
been saying over and over that in this situation, they are the same (IMO).
I am not pushing Regex nor am I ruling them out. You however, can't make up
your mind. One minute you say that something as simple as the example we
are using is too complex for a programmer and then proceed to say that you
would use Regex in other situations (which would have to be more
complicated), makes no sense.

So again, the code could be made more readable even by just modifying
the existing regex replacement, let alone by replacing the regular
expressions with simple String.Replace calls. Had they been
String.Replace calls, the meaning of the second line would have been
unambiguous - you'd have had to write it the simple way to start with.

I am not saying there may not be other ways to write the code. As I said, I
often rewrite my own code later as I see a way I like better that I may not
have thought of at the time I wrote it. Many times it isn't better code,
just different.

Note that your first replacement will replace two tabs with a single
space, but leave one tab alone, by the way. It would be better to
replace "\s+" with the space, IMO.

Probably true. I am not a Regex expert. That was what I came up with at
the time.

I have had to look it up if you hadn't been answering the question
though. Why make the code harder to understand in the first place? If
you want to replace a space with " or ", just use
keywords = keywords.Replace (" ", " or ");
Much more straightforward.

Even in C, which I have used for years, I have to look up parameters to make
sure I have the right parameters and have them in the right order.

As I said, the Parens were probably a mistake and may have made some changes
to the line and left the parens in. I agree yours is the correct one.

Eh?

Must have had a little brain fade there. Not sure what I was saying.

When using C#, one has to use objects. I will almost always try to
implement the simplest solution to a problem, unless there is a
compelling reason to use a more complex solution. That way, anyone
reading the code has to learn relatively little "extra" stuff beyond
the language itself.

That isn't the point.

We are talking readability here. So don't write any objects. You can use
the ones you need to, but if you write objects and someone has to maintain
it, it could be a problem if he doesn't understand objects.

You can write the same code in straight C to do what objects do. We got
along fine before there were objects. So I think, based on your statements,
you should write the easier code that some very junior programmer might have
to read.

No, they really aren't. for and foreach are well-defined in the C#
language specification. If the program is in C# to start with, it is
reasonable to assume competency in C# on the part of the reader of the
code. It is *not* reasonable to assume competency in regular
expressions, and while that wouldn't prevent me from using regular
expressions where they provide value, they just *don't* here.

But I am not writing in C# only. I am writing in .Net.

Clearly not, as you seem to be keen on using them instead of simple
string manipulations all over the place - if I saw anyone using regular
expressions rather than String.Replace in the way you've shown in other
code posts, that code would never get through code review.

Obviously, you micro manage more than I.

If you would have a problem with our examples, I don't think I would like to
work in your team.

In my area, if your code is reasonable and well written and it follows our
standards, it's fine.

Why?

Because they are perfectly valid and as you said before there are some that
are useful (therefore, you should know them as someone might use them and
you may have to maintain it).

I'm working on a fairly large project which hasn't needed to use
regular expressions and wouldn't have benefitted from them once.

That's your style and position, but may not be someone else's.

I suspect many people could say the same thing. I suspect very few if any
of them could say the same thing about the basic string manipulation
methods - and yet you were surprised to see that one could call Replace
on the result of another Replace method call, which I'd consider a far
more "basic" level of understanding than knowledge of regular
expressions.

It's one thing to understand the general power of regular expressions,
so you would know when they may be applicable - it's another thing to
use them when they serve no purpose beyond what can be more simply
achieved with the simple String methods.

At that point, if I didn't understand the regular expression, I'd look
it up in the documentation. Do you know every part of regular
expression syntax off by heart?

According to your position, you should ban them altogether for ANY use,
since you can do anything in C# you can do in Regex.

If they're on my team, I'll tell them to refactor their code to only
use them when they're appropriate, frankly.

Appropriate as defined by you. Why allow them at all?

If code uses regular expressions when they serve no purpose, it is
*not* well written and clean though - it is less maintainable than it
might be.

They serve a purpose. They do the same as your string routines, so there is
a pupose. Both are string handling routines.

And you believe that everyone else does? Again, bear in mind that
you're unlikely to be the only person ever to read your code.

So you should never EVER use Regex. Someone else might read your code.

This is going in circles.

As I said, I would have a problem with someone who couldn't figure out what
the example we were using was doing.

Not at all - I'm saying that you shouldn't put regular expressions in
your code just for the sake of keeping your hand in. Use them where
they're applicable, and only there.

There either is a use or not. You can't say there is a use for it and then
brow beat a programmer because he happens to like to use it. Has a
programmer got to come to you each time he wants to use it to get your
permission.

I can see it if he writes some obscure cyptic Regular Expression - but come
on.

But that's *exactly* what you've suggested you should do with regular
expressions - use them even when there's no real purpose in doing so,
just so that you remember what they look like.

Sure.

If they are both perfectly valid, I might. Depends on my mood (you should
really have a problem with that).

Okay, so you don't memorise it, which means you *do* have to look up
which characters require escaping. I think you've just admitted that
your code is less maintainable than mine.

No.

I can maintain my car, but I might still have to look up specs on it.

I would use them when the solution which uses regular expressions is
clearer than the solution which doesn't use them. It seems a pretty
simple policy to me.

If they are not readable, you shouldn't use them at all. I personally think
they are both readable, in this case.

Those maintaining the code could no doubt understand it after looking
at it for a little while, just like they could work out your other
regular expressions after looking at them and consulting the
documentation - but why are you trying to make their jobs harder? Why
are you not concerned that the code you're writing is costing your
company money by making it harder to maintain than it needs to be?

Again, then you feel there is no place for Regex as you can do anything with
C# that you can do with Regex. As you say, it will always be harder to
read.

Well, they're equal in terms of their semantics. They're definitely not
equal in terms of maintainability, and as that's important to me, I
don't see what's wrong with saying that I'm very strongly in favour of
avoiding the less readable/maintainable code.

I didn't say that.

It would have been nice if you'd indicated that. Do you agree then that
it doesn't actually take any more brainpower to come up with
String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
brainpower when it comes to maintaining the IndexOf solution?

In this case, no. In other cases, could be. Would have to look at it. I
never said that Regex is the best thing out there. I was just saying that
it is valid and can be readable - can also be cryptic (as can C#).

And of course the answer is "yes, by calling IndexOf multiple times".

That wasn't the question asked. That was the example that was given and the
question was can you do it in one statement.

So the answer is no, using IndexOf.

Yes, as would a single call to a method which called IndexOf on the
string multiple times. I disagree with you - Nicholas wasn't correct in
his assessment, as he claimed that the "best bet" would be to use a
regular expression. Using regular expressions is just *not* the best
bet here - it requires more effort, as I've described repeatedly.

No, he was correct in his answer to the question. The question was never
"Which is better", but can you do it . And you can do a method which called
IndexOf multiple times. But then it isn't one line, is it?

Okay - now suppose I need to change it from searching for "something1"
to "something.1" or "something[1]". How long does it take to change in
each version? How easy is it to read afterwards?

Click to expand...

That wasn't the question.

Click to expand...

Are you suggesting that maintainability isn't something that should be
considered? Do you *really* want to look for "something1",
"something2" and "something3" or were they (as I suspect) just
examples, and the real values could easily have dots, brackets etc in?

I don't really remember what the context was originally. But I know they
didn't have dots and brackets in it.

Well, half the problem with IndexOf than it is with regular
expressions. With regular expressions, you'd need to know that not only
does backslash need escaping in C#, it also needs escaping in regular
expressions.

IndexOf: "something\\" or @"something\"
Regex: "something\\\\" or @"something\\"

Once again, the IndexOf version is easier to understand - there's less
to mentally unescape to work out what's actually being asked for.

Splitting hairs, now. Both are the same, as far as I can see (here).

Whereas by needing to know less (just the C# escapes) it's really easy
to memorise everything I need to know to solve this situation.

That's true, but then you would only know C#. And if that is your aim.
That's fine.

Well, Oliver Sturm has shown a more readable version, but you seem to
be keen on the "put them all in the same line" version.

Neither is as readable as the String.IndexOf version, however.

Yup, but it's something that isn't used in string literals other than
for regular expressions. It's an extra thing to bear in mind
unnecessarily.

No room for it, huh?

Absolutely not! It's significantly easier to spot the three separate
values when they're three separate strings than when they're all mashed
together.

While I'm leaving the exact original question, it's far from out of the
question that the original code wouldn't need to be changed to use a
variable to be searched for some time. At that point, can you guarantee
that your team would get it right? They'd need to be on their guard
when using regular expressions - they wouldn't need to be on their
guard using IndexOf.

Right. No one makes mistakes with IndexOf.

They aren't as readable *in this case*. In other, more complicated
situations, the version which only used IndexOf would be harder to read
than the regular expression version.

But your problem was that it would be hard for other programmers to read.
If they can read your more complicated version, this one should be easy.

Using a regular expression is like getting a car compared with walking
somewhere - it's absolutely the right thing to do when you're going on
a long journey, but in this case you're advocating getting in a car
just to travel to the next room. It's simpler to walk.

You seem to fail to grasp the "make it as simple as possible" concept.
It's not a case of maintenance engineers being idiots - it's about
presenting them with fewer possible risks. Why leave them a trap to
fall into when you can write simpler code which is easier to change
later on?

No.

I just find it as simple, in this case and you don't.

But you're pushing for regular expressions in *this* situation, or at
least saying it's just as good as using IndexOf. You've also shown in
your other code that you use regular expressions unnecessarily for
replacement, making a simple two-step replacement into a complicated
single-step replacement where the number of characters which *aren't*
just plain text is greater than the number of characters which are.

No. Not pushing. But think they are equivelant in this case. As you said
earlier, I am sure others would disagree. But I don't think that the
difference is significant enough, in this case, even if I were to agree on
which is easier, to preclude it.

Using regular expressions in this case *is* a problem with their code,
IMO. It's just asking for trouble later on.

Of course there is - to work out what's going on, you've got to
mentally unescape the dollar and the comma, but *not* mentally unescape
the |. All that rather than just "replace dollar with space, replace
comma with space" in a simple form with no hidden meanings to anything.

Right. So in the C#, you'd either have to have more escapes, or make
them verbatim literals. More stuff to get right. Note how no escaping
at all is required in my version.

How very black and white of you. Do you really have no concept of
someone being able to understand something, but having a harder time
understanding it one way than the other?

Who?

The person who can understand Regex if complicated, but would be trashed
trying to figure out our little example.

Bit of a stretch there.

Yes - the complicated cases where I've already said that regular
expressions are useful!

Just make sure the programmer that can't handle the easy Regex doesn't see
that one.Can't have that !!!!

It is still an issue.

Click to expand...

Yes, it's still going to be harder to search for "some\thing" than
"something". However, it's *not* going to be harder to search for
"some.thing", or "(something)", or "[something]", or "some,thing", or
"some*thing" or "some+thing" etc. Furthermore, there's still going to
be less to remember when you *are* faced with searching for
"some\thing" than there would be using regular expressions.

Just as the Regular expressions are. And again, if
you are going to allow Regex at all, you would still need to know about
the
escapes.

Click to expand...

You'd need to know about the escapes where regular expressions are
used. The fewer places they're used, the fewer times someone will need
to look them up in the documentation.

Again - then don't allow them at all.

Click to expand...

No, just allow them where they make sense. Note that if you only use
them where they're going to be doing something fairly involved, it's
much less likely that an engineer will forget that he's actually
dealing with a regular expression than with a simple string.

Already dealt with.

It certainly sounds like it.

But regular expressions are by their very nature more complicated than
a simple String.IndexOf call. If they weren't they wouldn't be as
powerful as they are.

Write and vanilla C# is less complicated than writing objects, but we still
do them.

I see no risk in the example we are talking about. At least, no more
that
in the IndexOf solution (in this situation).

Click to expand...

You don't think there's any risk that someone will forget one of the
regular expression characters which needs escaping? There is no string
you could need to search for which needs *less* escaping in regular
expressions than with String.IndexOf, but there are *lots* of strings
which need more escaping - thus there's more overall risk.

I bet if I showed my code to a random sample of a hundred C# developers
and asked them to change it to search for "hello[there]", virtually all
of them would get it right. I also bet that if I showed your code to
them and asked them for the same change, some would fail to escape it
appropriately. Do you disagree?

Click to expand...

No. But then the same developers would have a problem with the more
complicated expressions you claim is useful.

Click to expand...

Actually, the fact that they were presented with a complicated
expression would immediately make them wary, I suspect. Problems tend
to creep in when something *looks* simpler than it actually is - as is
the case here.

Since I am not sure why you would use the first, I would do the 2nd.

Click to expand...

You'd use the first to keep up your knowledge of reflection, of course.
After all, if you don't use it, you lose it, right? That's your
argument for using regular expressions where they're completely
unnecessary and provide no benefit, after all.

But in our case, I would still use either - as I see the Regex version as
easy as the IndexOf.

Click to expand...

I think we'll have to agree to disagree. You seem to be unable to grasp
the idea that there are more potential pitfalls and more knowledge
required for the regular expression version than for the IndexOf
version.

Agreed.

Tom

Jon Skeet [C# MVP] · Sep 27, 2005

tshad said:
So you wouldn't regard LISP as a programming language, just because
it's functional rather than procedural?

Click to expand...

I don't know much about LISP, but Mathematics is also a language, but not
the same way as English and German are.
Indeed.

Of course, you didn't even specify "programming language" before.

Click to expand...

True.

But I did specify, that it depends on how you define it.
True.

Right. Immediately the IndexOf value is more readable, by more clearly
separating the three separate strings which are being searched on.
(Oliver Sturm's version is more readable than that

Click to expand...

I assume you mean "if (Regex.IsMatch(myString, @"something[123]"))".

No, I mean:

Yes, where the string itself is separated onto three lines.

But actually they are both Olivers.

I don't agree there. I think the Regex is just as readable, as long as you
have a bit of Regular Expression understanding, obviously. I also think
that if you understand C and didn't understand Regex - you would get what it
is saying (IsMatch is pretty much of a giveaway). Much than if you didn't
understand C and so the IndexOf - which doesn't really telling you what it
is doing.

Yes it does, it's finding the index of one string within another.

IsMatch is much more understandable term than IndexOf.

The name is as understandable, but the exact semantics are *much* more
obscure. The name doesn't suggest that you can't just put a only in
there and expect it to only match a dot for instance, does it?

I never said that.

You said that when you have a solution, you won't consider whether a
more readable way of writing it. To me, that demonstrates that you
don't care very much about readability.

I never said readability is not an issue, but I am not going to write "Cat
in the Hat" instead of a novel so that the programmers with the simplest of
experience can read it. But I am not going to write cryptic code either so
they can't read it.

If the "Cat in the Hat" does the job as well as the novel and is easier
to read, why on earth would you want to write the novel?

I assume there are company standards to program by and I would follow that.

There aren't usually company standards down to the level of when to use
regular expressions.

I am not writing simple code, I am writing code to handle a problem. I
prefer to write good code not simple code. Sometimes they are synonymous,
sometimes they aren't.

I disagree - simple code that works (as well as the more complicated
code) is always good. Note that this is in terms of implementation, not
design - there's sometimes a very simple but inelegant design which
ends up costing a lot more work in the long run. That's a different
matter.

But in our case, I still them as equally readable.

You still haven't said whether you see them as equally readable *and
maintainable* to others though.

Don't agree there.

With which bit? If you're going to disagree with the first sentence
quoted, we really don't have much basis for discussion. I thought it
was pretty much universally accepted these days that code almost always
spends more time in maintenance than in original coding. That's why I'm
always happy to spend a bit more time refactoring working code to make
it easier to maintain.

In your opinion (as you say).

And you obviously are not listening. I am not pushing either side. I have
been saying over and over that in this situation, they are the same (IMO).

But that *is* pushing regular expressions from my point of view, where
they shouldn't be an option.

Consider an exaggerated equivalent situation. Suppose we were
discussing how to implement addition. Suppose I thought that just using
the expression x+y was the easiest way of doing things, and you thought
it was just as easy to write a remote web service which took two
integers. By *not* ruling out the more complex solution, you're
*effectively* pushing it - at least pushing it as an equally valid
option.

I am not pushing Regex nor am I ruling them out. You however, can't make up
your mind. One minute you say that something as simple as the example we
are using is too complex for a programmer and then proceed to say that you
would use Regex in other situations (which would have to be more
complicated), makes no sense.

<sigh> I don't know whether you're intentionally missing the point or
whether I'm genuinely not getting through.

There is always risk associated with changing code. When writing code,
you should try to reduce the risk that future changes will incur. That
means making the code as simple as possible, and easy to change.

In some cases a regular expression will be a lot simpler to read and
change than the equivalent "primitive string manipulation" code. Those
cases would usually be where the string manipulation involves several
steps, often nested loops etc. There, the complexity of regular
expressions (which is still there) is less than the complexity of the
primitive solution.

In this case, however, the primitive solution is very simple and
understandable. Changing it to search for a different string or an
extra string (or even a string passed in as a parameter) is trivial.
Changing the regular expression is not.

I am not saying there may not be other ways to write the code. As I said, I
often rewrite my own code later as I see a way I like better that I may not
have thought of at the time I wrote it. Many times it isn't better code,
just different.

In this case though, it *would* be better - it would be simpler to
understand, and simpler to write in the first place.

For instance, I wouldn't have had to consider whether the brackets were
doing something clever or not. I had to look up .NET regular
expressions just to check the meaning in this case. Do you really
believe that a solution which *doesn't* involve that extra thought
isn't better?

Probably true. I am not a Regex expert. That was what I came up with at
the time.

And that's part of the risk - that someone doesn't put enough effort
into the regex to get the *actually* desired behaviour. Where the
alternative is a complex solution, it makes a lot of sense to put
significant effort into getting the regex right. When you could do the
same thing with a few string operations, it's just not worth it.

(For this first line, a regex is probably the best way to go - but you
need to think about it more closely.)

Even in C, which I have used for years, I have to look up parameters to make
sure I have the right parameters and have them in the right order.

Usually intellisense can help you with that though - it *doesn't* start
explaining the details of regular expressions though.

As I said, the Parens were probably a mistake and may have made some changes
to the line and left the parens in. I agree yours is the correct one.

And if you weren't taking "use regular expressions" as your default
position, you wouldn't have made the mistake in the first place. The
first thing you should try to think of is the simplest one. You want to
manipulate a string, so ask yourself if there's anything in the string
class which does what you want.

That isn't the point.

It may not be your point, but it's part of my point.

We are talking readability here. So don't write any objects. You can use
the ones you need to, but if you write objects and someone has to maintain
it, it could be a problem if he doesn't understand objects.

I'm assuming that "the solution uses .NET" is a given - in other words,
any maintenance engineer should know C# and the basics of .NET. To me
"the basics" don't include regular expressions and memorising all the
details of them. *Some* familiarity can be hoped for, but not knowing
all the constructs - so anything which requires that people *do* know
the regex constructs in order to change things is at a disadvantage.

You can write the same code in straight C to do what objects do. We got
along fine before there were objects. So I think, based on your statements,
you should write the easier code that some very junior programmer might have
to read.

No, we didn't "get along fine" before there were objects. C code is
typically far harder to read than OO code - and where it's not, that's
often because it's effectively written in a semi-OO way, just using
naming to indicate which type of object is being used (just without
polymorphism etc).

But I am not writing in C# only. I am writing in .Net.

So you would assume that everyone who is reading and maintaining your
code knows every class in the .NET framework? I don't.

Obviously, you micro manage more than I.

Well, I code review, just as my peers code review. We almost always
find things which can be done better (which works even better when pair
programming). That doesn't indicate that we're not good developers -
just that an extra point of view is always helpful. It also stops us
from getting lazy and implementing something which is just "okay"
rather than as good as it should be.

If you would have a problem with our examples, I don't think I would like to
work in your team.

Likewise if you don't consider that finding the simplest way of
implementing a solution is worth doing, I wouldn't like to work on your
code.

In my area, if your code is reasonable and well written and it follows our
standards, it's fine.

Being more complex than it needs to be means that code *isn't*
reasonable and well-written, IMO.

Because they are perfectly valid and as you said before there are some that
are useful (therefore, you should know them as someone might use them and
you may have to maintain it).

Occasionally they're useful. I haven't used a single one in the project
I've been working on for the last six months. On the other hand, I've
used string manipulation all over the place.

I would expect that the number of straight string manipulations in most
code should be *much* higher than the number of regular expressions
used - hence it's more important to thoroughly understand the string
methods than regexes.

That's your style and position, but may not be someone else's.

Everyone else in the team certainly feels the same way.

According to your position, you should ban them altogether for ANY use,
since you can do anything in C# you can do in Regex.

No, because - as I *keep* saying - there are things you can't do as
*simply* using straight string manipulation. Where it's simpler to use
regexes, I'd use them. Those situations come up occasionally, but not
with the frequency you seem to use regular expressions.

Appropriate as defined by you. Why allow them at all?

See the various places I've exlained that both in this post and many
others.

They serve a purpose. They do the same as your string routines, so there is
a pupose. Both are string handling routines.

No, using regular expressions *instead* of the string handling routines
serves no purpose, just as using a web service to perform addition
would serve no purpose.

There's no advantage in using the regular expression here, and there
*is* a disadvantage.

So you should never EVER use Regex. Someone else might read your code.

This is going in circles.

Yes, because you seem unable to understand the position I've presented
several times.

As I said, I would have a problem with someone who couldn't figure out what
the example we were using was doing.

But would you have a problem with the same person if they forgot or
didn't check whether, say, '[' needed escaping? I'd find that a fairly
understandable mistake (although I'd hope that unit tests would show
the problem up).

There either is a use or not. You can't say there is a use for it and then
brow beat a programmer because he happens to like to use it.

I certainly can when the programmer uses it where there's no good
reason. There's a time and place to use reflection, but I would
certainly brow-beat a programmer who decided to use it to get the value
of a property which could be done in a safer way (using normal property
access syntax).

Has a programmer got to come to you each time he wants to use it to get your
permission.

In our team a programmer (including myself) has to get "permission"
every time they want to check anything in. It's called code review, and
it vastly improves the quality of the code.

I can see it if he writes some obscure cyptic Regular Expression - but come
on.

Cryptic such as "( )" where a straight " " would have been more
readable? Code review should have picked that up.

Sure.

If they are both perfectly valid, I might. Depends on my mood (you should
really have a problem with that).

I certainly do. "Valid" to me involves the code being as simple as
possible.

No.

I can maintain my car, but I might still have to look up specs on it.

But wouldn't it be easier to maintain something which *didn't* require
you to look up anything?

If they are not readable, you shouldn't use them at all. I personally think
they are both readable, in this case.

Readability is not a black and white issue. Something is "more
readable" than something else - in this case, using string manipulation
is more readable (and maintainable, importantly) than using regular
expressions. In other cases, it isn't.

Again, then you feel there is no place for Regex as you can do anything with
C# that you can do with Regex. As you say, it will always be harder to
read.

Where did I say it will *always* be harder to read? Please don't put
words in my mouth, especially when I've expressly stated otherwise
elsewhere.

At times, regular expressions will be easier to understand than the
equivalent string manipulation solution. In this case, they're not.

I didn't say that.

Didn't say what?

In this case, no.

So you don't think that it would be harder to change the regex code to
look for "hello[there" than it would be to change the IndexOf code in
the same way?

In other cases, could be. Would have to look at it. I
never said that Regex is the best thing out there. I was just saying that
it is valid and can be readable - can also be cryptic (as can C#).

And I've never argued with that. I've argued against it being *as*
readable and maintainable in *this* case.

That wasn't the question asked. That was the example that was given and the
question was can you do it in one statement.

So the answer is no, using IndexOf.

Okay. But the follow-on answer is "the best way to do it is to use
IndexOf repeatedly" possibly with "and you can always write your own
method to do this if you want".

No, he was correct in his answer to the question. The question was never
"Which is better", but can you do it .

His answer talked about the "best bet" - although the question didn't
ask about the best way, his answer did. I disagree with that answer.

And you can do a method which called
IndexOf multiple times. But then it isn't one line, is it?

You could put it in one line if you wanted to. It wouldn't be as easy
to read, but you could do it.

I don't really remember what the context was originally. But I know they
didn't have dots and brackets in it.

And wouldn't ever?

Splitting hairs, now. Both are the same, as far as I can see (here).

You don't think that having to count 4 backslashes is even slightly
harder than only counting 2? I can spot a double-backslash without
doing any double-checking. I'd always be careful when I needed four.

That's true, but then you would only know C#. And if that is your aim.
That's fine.

My aim is to only *need* to know as little as possible. The rest is
available where necessary.

No room for it, huh?

Not when there's a simpler solution, no.

Right. No one makes mistakes with IndexOf.

More rarely than with regular expressions.

But your problem was that it would be hard for other programmers to read.
If they can read your more complicated version, this one should be easy.

No.

I just find it as simple, in this case and you don't.

I would be willing to wager large amounts of money on others
(particularly junior programmers) finding it less simple though. I'm
absolutely certain that if thousands of programmers had to maintain the
IndexOf version and change it to look for "foo.bar", fewer would make a
mistake than thousands of equivalent programmers maintaining the
regular expression version.

Are you absolutely certain that the regular expression *wouldn't* prove
more bug-prone?

No. Not pushing. But think they are equivelant in this case. As you said
earlier, I am sure others would disagree. But I don't think that the
difference is significant enough, in this case, even if I were to agree on
which is easier, to preclude it.

To me, it's definitely signifiant. Using regular expressions here
introduces risk for no benefit.

Who?

The person who can understand Regex if complicated, but would be trashed
trying to figure out our little example.

Bit of a stretch there.

Again, you're being black and white. I'm not saying that people
*couldn't* understand the regular expression - although they're more
likely to make a simple mistake without thinking about it. I'm saying
that they'll need to put more effort into understanding it than a
straight IndexOf.

Just make sure the programmer that can't handle the easy Regex doesn't see
that one.

I would hope that anyone maintaining a complex regular expression will
double-check what's going on. It's easy to conceive of someone
maintaining a simple one failing to do so.

Already dealt with.
Where?

Write and vanilla C# is less complicated than writing objects, but we still
do them.

No, it's not less complicated. If you avoided using objects, the code
would be *much* harder to read and maintain.

tshad · Sep 28, 2005

Jon Skeet said:
tshad said:

When I talk about a Programming Language - I am talking about a
Procedural
Language (C, Fortran, VB, Pascal, etc.).

So you wouldn't regard LISP as a programming language, just because
it's functional rather than procedural?

Click to expand...

I don't know much about LISP, but Mathematics is also a language, but not
the same way as English and German are.
Indeed.

Of course, you didn't even specify "programming language" before.

Click to expand...

True.

But I did specify, that it depends on how you define it.
True.

Right. Immediately the IndexOf value is more readable, by more clearly
separating the three separate strings which are being searched on.
(Oliver Sturm's version is more readable than that

Click to expand...

I assume you mean "if (Regex.IsMatch(myString, @"something[123]"))".

Click to expand...

No, I mean:

Yes, where the string itself is separated onto three lines.

But actually they are both Olivers.

I don't agree there. I think the Regex is just as readable, as long as
you
have a bit of Regular Expression understanding, obviously. I also think
that if you understand C and didn't understand Regex - you would get what
it
is saying (IsMatch is pretty much of a giveaway). Much than if you
didn't
understand C and so the IndexOf - which doesn't really telling you what
it
is doing.

Click to expand...

Yes it does, it's finding the index of one string within another.

IsMatch is much more understandable term than IndexOf.

Click to expand...

The name is as understandable, but the exact semantics are *much* more
obscure. The name doesn't suggest that you can't just put a only in
there and expect it to only match a dot for instance, does it?

But you aren't getting the point.

You are talking readability (not all the possible permutations). OK if you
you want to put dots and {} and () and [] and \ in the string, we have a
different story. In this case, however, you cannot tell me that a
programmer can't see that you are looking for a match (IsMatch) and there
are OBVIOUSLY (to even a half way decent programmer) 3 strings separated by
a "|", so therefore this line (and this line only) says we are trying to
match one of the 3 strings.

Wouldn't you agree? Leave out all the extraneous possibilities.

We aren't talking about a Regular Expression such as:

^((31(?!\ (Feb(ruary)?|Apr(il)?|June?|(Sept|Nov)(ember)?)))|((30|29)(?!\
Feb(ruary)?))|(29(?=\ Feb(ruary)?\
(((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))|(0?[1-9])|1\d|2[0-8])\
(Jan(uary)?|Feb(ruary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\
((1[6-9]|[2-9]\d)\d{2})$

Description: This RE validates dates in the dd MMM yyyy format. Spaces
separate the values. Month value is either the full name of the month or the
3 letter abbrieviation without a period. Days for the month are validated
for all month, including Feb in leap years. Years are 4 digit years.

(this isn't mine, just one I saw on the net)

In this case, I would probably agree with you. Of course, I would assume
you could probably do the same thing in C# (although not with IndexOf only -
I would expect), but I don't know if it would be easy or readable. Probably
would be more readable.

You said that when you have a solution, you won't consider whether a
more readable way of writing it. To me, that demonstrates that you
don't care very much about readability.

What I said was (restated) - If I have a solution, I am not not going to
think (as I try to write reasonable code, anyway and document it) " Wait a
minute, will Jim have a problem with it, will Mark have a problem with it.
We just hired Steve, a junior programmer, will he have a problem with it.
Wait a minute I'm not sure whether Greg is versed in this simple Regex
statement. Maybe I should find another solution - even though this is valid
and 10 of my programmers can read it, someone may have a little problem with
it. Surely, I can spend a little more time to rewrite a solution that
clearly works and find a more readable one."

This is the mindset you go through?

Of course, readability is important, but lets not be excessive about it. I
am not going write to a 3rd grade mentality when I am writing a business
letter. This does not mean it isn't readable, just not written to a grade
school level.

I don't expect a junior programmer to be able to understand everything I
code (which is why we have documentation). I agree yours is readable in
this case, but not anymore readable than mine (in this case). If your
programmers would have a problem with this statement, I would maintain the
problem is not with the code but the programmers.

If the "Cat in the Hat" does the job as well as the novel and is easier
to read, why on earth would you want to write the novel?

There aren't usually company standards down to the level of when to use
regular expressions.

In your case, there should be.

Otherwise, don't quibble over a simple Regex, when you clearly say that a
much more complicated one is fine.

I have no problem with your saying that in your company, you don't allow
Regex statements.

I do have a problem with your saying, "your company doesn't preclude Regex,
and you accept Regex in complicated cases. But I'd better not catch you
using it in a simple case.".

I disagree - simple code that works (as well as the more complicated
code) is always good.

Except that I don't agree that this one is complicated.

Note that this is in terms of implementation, not
design - there's sometimes a very simple but inelegant design which
ends up costing a lot more work in the long run. That's a different
matter.

You still haven't said whether you see them as equally readable *and
maintainable* to others though.

I do.

With which bit? If you're going to disagree with the first sentence
quoted, we really don't have much basis for discussion. I thought it
was pretty much universally accepted these days that code almost always
spends more time in maintenance than in original coding. That's why I'm
always happy to spend a bit more time refactoring working code to make
it easier to maintain.

But that *is* pushing regular expressions from my point of view, where
they shouldn't be an option.

What?????

No pushing Regex is saying Regex is better and you should use it.

Giving choices and options is not pushing either side.

You are the one pushing one side over the other, I am not.

Consider an exaggerated equivalent situation. Suppose we were
discussing how to implement addition. Suppose I thought that just using
the expression x+y was the easiest way of doing things, and you thought
it was just as easy to write a remote web service which took two
integers. By *not* ruling out the more complex solution, you're
*effectively* pushing it - at least pushing it as an equally valid
option.

No, that just means I am still not pushing the Complex side.

I am not saying pushing a position is a bad thing. But even if I said
Complex is as good as not, that still doesn't push the position.

<sigh> I don't know whether you're intentionally missing the point or
whether I'm genuinely not getting through.

No you are getting through, I just disagree in this situation.

There is always risk associated with changing code. When writing code,
you should try to reduce the risk that future changes will incur. That
means making the code as simple as possible, and easy to change.

I don't disagree.

I just disagree that that is not the point here.

In some cases a regular expression will be a lot simpler to read and
change than the equivalent "primitive string manipulation" code. Those
cases would usually be where the string manipulation involves several
steps, often nested loops etc. There, the complexity of regular
expressions (which is still there) is less than the complexity of the
primitive solution.

Yes, but if a programmer can read the more complex Regex, he can most
certainly read this almost nothing line.

In this case, however, the primitive solution is very simple and
understandable. Changing it to search for a different string or an
extra string (or even a string passed in as a parameter) is trivial.
Changing the regular expression is not.

But you are changing exactly the same thing in this example, whether it is a
parameter or literal.

In this case though, it *would* be better - it would be simpler to
understand, and simpler to write in the first place.

Round and round

For instance, I wouldn't have had to consider whether the brackets were
doing something clever or not. I had to look up .NET regular
expressions just to check the meaning in this case. Do you really
believe that a solution which *doesn't* involve that extra thought
isn't better?

I see no brackets in our example (unless you are talking about the second
option).

And that's part of the risk - that someone doesn't put enough effort
into the regex to get the *actually* desired behaviour. Where the
alternative is a complex solution, it makes a lot of sense to put
significant effort into getting the regex right. When you could do the
same thing with a few string operations, it's just not worth it.

And if you feel that way, then you should do it that way. If someone
doesn't and feels that the Regex is just as easy, then it is feasible.

(For this first line, a regex is probably the best way to go - but you
need to think about it more closely.)

Usually intellisense can help you with that though - it *doesn't* start
explaining the details of regular expressions though.

If you happen to be using VS, I suppose. Now we are talking tools.

And if you weren't taking "use regular expressions" as your default
position, you wouldn't have made the mistake in the first place. The
first thing you should try to think of is the simplest one. You want to
manipulate a string, so ask yourself if there's anything in the string
class which does what you want.

No, don't agree there.

I go with "is there a good solution". And it may be that I would come up
with the C string solution, but it maybe that I come up with the Regex just
as easily. I would probably look at the above Regex that was so complicated
and check to see if this could be done easier with C (or VB if I was working
in that). But not in a simple example as this. I would probably go with my
first thought, if it worked and was viable. I would also probably go with
it as it was one line instead of three, especially if it was being used as a
function.

It may not be your point, but it's part of my point.

I'm assuming that "the solution uses .NET" is a given - in other words,
any maintenance engineer should know C# and the basics of .NET. To me
"the basics" don't include regular expressions and memorising all the
details of them. *Some* familiarity can be hoped for, but not knowing
all the constructs - so anything which requires that people *do* know
the regex constructs in order to change things is at a disadvantage.

I agree here, but if there is some familiarity with them, I assume they can
see what this example says.

No, we didn't "get along fine" before there were objects. C code is
typically far harder to read than OO code - and where it's not, that's
often because it's effectively written in a semi-OO way, just using
naming to indicate which type of object is being used (just without
polymorphism etc).

Not necessarily to a junior programmer.

So you would assume that everyone who is reading and maintaining your
code knows every class in the .NET framework? I don't.

Right. So they would have to look them up. But that doesn't preclude you
from using them.

Well, I code review, just as my peers code review. We almost always
find things which can be done better (which works even better when pair
programming). That doesn't indicate that we're not good developers -
just that an extra point of view is always helpful. It also stops us
from getting lazy and implementing something which is just "okay"
rather than as good as it should be.

Nothing wrong with that. And I would assume that there are disagreements
with what is "better". I never have a problem with others reading my code
and looking for better ways to program. I just don't necessarily agree that
the other persons way is better.

I have had a running disagreement with Joe Celko (don't know if you know who
he is). He wrote some good Sql Server books and is very knowledgeable in
the subject. Tons more than I am.

But I disagree with him on a couple of issues. One is that he he is dead
set against Camel Case. Says it is harder to read. I disagree. He feels
that code is worse if you use it. I disagree.

I also don't agree with the C style of putting the left bracket at the end
of a line Kernighan and Ritchie style. Makes more sense to put it on the
next line as it is part of a block of code. Have had many discussions on
that point. Matter of style. But not necessarily better.

Likewise if you don't consider that finding the simplest way of
implementing a solution is worth doing, I wouldn't like to work on your
code.
OK.

Being more complex than it needs to be means that code *isn't*
reasonable and well-written, IMO.

I agree in some but not in this case.

Because what you are saying is that you really should have 10 programmers
program each piece of each project. That way you can pick which piece is
the simplest (less complex).

If you have 2 programmers program the same problem, you are going to get 2
different solutions. One will be simpler than the other. If you have 3
programmers one of the 3 will be simpler and so on and so on ...

If that is your aim.

Occasionally they're useful. I haven't used a single one in the project
I've been working on for the last six months. On the other hand, I've
used string manipulation all over the place.

And no one says you have to use them.

I would expect that the number of straight string manipulations in most
code should be *much* higher than the number of regular expressions
used - hence it's more important to thoroughly understand the string
methods than regexes.
OK.

Everyone else in the team certainly feels the same way.

Then you are lucky you work with that team.

No, because - as I *keep* saying - there are things you can't do as
*simply* using straight string manipulation. Where it's simpler to use
regexes, I'd use them. Those situations come up occasionally, but not
with the frequency you seem to use regular expressions.

No your position was that "this" particular example was hard to read and may
not be maintainable by some programmer. I agree with you in other
situations, just not this one.

See the various places I've exlained that both in this post and many
others.

The problem here is that you are not leaving "appropriate" up to the
programmer.

No, using regular expressions *instead* of the string handling routines
serves no purpose, just as using a web service to perform addition
would serve no purpose.

There's no advantage in using the regular expression here, and there
*is* a disadvantage.

Yes, because you seem unable to understand the position I've presented
several times.

Of course, not because you seem unable to understand the position I've
presented several times.

As I said, I would have a problem with someone who couldn't figure out
what
the example we were using was doing.

Click to expand...

But would you have a problem with the same person if they forgot or
didn't check whether, say, '[' needed escaping? I'd find that a fairly
understandable mistake (although I'd hope that unit tests would show
the problem up).

And mistakes are not made with C string code?

I certainly can when the programmer uses it where there's no good
reason. There's a time and place to use reflection, but I would
certainly brow-beat a programmer who decided to use it to get the value
of a property which could be done in a safer way (using normal property
access syntax).

In our team a programmer (including myself) has to get "permission"
every time they want to check anything in. It's called code review, and
it vastly improves the quality of the code.

Cryptic such as "( )" where a straight " " would have been more
readable? Code review should have picked that up.

But now you are talking about Code review. The problem was that I probably
forgot to take out the Parens when I took out the other piece of code.
Would happen just as easily in C String handlers.

I certainly do. "Valid" to me involves the code being as simple as
possible.

But wouldn't it be easier to maintain something which *didn't* require
you to look up anything?

I've been writing C code for 15+ years and still have to look things up.
Just dense, I guess

Readability is not a black and white issue. Something is "more
readable" than something else - in this case, using string manipulation
is more readable (and maintainable, importantly) than using regular
expressions. In other cases, it isn't.

As I said, I think it is as readable as C in this case. I would agree with
you in others.

Where did I say it will *always* be harder to read? Please don't put
words in my mouth, especially when I've expressly stated otherwise
elsewhere.

At times, regular expressions will be easier to understand than the
equivalent string manipulation solution. In this case, they're not.

Didn't say what?

That I said you were wrong in saying you were strongly in favor of avoiding
the less readable/maintainable code.

In this case, no.

Click to expand...

So you don't think that it would be harder to change the regex code to
look for "hello[there" than it would be to change the IndexOf code in
the same way?

In other cases, could be. Would have to look at it. I
never said that Regex is the best thing out there. I was just saying
that
it is valid and can be readable - can also be cryptic (as can C#).

Click to expand...

And I've never argued with that. I've argued against it being *as*
readable and maintainable in *this* case.

That wasn't the question asked. That was the example that was given and
the
question was can you do it in one statement.

So the answer is no, using IndexOf.

Click to expand...

Okay. But the follow-on answer is "the best way to do it is to use
IndexOf repeatedly" possibly with "and you can always write your own
method to do this if you want".

But that wasn't the original question. If you do that you are definately
not doing it in one line (which was the question). The question was if
there was a way - not "what is the best way". That doesn't negate your
point of view on it. But that wasn't what was being asked. Obviously, the
multiple IndexOf lines were known as the question was "is there another way
in one line to do the same thing".

And there was.

His answer talked about the "best bet" - although the question didn't
ask about the best way, his answer did. I disagree with that answer.

That was obvious.

But he was answering the question. You weren't.

The question was "is there a way to do this in one command". And then an
example of what was being looked for.

He answered that question and said that was the best bet (IHO).

So far, you haven't answered that question.

Creating another subroutine to call - sort of does it. But wasn't what was
really being asked, according to the examples.

So his answer was valid.

You could put it in one line if you wanted to. It wouldn't be as easy
to read, but you could do it.

Ok.

It wasn't one statement, not one line.

And wouldn't ever?

Probably not, but possible. But again, that wasn't the question.

You don't think that having to count 4 backslashes is even slightly
harder than only counting 2? I can spot a double-backslash without
doing any double-checking. I'd always be careful when I needed four.

My aim is to only *need* to know as little as possible. The rest is
available where necessary.

Not when there's a simpler solution, no.

More rarely than with regular expressions.

I would be willing to wager large amounts of money on others
(particularly junior programmers) finding it less simple though. I'm
absolutely certain that if thousands of programmers had to maintain the
IndexOf version and change it to look for "foo.bar", fewer would make a
mistake than thousands of equivalent programmers maintaining the
regular expression version.

You can make anything more complicated than it has to be. The question was
still answered with a Regular Expression. Not whether you can make it more
complicated.

Are you absolutely certain that the regular expression *wouldn't* prove
more bug-prone?

To me, it's definitely signifiant. Using regular expressions here
introduces risk for no benefit.

Oh well.

Life is risky.

Again, you're being black and white. I'm not saying that people
*couldn't* understand the regular expression - although they're more
likely to make a simple mistake without thinking about it. I'm saying
that they'll need to put more effort into understanding it than a
straight IndexOf.

I would hope that anyone maintaining a complex regular expression will
double-check what's going on. It's easy to conceive of someone
maintaining a simple one failing to do so.

What????

So now you shouldn't use it because someone might not double-check it?

Where?

Ok. Maybe not.

I really don't see where an engineer is going to "forget" he is dealing with
a regular expression.

No, it's not less complicated. If you avoided using objects, the code
would be *much* harder to read and maintain.

I think we are running out of arguments

Tom

Jon Skeet [C# MVP] · Sep 28, 2005

tshad said:
The name is as understandable, but the exact semantics are *much* more
obscure. The name doesn't suggest that you can't just put a only in
there and expect it to only match a dot for instance, does it?

Click to expand...

But you aren't getting the point.

You are talking readability (not all the possible permutations). OK if you
you want to put dots and {} and () and [] and \ in the string, we have a
different story. In this case, however, you cannot tell me that a
programmer can't see that you are looking for a match (IsMatch) and there
are OBVIOUSLY (to even a half way decent programmer) 3 strings separated by
a "|", so therefore this line (and this line only) says we are trying to
match one of the 3 strings.

Whereas I would say it isn't *as* obvious as three calls to IndexOf.
Yes, it wouldn't take more than a few seconds to work out what was
going on, but why add that time in the first place?

Wouldn't you agree? Leave out all the extraneous possibilities.

We aren't talking about a Regular Expression such as:

^((31(?!\ (Feb(ruary)?|Apr(il)?|June?|(Sept|Nov)(ember)?)))|((30|29)(?!\
Feb(ruary)?))|(29(?=\ Feb(ruary)?\
(((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))|(0?[1-9])|1\d|2[0-8])\
(Jan(uary)?|Feb(ruary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\
((1[6-9]|[2-9]\d)\d{2})$

Description: This RE validates dates in the dd MMM yyyy format. Spaces
separate the values. Month value is either the full name of the month or the
3 letter abbrieviation without a period. Days for the month are validated
for all month, including Feb in leap years. Years are 4 digit years.

(this isn't mine, just one I saw on the net)

In this case, I would probably agree with you. Of course, I would assume
you could probably do the same thing in C# (although not with IndexOf only -
I would expect), but I don't know if it would be easy or readable. Probably
would be more readable.

I'd almost certainly use DateTime.ParseExact instead, giving it a list
of appropriate formats which were acceptable

What I said was (restated) - If I have a solution, I am not not going to
think (as I try to write reasonable code, anyway and document it) " Wait a
minute, will Jim have a problem with it, will Mark have a problem with it.
We just hired Steve, a junior programmer, will he have a problem with it.
Wait a minute I'm not sure whether Greg is versed in this simple Regex
statement. Maybe I should find another solution - even though this is valid
and 10 of my programmers can read it, someone may have a little problem with
it. Surely, I can spend a little more time to rewrite a solution that
clearly works and find a more readable one."

This is the mindset you go through?

Not with individuals, no. I *do* think: "Is this the simplest way of
getting the job done?"

Of course, readability is important, but lets not be excessive about it. I
am not going write to a 3rd grade mentality when I am writing a business
letter. This does not mean it isn't readable, just not written to a grade
school level.

I don't expect a junior programmer to be able to understand everything I
code (which is why we have documentation). I agree yours is readable in
this case, but not anymore readable than mine (in this case). If your
programmers would have a problem with this statement, I would maintain the
problem is not with the code but the programmers.

There's a difference between "unable to understand" and "not as
*easily* able to understand" - between "very low risk" and "some risk".
When it comes to maintenance, I would class the regex solution as "some
risk" because of the possibility for error if you need escaping - and
one of the things about maintenance is that you just can't easily
predict what changes will be required.

In your case, there should be.

There don't need to be - just the "do things in the simplest way" would
cover this case.

Otherwise, don't quibble over a simple Regex, when you clearly say that a
much more complicated one is fine.

The above rule covers both situations.

I have no problem with your saying that in your company, you don't allow
Regex statements.
I do have a problem with your saying, "your company doesn't preclude Regex,
and you accept Regex in complicated cases. But I'd better not catch you
using it in a simple case.".

Why? It clearly follows the "use the simplest code to get the job
done" rule.

I should add at this point that I've been talking to a few people about
this, and *all* of them have agreed so far that the regular expression
way just isn't the way to go here - that it's a risky and relatively
complex solution.

Except that I don't agree that this one is complicated.

I didn't say it was complicated - I said it was *more* complicated.

I do.

So you'd be happy to take the bet about which version would trip up
more coders? (Note that giving it in a "test" situation wouldn't be
entirely appropriate, unfortunate - people are already on their guard
for subtleties when you give them actual tests.)

What?????

No pushing Regex is saying Regex is better and you should use it.

Giving choices and options is not pushing either side.

You are the one pushing one side over the other, I am not.

I'm certainly pushing one side - but I happen to think you count as
"pushing" when the solution you're talking about is to me obviously
more complicated.

No, that just means I am still not pushing the Complex side.

I am not saying pushing a position is a bad thing. But even if I said
Complex is as good as not, that still doesn't push the position.

I think we'll have to agree to disagree about what counts as pushing
then. Fortunately it's not terribly important to the discussion.

I don't disagree.

You did before. You wrote:

<quote>
If I have the solution and it happens to be Regex, I would use it, I
wouldn't necessarily say to myself - "Is there perhaps a more readable
way to write this? I wonder if Jim will be able to read this or not."
</quote>

That doesn't sit well with agreeing that you should make the code as
simple as possible - you're saying that sometimes you wouldn't even
bother thinking whether there might be a more readable way of writing
it.

Yes, but if a programmer can read the more complex Regex, he can most
certainly read this almost nothing line.

Again, you're putting it all in black or white - either something being
readable or not. Life doesn't work like that - it's shades of grey.
Something can be "more readable" or "less readable" - "more
maintainable" or "less maintainable".

But you are changing exactly the same thing in this example, whether it is a
parameter or literal.

And that's exactly the kind of thing which happens during maintenance.

I see no brackets in our example (unless you are talking about the second
option).

This section of the thread was talking about your call to Regex.Replace
which used "( )" to replace a single space with something.

And if you feel that way, then you should do it that way. If someone
doesn't and feels that the Regex is just as easy, then it is feasible.

We fundamentally disagree about whether the regex really is "just as
easy" though - and I find it hard to understand, given the arguments
I've used for maintenance.

Can you think of any examples where it would be *easier* to maintain
the regex version? I've given plenty of examples where it would be
easier to maintain the IndexOf version. There may be examples where it
would be easier to maintain the regex version, but I strongly suspect
that when you come up with them, you'll agree that the changes which
would be easier to cope with in the IndexOf version are more likely to
occur.

If you happen to be using VS, I suppose. Now we are talking tools.

What proportion of professional developers *don't* use VS when writing
C#? I suspect it's under 1% - vanishingly small. Anything which helps
development when using VS is therefore useful for almost everyone.

No, don't agree there.

I go with "is there a good solution". And it may be that I would come up
with the C string solution, but it maybe that I come up with the Regex just
as easily. I would probably look at the above Regex that was so complicated
and check to see if this could be done easier with C (or VB if I was working
in that).

The examples you've given show you thinking of regex as a *first* port
of call rather than *after* the simpler solutions have been found
wanting though. That's just a bad idea.

But not in a simple example as this. I would probably go with my
first thought, if it worked and was viable. I would also probably go with
it as it was one line instead of three, especially if it was being used as a
function.

Readability is about *so* much more than saving space. As I said
elsewhere, you could put the IndexOf solution all on one line too, if
you want. Heck, put the whole of each class in one line if you want -
but readability will go down rather than up.

I agree here, but if there is some familiarity with them, I assume they can
see what this example says.

But they would probably have to look up which characters need escaping
when they had to change the string to include something involving
punctuation. You don't have to do that with IndexOf - so it's simpler.

Not necessarily to a junior programmer.

I think the cases where the C code is easier to read for someone who
knows as much C# as C are very, very few.

Right. So they would have to look them up. But that doesn't preclude you
from using them.

No - but I'd think twice about using a relatively obscure class like
Regex (obscure in that it's not something which I typically use all
over the place - by the looks of it, anyone who maintains your code
really *does* have to know about Regex) when a more common class
(String in this case) does the job just as well.

Nothing wrong with that. And I would assume that there are disagreements
with what is "better". I never have a problem with others reading my code
and looking for better ways to program. I just don't necessarily agree that
the other persons way is better.

I have had a running disagreement with Joe Celko (don't know if you know who
he is). He wrote some good Sql Server books and is very knowledgeable in
the subject. Tons more than I am.

But I disagree with him on a couple of issues. One is that he he is dead
set against Camel Case. Says it is harder to read. I disagree. He feels
that code is worse if you use it. I disagree.

I also don't agree with the C style of putting the left bracket at the end
of a line Kernighan and Ritchie style. Makes more sense to put it on the
next line as it is part of a block of code. Have had many discussions on
that point. Matter of style. But not necessarily better.

Yes, bracing and naming is very hard to make absolute judgements on -
and I agree with you on both of these cases. There's little you can put
forward in the way of concrete examples of why one version is better.
That's not the case here though - I've given numerous examples of
situations where changing the code to do something which is on the face
of it very similar (eg changing from looking for "a_b" to "a.b"
requires significantly more work (looking up docs) with the Regex
version than with the IndexOf version.

I agree in some but not in this case.

Because what you are saying is that you really should have 10 programmers
program each piece of each project. That way you can pick which piece is
the simplest (less complex).

There's no need, so long as the single or paired programmer always
bears it in mind.

If you have 2 programmers program the same problem, you are going to get 2
different solutions. One will be simpler than the other. If you have 3
programmers one of the 3 will be simpler and so on and so on ...

If that is your aim.

That would suggest that every line of code I pair program raises an
alternative with my pair - it just doesn't happen. Almost always, we
agree on the simplest course of action.

And no one says you have to use them.

But you should take from those stats the fact that people are *likely*
to run into regular expressions less often than straight string
manipulations - so people will be more familiar with the latter. When
two ways of doing something are equivalent other than in familiarity,
go for the more familiar way.

Then you are lucky you work with that team.

As I say, everyone else I've spoken to about this agrees that using
regular expressions in this case is overkill too.

No your position was that "this" particular example was hard to read and may
not be maintainable by some programmer. I agree with you in other
situations, just not this one.

Not *hard* to read, but *harder* to read (than the IndexOf version).
*Relatively* hard to read.

In this particular example, the IndexOf version is less risky than the
regex version in terms of future maintenance.

The problem here is that you are not leaving "appropriate" up to the
programmer.

I'm defining it as "simplest" - and you're the only person I've found
so far who *doesn't* think that using IndexOf is simpler.

Of course, not because you seem unable to understand the position I've
presented several times.

I understand that you're presenting the two implementations as equally
simple despite my numerous maintenance examples where maintaining the
regex version is harder than maintaining the IndexOf version. You
haven't come up with any counter-examples.

But would you have a problem with the same person if they forgot or
didn't check whether, say, '[' needed escaping? I'd find that a fairly
understandable mistake (although I'd hope that unit tests would show
the problem up).

Click to expand...

And mistakes are not made with C string code?

Sometimes - but more rarely, I believe. A change in what to search for
in the IndexOf case is really easy. It's a bit harder with the regex
version.

But now you are talking about Code review. The problem was that I probably
forgot to take out the Parens when I took out the other piece of code.
Would happen just as easily in C String handlers.

No it wouldn't - because if you didn't have a regex in the first place,
you wouldn't have had the brackets in the first place. Similarly, if
you'd constantly been asking yourself "is there a simpler way of doing
this?" (and come up with the same answer that everyone else I've spoken
to has) then even if you originally had a regex, by the time you got
down to "I'm replacing a space with something else" you'd have changed
to using String.Replace.

Got to go now - will answer the rest of the post later.

Jon Skeet [C# MVP] · Sep 28, 2005

[Continuing from where I left off]

tshad said:
I've been writing C code for 15+ years and still have to look things up.
Just dense, I guess

That didn't answer my question though - surely if there are things you
*don't* need to look up (and can reasonably expect others not to need
to look up), those are likely to be easier to read and maintain, right?

As I said, I think it is as readable as C in this case. I would agree with
you in others.

Do you think it's as readable to *most* people, or just to you
personally, out of interest?

That I said you were wrong in saying you were strongly in favor of avoiding
the less readable/maintainable code.
Right.

But that wasn't the original question. If you do that you are definately
not doing it in one line (which was the question).

Of course you can do it in one line of code. It would just be a very
*long* line.

The question was if there was a way - not "what is the best way".
That doesn't negate your point of view on it. But that wasn't what
was being asked. Obviously, the multiple IndexOf lines were known as
the question was "is there another way in one line to do the same
thing".

In that case, if you look at the response from Nicholas, it doesn't
even answer your question...

And there was.

The problem is that it was portrayed as a *better* way, when I believe
it's a significantly *worse* way.

That was obvious.

But he was answering the question. You weren't.

Look at his post - given your restricted nature of what you view as the
question, he didn't actually answer it.

The question was "is there a way to do this in one command". And then an
example of what was being looked for.

He answered that question and said that was the best bet (IHO).

Nope, he didn't answer the question of whether you could do it with
IndexOf.

So far, you haven't answered that question.

Actually, I think I've stated several times that you can't do it with a
single call to IndexOf, which *does* answer that question.

Creating another subroutine to call - sort of does it. But wasn't what was
really being asked, according to the examples.

So his answer was valid.

See above.

Ok.

It wasn't one statement, not one line.

It would still be one statement, in fact. Do you actually mean you're
after something which is a single method call? If so, that's a pretty
odd criterion to use for choice of implementation, IMO.

Probably not, but possible. But again, that wasn't the question.

But it affects whether Nick's answer was actually correct - whether his
"best bet" statement was true or not.

(I'm hoping Nick's going to be at the MVP summit and I can ask him for
a bit of clarification on this point - I'll let you know if I get to
chat with him.)

You can make anything more complicated than it has to be. The question was
still answered with a Regular Expression. Not whether you can make it more
complicated.

But the answer given stated not just that you *could* do it with a
regular expression, but that a regular expression was the "best bet".
That, to me, is false.

Oh well.

Life is risky.

You're really happy to just shrug your shoulders and introduce risk for
*no* benefit? Crikey.

What????

So now you shouldn't use it because someone might not double-check it?

Absolutely. Something which looks simple but has a hidden twist is
dangerous. It deserves a comment at least. Now, something which
requires a comment to decrease the risk is likely to be worse than
something which doesn't.

Ok. Maybe not.

I really don't see where an engineer is going to "forget" he is dealing with
a regular expression.

You have a lot more faith in developers than I do then. It's very easy
to make simple mistakes when you're perhaps a bit pushed for time. It's
better to reduce the scope of the error in the first place.

Now, even if the engineer remembers, he's quite possibly going to have
to check to see what needs escaping. So even if you don't buy the risk
argument, I can't see how you'd deny that it's making life that little
bit harder for the maintenance team - and, as I keep stressing, for
*no* benefit.

I think we are running out of arguments

Possibly. I'm still struggling to see how you can view the regex as
*not* more complicated. It's inherent in the power of regular
expressions - in order to be able to express very complicated patterns,
some simple patterns have to be made a bit more complex. Because
IndexOf limits itself to straight substring searches, it doesn't have
complicate things at all when all you want is a straight substring
search.

Jon Skeet [C# MVP] · Sep 29, 2005

Jon Skeet said:
(I'm hoping Nick's going to be at the MVP summit and I can ask him for
a bit of clarification on this point - I'll let you know if I get to
chat with him.)

<snip>

Update: I've now met Nick, and we've talked about many things. We
managed to stay on this topic for about a minute before moving onto
something else - it was one of those conversations. I wouldn't like to
trust my memory of the very brief mention of it to say whether or not
he agreed with me on the maintenance point.

tshad · Oct 1, 2005

Jon Skeet said:
<snip>

Update: I've now met Nick, and we've talked about many things. We
managed to stay on this topic for about a minute before moving onto
something else - it was one of those conversations. I wouldn't like to
trust my memory of the very brief mention of it to say whether or not
he agreed with me on the maintenance point.

He probably did.

I haven't had time to finish up our discussion, but will try to get to it
this weekend.

Tom

Is this a known bug in C# VS2005?	2	May 7, 2009
Multiple Delimiter in a single string! How to count those as one?	2	Oct 19, 2006
find and remove a string of a cell value with comma as delimiter	3	Feb 28, 2006
IFF Between...And	2	Jul 8, 2008
vb.net last record in loop	2	Sep 28, 2009
RadioButtonList and Selected	1	Feb 24, 2004
vb.net data	11	Sep 28, 2009
Getting one line in a lisBox	3	Mar 15, 2004

Search for multiple things in a string

tshad

Jon Skeet [C# MVP]

tshad

Jon Skeet [C# MVP]

tshad

Jon Skeet [C# MVP]

tshad

Jon Skeet [C# MVP]

tshad

Jon Skeet [C# MVP]

tshad

Jon Skeet [C# MVP]

tshad

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]

tshad

Ask a Question

Similar Threads