tshad said:
Yes, but so are dolphin sounds.
When I talk about a Programming Language - I am talking about a Procedural
Language (C, Fortran, VB, Pascal, etc.).
So you wouldn't regard LISP as a programming language, just because
it's functional rather than procedural?
Of course, you didn't even specify "programming language" before.
Regular expressions form a language in computing, and that language
needs to be learned before being used, just as any other language does,
whether it's C#, HTML, XPath or VB.NET.
And the Regex version:
if (Regex.IsMatch(myString, @"something1|something2|something3"))
Right. Immediately the IndexOf value is more readable, by more clearly
separating the three separate strings which are being searched on.
(Oliver Sturm's version is more readable than that
Again, why do I need a compelling reason. If I have the solution and it
happens to be Regex, I would use it, I wouldn't necessarily say to myself -
"Is there perhaps a more readable way to write this? I wonder if Jim will
be able to read this or not."
Then I'm afraid that's your problem. It sounds like you're basically
admitting that you're not that interested in readability. Personally, I
like writing code which is elegant but easy to maintain. Having *a*
solution which happens to work isn't enough when there are obviously
others available which could well be simpler.
Far more time is spent maintaining code than writing it in the first
place. Taking the attitude you take above just isn't cost-effective in
the long run.
No.
No pushing. No more than your pushing not using it.
But I'll readily admit to pushing the (IMO simpler) solution, for this
particular situation. So are you actually admitting that you *are*
pushing the use of regular expressions here?
Actually, nothing. It is grouping a " ", which isn't necessary. I think I
used to have something else there and took it out and didn't realize I
didn't need the ().
So again, the code could be made more readable even by just modifying
the existing regex replacement, let alone by replacing the regular
expressions with simple String.Replace calls. Had they been
String.Replace calls, the meaning of the second line would have been
unambiguous - you'd have had to write it the simple way to start with.
Note that your first replacement will replace two tabs with a single
space, but leave one tab alone, by the way. It would be better to
replace "\s+" with the space, IMO.
Obviously, you didn't need to look this one up either - as you were correct.
It is just grouping a blank.
I have had to look it up if you hadn't been answering the question
though. Why make the code harder to understand in the first place? If
you want to replace a space with " or ", just use
keywords = keywords.Replace (" ", " or ");
Much more straightforward.
Just that you don't want to Regex as it is not easily readable. Neither are
Regex.
Eh?
But the fact a junior programmer might not understand Objects as you do
would not prevent you from writing them, would you?
When using C#, one has to use objects. I will almost always try to
implement the simplest solution to a problem, unless there is a
compelling reason to use a more complex solution. That way, anyone
reading the code has to learn relatively little "extra" stuff beyond
the language itself.
You don't have to use lots of things. That doesn't make them invalid.
Neither is the fact that you use Foreach vs For {}. They are there and are
part of the mix as is Regex.
No, they really aren't. for and foreach are well-defined in the C#
language specification. If the program is in C# to start with, it is
reasonable to assume competency in C# on the part of the reader of the
code. It is *not* reasonable to assume competency in regular
expressions, and while that wouldn't prevent me from using regular
expressions where they provide value, they just *don't* here.
I might agree with you more if Regex were some
component that you picked up and added. Or if Regex were some obscure
technique that few knew about. They have been around for quite a long time
and is just another gun in your arsenal. If I thought that MS were
deprecating it, I would also think twice about using it. But it is part of
.Net that all the languages can make use of and I would never tell a
programmer, who may be really comfortable with it and uses it responsibly
(not obscure cryptic non-commented code), that he should be using IndexOf
instead.
Clearly not, as you seem to be keen on using them instead of simple
string manipulations all over the place - if I saw anyone using regular
expressions rather than String.Replace in the way you've shown in other
code posts, that code would never get through code review.
I agree with part of that and think that regular expressions are just as
important to know.
Why? I'm working on a fairly large project which hasn't needed to use
regular expressions and wouldn't have benefitted from them once. I
suspect many people could say the same thing. I suspect very few if any
of them could say the same thing about the basic string manipulation
methods - and yet you were surprised to see that one could call Replace
on the result of another Replace method call, which I'd consider a far
more "basic" level of understanding than knowledge of regular
expressions.
As we have been saying, it is here and many people use it, so to not
understand it is to limit yourself.
It's one thing to understand the general power of regular expressions,
so you would know when they may be applicable - it's another thing to
use them when they serve no purpose beyond what can be more simply
achieved with the simple String methods.
You don't have to use it, but you should at least understand the
basics of how it works. What are you going to do when someone uses a
RegularExpressionValidator and you don't understand what the
expression is?
At that point, if I didn't understand the regular expression, I'd look
it up in the documentation. Do you know every part of regular
expression syntax off by heart?
The fact that it is not C# (neither is a textbox, datagrid, etc),
doesn't mean you should understand them. Whether you use them is up
to you.
As you point out, you are not the only programmer and many programmers like
to use Regex and that doesn't make them any lesser programmers. What are
you going to when you run into their code?
If they're on my team, I'll tell them to refactor their code to only
use them when they're appropriate, frankly.
I see code all the time (much of the time it is mine) and wonder why the
programmer didn't do it another way. There are many ways to skin a cat.
Sometimes it is just style, sometimes it is all they know. But if they
follow whatever standards are setup (and in your case maybe you forbid
Regex) then as long as the code is well written and clean - I have no
problem with it.
If code uses regular expressions when they serve no purpose, it is
*not* well written and clean though - it is less maintainable than it
might be.
I agree there.
Which is easier to write is obviously your perception. I found my example,
as easy as yours to write and just as readable.
And you believe that everyone else does? Again, bear in mind that
you're unlikely to be the only person ever to read your code.
Keep regular expressions out of my code?????
So now you are saying there is no use for it?
Not at all - I'm saying that you shouldn't put regular expressions in
your code just for the sake of keeping your hand in. Use them where
they're applicable, and only there.
Sure.
If it is valid. As I said there are many ways to skin ..., depending on the
situation I may do it one way and the next time another way. Gives me many
options. I don't do it willy nilly, as you seem to suggest, as a test
bench.
But that's *exactly* what you've suggested you should do with regular
expressions - use them even when there's no real purpose in doing so,
just so that you remember what they look like.
I am not. I don't memorize. But I still use it.
Okay, so you don't memorise it, which means you *do* have to look up
which characters require escaping. I think you've just admitted that
your code is less maintainable than mine.
No you are very clear. If you are so concerned with others being able to
read your code and problems with escape characters - why would you EVER want
them to use them. You can't have it both ways.
I would use them when the solution which uses regular expressions is
clearer than the solution which doesn't use them. It seems a pretty
simple policy to me.
If they would have a hard time with a nothing expression like "if
(Regex.IsMatch(myString, @"something1|something2|something3"))" - they are
never going to get some of the of the other standard Regex solutions I
mentioned before.
Those maintaining the code could no doubt understand it after looking
at it for a little while, just like they could work out your other
regular expressions after looking at them and consulting the
documentation - but why are you trying to make their jobs harder? Why
are you not concerned that the code you're writing is costing your
company money by making it harder to maintain than it needs to be?
As you said, the two solutions are equal. Your solution is that you MUST go
with IndexOf. Mine is you can use either.
Well, they're equal in terms of their semantics. They're definitely not
equal in terms of maintainability, and as that's important to me, I
don't see what's wrong with saying that I'm very strongly in favour of
avoiding the less readable/maintainable code.
I wasn't referring to this particular issue when I said this.
It would have been nice if you'd indicated that. Do you agree then that
it doesn't actually take any more brainpower to come up with
String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
brainpower when it comes to maintaining the IndexOf solution?
I never said I was not familier with IndexOf.
As a matter of fact, the original question was given whether you could "do a
search for more that one string in another string".
And of course the answer is "yes, by calling IndexOf multiple times".
****************************************************************
Can you do a search for more that one string in another string?
Something like:
someString.IndexOf("something1","something2","something3",0)
or would you have to do something like:
if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}
***************************************************************************
IndexOf doesn't do it. This was the original question. You have to do
multiple calls as is said in the original question. Nicholas was correct in
his assessment. One Regex call would work.
Yes, as would a single call to a method which called IndexOf on the
string multiple times. I disagree with you - Nicholas wasn't correct in
his assessment, as he claimed that the "best bet" would be to use a
regular expression. Using regular expressions is just *not* the best
bet here - it requires more effort, as I've described repeatedly.
Okay - now suppose I need to change it from searching for "something1"
to "something.1" or "something[1]". How long does it take to change in
each version? How easy is it to read afterwards?
That wasn't the question.
Are you suggesting that maintainability isn't something that should be
considered? Do you *really* want to look for "something1",
"something2" and "something3" or were they (as I suspect) just
examples, and the real values could easily have dots, brackets etc in?
What if you wanted to change "something1" to "something\". Same problem.
Well, half the problem with IndexOf than it is with regular
expressions. With regular expressions, you'd need to know that not only
does backslash need escaping in C#, it also needs escaping in regular
expressions.
IndexOf: "something\\" or @"something\"
Regex: "something\\\\" or @"something\\"
Once again, the IndexOf version is easier to understand - there's less
to mentally unescape to work out what's actually being asked for.
And if escapes were a problem (if it were me) I would have a little sheet
that showed them at my desk within easy reach.
Whereas by needing to know less (just the C# escapes) it's really easy
to memorise everything I need to know to solve this situation.
As you can with Regular Expressions.
Well, Oliver Sturm has shown a more readable version, but you seem to
be keen on the "put them all in the same line" version.
Neither is as readable as the String.IndexOf version, however.
Yup, but it's something that isn't used in string literals other than
for regular expressions. It's an extra thing to bear in mind
unnecessarily.
Absolutely not! It's significantly easier to spot the three separate
values when they're three separate strings than when they're all mashed
together.
You are now leaving the original question. I never said that Regular
Expressions was the better (or not better) in all cases.
While I'm leaving the exact original question, it's far from out of the
question that the original code wouldn't need to be changed to use a
variable to be searched for some time. At that point, can you guarantee
that your team would get it right? They'd need to be on their guard
when using regular expressions - they wouldn't need to be on their
guard using IndexOf.
Why use them at all? It isn't readable.
They aren't as readable *in this case*. In other, more complicated
situations, the version which only used IndexOf would be harder to read
than the regular expression version.
Using a regular expression is like getting a car compared with walking
somewhere - it's absolutely the right thing to do when you're going on
a long journey, but in this case you're advocating getting in a car
just to travel to the next room. It's simpler to walk.
And if your programmers can't maintain the simple Regexs, they definately
won't be able to handle the more complicated ones.
You seem to fail to grasp the "make it as simple as possible" concept.
It's not a case of maintenance engineers being idiots - it's about
presenting them with fewer possible risks. Why leave them a trap to
fall into when you can write simpler code which is easier to change
later on?
Not in this specific case. I was never maintaining or pushing Regex for all
or any situations.
But you're pushing for regular expressions in *this* situation, or at
least saying it's just as good as using IndexOf. You've also shown in
your other code that you use regular expressions unnecessarily for
replacement, making a simple two-step replacement into a complicated
single-step replacement where the number of characters which *aren't*
just plain text is greater than the number of characters which are.
But I am not going to force my programmers to come to me to find out whether
or not Regex is the easiest way or not. That is up to the programmer. If
there is a problem with their code and feel the programmer is way off base
in his coding we would talk about (that would be the case with his C#, VB or
Regex code).
Using regular expressions in this case *is* a problem with their code,
IMO. It's just asking for trouble later on.
If you knew enough to know about Regex at all (which you said you would have
no problem with in some situations - so the programmers better be able to
read it), there should not be a problem with the 2 special characters which
is the same as C#. There is nothing obscure in this example - that I can
see.
Of course there is - to work out what's going on, you've got to
mentally unescape the dollar and the comma, but *not* mentally unescape
the |. All that rather than just "replace dollar with space, replace
comma with space" in a simple form with no hidden meanings to anything.
Right. So in the C#, you'd either have to have more escapes, or make
them verbatim literals. More stuff to get right. Note how no escaping
at all is required in my version.
But according to you, you shouldn't use them as some of the programmers may
not be able to maintain it.
Definately if they would have a problem with our example.
Can't have it both ways. If you allow Regular Expressions, you shouldn't
have a problem if a programmer used the Regex or IndexOf in our example.
Anyone maintaining the "USEFUL" ones would have zero problems with this one.
How very black and white of you. Do you really have no concept of
someone being able to understand something, but having a harder time
understanding it one way than the other?
I was obviously talking about Regular Expressions in general here as I was
refering to the standard ones you can get anywhere dealing with (Phone
numbers, credit card etc). There would be none in this case, obviously.
But there may be in more complicated cases.
Yes - the complicated cases where I've already said that regular
expressions are useful!
Yes, it's still going to be harder to search for "some\thing" than
"something". However, it's *not* going to be harder to search for
"some.thing", or "(something)", or "[something]", or "some,thing", or
"some*thing" or "some+thing" etc. Furthermore, there's still going to
be less to remember when you *are* faced with searching for
"some\thing" than there would be using regular expressions.
Just as the Regular expressions are. And again, if
you are going to allow Regex at all, you would still need to know about the
escapes.
You'd need to know about the escapes where regular expressions are
used. The fewer places they're used, the fewer times someone will need
to look them up in the documentation.
Again - then don't allow them at all.
No, just allow them where they make sense. Note that if you only use
them where they're going to be doing something fairly involved, it's
much less likely that an engineer will forget that he's actually
dealing with a regular expression than with a simple string.
I guess that is where we disagree.
It certainly sounds like it.
But regular expressions are by their very nature more complicated than
a simple String.IndexOf call. If they weren't they wouldn't be as
powerful as they are.
I see no risk in the example we are talking about. At least, no more that
in the IndexOf solution (in this situation).
You don't think there's any risk that someone will forget one of the
regular expression characters which needs escaping? There is no string
you could need to search for which needs *less* escaping in regular
expressions than with String.IndexOf, but there are *lots* of strings
which need more escaping - thus there's more overall risk.
I bet if I showed my code to a random sample of a hundred C# developers
and asked them to change it to search for "hello[there]", virtually all
of them would get it right. I also bet that if I showed your code to
them and asked them for the same change, some would fail to escape it
appropriately. Do you disagree?
No. But then the same developers would have a problem with the more
complicated expressions you claim is useful.
Actually, the fact that they were presented with a complicated
expression would immediately make them wary, I suspect. Problems tend
to creep in when something *looks* simpler than it actually is - as is
the case here.
Since I am not sure why you would use the first, I would do the 2nd.
You'd use the first to keep up your knowledge of reflection, of course.
After all, if you don't use it, you lose it, right? That's your
argument for using regular expressions where they're completely
unnecessary and provide no benefit, after all.
But in our case, I would still use either - as I see the Regex version as
easy as the IndexOf.
I think we'll have to agree to disagree. You seem to be unable to grasp
the idea that there are more potential pitfalls and more knowledge
required for the regular expression version than for the IndexOf
version.