Search for multiple things in a string

  • Thread starter Thread starter tshad
  • Start date Start date
T

tshad

Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","something3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

Thanks,

Tom
 
Tom,

Your best bet would be to use a regular expression. You can use the
classes in the System.Text.RegularExpressions namespace to do this.

Hope this helps.
 
Nicholas Paldino said:
Tom,

Your best bet would be to use a regular expression. You can use the
classes in the System.Text.RegularExpressions namespace to do this.

This would be preferrable to the multiple if tests?

I don't know which is more efficient. Both would have to go back and test
for all the different items.

Thanks,

Tom
Hope this helps.


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

tshad said:
Can you do a search for more that one string in another string?

Something like:

someString.IndexOf("something1","something2","something3",0)

or would you have to do something like:

if ((someString.IndexOf("something1",0) >= 0) ||
((someString.IndexOf("something2",0) >= 0) ||
((someString.IndexOf("something3",0) >= 0))
{
Do something
}

Thanks,

Tom
 
tshad said:
This would be preferrable to the multiple if tests?

I don't know which is more efficient. Both would have to go back and test
for all the different items.

Personally, I'd go for the "if" tests - possibly with a helper method
using a params string array to aid readability - unless the performance
is really a problem, in which case measuring that performance and that
of the regular expressions would be an absolute necessity.

Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.
 
Jon said:
Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.

But they really aren't in this case:

if (Regex.IsMatch(myString, @"something1|something2|something3"))
...

or even, in this special case:

if (Regex.IsMatch(myString, @"something[123]"))
...

I tend to think that regular expressions get hard to read when they are
used to do complicated stuff - and then the alternative is usually not a
"very simple string operation". Part of the reason, though, is that people
don't know it's possible to stretch regular expressions over multiple
lines and even use comments in them. I could rewrite the code above like
this:

string myRegex = @"
something1 | # something1 is our first option
something2 | # something2 would also be fine
something3 # last chance, something3";

if (Regex.IsMatch(myString, myRegex, RegexOptions.IgnorePatternWhitespace))
...

So it's really easy to pick apart the expression and comment the parts - I
don't think it's less readable than any other part of code. You have to
know the language of course, but that's the same for any other programming
language or construct out there.

But you're right about the performance question for simple cases like
this, of course.


Oliver Sturm
 
Oliver Sturm said:
Jon said:
Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.

But they really aren't in this case:

if (Regex.IsMatch(myString, @"something1|something2|something3"))
...

or even, in this special case:

if (Regex.IsMatch(myString, @"something[123]"))
...

I tend to think that regular expressions get hard to read when they are
used to do complicated stuff - and then the alternative is usually not a
"very simple string operation". Part of the reason, though, is that people
don't know it's possible to stretch regular expressions over multiple
lines and even use comments in them. I could rewrite the code above like
this:

string myRegex = @"
something1 | # something1 is our first option
something2 | # something2 would also be fine
something3 # last chance, something3";

if (Regex.IsMatch(myString, myRegex,
RegexOptions.IgnorePatternWhitespace))
...

So it's really easy to pick apart the expression and comment the parts - I
don't think it's less readable than any other part of code. You have to
know the language of course, but that's the same for any other programming
language or construct out there.

But you're right about the performance question for simple cases like
this, of course.

But it is nice to know the options.

BTW, what is the "@" for?

Thanks,

Tom
 
Oliver Sturm said:
Jon said:
Regular expressions are really powerful, but can be much harder to read
than a series of very simple string operations.

But they really aren't in this case:

if (Regex.IsMatch(myString, @"something1|something2|something3"))
...

or even, in this special case:

if (Regex.IsMatch(myString, @"something[123]"))
...

Until, of course, something1 etc start having characters in which need
escaping - how confident would you be that you'd get that right? It's
an extra thing to think about - and I'm sure the real strings aren't
actually "something1" etc.
I tend to think that regular expressions get hard to read when they are
used to do complicated stuff - and then the alternative is usually not a
"very simple string operation". Part of the reason, though, is that people
don't know it's possible to stretch regular expressions over multiple
lines and even use comments in them. I could rewrite the code above like
this:

string myRegex = @"
something1 | # something1 is our first option
something2 | # something2 would also be fine
something3 # last chance, something3";

if (Regex.IsMatch(myString, myRegex, RegexOptions.IgnorePatternWhitespace))
...

So it's really easy to pick apart the expression and comment the parts - I
don't think it's less readable than any other part of code. You have to
know the language of course, but that's the same for any other programming
language or construct out there.

Well, I don't have to learn (or more importantly, remember) *any* extra
bits of language other than C# (which I already need to know) to get it
right with IndexOf, even if the strings I'm looking for contain things
like dots, stars etc. That isn't true for regular expressions.
 
Jon said:
Until, of course, something1 etc start having characters in which need
escaping - how confident would you be that you'd get that right? It's
an extra thing to think about - and I'm sure the real strings aren't
actually "something1" etc.

Aren't you exaggerating a bit here? There are regex testers out there to
help you with building regular expressions and the Regex class itself
knows how to escape special chars - it's not that big a deal.
Well, I don't have to learn (or more importantly, remember) any extra
bits of language other than C# (which I already need to know) to get it
right with IndexOf, even if the strings I'm looking for contain things
like dots, stars etc. That isn't true for regular expressions.

No, it isn't. But you won't get far in today's programming world if you
don't know the first thing about SQL or XML, for example, so I guess
you're not suggesting that one language is enough? I believe that Regular
Expressions are a powerful technology well worth learning - and it's
probably good advice to stay clear of them for anything but the simplest
applications if you're not willing to put in a bit of time to get to know
them.

About IndexOf, as I meant to say already, as long as the problems you're
trying to solve are the kind that can be solved with those simple string
functions (and without resulting in huge algorithms), you'll probably have
the performance argument on your side anyway.


Oliver Sturm
 
Oliver Sturm said:
Aren't you exaggerating a bit here? There are regex testers out there to
help you with building regular expressions and the Regex class itself
knows how to escape special chars - it's not that big a deal.

No, but it's still harder to remember than not having to remember
anything special at all, which is what you get with IndexOf.

In a hurry, I can very easily see someone changing a string literal
from one thing to another, not noticing that as it's a regular
expression, they need to escape part of their new string.

Now, where's the *advantage* of using regular expressions in this case?
No, it isn't. But you won't get far in today's programming world if you
don't know the first thing about SQL or XML, for example, so I guess
you're not suggesting that one language is enough?

No - but I'm suggesting that when one language works perfectly well for
the task at hand, and it's the same language that the rest of your code
is written in, it's easier to stick within that language.
I believe that Regular Expressions are a powerful technology well
worth learning - and it's probably good advice to stay clear of them
for anything but the simplest applications if you're not willing to
put in a bit of time to get to know them.

Regular expressions are absolutely worth learning for where they
provide extra value. In cases like this, where they're only really
providing extra things to remember (what you need to escape, or to call
Regex's own escaping mechanism) I don't think there's any value.
About IndexOf, as I meant to say already, as long as the problems you're
trying to solve are the kind that can be solved with those simple string
functions (and without resulting in huge algorithms), you'll probably have
the performance argument on your side anyway.

Well, I'm much keener on the readability argument than the performance
one - I suspect that the performance difference would rarely be of
overall significance.
 
Jon said:
In a hurry, I can very easily see someone changing a string literal
from one thing to another, not noticing that as it's a regular
expression, they need to escape part of their new string.

In a hurry, all kinds of things can happen when making changes to source
code.
Now, where's the advantage of using regular expressions in this case?

I wasn't saying there was one in the specific scenario the OP introduced.
I was using the example to show that regular expressions don't have to be
any more complicated than simple string operations.
Well, I'm much keener on the readability argument than the performance
one - I suspect that the performance difference would rarely be of
overall significance.

As I'm trying to say all the time, as soon as an implementation reaches a
complexity that makes it worth thinking about regular expressions, I'm
sure an alternative solution based on simple string functions won't be
more readable any longer. I'd even go so far as to say that as soon as
more than one call to a simple string function is needed for a given
problem, most probably I'll find the regular expression solution more
readable. This is, after all, a subjective decision to make.


Oliver Sturm
 
Oliver Sturm said:
In a hurry, all kinds of things can happen when making changes to source
code.

Indeed - but why make it even easier to introduce bugs? Changing a
search from "somewhere" to "somewhere.com" *shouldn't* be something
which requires significant thought, in my view - but it does as soon as
you're using regular expressions.
I wasn't saying there was one in the specific scenario the OP introduced.
I was using the example to show that regular expressions don't have to be
any more complicated than simple string operations.

But there's *always* the added complexity of "do I have to escape this
or not". There are certainly times when the string operations become
more complicated than the corresponding regular expressions (otherwise
they really would be pointless - something I've never suggested), but I
don't believe that's the case here.
As I'm trying to say all the time, as soon as an implementation reaches a
complexity that makes it worth thinking about regular expressions, I'm
sure an alternative solution based on simple string functions won't be
more readable any longer.

Well, Nicholas certainly thought it worth thinking about regular
expressions in this case - do you? (The earlier part of your reply
suggests not, but the bit below suggests you do.)
I'd even go so far as to say that as soon as
more than one call to a simple string function is needed for a given
problem, most probably I'll find the regular expression solution more
readable. This is, after all, a subjective decision to make.

Whereas three calls to IndexOf is *definitely* more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.
 
Jon said:
Indeed - but why make it even easier to introduce bugs? Changing a
search from "somewhere" to "somewhere.com" shouldn't be something
which requires significant thought, in my view - but it does as soon as
you're using regular expressions.

But in any proper real-world use case of regular expressions, there won't
be an expression saying "somewhere" to start with. If the pattern string
doesn't show any trace of wildcards or other recognizable regular
expression features, it should be safe to assume that regular expressions
aren't being used. If a string in some source code I don't know shows
signs of being a match pattern and there's nothing else that tells me
whether it's a regular expression or not, I'll have to look and find it
out, there's no way around that. To be safe in assuming that no string
could ever be a regular expression, regardless of whether it looks like
it, you would have to forbid them completely in your team at least.
Well, Nicholas certainly thought it worth thinking about regular
expressions in this case - do you? (The earlier part of your reply
suggests not, but the bit below suggests you do.)


Whereas three calls to IndexOf is definitely more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.

In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions. I don't know whether the
actual code that the OP is writing might justify regexes better. Anyway, I
was merely using the case to demonstrate the fact that regular expressions
don't have a readability problem, IMHO, or at least they don't need to
have one if used properly.


Oliver Sturm
 
Oliver Sturm said:
But in any proper real-world use case of regular expressions, there won't
be an expression saying "somewhere" to start with. If the pattern string
doesn't show any trace of wildcards or other recognizable regular
expression features, it should be safe to assume that regular expressions
aren't being used. If a string in some source code I don't know shows
signs of being a match pattern and there's nothing else that tells me
whether it's a regular expression or not, I'll have to look and find it
out, there's no way around that. To be safe in assuming that no string
could ever be a regular expression, regardless of whether it looks like
it, you would have to forbid them completely in your team at least.

No - you just have to be careful when you're using regular expressions.
I prefer code which means I don't have to take as much care, because
being human, sooner or later I'll be careless. The fewer possibilities
I have for carelessness actually causing an error, the better.

I know I couldn't off the top of my head list all the characters which
need escaping for regular expressions - could you *and* every member of
your team?
In this case, as far as it's described by the sample we've seen, I
wouldn't favor the usage of regular expressions.

Even though it's more than one call to a simple string function?
I don't know whether the
actual code that the OP is writing might justify regexes better. Anyway, I
was merely using the case to demonstrate the fact that regular expressions
don't have a readability problem, IMHO, or at least they don't need to
have one if used properly.

They have a readability problem compared with simple operations - they
require more care than simple literals. To me, "more care required"
means "lower readability and maintainability", which is a problem.

I'm not saying they're hideously unreadable - just *less* readable.
That's enough for me.
 
Jon said:
I know I couldn't off the top of my head list all the characters which
need escaping for regular expressions - could you and every member of
your team?

I think I might, they are not really as many as you think. But that's not
the point; I use a testing tool when I create a larger expression and I
most probably use it again when I make changes. I have comments on my
regular expressions telling me what they do, what sample input and output
is. The first thing that's important is just that someone has to recognize
a regular expression when he encounters it, you're right about that.
Even though it's more than one call to a simple string function?

Probably... the number of calls is not really what counts, is it?
Sometimes, string parsing algorithms that don't make use of regular
expressions involve several nested loops, several temporary variables and
just a single call to a simple string function. Yet these beasts can be
horrible because it takes only a short while until even the author can't
reliably remember what the algorithm does.

I won't contest the fact that three lines of code, calling IndexOf three
times, are probably a better alternative to a regular expression.
They have a readability problem compared with simple operations - they
require more care than simple literals. To me, "more care required"
means "lower readability and maintainability", which is a problem.

Well, let's agree to disagree. I'm still trying to make the point that the
comparison with simple string literals is a bad one, because the two won't
ever be equal alternatives in any real world problem situation. Use the
simple operations as long as it makes sense, but don't hesitate to look at
other solutions because you think someone else on the team might make a
mistake changing a string literal later on.
I'm not saying they're hideously unreadable - just less readable.
That's enough for me.

Jon, I'm with you most of the way. But there's a limit to the demand for
readability, as I see it. I'm not likely to turn down a useful technology
in cases where it is practically without alternatives because the solution
doesn't please me aesthetically.


Oliver Sturm
 
Oliver Sturm said:
I think I might, they are not really as many as you think. But that's not
the point; I use a testing tool when I create a larger expression and I
most probably use it again when I make changes. I have comments on my
regular expressions telling me what they do, what sample input and output
is. The first thing that's important is just that someone has to recognize
a regular expression when he encounters it, you're right about that.

Absolutely - especially when your tests may well not catch the problem.
For instance, if you have a search for "jon.skeet", are you going to
write a test to make sure that "jonxskeet" doesn't match? Unless you
actually know what to avoid (in which case you're likely to have
written it correctly in the first place) the test may well not pick up
on a missed character which needs escaping.
Probably... the number of calls is not really what counts, is it?

I was only going by what you'd said previously:

<quote>
I'd even go so far as to say that as soon as more than one call to a
simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.
Sometimes, string parsing algorithms that don't make use of regular
expressions involve several nested loops, several temporary variables and
just a single call to a simple string function. Yet these beasts can be
horrible because it takes only a short while until even the author can't
reliably remember what the algorithm does.
Absolutely.

I won't contest the fact that three lines of code, calling IndexOf three
times, are probably a better alternative to a regular expression.

Goodo :)
Well, let's agree to disagree. I'm still trying to make the point that the
comparison with simple string literals is a bad one, because the two won't
ever be equal alternatives in any real world problem situation.

I don't see how you can say that when using regular expressions was one
suggested solution, and using IndexOf was another suggested solution.
Use the simple operations as long as it makes sense, but don't
hesitate to look at other solutions because you think someone else on
the team might make a mistake changing a string literal later on.

If the other solution is likely to be fundamentally simpler, I'm all
for that. It was this particular situation that I was commenting on,
and the general comment that regular expressions are often used as a
sledgehammer to crack a pretty flimsy nut.
Jon, I'm with you most of the way. But there's a limit to the demand for
readability, as I see it. I'm not likely to turn down a useful technology
in cases where it is practically without alternatives because the solution
doesn't please me aesthetically.

Me either - but where there *is* a practical alternative which is more
readable, I'll go for that. If you only have one solution, you *can't*
turn it down really, can you? (Unless you can forego the feature which
requires it, of course, which is unlikely.)
 
Jon said:
I was only going by what you'd said previously:

<quote>
I'd even go so far as to say that as soon as more than one call to a
simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.
</quote>

I know I said that and I know you were referring to it. But I meant one
call as in "one call at runtime", as opposed to "one line of code that
makes the call".
I don't see how you can say that when using regular expressions was one
suggested solution, and using IndexOf was another suggested solution.

Sorry, I meant "simple string operations". And I meant that I wouldn't
consider using a regular expression if an IndexOf could do the job just as
well - the two are no equal alternatives because I wouldn't seriously
consider one of them.
If the other solution is likely to be fundamentally simpler, I'm all
for that. It was this particular situation that I was commenting on,
and the general comment that regular expressions are often used as a
sledgehammer to crack a pretty flimsy nut.

You're right about that. Complex technologies tend to be misused more
often than simple ones, don't they?
Me either - but where there is a practical alternative which is more
readable, I'll go for that. If you only have one solution, you can't
turn it down really, can you? (Unless you can forego the feature which
requires it, of course, which is unlikely.)

Well, usually someone will come forward with other solutions, however
far-fetched. One that can actually be quite a good alternative to more
complex regular expression scenarios is writing a parser - or rather,
using a compiler compiler to create one. But in my experience there's a
lot of room for nicely written regular expressions, somewhere between a
few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)


Oliver Sturm
 
Oliver Sturm said:
I know I said that and I know you were referring to it. But I meant one
call as in "one call at runtime", as opposed to "one line of code that
makes the call".

Not quite with you there - in this case, there would be three calls at
runtime, and three lines of code.
Sorry, I meant "simple string operations". And I meant that I wouldn't
consider using a regular expression if an IndexOf could do the job just as
well - the two are no equal alternatives because I wouldn't seriously
consider one of them.

Right - but unfortunately (IMO) other people do.
You're right about that. Complex technologies tend to be misused more
often than simple ones, don't they?
Absolutely...


Well, usually someone will come forward with other solutions, however
far-fetched. One that can actually be quite a good alternative to more
complex regular expression scenarios is writing a parser - or rather,
using a compiler compiler to create one. But in my experience there's a
lot of room for nicely written regular expressions, somewhere between a
few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)

Oh certainly. I'm really *not* trying to suggest that regular
expressions should never be used - just that they shouldn't be the
first port of call as soon as you need to do anything with a string :)
 
Jon said:
Not quite with you there - in this case, there would be three calls at
runtime, and three lines of code.

And in this case I would be prepared to see things differently - I said
already that I don't believe in call counting. But the sentence you quoted
was meant more in the context of the problem I was describing, where
simple string functions are used as a part of a, possibly hugely
complicated, larger algorithm.

As soon as there are loops involved, which may or may not result in a
single line with such a call being executed multiple times, things start
getting complex very quickly in my experience. How often have you been
sitting there with the debugger running, counting characters in a string
to find that one-off problem somebody introduced? I'll take an enormously
unreadable regular expression over that task any day :-)



Oliver Sturm
 
Jon Skeet said:
Indeed - but why make it even easier to introduce bugs? Changing a
search from "somewhere" to "somewhere.com" *shouldn't* be something
which requires significant thought, in my view - but it does as soon as
you're using regular expressions.


But there's *always* the added complexity of "do I have to escape this
or not". There are certainly times when the string operations become
more complicated than the corresponding regular expressions (otherwise
they really would be pointless - something I've never suggested), but I
don't believe that's the case here.


Well, Nicholas certainly thought it worth thinking about regular
expressions in this case - do you? (The earlier part of your reply
suggests not, but the bit below suggests you do.)


Whereas three calls to IndexOf is *definitely* more readable than a
regular expression which, depending on the strings involved may well
need to involve escaping.
Escaping?

You've mentioned that as being a problem a couple of times.

What do you mean by this?

Are you talking about stopping if you find the first one matching?

Thanks,

Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top