Search a String

S

S

Any idea on how I would be able to do a search within C# that does
ranges or words


For example

I want to search for Chicken in the string


string s1 = "This is Great Chicken";
string s2 = "Chicken";
return s1.IndexOf(s2) >= 0

The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

string s1 = "This is Great Chicken";
string s2 = "Chickin";

return ???

or

Worst yet
string s1 = "This is Great Chickin";
string s2 = "Chicken";

Person enter the search phrase misspelled the word Chicken;

string s1 = "This is Great Chickin";
string s2 = "Chicken";
 
A

Author

Any idea on how I would be able to do a search within C# that does
ranges or words

For example

I want to search for Chicken in the string

string s1 = "This is Great Chicken";
string s2 = "Chicken";
return s1.IndexOf(s2) >= 0

The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

string s1 = "This is Great Chicken";
string s2 = "Chickin";

return ???

or

Worst yet
string s1 = "This is Great Chickin";
string s2 = "Chicken";

Person enter the search phrase misspelled the word Chicken;

string s1 = "This is Great Chickin";
string s2 = "Chicken";

Check out the Regex namespace.

If you search something such as "Chickin" in a string which does not
contain this token, such as "This is Great Chicken", what would you
expect to return? I wouldn't expect it to return anything. Maybe you
wanna do something like google's key words suggestion if user entry
has some typos?
 
I

Ignacio Machin ( .NET/ C# MVP )

Any idea on how I would be able to do a search within C# that does
ranges or words

For example

I want to search for Chicken in the string

string s1 = "This is Great Chicken";
string s2 = "Chicken";
return s1.IndexOf(s2) >= 0

The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

string s1 = "This is Great Chicken";
string s2 = "Chickin";

return ???

or

Worst yet
string s1 = "This is Great Chickin";
string s2 = "Chicken";

Person enter the search phrase misspelled the word Chicken;

string s1 = "This is Great Chickin";
string s2 = "Chicken";

Hi,

I do not how to do it in code, I'm pretty sure that no version of the
framework provides such a feature.
In T-SQL you have this feature though take a look for SOUNDEX function
 
I

Ignacio Machin ( .NET/ C# MVP )

Check out the Regex namespace.

If you search something such as "Chickin" in a string which does not
contain this token, such as "This is Great Chicken", what would you
expect to return? I wouldn't expect it to return anything. Maybe you
wanna do something like google's key words suggestion if user entry
has some typos?



I do not know how to use a Regex in this case.can you elaborate?
 
J

Jeff Johnson

The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

The answer is that what you're looking for is not at all simple. I'd
recommend you do a Google search to see if you can find sample code for a
spell checker.
 
A

Arto Viitanen

Jeff Johnson kirjoitti:
The answer is that what you're looking for is not at all simple. I'd
recommend you do a Google search to see if you can find sample code for a
spell checker.
Spelling checker does not help. For example the text contains "Cut" and
the user searches for "Cat". Both are spelled right, so the search gives
no result.

But, with addition to already mentioned Soundex, you might like to check
Levenshtein distance
(http://en.wikipedia.org/wiki/Levenshtein_distance). It tells how
similar two texts are; i.e. how many simple edit commands are needed to
make the second text from the first. For example distance between
Chicken and Chikin is one.

About Soundex: I have not studied it much, but I have a feeling Knuth
built it for english words. How well does it apply to other languages
or names?
-
Arto Viitanen
 
J

Jeff Johnson

Spelling checker does not help. For example the text contains "Cut" and
the user searches for "Cat". Both are spelled right, so the search gives
no result.

I agree, IF you say, "Hey, Mr. Spell Checker algorithm, tell me what's
misspelled in this sentence." However, if you are in control of the code,
I'd think you'd be able to say, "Hey, Mr. Spell Checker algorithm, make a
list of all the probable misspellings of 'chicken' and tell me if this
string contains any of them." See where I'm going?
But, with addition to already mentioned Soundex, you might like to check
Levenshtein distance
(http://en.wikipedia.org/wiki/Levenshtein_distance). It tells how
similar two texts are; i.e. how many simple edit commands are needed to
make the second text from the first. For example distance between
Chicken and Chikin is one.

Sounds good.
About Soundex: I have not studied it much, but I have a feeling Knuth
built it for english words. How well does it apply to other languages
or names?

I've heard people say that Soundex is an ancient crutch and shouldn't be
relied upon too much.
 
F

Fernando A. Gómez F.

Peter said:
Any idea on how I would be able to do a search within C# that does
ranges or words

There are a variety of techniques for dealing with partial matching as
you describe.

I've actually implemented a very simple approach myself in the past, in
which I do a sort of character-by-character merge between the target
text and the search word, finding the spot in the target text where the
search word fits the best with the least number of excluded characters.
This would work perfectly for the examples you gave, but whether that
would address your more general goals I can't say.
[SNIP]
Pete

That sounds interesting. Is there any chance of seeing it as a
CodeProject article? :D
 
F

Fernando A. Gómez F.

Peter said:
[...]
I've actually implemented a very simple approach myself in the past,
in which I do a sort of character-by-character merge between the
target text and the search word, finding the spot in the target text
where the search word fits the best with the least number of excluded
characters. This would work perfectly for the examples you gave, but
whether that would address your more general goals I can't say.
[SNIP]
Pete

That sounds interesting. Is there any chance of seeing it as a
CodeProject article? :D

Nope. But I'll look for the code and try to remember to post it here.

After that, someone else can post it to CodeProject if they like.

That would be nice, thank you.
Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute force
worked great. I haven't looked at the Levenshtein article yet, but I
suspect that the algorithm described there is similar to what I did, but
with some actual thought and cleverness applied to it so that it has a
reasonable computational cost. :)

I see. However perhaps it could be improved. It would be interesting, at
least, to see the approach you used, so if you have the time and mood,
I'd be most grateful if you could post such code.

Thanks. Best Regards.
 
F

Fernando A. Gómez F.

Peter said:
[...]
Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute
force worked great. I haven't looked at the Levenshtein article yet,
but I suspect that the algorithm described there is similar to what I
did, but with some actual thought and cleverness applied to it so
that it has a reasonable computational cost. :)

I see. However perhaps it could be improved. It would be interesting,
at least, to see the approach you used, so if you have the time and
mood, I'd be most grateful if you could post such code.

Here you go (see below). Note that in my original problem, I was
comparing two complete strings and producing a metric that indicated how
similar the strings were, rather than searching for one string in
another. But you can easily modify this approach by using the same
basic technique, but only looking at substrings of the string being
searched.

Though, actually...now that I look at this implementation again, I think
you could probably feed it a string to be searched and a string to look
for, and it would work okay. It could even be modified to return the
character index of the starting point of the best match for the string
being looked for. You'd want to limit how far it looks ahead, maybe
simply by delimiting attempted matches, or otherwise take into account
gaps within the matched characters, because otherwise it will consider
itself to have successfully matched the string even if each character it
found was actually just located in a different word. But other than
that, it's probably pretty close to what could work.

The basic idea below is to start out doing a basic string compare, but
when reaching a character that doesn't match, scan forward in each
string to try to match the current character in the other string (for
two scans each mis-match). If a matching character can be found,
continue trying to match characters again, looking for the point at
which a comparison of the remaining characters returns the highest
"score". In this context, the score is the number of characters that
the algorithm could match before running out of characters in one of the
strings.

You'll note that the algorithm basically enumerates all of the possible
ways that the strings could match. There's essentially a binary search
tree that bifurcates each time there's a character mis-match, which
means that the cost of the algorithm has a worst-case cost that is
exponential by the length of the longest search. In practice, it will
hit the worst-case scenario only rarely, but it's at least worth
understanding, especially since you don't have to get very far into the
algorithm for even an average-case scenario to be relatively expensive
as compared to some other algorithm that might be designed more optimally.

Like I said, I was dealing with such short strings, I made no attempt at
all to be clever. It worked very well for my needs at the time, but I
make no promises about anyone else's. :)

Pete

Thanks a lot Pete, it looks interesting.

Regards,
Fernando.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top