C# String Comparison, IndexOf and Related

  • Thread starter Thread starter BILL
  • Start date Start date
B

BILL

Hi Everyone,

I've been looking through these .NET groups and can't find the exact
answer I want, so I'm asking.

Can someone let me know the best way (you feel) to search a C# string
for an occurance of a CASE INSENSITIVE substring, returning the found
position. I'm speaking of larger strings to search as well ~50K-500K.
Here's what I have so far:

* ToUpper/ToLower and IndexOf would be quite slow, right? as strings
are immutable and these search strings are larger to begin with.

* RegEx could be the answer, but I'm not sure pattern matching would
be the right solution for this problem

* Any unsafe code, Boyer-Moore using pointers or inline assembly (if
that's possible), would seem the best, but well, it's unsafe code

* I've found a MapTable example here in the C# nj (thanks maptable
person), and think this might be the best solution

Any help is appreciated, thanks in advance!
BILL
 
Bill,
I did some tests. I created a 5 MB file and loaded it into a
streamreader. I assigned all of the text from the file into a string object.
I did a tolower and it returned the index of the specified substring
immediately. I also used some of the globalization classes that allows you
to do indexof with an ignorecase parameter. That also returned the index
immediately. I don't have any numbers as far as time that it took to run but
during debugging it literally stepped over the line of code doing the
comparison with no pause whatsoever.

here is the globalization code. I used a very simple text comparison below.

CultureInfo culture = new CultureInfo("en-us");

int index = culture.CompareInfo.IndexOf("this is a
TEST","test",System.Globalization.CompareOptions.IgnoreCase);

HTH
 
Thanks Lateralus - although I was a bit skeptical of the results,
after doing similar tests, I think I've changed my thinking on the
matter. I ran some IndexOf/ToUpper and related code on a few older
boxes I have here (eg, 500Mhz AMD, 512M) and didn't see any real
performance degradation either.

So - here's my question to everyone- if I'm not looking to do
heavy-duty work with these strings I think I'm best off using .NET
methods. The original question might have resulted from my being
trained as an anal-C++-guy, if so ... sorry all :)


Lateralus said:
Bill,
I did some tests. I created a 5 MB file and loaded it into a
streamreader. I assigned all of the text from the file into a string object.
I did a tolower and it returned the index of the specified substring
immediately. I also used some of the globalization classes that allows you
to do indexof with an ignorecase parameter. That also returned the index
immediately. I don't have any numbers as far as time that it took to run but
during debugging it literally stepped over the line of code doing the
comparison with no pause whatsoever.

here is the globalization code. I used a very simple text comparison below.

CultureInfo culture = new CultureInfo("en-us");

int index = culture.CompareInfo.IndexOf("this is a
TEST","test",System.Globalization.CompareOptions.IgnoreCase);

HTH
--
Lateralus [MCAD]


BILL said:
Hi Everyone,

I've been looking through these .NET groups and can't find the exact
answer I want, so I'm asking.

Can someone let me know the best way (you feel) to search a C# string
for an occurance of a CASE INSENSITIVE substring, returning the found
position. I'm speaking of larger strings to search as well ~50K-500K.
Here's what I have so far:

* ToUpper/ToLower and IndexOf would be quite slow, right? as strings
are immutable and these search strings are larger to begin with.

* RegEx could be the answer, but I'm not sure pattern matching would
be the right solution for this problem

* Any unsafe code, Boyer-Moore using pointers or inline assembly (if
that's possible), would seem the best, but well, it's unsafe code

* I've found a MapTable example here in the C# nj (thanks maptable
person), and think this might be the best solution

Any help is appreciated, thanks in advance!
BILL
 
Bill,
I can understand where youre coming from. Whenever our applications need
heavy string manipulation on large amounts of data we would always write the
dll in C++. There is nothing scientific about my next statement because I
never ran any "true" tests. We had a c++ dll that would manipulate large
strings up to 10MB in size. When it was rewritten in c# we didn't notice any
degredation in the speed becides it's first time executing since it gets
compiled the first time. So basically I found that the systems I've worked
on there is no need to turn to C++ as there was in the past. Of course there
are going to be times that you will need to, but for this one I think you're
ok with C#.

--
Lateralus [MCAD]


BILL said:
Thanks Lateralus - although I was a bit skeptical of the results,
after doing similar tests, I think I've changed my thinking on the
matter. I ran some IndexOf/ToUpper and related code on a few older
boxes I have here (eg, 500Mhz AMD, 512M) and didn't see any real
performance degradation either.

So - here's my question to everyone- if I'm not looking to do
heavy-duty work with these strings I think I'm best off using .NET
methods. The original question might have resulted from my being
trained as an anal-C++-guy, if so ... sorry all :)


Lateralus said:
Bill,
I did some tests. I created a 5 MB file and loaded it into a
streamreader. I assigned all of the text from the file into a string
object.
I did a tolower and it returned the index of the specified substring
immediately. I also used some of the globalization classes that allows
you
to do indexof with an ignorecase parameter. That also returned the index
immediately. I don't have any numbers as far as time that it took to run
but
during debugging it literally stepped over the line of code doing the
comparison with no pause whatsoever.

here is the globalization code. I used a very simple text comparison
below.

CultureInfo culture = new CultureInfo("en-us");

int index = culture.CompareInfo.IndexOf("this is a
TEST","test",System.Globalization.CompareOptions.IgnoreCase);

HTH
--
Lateralus [MCAD]


BILL said:
Hi Everyone,

I've been looking through these .NET groups and can't find the exact
answer I want, so I'm asking.

Can someone let me know the best way (you feel) to search a C# string
for an occurance of a CASE INSENSITIVE substring, returning the found
position. I'm speaking of larger strings to search as well ~50K-500K.
Here's what I have so far:

* ToUpper/ToLower and IndexOf would be quite slow, right? as strings
are immutable and these search strings are larger to begin with.

* RegEx could be the answer, but I'm not sure pattern matching would
be the right solution for this problem

* Any unsafe code, Boyer-Moore using pointers or inline assembly (if
that's possible), would seem the best, but well, it's unsafe code

* I've found a MapTable example here in the C# nj (thanks maptable
person), and think this might be the best solution

Any help is appreciated, thanks in advance!
BILL
 
Lateralus - Thanks! It's hard to leave my C++/MASM behind, but you're
->absolutely<- right, I'll attack these problems when needed now. Any
different opinions on this thread are always welcome, but I think I've
found my answer...
BILL


Lateralus said:
Bill,
I can understand where youre coming from. Whenever our applications need
heavy string manipulation on large amounts of data we would always write the
dll in C++. There is nothing scientific about my next statement because I
never ran any "true" tests. We had a c++ dll that would manipulate large
strings up to 10MB in size. When it was rewritten in c# we didn't notice any
degredation in the speed becides it's first time executing since it gets
compiled the first time. So basically I found that the systems I've worked
on there is no need to turn to C++ as there was in the past. Of course there
are going to be times that you will need to, but for this one I think you're
ok with C#.
<snip>
 
BILL said:
I've been looking through these .NET groups and can't find the exact
answer I want, so I'm asking.

Can someone let me know the best way (you feel) to search a C# string
for an occurance of a CASE INSENSITIVE substring, returning the found
position. I'm speaking of larger strings to search as well ~50K-500K.
Here's what I have so far:

<snip>

In addition to the previous comments, you may wish to consider using
CompareInfo.IndexOf (source, value, CompareOptions.IgnoreCase)

You can get a CompareInfo reference from a CultureInfo - you could use
the current culture (CultureInfo.CurrentCulture) or the invariant one
(CultureInfo.InvariantCulture).
 
Back
Top