string compare.

  • Thread starter Thread starter benny
  • Start date Start date
B

benny

Anybody can explain why
String.Compare("4","¢w3")=1;
but
String.Compare("3","¢w3")=-1;

the '¢w ' is 0x2015
 
hi,
what richard wrote is correct. mee to got 1 and 1. check the regional
settings if you are not using nuteral culture.
 
benny said:
Anybody can explain why
String.Compare("4","¢w3")=1;
but
String.Compare("3","¢w3")=-1;

the '¢w ' is 0x2015

Just an FYI for anyone trying to reproduce this - I do not get the
results benny sees if I cut-n-paste the above - I have to create the
string with the unusual character in question like so:

String.Compare( "4", "\u2015" + "3");
String.Compare( "3", "\u2015" + "3");

In this case, I get the results benny sees. It also does not seem to
depend on the current culture (at least non of the several I tried,
including InvariantCulture).

As it happens, Char.GetUnicodeCategory( '\u2015') returns
UnicodeCategory.DashPunctuation.

So I tried:

String.Compare( "4", "-3");
String.Compare( "3", "-3");

And I came up with 1 and -1 again.

unfortunately, I don't know why, but I do think it's an interesting
result. I suspect it has to do with the DashPunctuation character not
being enough to decide a comparison on it's own, so it moves on to the
next char:

- in the first case, '4' is clearly greater than '3', so return 1
in that case

- in the second case '3' is equal to '3', but since the 2nd string
had the DashPunctuation character, that's enough to bump the 2nd string
as being greater than the first string.

But I'm just guessing here. Also, the few other punctuation characters
I tried do not behave the same way the dash does in the compare.
 
As it happens, Char.GetUnicodeCategory( '\u2015') returns
UnicodeCategory.DashPunctuation.

So I tried:

String.Compare( "4", "-3");
String.Compare( "3", "-3");

And I came up with 1 and -1 again.

I suspect this is the same reason as another recent string comparison
question. Basically the dash is being treated as a word joining
character which is compared differently so that if you're looking at a
word list, "co-ordinate" and "coordinate" come together. The test above
is effectively making a list look like:

3
-3
4

which looks funny until you think of:

coordinate
co-ordinate
cop

when it looks absolutely fine.
 
Jon said:
I suspect this is the same reason as another recent string comparison
question. Basically the dash is being treated as a word joining
character which is compared differently so that if you're looking at a
word list, "co-ordinate" and "coordinate" come together. The test above
is effectively making a list look like:

3
-3
4

which looks funny until you think of:

coordinate
co-ordinate
cop

when it looks absolutely fine.

Excellent. I like it when a nice, rational explanation can be made for
these curiosities.
 
Back
Top