Error in string comparison (Non-English windows)

  • Thread starter Thread starter Usman Jamil
  • Start date Start date
U

Usman Jamil

Hi

I'm having a strange error while comparing two strings. Please check the
code below. This is a simple string comparison code and works just fine on
all of my machines. While debugging an issue on a client's machine, who had
turkish windows installed on his system, I found out that this simple piece
of code does'nt work. The messages boxes that are displayed are in this
sequence.

1. to upper works with szWINDOWS
2. to lower does'nt works with szWINDOWS
3. to upper does'nt works with szwindows
4. to lower works with szwindows

It seems like ToUpper and ToLower are'nt working at all and .Equals() method
is being passed the original values of the variables szWINDOWS and
szwindows. Does this problem have anything to do with the Turkish window
that is installed on the client's machine, or is it a known issue.

string szWINDOWS = "WINDOWS";
string szwindows = "windows";

if(szWINDOWS.ToUpper().Equals ("WINDOWS") )
System.Windows.Forms.MessageBox.Show("to upper works with
szWINDOWS");
else
System.Windows.Forms.MessageBox.Show("to upper does'nt works with
szWINDOWS");

if(szWINDOWS.ToLower().Equals ("windows"))
System.Windows.Forms.MessageBox.Show("to lower works with
szWINDOWS");
else
System.Windows.Forms.MessageBox.Show("to lower does'nt works with
szWINDOWS");

if(szwindows.ToUpper().Equals ("WINDOWS"))
System.Windows.Forms.MessageBox.Show("to upper works with
szwindows");
else
System.Windows.Forms.MessageBox.Show("to upper does'nt works with
szwindows");

if(szwindows.ToLower().Equals ("windows"))
System.Windows.Forms.MessageBox.Show("to lower works with
szwindows");
else
System.Windows.Forms.MessageBox.Show("to lower does'nt works with
szwindows");

Regards

Usman
 
Thanx Marc.

It has been a great help. I've been debugging my whole project since 48
hours, and cud'nt get any idea why applicaiton is creating problems. I'll
surely look into the alternatives.

Regards
Usman
 
Hmmm... just looking at Jon's sample again, and I'm damned if I can
get it to successfuly equate... following all also report false /
non-zero:
Console.WriteLine("mail".ToUpper() == "MAIL");
Console.WriteLine("mail".ToUpper() == "MAIL".ToUpper());
Console.WriteLine(StringComparer.CurrentCultureIgnoreCase.Equals("mail",
"MAIL"));
Console.WriteLine(StringComparer.CurrentCultureIgnoreCase.Compare("mail",
"MAIL"));
Console.WriteLine(string.Equals("mail", "MAIL",
StringComparison.CurrentCultureIgnoreCase));
Console.WriteLine("mail".Equals("MAIL",
StringComparison.CurrentCultureIgnoreCase));

Of course, this is Jon's test case, not yours - so your specific
culture and phrase may be more forgiving... but I don't think I know
about internationalization to give the complete answer... I'll add it
to my list of things to brush up on...

So : does anybody know how you *should* realistically compare such?

Marc
 
Hi

The problem just made me think that do I need to check my C++ code also for
this, or is this problem related to dotnet only. In C++ I've used stricmp()
at most of the places for case-insensitive comparison but at a few places
I've used my custom ToUpperCase and ToLowerCase functions. Just pasting one
of them, if you have any idea of it. Just curious to know, if I may have
problem here too, otherwise ignore it if its not relavant.

Thanks and Regards

Usman

string ToLowerCase(string szSourceString)
{
for(int nIndex = 0; nIndex < szSourceString.length(); nIndex++)
{
char cSingleChar = szSourceString[nIndex];
if( cSingleChar >= 'A' && cSingleChar <= 'Z')
{
szSourceString[nIndex] = cSingleChar + 32;
}
}
return szSourceString;
}
 
In Turkish there are two I's - with and without the dot above. The lower
case of I is ? (Dotless i), and the uppercase of i is ?.

JR


Usman Jamil said:
Hi

The problem just made me think that do I need to check my C++ code also
for
this, or is this problem related to dotnet only. In C++ I've used
stricmp()
at most of the places for case-insensitive comparison but at a few places
I've used my custom ToUpperCase and ToLowerCase functions. Just pasting
one
of them, if you have any idea of it. Just curious to know, if I may have
problem here too, otherwise ignore it if its not relavant.

Thanks and Regards

Usman

string ToLowerCase(string szSourceString)
{
for(int nIndex = 0; nIndex < szSourceString.length(); nIndex++)
{
char cSingleChar = szSourceString[nIndex];
if( cSingleChar >= 'A' && cSingleChar <= 'Z')
{
szSourceString[nIndex] = cSingleChar + 32;
}
}
return szSourceString;
}

Marc Gravell said:
 
I'll try again, hoping the get through with UTF-8 and HTML:

In Turkish there are two I's - with and without the dot above. The lower
case of I is ı (Dotless i, U+0131), and the uppercase of i is İ (U+0130).

JR

JR said:
In Turkish there are two I's - with and without the dot above. The lower
case of I is ? (Dotless i), and the uppercase of i is ?.

JR


Usman Jamil said:
Hi

The problem just made me think that do I need to check my C++ code also
for
this, or is this problem related to dotnet only. In C++ I've used
stricmp()
at most of the places for case-insensitive comparison but at a few places
I've used my custom ToUpperCase and ToLowerCase functions. Just pasting
one
of them, if you have any idea of it. Just curious to know, if I may have
problem here too, otherwise ignore it if its not relavant.

Thanks and Regards

Usman

string ToLowerCase(string szSourceString)
{
for(int nIndex = 0; nIndex < szSourceString.length(); nIndex++)
{
char cSingleChar = szSourceString[nIndex];
if( cSingleChar >= 'A' && cSingleChar <= 'Z')
{
szSourceString[nIndex] = cSingleChar + 32;
}
}
return szSourceString;
}

Marc Gravell said:
 
string ToLowerCase(string szSourceString)
{
for(int nIndex = 0; nIndex < szSourceString.length(); nIndex++)
{
char cSingleChar = szSourceString[nIndex];
if( cSingleChar >= 'A' && cSingleChar <= 'Z')
{
szSourceString[nIndex] = cSingleChar + 32;
}
}
return szSourceString;
}

Wrong for almost everything beyond plain ASCII.
Meaning it will be wrong for pretty much every language,
including English (thing résumé)
 
Hmmm... just looking at Jon's sample again, and I'm damned if I can
get it to successfuly equate... following all also report false /
non-zero:
Console.WriteLine("mail".ToUpper() == "MAIL");
Console.WriteLine("mail".ToUpper() == "MAIL".ToUpper());
Console.WriteLine(StringComparer.CurrentCultureIgnoreCase.Equals("mail",
Console.WriteLine(StringComparer.CurrentCultureIgnoreCase.Compare("mail",
"MAIL"));
Console.WriteLine(string.Equals("mail", "MAIL",
StringComparison.CurrentCultureIgnoreCase));
Console.WriteLine("mail".Equals("MAIL",
StringComparison.CurrentCultureIgnoreCase));

Of course, this is Jon's test case, not yours - so your specific
culture and phrase may be more forgiving... but I don't think I know
about internationalization to give the complete answer... I'll add it
to my list of things to brush up on...

So : does anybody know how you *should* realistically compare such?


For all the examples above, as well as for the initial case (the "Windows"
string) the CurrentCulture is the most important factor.

For English:
U+0069 <-> U+0049
For Turkish/Azeri
U+0069 <-> U+0130
U+0131 <-> U+0049

So, for Turkish/Azeri "MAIL" is really NOT ToUpper("mail")
This is how it is, and this is how it *should* be.


Now, sometimes you might need to compare things in a locale-independent way
(ie for file system, comunication protocols (ex: mailto:....), etc.)

The right thing for file system is to try accessing the file
( with _access (_taccess), or PathFileExists, or CreateFile)

For other things use StringComparer.InvariantCultureIgnoreCase( ... )
or String.ToUpperInvariant + String.CompareOrdinal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top