Localization and the Comparer class

T

Tony Johansson

Hi!

Below is a simple program that is using the Comparer class to compare two
strings named str1 and str2.
If I use the 0x040A as the first argument to the CultureInfo I use the
traditional sort order accoding to the MSDN documentation that you can find
at the bottom.
The WriteLine statement in the program is writing 1 as the value meaning
that str1 > str2.
Can somebody explain how this works because the comparing is not based on
the ascii table ?
I mean if we use the normal ascii table we would have said that str1 < str2
because the letter l is less then u.

public static void Main()
{
// Creates the strings to compare.
String str1 = "llegar";
String str2 = "lugar";

Comparer myCompTrad = new Comparer(new CultureInfo(0x040A, false));
Console.WriteLine(" Traditional Sort : {0}",
myCompTrad.Compare(str1, str2));
}

The Spanish (Spain) culture uses two culture identifiers, 0x0C0A using the
default international sort order, and 0x040A using the traditional sort
order. If the CultureInfo is constructed using the es-ES culture name, the
new CultureInfo uses the default international sort order. For the
traditional sort order, the object is constructed using the name
es-ES_tradnl.

//Tony
 
P

Peter Duniho

Tony said:
Hi!

Below is a simple program that is using the Comparer class to compare two
strings named str1 and str2.
If I use the 0x040A as the first argument to the CultureInfo I use the
traditional sort order accoding to the MSDN documentation that you can find
at the bottom.

At the bottom of what?
The WriteLine statement in the program is writing 1 as the value meaning
that str1 > str2.
Can somebody explain how this works because the comparing is not based on
the ascii table ?

What do you want to know? If you want all the gory details of the
comparison, you need to just look at the implementation (which may or
may not involve diving into the unmanaged Windows API).

The basic answer is: duh, of course a culture-specific comparison must
not be based on the ASCII character values. That's the whole point of a
culture-specific comparison, as ASCII is itself not a culturally-based
character encoding.

Instead, when you do a culture-specific comparison, it uses whatever
ordering rules exist for that specific culture. Humans being the kind
of animal they are, these rules aren't always logical. Even when they
are logical, the logic does not necessarily follow the representation of
characters and words as found in a computer.

But, those rules _are_ what a human being expects when the computer is
asked to order the input, which is the whole reason for having
culture-specific support in various APIs, including .NET.
I mean if we use the normal ascii table we would have said that str1 < str2
because the letter l is less then u.

The 0x040A LCID is not even listed on the reference that I looked at
(http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx). But, we can
see on the documentation for the CultureInfo class that it's used to
indicate a "traditional" Spanish-specific sorting.

And for whatever reason (I don't speak Spanish, so I couldn't tell you
why), the word "llegar" is alphabetized after "lugar". So that's what
the Compare() method tells you when you compare them.

If you want to know why in the "traditional" ordering, "llegar" comes
after "lugar", but in the "international" ordering, it comes before, you
need to ask someone who knows about Spanish culture. It's not a
programming question.

Pete
 
H

Harlan Messinger

Peter said:
At the bottom of what?


What do you want to know? If you want all the gory details of the
comparison, you need to just look at the implementation (which may or
may not involve diving into the unmanaged Windows API).

The basic answer is: duh, of course a culture-specific comparison must
not be based on the ASCII character values. That's the whole point of a
culture-specific comparison, as ASCII is itself not a culturally-based
character encoding.

Instead, when you do a culture-specific comparison, it uses whatever
ordering rules exist for that specific culture. Humans being the kind
of animal they are, these rules aren't always logical. Even when they
are logical, the logic does not necessarily follow the representation of
characters and words as found in a computer.

But, those rules _are_ what a human being expects when the computer is
asked to order the input, which is the whole reason for having
culture-specific support in various APIs, including .NET.


The 0x040A LCID is not even listed on the reference that I looked at
(http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx). But, we can
see on the documentation for the CultureInfo class that it's used to
indicate a "traditional" Spanish-specific sorting.

And for whatever reason (I don't speak Spanish, so I couldn't tell you
why), the word "llegar" is alphabetized after "lugar". So that's what
the Compare() method tells you when you compare them.

If you want to know why in the "traditional" ordering, "llegar" comes
after "lugar", but in the "international" ordering, it comes before, you
need to ask someone who knows about Spanish culture. It's not a
programming question.

The Spanish alphabet is, officially, a, b, c, ch, d, e, f, g, h, i, j,
k, l, ll, m, n, ñ, o, p, q, r, s, t, u, v, w, x, y, z. The digraph "ll"
which has its own pronunciation distinct from that of "l", has been
treated as a single letter, in the same way as the digraph "ch".

However, a 1994 international language reform passed during the Tenth
Congress of the Association of Spanish Language Academies decreed that
henceforth, for purposes of sorting, "ch" and "ll" should be treated as
two separate letters, so the official order would now be {llegar,
lugar), despite the fact that "llegar" is still officially considered to
consist of five letters. Weird, but official, and perhaps enacted in
order to avoid the kinds of problems involved in international,
computerized data exchange, given that everybody, Spanish speakers
included, *types* "ch" and "ll" each as a sequence of two letters
instead of as a digraph.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top