Sort collating sequence

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I used the sort.exe command at the command prompt. The collating sequence
does not seem to conform to ASCII. Does anybody know how to force the sort
command to conform to the ASCII collating sequence?
 
What makes you think it's not doing ASCII based sorting? Please provide a
sample input and output, and point out what's wrong.

val
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
www.sdsmt.edu
The best little engineering school you
may not have heard of, but should have!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I used the sort.exe command at the command prompt. The collating sequence
does not seem to conform to ASCII. Does anybody know how to force the sort
command to conform to the ASCII collating sequence?
 
The following file segments were sorted with
qsort.exe (an old DOS program) and Windows
sort.exe program. Note that sort.exe has
the underscore '_' before the letters 'C'
and 'S'. This does not conform to ASCII.

Qsort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Sort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Bractals
 
Sort doesn't do ASCII. It does language specific sorting. Windows has never used ASCII. ASCII is Dos. Old windows use ANSI, current windows use unicode and Ansi for old programs.
 
The out of sequence use of the "_" character is probably a Microsoftism -
this lets you force a directory (folder) or filename to be sorted before
similar named ones in directory/folder views. I use this a lot to put a
folder I want up front to be there, without having to give it a different
name.

Or, it's really and undocumented change to a minimally documented aspect of
the sort.exe program. In the MS-DOS Encyclopedia (1988), "...with versions
3.0 and later, SORT assigns lowercase letters the same ASCII value as
uppercase letters; hence, case is effectively ignored."
Perhaps someone revised to code to treat uppercase by their lowercase
equivalent? NO, I don't think it's that, because the underscore is also
collated before digits, which is way out whack with ASCII.

It's just an M$ thing - no one understands!

Val

***************************

The following file segments were sorted with
qsort.exe (an old DOS program) and Windows
sort.exe program. Note that sort.exe has
the underscore '_' before the letters 'C'
and 'S'. This does not conform to ASCII.

Qsort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Sort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Bractals
 
Yes we do.
1/ Sort is a win32 program
2/ It uses two sorting modes - the sort mode of the user or another which I don't know what it does. [Trying to find how locale affects sorting is very difficult]

/L[OCALE] locale Overrides the system default locale with



the specified one. The ""C"" locale yields



the fastest collating sequence and is



currently the only alternative. The sort



is always case insensitive.
 
None of which answers the question of underscore being sorted out of any
commonly understood sequence.

..

***************************

"David Candy" <.> wrote in message
Yes we do.
1/ Sort is a win32 program
2/ It uses two sorting modes - the sort mode of the user or another which I
don't know what it does. [Trying to find how locale affects sorting is very
difficult]

/L[OCALE] locale Overrides the system default locale with



the specified one. The ""C"" locale yields



the fastest collating sequence and is



currently the only alternative. The sort



is always case insensitive.
 
It is sorting correctly. It is not ASCII.
The system's default behavior is to sort punctuation first, followed by numbers and letters.
--
----------------------------------------------------------
http://www.uscricket.com
VManes said:
None of which answers the question of underscore being sorted out of any
commonly understood sequence.

.

***************************

"David Candy" <.> wrote in message
Yes we do.
1/ Sort is a win32 program
2/ It uses two sorting modes - the sort mode of the user or another which I
don't know what it does. [Trying to find how locale affects sorting is very
difficult]

/L[OCALE] locale Overrides the system default locale with



the specified one. The ""C"" locale yields



the fastest collating sequence and is



currently the only alternative. The sort



is always case insensitive.


--
----------------------------------------------------------
http://www.uscricket.com
VManes said:
The out of sequence use of the "_" character is probably a Microsoftism -
this lets you force a directory (folder) or filename to be sorted before
similar named ones in directory/folder views. I use this a lot to put a
folder I want up front to be there, without having to give it a different
name.

Or, it's really and undocumented change to a minimally documented aspect
of
the sort.exe program. In the MS-DOS Encyclopedia (1988), "...with
versions
3.0 and later, SORT assigns lowercase letters the same ASCII value as
uppercase letters; hence, case is effectively ignored."
Perhaps someone revised to code to treat uppercase by their lowercase
equivalent? NO, I don't think it's that, because the underscore is also
collated before digits, which is way out whack with ASCII.

It's just an M$ thing - no one understands!

Val

***************************

The following file segments were sorted with
qsort.exe (an old DOS program) and Windows
sort.exe program. Note that sort.exe has
the underscore '_' before the letters 'C'
and 'S'. This does not conform to ASCII.

Qsort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Sort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Bractals
 
Thanks everyone for the responses. I asked the question because I wanted to
sort two files and merge them together using Perl. But, Perl uses the ASCII
collating sequence when comparing strings. This caused all kinds of problems
until I found out that sort.exe does not use ASCII.

David Candy said:
It is sorting correctly. It is not ASCII.
The system's default behavior is to sort punctuation first, followed by numbers and letters.
--
----------------------------------------------------------
http://www.uscricket.com
VManes said:
None of which answers the question of underscore being sorted out of any
commonly understood sequence.

.

***************************

"David Candy" <.> wrote in message
Yes we do.
1/ Sort is a win32 program
2/ It uses two sorting modes - the sort mode of the user or another which I
don't know what it does. [Trying to find how locale affects sorting is very
difficult]

/L[OCALE] locale Overrides the system default locale with



the specified one. The ""C"" locale yields



the fastest collating sequence and is



currently the only alternative. The sort



is always case insensitive.


--
----------------------------------------------------------
http://www.uscricket.com
VManes said:
The out of sequence use of the "_" character is probably a Microsoftism -
this lets you force a directory (folder) or filename to be sorted before
similar named ones in directory/folder views. I use this a lot to put a
folder I want up front to be there, without having to give it a different
name.

Or, it's really and undocumented change to a minimally documented aspect
of
the sort.exe program. In the MS-DOS Encyclopedia (1988), "...with
versions
3.0 and later, SORT assigns lowercase letters the same ASCII value as
uppercase letters; hence, case is effectively ignored."
Perhaps someone revised to code to treat uppercase by their lowercase
equivalent? NO, I don't think it's that, because the underscore is also
collated before digits, which is way out whack with ASCII.

It's just an M$ thing - no one understands!

Val

***************************

The following file segments were sorted with
qsort.exe (an old DOS program) and Windows
sort.exe program. Note that sort.exe has
the underscore '_' before the letters 'C'
and 'S'. This does not conform to ASCII.

Qsort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Sort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Bractals


:

What makes you think it's not doing ASCII based sorting? Please provide
a
sample input and output, and point out what's wrong.

val
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
www.sdsmt.edu
The best little engineering school you
may not have heard of, but should have!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I used the sort.exe command at the command prompt. The collating sequence
does not seem to conform to ASCII. Does anybody know how to force the
sort
command to conform to the ASCII collating sequence?
 
To paraphrase a former leader of the free world, define "sorting correctly".
I teach programming, and every one of the dozens of texts on my shelf refer
to sorting behavior of strings as relating to the collating sequence, which
is represented by the character coding scheme, generally ASCII. In ASCII,
the underscore falls between the upper and lowercase letters. You write a
C/C++ program to do sorting of text, one will generally use the strcmp()
family of functions, or relational operator comparing newer class-based
strings - both methods devolve to comparing the ASCII values of
corresponding characters, so an uppercase letter comes before the
underscore.

The sort utility on my Linux box sorts in that same manner.

Perhaps by correctly, you mean according to ISO 14651? Well, why didn't you
say so?
***************************

"David Candy" <.> wrote in message
It is sorting correctly. It is not ASCII.
The system's default behavior is to sort punctuation first, followed by
numbers and letters.
--
----------------------------------------------------------
http://www.uscricket.com
VManes said:
None of which answers the question of underscore being sorted out of any
commonly understood sequence.

.

***************************

"David Candy" <.> wrote in message
Yes we do.
1/ Sort is a win32 program
2/ It uses two sorting modes - the sort mode of the user or another which
I
don't know what it does. [Trying to find how locale affects sorting is
very
difficult]

/L[OCALE] locale Overrides the system default locale with



the specified one. The ""C"" locale yields



the fastest collating sequence and is



currently the only alternative. The sort



is always case insensitive.


--
----------------------------------------------------------
http://www.uscricket.com
VManes said:
The out of sequence use of the "_" character is probably a Microsoftism -
this lets you force a directory (folder) or filename to be sorted before
similar named ones in directory/folder views. I use this a lot to put a
folder I want up front to be there, without having to give it a different
name.

Or, it's really and undocumented change to a minimally documented aspect
of
the sort.exe program. In the MS-DOS Encyclopedia (1988), "...with
versions
3.0 and later, SORT assigns lowercase letters the same ASCII value as
uppercase letters; hence, case is effectively ignored."
Perhaps someone revised to code to treat uppercase by their lowercase
equivalent? NO, I don't think it's that, because the underscore is also
collated before digits, which is way out whack with ASCII.

It's just an M$ thing - no one understands!

Val

***************************

The following file segments were sorted with
qsort.exe (an old DOS program) and Windows
sort.exe program. Note that sort.exe has
the underscore '_' before the letters 'C'
and 'S'. This does not conform to ASCII.

Qsort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Sort.exe Collating Sequence

C:\DELL\ALERT\
C:\DELL\ALERT\0 \
C:\DELL\ALERT\0\+___.GIF
C:\DELL\ALERT\0\+CCC.GIF
C:\DELL\ALERT\0\10675121.GIF
C:\DELL\ALERT\0\ALERT.GIF
C:\DELL\ALERT\0\OFFDELL.GIF
C:\DELL\ALERT\0\PRIVACY_CONTENT.HTM
C:\DELL\ALERT\0\PRIVACYSEAL.GIF
C:\DELL\ALERT\0\RELIABILITYSEAL.GIF

Bractals
 
Back
Top