Character encoding - 1252 vs. ISO-8859-1

J

JS

I was wondering why one would specify character encoding of 1252 vs.
ISO-8859-1 when retrieving data via HTTP. My circumstance is that I am
retrieving XML via HTTP with French characters in it and I have
specified the encoding as follows:

Dim str as New StreamReader([data source],
system.text.encoding.getencoding("ISO-8859-1"))

Doing this works fine and I retrieve the data without the special
French characters being dropped. When I change the above line of code
to the following:

Dim str as New StreamReader([data source],
System.Text.Encoding.GetEncoding(1252))

The end result is the same.

Is there any advantage to one encoding over another?
 
J

Joerg Jooss

Thus wrote js,
I was wondering why one would specify character encoding of 1252 vs.
ISO-8859-1 when retrieving data via HTTP. My circumstance is that I
am retrieving XML via HTTP with French characters in it and I have
specified the encoding as follows:

Dim str as New StreamReader([data source],
system.text.encoding.getencoding("ISO-8859-1"))
Doing this works fine and I retrieve the data without the special
French characters being dropped. When I change the above line of code
to the following:

Dim str as New StreamReader([data source],
System.Text.Encoding.GetEncoding(1252))
The end result is the same.

Is there any advantage to one encoding over another?

Well, both are dated. Windows-1252 is actually an extension of ISO-8859-1.
See http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx and http://www.microsoft.com/globaldev/reference/iso/28591.mspx.
ISO-8859-1 does not contain €, nor the uppercase and lowercase "oe" ligature
(Unicode \u0152 and \u0153). Windows-1252 contains both.

Modern applications should rather use one of the Unicode Transformation Formats
like UTF-8.

Cheers,
 
J

JS

Well, both are dated. Windows-1252 is actually an extension of
ISO-8859-1. See
http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx and
http://www.microsoft.com/globaldev/reference/iso/28591.mspx.
ISO-8859-1 does not contain €, nor the uppercase and
lowercase "oe" ligature (Unicode \u0152 and \u0153).
Windows-1252 contains both.

Modern applications should rather use one of the Unicode
Transformation Formats like UTF-8.

Okay, that is what I was thinking (in terms of the difference between
the two of them) when I was researching the issue but figured that
there must be something else I was missing. Unfortunately I cannot get
our remote partners to switch to UTF-8 (or something else more current)
so I am stuck with it but at least I feel comfortable with what I am
doing.

Thank you Joerg; great informations and assistance as always.

J.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top