strange character when receiving mail

T

Tony Johansson

Hi!

Once a received a mail that had a lot of strange character and once I looked
at a web page that also had a lot of strange characters. With strange
character I mean character that is not readable it could be some graphical
character.
I know that this somehow depends of encodings.

So my question is if someone could descibe a scenario where this actually
could appear.

//Tony
 
P

Peter Duniho

Tony said:
Hi!

Once a received a mail that had a lot of strange character and once I looked
at a web page that also had a lot of strange characters. With strange
character I mean character that is not readable it could be some graphical
character.
I know that this somehow depends of encodings.

So my question is if someone could descibe a scenario where this actually
could appear.

You can get "strange characters" for two reasons:

• someone sent you strange characters
• someone sent you familiar characters, but you are interpreting them
using the wrong encoding

That's _two_ scenarios where this actually could appear.

Make sure that your C# program properly uses whatever encoding
description/specification is found in your data, and offers the user the
opportunity to specify an encoding explicitly for situations where the
encoding is not specified in the data itself, and you should be fine.

Hope that helps.

Pete
 
A

Arne Vajhøj

Once a received a mail that had a lot of strange character and once I looked
at a web page that also had a lot of strange characters. With strange
character I mean character that is not readable it could be some graphical
character.
I know that this somehow depends of encodings.

So my question is if someone could descibe a scenario where this actually
could appear.

Both emails and web pages uses MIME types.

Which means that "something" consists of:
- a content-type specifying the charset
- the actual data

The typical reason for seeing garbage is if those two
are inconsistent.

If the content-type says "text/plain; charset=ISO-8859-1" but
the content is really UTF-8 then you will see 2 letters for
each 0x80-0xFF character.

If the content-type says "text/plain; charset=UTF-8" but
the content is really ISO-8859-1 then you will see some
error like question marks for each 0x80-0xFF character.

If we goes to more exotic stuff like chinese and japanese,
then you can really see something weird.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top