Problems reading special characters from a file

  • Thread starter Thread starter jcsnippets.atspace.com
  • Start date Start date
J

jcsnippets.atspace.com

Hi everybody,

I'm trying to read a file containing a list of addresses. Most of these
addresses are based in Germany, so there are a lot of names which have
special characters like München and the ever-present "Straße".

The code is as follows:
FileStream fs = new FileStream(fi.FullName, FileMode.Open);
StreamReader sr = new StreamReader(fs, Encoding.Default);

As you can see, I'm using Encoding.Default here - I have tried all
available encodings that are available through Encoding, but none of them
returns the text as expected.

Is there another way to read the file? What can I change to make sure these
special characters are still read and processed?

Thanks in advance,

JayCee
 
If this is a regular non-unicode text file, it probably uses the Latin 1
encoding, a.k.a. ISO-8859-1, a.k.a. Windows-1252.

I think that you can create an Encoding object for this using: new
Encoding(1252)
 
If this is a regular non-unicode text file, it probably uses the Latin
1 encoding, a.k.a. ISO-8859-1, a.k.a. Windows-1252.

I think that you can create an Encoding object for this using: new
Encoding(1252)

Hi Göran,

Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also had
a look at alternatives for this class, but I cannot seem to find any.

The text itself is indeed a regular non-unicode text file, so the codepages
you suggested should be allright. Now I only need to find a way to actually
specify this encoding...

Best regards,

JayCee
 
jcsnippets.atspace.com wrote:

[...snip...]
Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also had
a look at alternatives for this class, but I cannot seem to find any.
[...snip...]

Try

Encoding enc =Encoding.GetEncoding("windows-1252");

or

Encoding enc = Encoding.GetEncoding(1252);

HTH
Michael
 
jcsnippets.atspace.com said:
Hi Göran,

Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also
had
a look at alternatives for this class, but I cannot seem to find any.

The text itself is indeed a regular non-unicode text file, so the
codepages
you suggested should be allright. Now I only need to find a way to
actually
specify this encoding...

Best regards,

JayCee

Hi JayCee,

your right, it's not a constructor but a static method:
Encoding.GetEncoding(1252)
 
You are right, the constructors are protected, so you can't use them. I
missed that when reading the documentation.

Use the GetEncoding method as Michel Voss showed.
 
jcsnippets.atspace.com wrote:

[...snip...]
Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also had
a look at alternatives for this class, but I cannot seem to find any.
[...snip...]

Try

Encoding enc =Encoding.GetEncoding("windows-1252");

or

Encoding enc = Encoding.GetEncoding(1252);

Well, as you and Christof both have suggested, I tried to use the static
method GetEncoding, but to no avail.

Instead of the character I expect (ü) I receive a question-mark. As I've
said before, I tried the other encodings and they either gave me the same
result, or the character disappeared entirely.

At the moment I'm Googling for an answer, but I can't seem to find one. A
lot of people with the same problem, but no solution that helps...

Best regards,

JayCee
 
Where does the file come from?

If you open it in Notepad, does it handle the characters correctly?

jcsnippets.atspace.com said:
jcsnippets.atspace.com wrote:

[...snip...]
Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also had
a look at alternatives for this class, but I cannot seem to find any.
[...snip...]

Try

Encoding enc =Encoding.GetEncoding("windows-1252");

or

Encoding enc = Encoding.GetEncoding(1252);

Well, as you and Christof both have suggested, I tried to use the static
method GetEncoding, but to no avail.

Instead of the character I expect (ü) I receive a question-mark. As I've
said before, I tried the other encodings and they either gave me the same
result, or the character disappeared entirely.

At the moment I'm Googling for an answer, but I can't seem to find one. A
lot of people with the same problem, but no solution that helps...

Best regards,

JayCee
 
Where does the file come from?

If you open it in Notepad, does it handle the characters correctly?

The file is the result of an export a client created for us.

Notepad does not handle the characters correctly, and shows squares instead
of Ümlauts, and other stuff for other characters.

What I find strange is that I get other results as Notepad, no matter what
encoding I use to read the file.

If I read the files in DOS, however, they are correct.

Best regards,

JayCee
 
Then the encoding is Extended ASCII. Try this:

GetEncoding(850)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top