Problems reading special characters from a file

  • Thread starter jcsnippets.atspace.com
  • Start date
J

jcsnippets.atspace.com

Hi everybody,

I'm trying to read a file containing a list of addresses. Most of these
addresses are based in Germany, so there are a lot of names which have
special characters like München and the ever-present "Straße".

The code is as follows:
FileStream fs = new FileStream(fi.FullName, FileMode.Open);
StreamReader sr = new StreamReader(fs, Encoding.Default);

As you can see, I'm using Encoding.Default here - I have tried all
available encodings that are available through Encoding, but none of them
returns the text as expected.

Is there another way to read the file? What can I change to make sure these
special characters are still read and processed?

Thanks in advance,

JayCee
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

If this is a regular non-unicode text file, it probably uses the Latin 1
encoding, a.k.a. ISO-8859-1, a.k.a. Windows-1252.

I think that you can create an Encoding object for this using: new
Encoding(1252)
 
J

jcsnippets.atspace.com

If this is a regular non-unicode text file, it probably uses the Latin
1 encoding, a.k.a. ISO-8859-1, a.k.a. Windows-1252.

I think that you can create an Encoding object for this using: new
Encoding(1252)

Hi Göran,

Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also had
a look at alternatives for this class, but I cannot seem to find any.

The text itself is indeed a regular non-unicode text file, so the codepages
you suggested should be allright. Now I only need to find a way to actually
specify this encoding...

Best regards,

JayCee
 
M

Michael Voss

jcsnippets.atspace.com wrote:

[...snip...]
Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also had
a look at alternatives for this class, but I cannot seem to find any.
[...snip...]

Try

Encoding enc =Encoding.GetEncoding("windows-1252");

or

Encoding enc = Encoding.GetEncoding(1252);

HTH
Michael
 
C

Christof Nordiek

jcsnippets.atspace.com said:
Hi Göran,

Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also
had
a look at alternatives for this class, but I cannot seem to find any.

The text itself is indeed a regular non-unicode text file, so the
codepages
you suggested should be allright. Now I only need to find a way to
actually
specify this encoding...

Best regards,

JayCee

Hi JayCee,

your right, it's not a constructor but a static method:
Encoding.GetEncoding(1252)
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

You are right, the constructors are protected, so you can't use them. I
missed that when reading the documentation.

Use the GetEncoding method as Michel Voss showed.
 
J

jcsnippets.atspace.com

jcsnippets.atspace.com wrote:

[...snip...]
Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also had
a look at alternatives for this class, but I cannot seem to find any.
[...snip...]

Try

Encoding enc =Encoding.GetEncoding("windows-1252");

or

Encoding enc = Encoding.GetEncoding(1252);

Well, as you and Christof both have suggested, I tried to use the static
method GetEncoding, but to no avail.

Instead of the character I expect (ü) I receive a question-mark. As I've
said before, I tried the other encodings and they either gave me the same
result, or the character disappeared entirely.

At the moment I'm Googling for an answer, but I can't seem to find one. A
lot of people with the same problem, but no solution that helps...

Best regards,

JayCee
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Where does the file come from?

If you open it in Notepad, does it handle the characters correctly?

jcsnippets.atspace.com said:
jcsnippets.atspace.com wrote:

[...snip...]
Thanks for the explanation - I've had a look at the possibility to use
Encoding, but there is no constructor available. Using WinCV, I've also had
a look at alternatives for this class, but I cannot seem to find any.
[...snip...]

Try

Encoding enc =Encoding.GetEncoding("windows-1252");

or

Encoding enc = Encoding.GetEncoding(1252);

Well, as you and Christof both have suggested, I tried to use the static
method GetEncoding, but to no avail.

Instead of the character I expect (ü) I receive a question-mark. As I've
said before, I tried the other encodings and they either gave me the same
result, or the character disappeared entirely.

At the moment I'm Googling for an answer, but I can't seem to find one. A
lot of people with the same problem, but no solution that helps...

Best regards,

JayCee
 
J

jcsnippets.atspace.com

Where does the file come from?

If you open it in Notepad, does it handle the characters correctly?

The file is the result of an export a client created for us.

Notepad does not handle the characters correctly, and shows squares instead
of Ümlauts, and other stuff for other characters.

What I find strange is that I get other results as Notepad, no matter what
encoding I use to read the file.

If I read the files in DOS, however, they are correct.

Best regards,

JayCee
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Then the encoding is Extended ASCII. Try this:

GetEncoding(850)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top