Encoding.Default - reliable ???

Guest · Nov 2, 2006

I am having alot of difficulty with text files in .NET when they have special
characters like Ã, Ã³, Ã§ etc...

When i read a text file with them and then write it back out it ignores all
of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does it
right... but it sounds dangerous... can I rely on Encoding.Default behaving
like this for all other machines?

Joerg Jooss · Nov 2, 2006

Thus wrote MrNobody,

I am having alot of difficulty with text files in .NET when they have
special characters like í, ó, ç etc...

When i read a text file with them and then write it back out it
ignores all of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does
it right...

The question is how do you check the resulting file? It's pretty likely that
your editor wasn't up to the task to decode the file correctly.

but it sounds dangerous... can I rely on Encoding.Default
behaving like this for all other machines?

Depends on what reach you require, but around the globe certainly no. UTF-8
or UTF-16 are much better choices.

Cheers,

Christof Nordiek · Nov 2, 2006

Hi Nobody,

From the Docs of the .NET Framework Encoding.Default is the current
ANSI-CodePage. So it can easily change, for all if the application runs on
another system.
Encoding.Unicode and Encoding.UTF8 can encode any Character so they should
work fine.

Guest · Nov 2, 2006

OK, but I have a problem when I use either Encoding.Unicode or Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to

string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or Encoding.Unicode

When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data loss, it
is just full of these little black boxes...

Christof Nordiek · Nov 2, 2006

How did you generate the textfile beforehand? Obviously it is not stored in
UTF-16 encoding. If you used Nodepad an simply hit 'save' i guess it is
stored in the standard codpage of your system. So Encoding.Default would be
right here. But if the textfile was created on a machine with a different
default codepage even this will not work. But in Nodepad you con choose the
encoding in the 'save as'-dialog. Other editors will have similar features.

In any case you will first have to know, in wich encoding the file was
stored. Alas there is no general way to detect it. At least i don't know.

The best situation is, when you can agree with the creator of the sourcefile
about the encoding. The next best situation is, someone knows the encoding
used while creating the file.

Guest · Nov 2, 2006

Well, all I know is the files are created in Windows machines only., using
such programs as Notepad.

And this app is tagretted for Windows machines only, but it will be used by
people accross the globe who may need those special characters.

So is it safe then given these restrictions to rely on Encoding.Default?

I still really don't understand why I can get the files to read/write OK
using Encoding.Default but using any of the specific encodings fail...

Jon Skeet [C# MVP] · Nov 2, 2006

MrNobody said:
Well, all I know is the files are created in Windows machines only., using
such programs as Notepad.

And this app is tagretted for Windows machines only, but it will be used by
people accross the globe who may need those special characters.

So is it safe then given these restrictions to rely on Encoding.Default?

I still really don't understand why I can get the files to read/write OK
using Encoding.Default but using any of the specific encodings fail...

You won't be able to correctly read a file unless you know its
encoding. For instance, if you try to read a UTF-8 encoded file using
Encoding.Default, then any characters outside the ASCII range are
likely to end up being corrupted.

It sounds like you might be a bit fuzzy on what encodings are about.
See if
http://www.pobox.com/~skeet/csharp/unicode.html helps.

Joerg Jooss · Nov 3, 2006

Thus wrote MrNobody,

OK, but I have a problem when I use either Encoding.Unicode or
Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to
string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or
Encoding.Unicode

When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data
loss, it is just full of these little black boxes...

If you're consuming text files that have been authored outside of your application,
you have to use the encoding that was used to create the file in order to
read it.

Notepad for example can create both UTF-8 and UTF-16 encoded files, but neither
is its default encoding. So if you've created your test files in Notepad
without considering the encoding, they will end up encoded as something that
is compatible with or equal to Encoding.Default.

Cheers,

Encoding Method as variable	1	Jun 7, 2009
Problems reading special characters from a file	10	May 4, 2006
UTF encoding and StreamWriter	1	Apr 29, 2008
Encoding XML troubles	2	Mar 27, 2005
Writing extended ascii characters to text file.	3	Jan 19, 2005
Problem with UTF8 files	4	Apr 23, 2008
C# and encodings	30	Feb 3, 2009
Encoding.Default Works not Encoding.UTF8 ??	1	Oct 9, 2007

Encoding.Default - reliable ???

Guest

Joerg Jooss

Christof Nordiek

Guest

Christof Nordiek

Guest

Jon Skeet [C# MVP]

Joerg Jooss

Ask a Question

Similar Threads