Encoding.Default - reliable ???

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I am having alot of difficulty with text files in .NET when they have special
characters like í, ó, ç etc...

When i read a text file with them and then write it back out it ignores all
of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does it
right... but it sounds dangerous... can I rely on Encoding.Default behaving
like this for all other machines?
 
Thus wrote MrNobody,
I am having alot of difficulty with text files in .NET when they have
special characters like í, ó, ç etc...

When i read a text file with them and then write it back out it
ignores all of those characters completely.

I tried all the encoding types and it seems only Encoding.Default does
it right...

The question is how do you check the resulting file? It's pretty likely that
your editor wasn't up to the task to decode the file correctly.
but it sounds dangerous... can I rely on Encoding.Default
behaving like this for all other machines?

Depends on what reach you require, but around the globe certainly no. UTF-8
or UTF-16 are much better choices.

Cheers,
 
Hi Nobody,

From the Docs of the .NET Framework Encoding.Default is the current
ANSI-CodePage. So it can easily change, for all if the application runs on
another system.
Encoding.Unicode and Encoding.UTF8 can encode any Character so they should
work fine.
 
OK, but I have a problem when I use either Encoding.Unicode or Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to

string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or Encoding.Unicode


When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data loss, it
is just full of these little black boxes...
 
How did you generate the textfile beforehand? Obviously it is not stored in
UTF-16 encoding. If you used Nodepad an simply hit 'save' i guess it is
stored in the standard codpage of your system. So Encoding.Default would be
right here. But if the textfile was created on a machine with a different
default codepage even this will not work. But in Nodepad you con choose the
encoding in the 'save as'-dialog. Other editors will have similar features.

In any case you will first have to know, in wich encoding the file was
stored. Alas there is no general way to detect it. At least i don't know.

The best situation is, when you can agree with the creator of the sourcefile
about the encoding. The next best situation is, someone knows the encoding
used while creating the file.
 
Well, all I know is the files are created in Windows machines only., using
such programs as Notepad.

And this app is tagretted for Windows machines only, but it will be used by
people accross the globe who may need those special characters.

So is it safe then given these restrictions to rely on Encoding.Default?

I still really don't understand why I can get the files to read/write OK
using Encoding.Default but using any of the specific encodings fail...
 
MrNobody said:
Well, all I know is the files are created in Windows machines only., using
such programs as Notepad.

And this app is tagretted for Windows machines only, but it will be used by
people accross the globe who may need those special characters.

So is it safe then given these restrictions to rely on Encoding.Default?

I still really don't understand why I can get the files to read/write OK
using Encoding.Default but using any of the specific encodings fail...

You won't be able to correctly read a file unless you know its
encoding. For instance, if you try to read a UTF-8 encoded file using
Encoding.Default, then any characters outside the ASCII range are
likely to end up being corrupted.

It sounds like you might be a bit fuzzy on what encodings are about.
See if
http://www.pobox.com/~skeet/csharp/unicode.html helps.
 
Thus wrote MrNobody,
OK, but I have a problem when I use either Encoding.Unicode or
Encoding.UTF8.

string src = // path to source file
string tgt = // path where to write new file to
string txt = System.IO.File.ReadAllText(src, Encoding.Unicode);

Console.WriteLine("index = " + txt.IndexOf("something"));

System.IO.File.WriteAllText(tgt, txt, Encoding.UTF8); // or
Encoding.Unicode

When I run that code, the index is always -1 for a string which is
definitely in the file and the file it prints out has complete data
loss, it is just full of these little black boxes...

If you're consuming text files that have been authored outside of your application,
you have to use the encoding that was used to create the file in order to
read it.

Notepad for example can create both UTF-8 and UTF-16 encoded files, but neither
is its default encoding. So if you've created your test files in Notepad
without considering the encoding, they will end up encoded as something that
is compatible with or equal to Encoding.Default.

Cheers,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top