ReadAllText, special characters

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have a problem reading special characters in unicode files like the german ß.

I use the following code:

Dim enc As System.Text.Encoding = New System.Text.UnicodeEncoding(False,
False)

Dim a As String =
Microsoft.VisualBasic.FileIO.FileSystem.ReadAllText(Application.StartupPath &
"\Avkon.r03", enc)


Microsoft.VisualBasic.FileIO.FileSystem.WriteAllText(Application.StartupPath
& "\Avkon.r05", _
a, False, enc)


And the weird thing is that some ß are read out OK but others are simply
left out.
If I set throwOnInvalidbytes to True, then i get an error of course...
 
kenny said:
I have a problem reading special characters in unicode files like the german ß.

I use the following code:

Dim enc As System.Text.Encoding = New System.Text.UnicodeEncoding(False,
False)

Dim a As String =
Microsoft.VisualBasic.FileIO.FileSystem.ReadAllText(Application.StartupPath &
"\Avkon.r03", enc)


Microsoft.VisualBasic.FileIO.FileSystem.WriteAllText(Application.StartupPath
& "\Avkon.r05", _
a, False, enc)


And the weird thing is that some ß are read out OK but others are simply
left out.
If I set throwOnInvalidbytes to True, then i get an error of course...

Which somewhat suggests that the file contains invalid Unicode...

I'm far from an expert on Unicode; is there some external resource you
can use to validate your files?
 
Well, I am absolutely sure that the file is ok. And if it would be not, why
only some chars cannot be read out? Or is there perhaps a way to read files
independent from the encoding?
 
kenny said:
Well, I am absolutely sure that the file is ok. And if it would be not, why
only some chars cannot be read out? Or is there perhaps a way to read files
independent from the encoding?

Have a look at the file with a hex editor. Each instance of the German
Eszett should be represented by the same bytes (the Eszett is Unicode
00DF it appears, so the two bytes will be 00 and DF in some order). If
the file is corrupted, one will be wrong.

Of course, the problem might be somewhere else. Instead of using
ReadAllText, you could try reading the file a line at a time, and
seeing which line causes the problem.

If even that doesn't help, maybe read the file into a byte array using
a BinaryReader, and then try decoding it one character at a time...
 
Back
Top