StreamReaders and encoding

  • Thread starter Thread starter MattB
  • Start date Start date
M

MattB

I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByVal strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.StreamReader

Try
objStreamReader = System.IO.File.OpenText(strFilename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader.ReadToEnd
objStreamReader.Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly read a
text file with a different encoding. I have a text file with some French
accents in it, like "acheté". My function would return "achet", dropping
the é completely. I'm not sure how to address this and it's very
important to make it continue to work as it has with the plain English
files I usually use it with. Anyone know how to address this? Thanks!

Matt
 
MattB said:
I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByVal strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.StreamReader

Try
objStreamReader = System.IO.File.OpenText(strFilename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader.ReadToEnd
objStreamReader.Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly
read a text file with a different encoding. I have a text file with
some French accents in it, like "acheté". My function would return
"achet", dropping the é completely. I'm not sure how to address this
and it's very important to make it continue to work as it has with
the plain English files I usually use it with. Anyone know how to
address this? Thanks!

Same answer as always: Use the correct encoding. File.OpenText() always
uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@"C:\Foo\Bar.txt", Encoding.Default)
{
// ...
}

Note that Encoding.Default is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,
 
Joerg said:
MattB wrote:




Same answer as always: Use the correct encoding. File.OpenText() always
uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@"C:\Foo\Bar.txt", Encoding.Default)
{
// ...
}

Note that Encoding.Default is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,

Thanks for the reply!

Do you know if I can detect the encoding of the text file somehow, so
this app will work correctly with differently encoded text files?

Got any links or examples?

Thanks again!

Matt
 
Joerg said:
MattB wrote:




Same answer as always: Use the correct encoding. File.OpenText() always
uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@"C:\Foo\Bar.txt", Encoding.Default)
{
// ...
}

Note that Encoding.Default is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,

OK, so I tried creating the StreamReader as you said, and I tried every
encoding I could and nothing could read my text file with French
characters correctly. For example, the word "acheté" comes across as
"achet".
It entirely possible (even likely) I'm taking the wrong approach.
Can anyone with US English Windows put the word "acheté" in a text file
and have the last character come through?

Maybe I'll try reading it as binary next...

Any suggestions appreciated!

Matt
 
MattB wrote:

[...]
OK, so I tried creating the StreamReader as you said, and I tried
every encoding I could and nothing could read my text file with
French characters correctly. For example, the word "acheté" comes
across as "achet". It entirely possible (even likely) I'm taking the
wrong approach. Can anyone with US English Windows put the word
"acheté" in a text file and have the last character come through?

Maybe I'll try reading it as binary next...

There's no such thing as binary text. There are only bytes, which after
decoding them to characters, may become meaningful text.

The only way to solve this problem is to understand which character
encoding is being used. Can you load the file in a hex editor and try
to find out what bytes are used to represent the 'é'?

Cheers,
 
Back
Top