ASCII vs Unicode

J

Jeff

Hi -

I'm setting up a streamreader in a VB.NET app to read a text file and
display its contents in a multiline textbox.

If I set it up with System.Text.Encoding.Unicode, it reads a unicode file
just fine. If I set it up as ASCII, it reads a non-unicode text file. But
I don't know the file format in advance.

How can my app determine whether to use Unicode encoding before I read the
file?

- Jeff
 
P

Peter Huang

Hi Jeff,

Based on my test, we do not need to specified the encoding when we read a
file into string, .net framework will handle the issue.
Private Const FILE_NAME As String = "c:\unicode.txt"
Private Const FILE_NAME1 As String = "c:\ascii.txt"
Public Sub Main()
If Not File.Exists(FILE_NAME) Then
Console.WriteLine("{0} does not exist.", FILE_NAME)
Return
End If
Dim sr As StreamReader = File.OpenText(FILE_NAME)
Dim input As String
input = sr.ReadToEnd()
Console.WriteLine(input)
sr.Close()

sr = File.OpenText(FILE_NAME1)
input = sr.ReadToEnd()
Console.WriteLine(input)
sr.Close()
End Sub

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 
J

Jeff

Thanks for responding, Peter -

I wish my results were the same as yours. I've attached a couple of files,
ASCII.txt and Unicode.txt. My code is below, FYI.

I've got a form with a textbox on it, and a couple of radiobuttons (encode
or not). To make it simple, I also included a couple of buttons, one for
each file. I display the results of the function below in the textbox.

With encoding, the Unicode file displays fine, and the ASCII file is a
string of unreadable characters. With no encoding, the ASCII file displays
fine, and the Unicode file only displays the first character (=).

I've tried replacing the If-Else-End If block of the code with "rdr =
File.OpenText(rFile)", but my results are the same as the no encoding
results just described.

Note that the Unicode file that I'm trying to read is a log file created by
Microsoft's SQL Server Desktop Engine Setup.exe.

I'd appreciate additional help to solve this problem.

- Jeff


My code:

Public Function ReadFile(ByVal rFile As String) As String
Dim fi As FileInfo
Dim rdr As StreamReader

Try

ReadFile = ""

fi = New FileInfo(rFile)
If Not fi.Exists Then
MessageBox.Show("File Not Found." & ControlChars.CrLf & rFile)
Exit Function
End If

fi = Nothing
If frmMain.optUnicode.Checked Then
rdr = New StreamReader(rFile, System.Text.Encoding.Unicode)
Else
rdr = New StreamReader(rFile)
End If

ReadFile = rdr.ReadToEnd

rdr.Close()
rdr = Nothing

Catch ex As Exception
MessageBox.Show(ex.ToString)

End Try

End Function
 
P

Peter Huang

Hi Jeff,

FFFE is the byte order mark of Unicode, it signifies to Unicode that the
bytes are little endian.
If you use the similar as below to write a string into a file with unicode
encoding, you will find that there is the FFFE leader characters.
Dim wr As New StreamWriter("C:\testuni.txt", True,
System.Text.Encoding.Unicode)
wr.Write(input)
wr.Close()

If you use the notepad.exe to save text as unicode file in the save as
dialog, you will find the same FFFE occurance.

Now the different between my unicode and your unicode file is that the
unicode.txt you provide did not have the FFFE leader characters which cause
the problem. StreamReader needs that to do the right decoding from byte
stream to string.

Acutally if you use the StreamReader to read your unicode.txt into a string
and then use the Console.WriteLine to print the string, you will find that
the string will be displayed correctly but a space between every two
characters.

So far now I think you may try to add the FFFE at the very beginning of the
unicode file when you generate the file.
You may try to use the hex editor to observe the unicode.txt. e.g.
UltraEdit is a good hex editor.


Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 
J

Jeff

Thanks, Peter -

But, as I mentioned in my post, I am not creating the unicode file.
(Microsoft creates it as a log file written by their setup.exe for Microsoft
SQL Server Desktop Edition.) And the unicode file that MS creates, while it
doesn't have the FFFE at its start, is quite readable with
StreamReader(rFile, System.Text.Encoding.Unicode).

I'm not looking for a way to create a unicode file. I'm looking for the
best way to display the contents of a text file in a multiline textbox, when
I don't know in advance whether it's ASCII or unicode.

Please help.

- Jeff
 
P

Peter Huang

Hi Jeff,

Based on my test, the msde setup.exe tool will generate the log file with
the FFFE tag.
I run the command line as below.
setup /l*v C:\msde.log

After that, I will get the msde.log file, if I open it in the hex editor I
will find the flag FFFE.

If we do not have the flag, we can not identity the file's encoding.
e.g. the string below
=
is stored as
FFFE3d00

From the FFFE, streamreader will know that it is unicode, and it will
convert the 3d00 as the unicode.

But is we just encoding it as
3d00
the we can decoding in two way, acsii or unicode way.
If in unicode way, the 3d00 will be one character "=".
but if in ascii way, the 3d00 will be two character 3d and 00 i.e. "=" and
the character represented by ascii code(00)

Maybe there is any problem with the SQL MSDE setup program. As for that
issue, I think the SQL group will be better.
microsoft.public.sqlserver.msde
or
microsoft.public.sqlserver.setup

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top