Reading UTF data from c#

  • Thread starter Thread starter Sivaraj G via .NET 247
  • Start date Start date
S

Sivaraj G via .NET 247

We created a unicode file using java application. It usesmethods like writeUTF(), writeInt() of java.io.DataOutputStreamclass to write the content of the file. We are able to read datausing java.io.DataInputStream.readUTF() method. It's workingwell in java environment.

When we tried to read the above unicode file in .net environment.We received junk content not the original content.

Actually we tried a sample program in c# and usedSystem.Text.UTF8Encoding(true) option also. Any help highlyappreciated.
 
Sivaraj G via .NET 247 said:
We created a unicode file using java application. It uses methods
like writeUTF(), writeInt() of java.io.DataOutputStream class to
write the content of the file. We are able to read data using
java.io.DataInputStream.readUTF() method. It's working well in java
environment.

When we tried to read the above unicode file in .net environment. We
received junk content not the original content.

Actually we tried a sample program in c# and used
System.Text.UTF8Encoding(true) option also. Any help highly
appreciated.

writeUTF first writes a pair of bytes to give the number of bytes to
follow. Those aren't UTF-8 characters, but .NET would be expecting them
to be.

Effectively, writeUTF and readUTF are only designed to work with
DataInputStream/DataOutputStream. You can probably read that pair of
bytes before reading the rest, but it's not ideal. If you're just
creating a text file in Java, I suggest you use OutputStreamWriter
wrapped round a FileStream, and specify UTF-8 as the encoding. If
you're writing a file with mixed binary and text data, you need to make
sure you know *exactly* what you're writing, and then read it very
carefully from the other platform.
 
Hello, Sivaraj!

Here is a block of code that might help (or give some ideas),
if I understood you correctly:
string
s=string.Empty,
path="c:\file.txt", //... or whatever file you'll be using
NewLine=string.Empty; //this variable doesn't have to be initialized, but
it is a good habit

System.IO.StreamReader MyStreamReader=
new System.IO.StreamReader(path, System.Text.Encoding.UTF8);

while((NewLine=MyStreamReader.ReadLine())!=null)
s+=NewLine+"\r\n";

MyStreamReader.Close();

//Now do something with 's'

Regards

You wrote on Fri, 06 Aug 2004 06:51:09 -0700:

SGN> When we tried to read the above unicode file in .net environment. We
SGN> received junk content not the original content.

SGN> Actually we tried a sample program in c# and used
SGN> System.Text.UTF8Encoding(true) option also. Any help highly
SGN> appreciated.

SGN> -----------------------
SGN> Posted by a user from .NET 247 (http://www.dotnet247.com/)


With best regards, Nurchi BECHED.
 
Nurchi BECHED said:
Here is a block of code that might help (or give some ideas),
if I understood you correctly:
string
s=string.Empty,
path="c:\file.txt", //... or whatever file you'll be using
NewLine=string.Empty; //this variable doesn't have to be initialized, but
it is a good habit

I disagree on that point - if an assignment isn't required because
there'll be another assignment before the first "read" of the variable,
I'd rather the extraneous assignment isn't present in the first place.
It implies that the assigned value has some purpose, when it doesn't.
System.IO.StreamReader MyStreamReader=
new System.IO.StreamReader(path, System.Text.Encoding.UTF8);

while((NewLine=MyStreamReader.ReadLine())!=null)
s+=NewLine+"\r\n";

That's a horrible way of building up a string. Use StringBuilder
instead.

In this case though, just MyStreamReader.ReadToEnd() would be a better
solution still.
MyStreamReader.Close();

You should use a using statement instead - that way if an exception is
thrown, the StreamReader still gets disposed.
 
Back
Top