Problem reading Unicode from a tab-delimited file

H

Hoop

Hi,
I have a spreadsheet that contains multiple languages. Per project
requirements I have saved this speadsheet in a tab-delimted format
using Excel 2007. I have been able to pretty much get all the
charaters correct when reading the file. I will also have to parse
Japanese from this, which I have not been able to do. I cannot read
the 3 byte characters from the file correctly for some reason. I have
an example here showing 2 characters for simplicity.

ë ’

I get the correct value for ë, but not for ’, The value of ’ should be
2019, instead I getting something that just displays a box in the
debugger, some value around 65k.
Note in the code snippet below, If I pass the unicodeString to the
foreach (Byte b in encodedBytes)
it works perfectly, outs the correct values for the charaters.. I have
a tab-delimted file that contains the same characters. When I open
that file and reead from it, the ë is correct, but the ’ is not.
Likewise for the Japanese characters I am tring to read. Almost seems
like the issue is the file open() or the readline().
Any help and code examples would be appreciated.
Thanks
Jeff




String unicodeString = " ’ ë ’ ";;
// Create a UTF-8 encoding.
UTF8Encoding utf8 = new UTF8Encoding();
// determine whether fileName is a file
if (File.Exists(fileName))
{
// obtain reader and file contents
StreamReader stream = new StreamReader(fileName);
// Open the file to read from.
using (StreamReader sr = File.OpenText(fileName))
{

while( sr.ReadLine() != null )
{

// Encode the string.
Byte[] encodedBytes = utf8.GetBytes
(unicodeString);
Console.WriteLine();
Console.WriteLine("Encoded bytes:");
foreach (Byte b in encodedBytes)
{
if (b != 9)
{
Console.Write("[{0}]", b);
}
}

}

}

} // end if
 
M

Martin Honnen

Hoop said:
I have a spreadsheet that contains multiple languages. Per project
requirements I have saved this speadsheet in a tab-delimted format
using Excel 2007. I have been able to pretty much get all the
charaters correct when reading the file. I will also have to parse
Japanese from this, which I have not been able to do.

Which encoding do you use to save the file with Excel?
If that is UTF-16 then doing File.OpenText does not work as that uses
UTF-8. So you will need to use e.g
using (StreamReader sr = new StreamReader("file.csv", Encoding.Unicode))
 
H

Hoop

Which encoding do you use to save the file with Excel?
If that is UTF-16 then doing File.OpenText does not work as that uses
UTF-8. So you will need to use e.g
   using (StreamReader sr = new StreamReader("file.csv", Encoding.Unicode))

Hi Martin,
Not sure what encoding the file is saved in. From Excel I choose, Text
(Tabdelimited).
Though I will try your suggestion.
Thanks
Jeff
 
H

Hoop

Which encoding do you use to save the file with Excel?
If that is UTF-16 then doing File.OpenText does not work as that uses
UTF-8. So you will need to use e.g
   using (StreamReader sr = new StreamReader("file.csv", Encoding.Unicode))


Hi Martin,
I added Encoding.Unicode and it did not make any difference, not
reading in ’ ë ’.
What I did as a test was make a file with notepad and save it as
Encoding: Unicode. I then opedn that file with the code below and it
worked.
So I am thinking either Excel 2007 does not save a tab delimited as
unicode, or I am saving it wrong.
Jeff

public Language(string fileName)
{
String unicodeString = " ’ ë ’ "; ;
// Create a UTF-8 encoding.
UTF8Encoding utf8 = new UTF8Encoding();
// determine whether fileName is a file
if (File.Exists(fileName))
{

// obtain reader and file contents
StreamReader stream = new StreamReader(fileName,
Encoding.Unicode);
// Open the file to read from.
using (stream = File.OpenText(fileName))
{

while (stream.ReadLine() != null)
{
unicodeString = stream.ReadLine();
// Encode the string.
Byte[] encodedBytes = utf8.GetBytes
(unicodeString);
Console.WriteLine();
Console.WriteLine("Encoded bytes:");
foreach (Byte b in encodedBytes)
{
if (b != 9)
{
Console.Write("[{0}]", b);
}
}

}

}

}//end if file exists

} //end constructor
 
H

Hoop

Which encoding do you use to save the file with Excel?
If that is UTF-16 then doing File.OpenText does not work as that uses
UTF-8. So you will need to use e.g
   using (StreamReader sr = new StreamReader("file.csv", Encoding.Unicode))

Hi Martin,
Got it figured out. In Excel you can save the file as unicode, and
that is a tab delimited. So adding the Encoding.Unicode and changing
the way the file is saved seems to have solved it. Tommorow I will see
how the Japanese looks.
Thanks
Jeff
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top