Bug or Feature in BinaryReader.PeekChar()?

G

Guest

Hi,

my scenario is as follows:
(1) I write data to file using a BinaryWriter (UTF8 encoding).
(2) I read data from that file using a BinaryReader (UTF8 encoding).

Using BinaryReader.PeekChar() to check if more data can be read, _sometimes_
certain characters are just lost in subsequent calls to ReadString()!

It took me some time to detect that PeekChar() changes the internal state of
the BinaryReader's UTF8Decoder. Seems like whenever PeekChar() gets a byte
that it interprets as first byte of a multi-byte UTF8 sequence the
UTF8Decoder is left in a state that makes it skip the next multi-byte UTF8
sequence.

The code below will reproduce the loss of data. Just paste it into a Console
application and run it in debug mode.

Any comments will be appreciated,

Bernd

********** C# code: **************

using System;
using System.Diagnostics;
using System.IO;
using System.Text;

namespace UTF8DecoderTest
{
/// <summary>
/// Summary description for UTF8DecoderTest.
/// </summary>
class UTF8DecoderTest
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main(string[] args)
{
string testString = string.Format("<{0}>", (char)322); // 'special'
character occupying more than 1 byte in UTF8 encoding
string tempFile = Path.GetTempFileName();
// Write test data to temporary file:
BinaryWriter writer = new BinaryWriter(new FileStream(tempFile,
FileMode.Create), Encoding.UTF8);
for (int i = 235; i < 255; i++)
{
writer.Write(i);
writer.Write(testString);
}
writer.Close();
// Read back data using a FOR loop, expect no problems
BinaryReader reader = new BinaryReader(new FileStream(tempFile,
FileMode.Open), Encoding.UTF8);
for (int i = 235; i < 255; i++)
{
readAndCheckData (reader, testString);
}
reader.Close();
// Read back data using PeekChar(), expect problems
BinaryReader readerUsingPeekChar = new BinaryReader(new
FileStream(tempFile, FileMode.Open), Encoding.UTF8);
while (readerUsingPeekChar.PeekChar() >= 0)
{
// PeekChar changes the internal state of readerUsingPeekChar's
UTF8Decoder!
// Expect incorrect data being read when the integer peeked into is
240, 241,...
readAndCheckData (readerUsingPeekChar, testString);
}
readerUsingPeekChar.Close();
File.Delete(tempFile);
Console.WriteLine("Hit <return> to exit");
Console.ReadLine();
}

static private void readAndCheckData (BinaryReader reader, string
expectedString)
{
int intAsReadFromFile = reader.ReadInt32();
string stringAsReadFromFile = reader.ReadString();
Trace.WriteLine(string.Format("{0}, {1} - {2}", intAsReadFromFile,
stringAsReadFromFile, stringAsReadFromFile == expectedString ? "ok" :
"ooops"));
Debug.Assert(stringAsReadFromFile == expectedString,
string.Format("Wrote {0}, but read {1} when integer read was {2}.",
expectedString, stringAsReadFromFile, intAsReadFromFile));
}
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top