Problem Using TextReader and Binaray Reader on Same File

G

Guest

I am rewriting a C++ application in C#. This file has a combination of Text
and Binary data.

I used CFile before to read the text. If I hit a certain string that
denotes the following data is binary, I used the current position in the
file and another stream to read to the binary data.

All text data is ended with a carriage return / line feed while the binary
is actually an image file listed byte by byte. Preceding the binary data is
a text value ended by a cr/lf listing the actual number of bytes of binary
data.

The problem is that the position given by textreader.basestream seems to be
actually twice the actual position in bytes. Therefore positioning a Binary
Reader based on the position in a Text Reader is incorrect.

This worked with CFile.

Any ideas?

Thank You,
Jeff
 
J

Joshua Flanagan

It sounds like you are using a 16-bit encoding with the TextReader. I
wonder if specifying an 8-bit encoding (System.Text.Encoding.ASCII)
would solve your problem. It's worth a try.
Otherwise, if you can consistently see that the TextReader position is
always twice the binary position... well, divide by 2 (right shift 1)!
 
G

Guest

Joshua,

I am considering that except I did not want to do anything that would
potentially fail in converting to a 64 bit OS.

I did try the ASCII encoding, etc and the same problem occurred. I think I
am just going to use the binary reader and create my own readline
functionality.

I also just read the entire file into memory and it did not bog down the
machine too much. I am not sure about the deployment machine yet. The
FileStream / Binary Reader classes may provide some behind the scenes
buffering to help this.

This does give me the capability to scan the contents quickly and create
some file offset arrays to quickly process the data I need.

Thanks for the input,
Jeff
 
J

Jon Skeet [C# MVP]

I am rewriting a C++ application in C#. This file has a combination of Text
and Binary data.

I used CFile before to read the text. If I hit a certain string that
denotes the following data is binary, I used the current position in the
file and another stream to read to the binary data.

All text data is ended with a carriage return / line feed while the binary
is actually an image file listed byte by byte. Preceding the binary data is
a text value ended by a cr/lf listing the actual number of bytes of binary
data.

The problem is that the position given by textreader.basestream seems to be
actually twice the actual position in bytes. Therefore positioning a Binary
Reader based on the position in a Text Reader is incorrect.

This is an unfortunate problem with TextReader. You *may* find that
creating a StreamReader with a buffer size of 1 sorts the problem, but
it'll be very inefficient.

If your file is in ASCII, you could spot the CRLF while reading it as
binary data, and then convert the binary data to text when you find the
CRLF, all the while knowing where you are.

If you're in control of the file format, however, I'd suggest that you
prefix any text section with the number of bytes in that text section.
You can then read that amount of data and convert it to text
separately, without overrunning.
 
J

Jon Skeet [C# MVP]

Joshua Flanagan said:
It sounds like you are using a 16-bit encoding with the TextReader. I
wonder if specifying an 8-bit encoding (System.Text.Encoding.ASCII)
would solve your problem. It's worth a try.
Otherwise, if you can consistently see that the TextReader position is
always twice the binary position... well, divide by 2 (right shift 1)!

No, the problem is that StreamReader will read more than it's returned
so far into an internal buffer. The stream's position is then
"inaccurate" in terms of how much appears to have been read.
 
G

Guest

Jon,

I appreciate you response and it makes sense now but does not help my plight
much in migrating the code.

However is there any reason I can not just access the file via BinaryReader
and load the entire file into a byte[] buffer and parse it instead. It
seems to load the large file quickly and efficiently. The files are 35 to
60 megabytes. It is so quick it seems it is buffered or cached behind the
scenes anyway.

I am doing this way now so I just have deal with parsing the byte buffer
versus potentially dealing with network / buffering issues hitting the
BinaryStream directly.

Thank You,
Jeff
 
J

Jon Skeet [C# MVP]

I appreciate you response and it makes sense now but does not help my plight
much in migrating the code.

However is there any reason I can not just access the file via BinaryReader
and load the entire file into a byte[] buffer and parse it instead. It
seems to load the large file quickly and efficiently. The files are 35 to
60 megabytes. It is so quick it seems it is buffered or cached behind the
scenes anyway.

If you've got enough memory and if the text is in a simple encoding
(such as ASCII) where you don't need to worry about detecting multi-
byte characters, that may well be the easiest way of doing things.
I am doing this way now so I just have deal with parsing the byte buffer
versus potentially dealing with network / buffering issues hitting the
BinaryStream directly.

Right.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top