Reading and writing a large binary file fails

G

Guest

Hi all,

I am reading a 240MB+ binary file performing some changes and writing it
back out. For now I have removed the code that performs changes so in its
simplistic form reading a large binary and then writing it back out.

After about 4MB I receive an exception:

The output char buffer is too small to contain the decoded characters,
encoding 'Unicode' fallback 'System.Text.DecoderReplacementFallback'.
Parameter name: chars

I really cant figure this one out. Code is below ( I put the flushes in
hoping that was the problem)

try
{
//Open the source file
sourcestream = SourceFile.Open(FileMode.Open,
FileAccess.Read, FileShare.None);

//open the output file
targetstream = TargetFile.Open(FileMode.CreateNew,
FileAccess.Write, FileShare.None);

breader = new BinaryReader(sourcestream);
bwriter = new BinaryWriter(targetstream);


for (long i = 0; i < SourceFile.Length; i++)
{
bwriter.Write(breader.Read());
if (i % 2000 == 0)
{
bwriter.Flush();
targetstream.Flush();
}
}

breader.Close();
sourcestream.Close();

bwriter.Close();
targetstream.Close();

//Delete the original source file
SourceFile.Delete();

Any help would be greatly appreciated
Regards, Pete.
 
V

Vadym Stetsyak

Hello, TrinityPete!

T> I am reading a 240MB+ binary file performing some changes and writing it
T> back out. For now I have removed the code that performs changes so in
T> its simplistic form reading a large binary and then writing it back out.

T> After about 4MB I receive an exception:

T> The output char buffer is too small to contain the decoded characters,
T> encoding 'Unicode' fallback 'System.Text.DecoderReplacementFallback'.
T> Parameter name: chars

On what operation did you get an exception?
breader.Read() or bwriter.Write()? Maybe on the other?

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
 
B

Barry Kelly

TrinityPete said:
I am reading a 240MB+ binary file performing some changes and writing it
back out.

So it's a binary file - and doesn't contain meaningful text.
bwriter.Write(breader.Read());

The documentation for BinaryReader.Read() states:

---8<---
Reads characters from the underlying stream and advances the current
position of the stream in accordance with the Encoding used and the
specific character being read from the stream.
--->8---

This reads *characters* according to the encoding associated with the
BinaryReader (which defaults to UTF8Encoding).

Having read a character (which may be more than one byte due to UTF8
being a multibyte encoding), you write to the BinaryWriter.Write(Int32)
overload, which writes out exactly 4 bytes corresponding to an int.

As near as I can make out, you should be using something more like:

---8<---
bwriter.Write(breader.ReadInt32());
--->8---

Don't forget that you can open an existing file and seek within it, and
make changes in place - you won't be able to insert easily, though.

BTW: To reduce fragmentation, you may want to extend your target file by
calling SetLength on the output stream before doing your stream-based
editing. If you don't, Windows will end up being too optimistic and try
to squeeze the increasingly long file in all the fragmented bits of free
space on your drive. I've seen files with >1000 fragments easily created
because of this, taking a significant first-time-read hit next time
they're accessed.

-- Barry
 
G

Guest

Looks like it was on the read??

Just been messing some more with the code and if I change
bwriter.Write(breader.Read());
to
bwriter.Write(breader.ReadByte());

it works fine. It still doesn't help in understanding whats happening with
the original statement.

Full stack trace:

" at System.Text.Encoding.ThrowCharsOverflow()\r\n at
System.Text.Encoding.ThrowCharsOverflow(DecoderNLS decoder, Boolean
nothingDecoded)\r\n at System.Text.UTF8Encoding.GetChars(Byte* bytes, Int32
byteCount, Char* chars, Int32 charCount, DecoderNLS baseDecoder)\r\n at
System.Text.DecoderNLS.GetChars(Byte* bytes, Int32 byteCount, Char* chars,
Int32 charCount, Boolean flush)\r\n at
System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32
byteCount, Char[] chars, Int32 charIndex, Boolean flush)\r\n at
System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32
byteCount, Char[] chars, Int32 charIndex)\r\n at
System.IO.BinaryReader.InternalReadOneChar()\r\n at
System.IO.BinaryReader.Read()\r\n at
TCS.Utilities.TCSDirMonClasses.tcsDirMonitor.MoveFile(FileInfo SourceFile,
FileInfo TargetFile) in D:\\DOTNETDEV
VS2005\\TCS.Utilities.TCSDirMon\\TCS.Utilities.TCSDirMon\\TCS.Utilities.TCSDirMonClasses\\TCSDirMonClasses.cs:line 797"

Pete.
 
G

Guest

Thanks barry, that seems to make sense now - if changed the read() to
readbyte() it works OK.

Thank you.
Pete.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top