read binary file into byte[]

J

John Grandy

What is the most efficient technique to read a binary file ( such as a .dll
or .exe ) into a byte[] ?

Are there advantages to using a BinaryReader ?

Thanks.
 
P

Peter Duniho

John said:
What is the most efficient technique to read a binary file ( such as a .dll
or .exe ) into a byte[] ?

Are there advantages to using a BinaryReader ?

BinaryReader is mainly useful for when you are dealing with specific
data types _other_ than an array of bytes.

As for "the most efficient technique" for reading a binary file, it all
depends. But, depending on your needs, your first try should be simply
to allocate a byte[] of the necessary size, and then make a single call
to FileStream.Read() to read all of the data at once.

For absolute correctness, you should actually code that as a loop,
checking the return value of Read() and calling it again to read any
remaining bytes if the return value is less than the total length
remaining to read.

But for a FileStream, the complete file _should_ be always read in a
single call to Read().

Doing it this way delegates all of the buffering and data copying tasks
to .NET and Windows. Most of the time, those components will be able to
do the i/o as efficiently as possible. If you have some specific need
or consideration that requires a different strategy, you should be
specific about that so you can receive advice specific to that need or
consideration.

Finally, note that with respect to file i/o, unless you are accessing
the same file repeatedly, it's unlikely that anything you do in your
code will very much affect the over throughput, because the real
bottleneck is simply in getting the data from the storage device.

Pete
 
J

John Grandy

Hi Peter, and thanks for the response.

So, the techniques in the following article :

http://www.yoda.arachsys.com/csharp/readbinary.html

are generally not necessary ? What special cases would require looping ?


Peter Duniho said:
John said:
What is the most efficient technique to read a binary file ( such as a
.dll or .exe ) into a byte[] ?

Are there advantages to using a BinaryReader ?

BinaryReader is mainly useful for when you are dealing with specific data
types _other_ than an array of bytes.

As for "the most efficient technique" for reading a binary file, it all
depends. But, depending on your needs, your first try should be simply to
allocate a byte[] of the necessary size, and then make a single call to
FileStream.Read() to read all of the data at once.

For absolute correctness, you should actually code that as a loop,
checking the return value of Read() and calling it again to read any
remaining bytes if the return value is less than the total length
remaining to read.

But for a FileStream, the complete file _should_ be always read in a
single call to Read().

Doing it this way delegates all of the buffering and data copying tasks to
.NET and Windows. Most of the time, those components will be able to do
the i/o as efficiently as possible. If you have some specific need or
consideration that requires a different strategy, you should be specific
about that so you can receive advice specific to that need or
consideration.

Finally, note that with respect to file i/o, unless you are accessing the
same file repeatedly, it's unlikely that anything you do in your code will
very much affect the over throughput, because the real bottleneck is
simply in getting the data from the storage device.

Pete
 
P

Peter Duniho

John said:
Hi Peter, and thanks for the response.

So, the techniques in the following article :

http://www.yoda.arachsys.com/csharp/readbinary.html

are generally not necessary ? What special cases would require looping ?

As I wrote in my previous reply, you _should_ loop. Technically, the
Read() method is only required to block as long as it takes to get
_some_ data. This leaves open the possibility that it might return
before having read _all_ of the data.

In reality, the Read() method generally only returns before
end-of-stream and/or filling the buffer if for some reason the stream
has a way of indicating data isn't yet available, and that becomes true
after reading some data already. And in reality, that's just not
something that typically happens with FileStream.

But, the API doesn't provide any _guarantee_ of that. That's an
implementation detail in the API that you shouldn't count on. Thus my
comment about "for absolute correctness".

In your case, the first non-"Bad code" block Jon shows should suffice.

Pete
 
A

Alberto Poblacion

John Grandy said:
What is the most efficient technique to read a binary file ( such as a
.dll or .exe ) into a byte[] ?

If we interpret "most efficient" as "the one that requires me to write
the least amount of code", then it's probably File.ReadAllBytes:

using System.IO;
....
byte[] theBytes = File.ReadAllBytes(pathname);
 
G

Göran Andersson

John said:
Hi Peter, and thanks for the response.

So, the techniques in the following article :

http://www.yoda.arachsys.com/csharp/readbinary.html

are generally not necessary ? What special cases would require looping ?

For two reasons:

1. There is no guarantee that it works without looping. You could dig
deep in the actual implementation to find out if it is ever needed when
reading from a file or not, but you shouldn't rely on uncodumented
implementation details like that.

2. Code spreads. Even if it works for reading a stream that is a local
file, it won't work for a stream that is slower, like a web response.
When you copy your code that works fine in one project into another
project, it will fail.

Besides, the File.ReadAllBytes method already has the code, correctly
implemented.
 
A

Arto Viitanen

John said:
What is the most efficient technique to read a binary file ( such as a .dll
or .exe ) into a byte[] ?

Are there advantages to using a BinaryReader ?

Thanks.

If I don't remember wrong, W. Richard Stevens et al compared different
ways to copy a file to another in "Advanced Programming in the UNIX
Environment". Memory mapped files turn out to be fastest way. Ok, it was
both reading and writing, so memory mapped files might no be fastest in
reading. But bigger catch is, memory mapped files are not in .NET until
version 4.0.
 
P

Peter Duniho

Arto said:
John said:
What is the most efficient technique to read a binary file ( such as a
.dll or .exe ) into a byte[] ?

Are there advantages to using a BinaryReader ?

Thanks.

If I don't remember wrong, W. Richard Stevens et al compared different
ways to copy a file to another in "Advanced Programming in the UNIX
Environment". Memory mapped files turn out to be fastest way. Ok, it was
both reading and writing, so memory mapped files might no be fastest in
reading. But bigger catch is, memory mapped files are not in .NET until
version 4.0.

I don't understand this response. Did "W. Richard Stevens et al" do
their comparison using .NET code? In a "...UNIX Environment" book?

If not, then their analysis isn't really that applicable to this question.

A determined person can use memory mapped files in .NET pre-4.0 (which
is still only in beta), by using p/invoke to access the unmanaged API
for memory mapped files.

I doubt for reading data into an array it would be all that worthwhile
though. I have seen in my own programs that for files that are read
just once, the bottleneck is the disk (subsequent reads of the same data
can go much faster though...10-20 times faster, or even more I've seen
in some cases, due to caching).

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top