Opening Large Binary file efficiently

G

Gina_Marano

Hey all,

I need to validate a large binary file. I just need to read the last
100 or so bytes of the file.

Here is what I am currently doing but it seems slow:

private bool ValidFile(string afilename)
{
byte ch;
bool bGraybar = true;
FileStream fs = null;
BinaryReader reader = null;

try
{
fs = File.Open(afilename, FileMode.Open);
reader = new BinaryReader(fs);

fs.Seek(-101, SeekOrigin.End);

for (int i = 0; i <= 100; i++)
{
ch = reader.ReadByte();
if (ch != 0)
{
bGraybar = false;
break;
}
}

if (bGraybar)
return false;
else
return true;
}
finally
{
if (reader != null)
reader.Close();
if (fs != null)
{
fs.Close();
fs.Dispose();
}
}
}

~Gina~
 
G

Gina_Marano

Hey guys,

Actually, it is slow because of my network.

Can I get a code review anyhow?

~Gina~
 
P

Peter Duniho

Gina_Marano said:
Hey guys,

Actually, it is slow because of my network.

Can I get a code review anyhow?

Well, the other reason it's slow is that you read one byte at a time. If
you know you want to read 100 bytes, then read 100 bytes with a single call
to ReadBytes and then process the data from the byte array directly.

Of course, waiting on the network will slow you down too, but no reason to
make things worse than necessary.

As far as the rest of the code goes, I don't think you need to call Dispose
on the filestream, but otherwise I don't see anything obvious that I'd
change. It's not clear to me why you read the last 101 bytes instead of
100, but you did say "100 or so bytes", so I guess there's probably nothing
wrong with that. :)

Pete
 
G

Guest

For safety sake you should probably check that the file length is >= the
number of bytes you want to read.
 
P

Peter Duniho

KH said:
For safety sake you should probably check that the file length is >= the
number of bytes you want to read.

Perhaps. Though, presumably the files the person is talking about are
assured of being large enough if valid, and an exception will be thrown (and
handled) by the code if they are not valid (in that way, or perhaps other
ways, such as being locked for reads). Such a check may be superfluous.
 
K

Kev

Relying on exceptions to be thrown is sloppy coding in my mind - and not
what they are intended for.

If you can do a simple check to prevent an exception being raised then do
it. Exceptions trigger the CPU interrupt line and this causes the CPU to
stop what it is doing, store current info to the stack, handle the
exception, then reload data back off the stack and continue what it was
doing (ok, that was a really rough description - don't quote me exactly).
Why halt the CPU when you can do a simple check that does not have this side
effect?

I am not saying do not use try catch - you should use it often, just use it
to handle error situations you can't necessarily check for before the
operation in question.

"Presumably", and "assured of" are not terms I associate with good reliable
software design.

Cheers
 
P

Peter Duniho

Kev said:
Relying on exceptions to be thrown is sloppy coding in my mind - and not
what they are intended for.

IMHO, you are making too much of this. But since you brought it up, let's
look at your comments...
If you can do a simple check to prevent an exception being raised then do
it.

And do what? The OP's code already has a try/finally. There are a wide
variety of things that could go wrong in just the few lines of code he has,
especially since he's reading the file over a network. How is it better to
add an extra check, just to avoid having the Seek throw an exception, when
all he's likely to do is fall out of the code to the finally anyway.

Why not check for all the other things that could cause an exception?
Exceptions trigger the CPU interrupt line and this causes the CPU to stop
what it is doing, store current info to the stack, handle the exception,
then reload data back off the stack and continue what it was doing (ok,
that was a really rough description - don't quote me exactly). Why halt
the CPU when you can do a simple check that does not have this side
effect?

Because exception handling is for exceptional situations. Not that I really
agree with your characterization of "halting the CPU" anyway, but why would
you slow down the common case, just to save some time in the exceptional
case?

In fact, that's one of the nice things about exception handling. You can
write all of the code as if everything will work fine, not wasting code or
time on expensive checks like retrieving the file length and comparing it to
the minimum required length. After all, the code underlying Seek is going
to have to make that check anyway.

So, you're suggesting that we write the code in a way that forces the exact
same check to happen twice each time through the code, just so that in the
rare case when an exception happens, the exception can be handled more
quickly?
I am not saying do not use try catch - you should use it often, just use
it to handle error situations you can't necessarily check for before the
operation in question.

Well, I simply disagree. IMHO, it's a waste of time checking things that
the code you're calling is going to have to check anyway, especially if your
handling of a failure of the check is identical to how you'd handle an
exception.
"Presumably", and "assured of" are not terms I associate with good
reliable software design.

I used those terms because the code is not mine, and I don't have the full
information regarding the situation in which the code will be used (or even
of other code related to the problem). It makes no sense for you to assume
I'm using those terms as a programming concept, when in fact my use of those
terms has to do with my relationship (or rather, lack thereof) with the OP
and his code.

(Not that I think "assured of" is in any way a negative thing to consider
with respect to code anyway...seems to be, being "assured of" something is a
*good* thing. As in, "I am assured that the compiler will generate the
correct output given my source code").

Pete
 
P

Peter Duniho

Peter Duniho said:
[...] Why halt the CPU when you can do a simple check that does not have
this side effect?

Because exception handling is for exceptional situations. Not that I
really agree with your characterization of "halting the CPU" anyway, but
why would you slow down the common case, just to save some time in the
exceptional case?

And, by the way, I'll point out that while branch prediction in CPUs is a
fairly mature technology, branches can still be mispredicted. Exception
handling allows you to take branches out of the code entirely, ensuring that
your execution pipeline won't get flushed in the common case (well, not any
more often than is strictly necessary, anyway).
 
W

Willy Denoyette [MVP]

Gina_Marano said:
Hey guys,

Actually, it is slow because of my network.

Can I get a code review anyhow?

~Gina~

What kind of network are you talking about and what do you call *slow*? The size of the file
doesn't matter at all, reading the last 100 bytes of a giant file over the network must be
as fast as reading a tiny 100 bytes file. Over a 10MB ethernet it should take less than say
100 msec.

Willy.
 
G

Gina_Marano

Hey Willy,

I will have to check this out again.

I am running over a VPN. I would have thought it would have been zippy
as well but it isn't 100ms. The files sizes are typically 10-15mb
files.

Since the production environment is all local there is no problem. But
I too thought it should be much faster.

~Gina~
 
G

Gina_Marano

Now, now boys. Your making me blush here. No need to fight over
little'ole me. :)

~Gina~

Peter said:
Peter Duniho said:
[...] Why halt the CPU when you can do a simple check that does not have
this side effect?

Because exception handling is for exceptional situations. Not that I
really agree with your characterization of "halting the CPU" anyway, but
why would you slow down the common case, just to save some time in the
exceptional case?

And, by the way, I'll point out that while branch prediction in CPUs is a
fairly mature technology, branches can still be mispredicted. Exception
handling allows you to take branches out of the code entirely, ensuring that
your execution pipeline won't get flushed in the common case (well, not any
more often than is strictly necessary, anyway).
 
W

Willy Denoyette [MVP]

Gina_Marano said:
Hey Willy,

I will have to check this out again.

I am running over a VPN. I would have thought it would have been zippy
as well but it isn't 100ms. The files sizes are typically 10-15mb
files.

Since the production environment is all local there is no problem. But
I too thought it should be much faster.

Well, while 100msec is something you could expect over a local switched LAN, thing may be
slower over VPN, anyway wat matters is the network latency not the file size.

Willy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top