get the actual size of a file

  • Thread starter Stephan Steiner
  • Start date
S

Stephan Steiner

Hi

Generally,
FileInfo fi = new FileInfo(path);
long size = fi.Length;

gets you the length of a file in bytes. However, when copying files, even
while the copy operation is still in progress, the filesize, as indicated in
Windows Explorer or derived with the above two lines of code, will be the
size of the file once the copy operation has completed. Is there a way to
get the actual number of bytes written to the harddisk while a copy
operation is under way?

The reason I'm asking is that I have to copy rather large files and I'm
currently using File.Copy(input, output) to do this. For a progress
indication, I have a thread that gets the size of the output via the
abovementioned code. Once the file has been copied, I append a second
(binary) file, but prior to starting to append, I set the length of the
output file to the total length the output is going to have. So, my progress
indicator has 2 values only and my thread getting the filesize could just as
well not exist.

The only way around this I can imagine is dump File.Copy, create a new file
manually, and copy the binary data from input to output in chunks of a
certain size. Besides the additional complexity, is there any inheritent
performance disadvantage of such a mechanism versus the built-in file copy
mechanism? I'm just guessing here but I assume the size of I/O buffers could
have a noticeable effect on performance.

Regards
Stephan
 
S

Stephany Young

You are absolutely correct. There will be a noticeable effect on
performance.
 
J

Jon Skeet [C# MVP]

Stephany Young said:
You are absolutely correct. There will be a noticeable effect on
performance.

Well, there will be a noticeable effect on performance depending on the
buffer size. There needn't be a noticeable effect on performance
between File.Copy and copying chunk-by-chunk if the buffer size is
chosen appropriately.
 
S

Stephan Steiner

So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100 MB
to 2 GB, whereas the file to be appended will more likely be in the 10 - 100
MB area.
 
M

Mike

I think, depending on the OS (and if file copy is calling the right APIs),
File.Copy can be a huge winner especailly if both the files are not on the
same machine as the machine the copy is exceuted on. (For instance, \\machA
executes File.Copy("\\machB\foo\bar", "\\machC\foo\bar"). This isn't a
common scenerio, but I was under the impression that in certain
configurations one could avoid the bits going through \\machA at all.

m
 
J

Jon Skeet [C# MVP]

Mike said:
I think, depending on the OS (and if file copy is calling the right APIs),
File.Copy can be a huge winner especailly if both the files are not on the
same machine as the machine the copy is exceuted on. (For instance, \\machA
executes File.Copy("\\machB\foo\bar", "\\machC\foo\bar"). This isn't a
common scenerio, but I was under the impression that in certain
configurations one could avoid the bits going through \\machA at all.

I'm not sure, to be honest. I think I'd want to see it working before
saying for certain either way :)
 
J

Jon Skeet [C# MVP]

Stephan Steiner said:
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100 MB
to 2 GB, whereas the file to be appended will more likely be in the 10 - 100
MB area.

I suspect with buffers larger than about 64K you end up with
diminishing returns - and if the buffers are large enough to get on the
large object heap, the memory won't be compacted. (It'll be collected
after a long time, but not compacted, as far as I know.) Of course, if
your app just runs and then exits after doing this copy, that isn't an
issue.

I suggest you try experiment with buffer sizes to find out what suits
your app best.
 
W

Willy Denoyette [MVP]

Stephan Steiner said:
So what would be a suitable buffer size? I need to make a copy of one file
and append another file to it. The first file can be anywhere from a 100
MB to 2 GB, whereas the file to be appended will more likely be in the
10 - 100 MB area.

All you can do is measure, however, you should keep in mind that ALL file IO
using the Framework IO classes are buffered IO's, that means that
irrespective the buffer size you specify at the API level, the File System
will buffer reads and writes from/to disk in the FS cache. The amount of
bytes buffered depends on the used FS type (NTFS, FAT32, ....) and the usage
pattern (sequential, random, mixed).
So whether you read a byte or 256 KB at a time, the FS will always transfer
a block of data from the disk device to the FS cache. That means that the
transfer rate is theoretically determined by the speed of the physical IO
path, however transferring the data blocks to the FS cache and from the FS
cache further to the IO buffer in your application, means CPU overhead. It's
obvious that the smaller the buffers at the API level the larger the
overhead, buffers below a certain size will saturate the CPU, at which point
the IO rate becomes CPU bound.

So basically we have four determining factors for IO transfer rate:
1. Physical IO system (FS type, disk rotational speed, disk cache size, RAID
level ...)
2. CPU speed and number of...
3. Sequential or Random IO.
4. Buffer size.

I wrote a program to measure the impact of buffer size on the sequential IO
rate and measured the CPU consumption and IO count (logical) and transfer
speed.
Following are the results obtained reading a single large file (10GB) from a
single 10.000RPM, SATA drive (of course your mileage may vary).

blocksize = 16 bytes, cpu = 99,99% speed = 10,44 MB/s, IO = 684444/s
blocksize = 128 bytes, cpu = 78,31% speed = 57,89 MB/s, IO = 474199/s
blocksize = 256 bytes, cpu = 43,42% speed = 56,37 MB/s, IO = 230906/s
blocksize = 2 KB, cpu = 16,98% speed = 57,84 MB/s, IO = 29613/s
blocksize = 4 KB, cpu = 14,63% speed = 56,37 MB/s, IO = 14432/s
blocksize = 8 KB, cpu = 12,76% speed = 56,4 MB/s, IO = 7219/s
blocksize = 16 KB, cpu = 13,23% speed = 57,84 MB/s, IO = 3702/s

What does this tell us:
- The transfer speed is optimal at buffer sizes > 128 bytes, anything
smaller reduces the transfer speed to ~10MB/sec due to CPU saturation.
- Anything larger than 128 bytes doesn't increase the IO rate but reduces
the CPU consumption.
- CPU consumption stabilizes with >4KB buffers (Cache managers overhead). If
you want to further reduce CPU consumption, you will have to perform
unbuffered IO using PInvoke.

Conclusion, anything between 2KB and 8 KB gives you optimal results for both
IO transfer and CPU consumption. Bigger buffers are only a waste of memory,
too small buffers are a waste of CPU resources.


Willy.
 
S

stork

TJB replied to:
Conclusion, anything between 2KB and 8 KB gives you optimal results for both
IO transfer and CPU consumption. Bigger buffers are only a waste of memory,
too small buffers are a waste of CPU resources.

That conclusion is a bit too sweeping given the benchmark. For a 10k
RPM drive, those numbers sound underperforming.

It sounds like your reads are synchronous and we also don't know what
the underlying options that got sent to CreateFile in that case. I
think you can get different results between sequential and random
access caching. Also, it might be interesting to see what happens if
you do a bunch of asynchronous random reads. Doesn't SATA support a
form of command queuing? The drive might be able to do the SCSI like
thing of ordering a batch of reads to conform to where it thinks the
fastest order is.
 
W

Willy Denoyette [MVP]

stork said:
TJB replied to:


That conclusion is a bit too sweeping given the benchmark. For a 10k
RPM drive, those numbers sound underperforming.

It sounds like your reads are synchronous and we also don't know what
the underlying options that got sent to CreateFile in that case. I
think you can get different results between sequential and random
access caching. Also, it might be interesting to see what happens if
you do a bunch of asynchronous random reads. Doesn't SATA support a
form of command queuing? The drive might be able to do the SCSI like
thing of ordering a batch of reads to conform to where it thinks the
fastest order is.
Note that the transfer rates are Buffer from/to Disk NOT Buffer from/to
Host, what makes you think that the numbers are underperforming?
Note that this wasn't meant as a benchmark, my only purpose was to show the
impact of the buffer sizes on CPU consumption and IO throughput for simple
reads.
Anyway to answer some of your questions;
The reads are synchronous sequential from an non-fragmented single disk
using a buffered Filestream IO .NET API.
fs = new FileStream(fileName,
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.None,
blockSize);

No additional options can be specified running v1.1 of the framework.
Running on v2.0 with "Sequentialscan" option results in a 5% increase of the
transfer rate.

Running the same test asynchronously didn't result in a higher throughput
(as expected).

Doing sequential synchronous writes gave aprox. the same figures for the IO
throughput with a smaller CPU overhead compared to the reads.

Wily.
 
S

Stephan Steiner

Sorry I've been so quiet these recent days.

Thanks a bunch for all your valuable suggestions. I have been using rather
large buffers that I have reduced considerably now.

Regards
Stephan
 
J

Joerg Krause

Stephan Steiner said:
[...]

Thanks a bunch for all your valuable suggestions. I have been using rather
large buffers that I have reduced considerably now.

Regards
Stephan

During some work on a similar problem I've made some little
benchmarking:

File size approx. 110 MB, 2.4GHz, 1GB RAM, FW 1.1:

Method MM:SS

File.Copy = 1:41
FileStream = 1:17
FS with 10K = 1:24
FS with 1K = 1:46

I guess that less memory will outperform if swapping starts and
smaller chunks will reduce this effect.
Nethertheless, copying with OS alone was about 1:00, which is bit
faster, indeed.
This was only a short test, not a scientific proven method.

-Joerg
www.joerg.krause.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top