Fastest way to read a file

M

Mark Broadbent

Does anybody know what is (factual please -not just guess) the quickest
method to read data from a file? I am not interested in the format of the
data (i.e. blocks, bytes, string etc) just that the IO to read the data is
very quick. I am currently using a Streamreader and have found the readline
method to perform slightly better than the read method (although it is nice
to have the read's granuality of one byte). Is there any faster reader that
I can use? (only interested in reads -not writes).

Thanks.

Mark.
 
E

Eugene Vtial

Mark said:
Does anybody know what is (factual please -not just guess) the quickest
method to read data from a file? I am not interested in the format of the
data (i.e. blocks, bytes, string etc) just that the IO to read the data is
very quick. I am currently using a Streamreader and have found the readline
method to perform slightly better than the read method (although it is nice
to have the read's granuality of one byte). Is there any faster reader that
I can use? (only interested in reads -not writes).

Thanks.

Mark.

Not sure if this is the fastest but you could give it a try.


public static string FileToStr(string cFileName)
{
//Create a StreamReader and open the file
StreamReader oReader = System.IO.File.OpenText(cFileName);

//Read all the contents of the file in a string
string lcString = oReader.ReadToEnd();

//Close the StreamReader and return the string
oReader.Close();
return lcString;
}
 
M

Morten Wennevik

Hi Mark,

You're not telling us what format your data is in. If it is raw data, then StreamReader will treat it as text and it will be faster reading directly from the FileStream. I tried with ~30mb binary file and StreamReader took roughly three times longer to read all the bytes.

However, since you are using StreamReader to begin with you probably have a text file. Still, it might be worth trying to read it as a byte array with FileStream.Read and convert the bytes to string using the Encoding class.
 
W

William Stacey [MVP]

Just 2 cents. I would use a "using" statement to wrap the StreamReader
object. That way it will be closed even if exception in opening file or in
the read.
Something like (from memory):

public static string ReadFile(string path)
{
using(StreamReader sr = new StreamReader(path))
{
return sr.ReadToEnd();
}
}
 
M

Mark Broadbent

As I said factual please not guess. I can try variants myself (and have).
ReadToEnd has no performance gain over ReadLine.
 
M

Mark Broadbent

Hi Morten. As I said Im not really interested in whatever format a file is
in, I am doing a comparison between files, therefore only need to compare
bits. Thanks for info on StreamReader , I had an thought that this might not
be the best IO object. Using FileStream, the same kind of operation on a
like for like basis (200MB) completes in 1 minute (compariable time to
StreamReader ReadLine) wheras StreamReader would have taken 1m 30secs.
I guess I could do async on both files which should speed things up even
further.

Thanks again.

Mark.
 
M

Morten Wennevik

Well, reading the 30Mb file took about 0.1 second using a FileStream, so a 200mb shouldn't take all that much longer. Most likely it is the comparison that takes up the time, or you read 'too few' bytes in each read.

Try to limit reading

Hi Morten. As I said Im not really interested in whatever format a file is
in, I am doing a comparison between files, therefore only need to compare
bits. Thanks for info on StreamReader , I had an thought that this might not
be the best IO object. Using FileStream, the same kind of operation on a
like for like basis (200MB) completes in 1 minute (compariable time to
StreamReader ReadLine) wheras StreamReader would have taken 1m 30secs.
I guess I could do async on both files which should speed things up even
further.

Thanks again.

Mark.
 
W

Willy Denoyette [MVP]

Hmm... 30MB file in 0.1 second, you must have the whole file data in the
cache, so what you are measuring is the memory to memory transfer rate.
Please if you run file IO benchmarks flush the file cache before each run.


Willy.

Morten Wennevik said:
Well, reading the 30Mb file took about 0.1 second using a FileStream, so a
200mb shouldn't take all that much longer. Most likely it is the
comparison that takes up the time, or you read 'too few' bytes in each
read.

Try to limit reading
 
W

Willy Denoyette [MVP]

Mark Broadbent said:
Hi Morten. As I said Im not really interested in whatever format a file is
in, I am doing a comparison between files, therefore only need to compare
bits. Thanks for info on StreamReader , I had an thought that this might
not be the best IO object. Using FileStream, the same kind of operation on
a like for like basis (200MB) completes in 1 minute (compariable time to
StreamReader ReadLine) wheras StreamReader would have taken 1m 30secs.
I guess I could do async on both files which should speed things up even
further.

Thanks again.

Mark.


All FCL file IO wrappers are simple wrappers over the one and only ReadFile
Win32 IO API so their performance level will be more or less the same when
using comparable buffer sizes at the core.
The differences in functionality/complexity of the wrapper classes only
plays a very small role in the overall IO transfer rate between disk an
process memory.
The Filestream class is the one designed for simple Buffered File IO, so
it's the fastest, but again the differences with others are hard to measure
when used with comparable buffer sizes.

Willy.
 
M

Morten Wennevik

Oh yeah, indeed :p Uncached FileStream seemed to take around 0.6 seconds on a fairly slow disk (assuming it is now uncahced). How would I go about ensuring the cache is flushed?


Hmm... 30MB file in 0.1 second, you must have the whole file data in the
cache, so what you are measuring is the memory to memory transfer rate.
Please if you run file IO benchmarks flush the file cache before each run.


Willy.
 
W

Willy Denoyette [MVP]

Morten Wennevik said:
Oh yeah, indeed :p Uncached FileStream seemed to take around 0.6 seconds
on a fairly slow disk (assuming it is now uncahced). How would I go about
ensuring the cache is flushed?

Note that I'm talking about the File System cache not the FileStream cache,
once you have read a file or a portion of a file that file (or portion) will
be in the FS cache. So your experiment is still using the FS cache.
You can eliminate the FS caching behavior:
1. By openeing the file unbuffered, that is the file data is directly passed
from the driver to the application buffer.
2. By flushing th FS cache for this file.
Both aren't exposed by the FCL, so you'll have to PInvoke (1)
CreateFile()with FILE_FLAG_NO_BUFFERING , (2) FlushFileBuffers() API's

Willy.
 
R

RichS

I've found that MemoryMapped files are the quickest form of File IO.
I've only ever used C++ for MM files, so am not sure if this
functionality is available natively with .Net, or if you would have to
use interop.

RichS
 
M

Morten Wennevik

Yes, the FileStream/StreamReader cache is flushed on Close() so I knew you didn't mean that. I'll try to ensure the cache is flushed from now on, thanks :)
 
M

Mark Broadbent

Yeah I agree, thanks. The performance could be better, but I guess it is
acceptable and I will see how far I can go to optimise. Reading in two
different threads should hopefully improve on what I have.

With respect to your discussion of the flushing of the cache to disk,
another method that could be used is the freeware systems internals app
"sync.exe" which presumably should perform the same function.

Thanks guys.

Mark.
 
W

Willy Denoyette [MVP]

RichS said:
I've found that MemoryMapped files are the quickest form of File IO.
I've only ever used C++ for MM files, so am not sure if this
functionality is available natively with .Net, or if you would have to
use interop.

RichS

No they are not faster, the accesses are faster once mapped into your
process space, but mapping file portions involves IO reads and these are not
faster than any other managed or unmanaged read IO. Also this is not
desirable at all when there is only one single process accessing the file.
MemoryMapped files are great for sharing file data that's all.

Willy.
 
Joined
Oct 13, 2013
Messages
6
Reaction score
0
Does anybody know what is (factual please -not just guess) the quickest
method to read data from a file? I am not interested in the format of the
data (i.e. blocks, bytes, string etc) just that the IO to read the data is
very quick. I am currently using a Streamreader and have found the readline
method to perform slightly better than the read method (although it is nice
to have the read's granuality of one byte). Is there any faster reader that
I can use? (only interested in reads -not writes).

Thanks.

Mark.

From .Net 4.0+, for those who are interested in micro-optimization techniques, the absolute fastest way in most cases is by the following:
Code:
using (StreamReader sr = File.OpenText(fileName))
  {
          string s = String.Empty;
          while ((s = sr.ReadLine()) != null)
          {
                 //we're just testing read speeds
          }
  }
Put up against several other techniques, it won out most of the time, including against the BufferedReader.

Here's the article which benchmarks multiple techniques to determine the fastest way.

blogs ^ davelozinski ^ com/curiousconsultant/csharp-net-fastest-way-to-read-text-files

(replace the "^" with "."

Definitely worth a look for those interested in the various speed performances on multiple techniques.

_
 
Joined
Oct 13, 2013
Messages
6
Reaction score
0
Well, there have been many benchmarks. This blog article uses code to demonstrate that the fastest way to read a text file is with the age old method:

Code:
using (StreamReader sr = File.OpenText(fileName))
{
    string s = String.Empty;
    while ((s = sr.ReadLine()) != null)
    {
        //we’re just testing read speeds
    }
}

Reference:
blogs.davelozinski ^ com/curiousconsultant/csharp-net-fastest-way-to-read-text-files

HOWEVER, if you need to lots of processing, this article benchmarks the fastest way to both read and process a text file. Basically, implementing parallel processing. Here’s the basic code snippet:

Code:
AllLines = new string[MAX]; //only allocate memory here
using (StreamReader sr = File.OpenText(fileName))
{
    int x = 0;
    while (!sr.EndOfStream)
    {
        AllLines[x] = sr.ReadLine();
        x += 1;
    }
} //CLOSE THE FILE because we are now DONE with it.
Parallel.For(0, AllLines.Length, x =>
{
    DoStuff(AllLines[x]);
});

Reference:
blogs.davelozinski ^ com/curiousconsultant/the-fastest-way-to-read-and-process-text-files


_
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top