StreamReader.Read moving position more than count argument specifies

B

Brett Gerhardi

Hi all, I am trying to go to the end of a file and read x amount of lines
from the end. For some reason I am finding that the StreamReader.Read is not
truthful about the way it should work. I am simply trying to read in blocks
of the file from the end using read and seek and record each newline
position and this is where the code doesn't seem to be behaving itself.

Here is the code to reproduce:
private const int FILE_SEEK_BUFFER_SIZE = 64;

public string LoadAndReadLinesFromFile(string aFilename, int aLinesToShow)

{
FileStream mFile;
if (!File.Exists(aFilename))

{

mFile = null;

throw new FileNotFoundException("File not found", aFilename);

}

else

{

mFile = File.Open(aFilename, FileMode.Open);

}

StreamReader mFileReader = new StreamReader(mFile);

// seek from the back of the file in the buffered amounts and count for the
amount of newline characters.

// and remember where they all are

ArrayList newLines = new ArrayList(aLinesToShow);

mFile.Seek(0, SeekOrigin.End);

newLines.Insert(0, mFile.Position); // record the end of the file pos at the
end of the array

mFile.Seek(-FILE_SEEK_BUFFER_SIZE, SeekOrigin.Current); // set initial
position for reading

char[] buffer = new char[FILE_SEEK_BUFFER_SIZE];

//byte[] buffer = new byte[FILE_SEEK_BUFFER_SIZE]; // this works fine

int searchStartPosition = FILE_SEEK_BUFFER_SIZE;

int test;

// loop until we have the right amount of newlines in our array

while (newLines.Count < aLinesToShow + 1) // +1 to include the end newline

{

test = mFileReader.Read(buffer, 0, FILE_SEEK_BUFFER_SIZE);

//test = mFile.Read(buffer, 0, FILE_SEEK_BUFFER_SIZE); // this works fine

mFile.Seek(-FILE_SEEK_BUFFER_SIZE*2, SeekOrigin.Current); // seek back twice
so we're in the correct position for the next loop

// search buffer loop until we have found all \n's or \r\n's

}

(This will end up exceptioning as I've ommitted the code which adds the
found newlines to the array, but it should demonstrate the problem ok as it
happens early in the loop)
I am using MS SMTP logs which pad out the end of the files with nulls
(/0/0/0/0 etc) and I wonder whether this could be the cause of the problem.

OK. The behaviour I see with the null padded file is that the mFile.Position
goes down from 65536 in 64 byte increments fine until 65280 .. then on the
mFileReader.Read call the position jumps to 65536 again and starts going
backwards again. The test return variable I have on the Read call shows 64,
although this is obviously not accurate. As the code suggests, if I use the
normal FileStream.Read, it behaves normally.

Is this known expected behaviour, it doesn't seem to be documented anywhere
I've seen?

Cheers, regards
-=- Brett
 
J

Jon Skeet

Is this known expected behaviour, it doesn't seem to be documented anywhere
I've seen?

It sounds like expected behaviour to me - StreamReader has an internal
buffer which it uses to avoid making too much small IO calls. From
StreamReader.BaseStream:

<quote>
StreamReader might buffer input such that the position of the
underlying stream will not match the StreamReader position.
</quote>
 
B

Brett Gerhardi

Thanks Jon, I missed that - I understand now that it probably isn't a good
idea to try to use the basestream position directly when using
StreamReader.Read because of the buffering it does.

Thanks again
-=- Brett
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top