N
NvrBst
Whats the best way to count the lines? I'm using the following code
at the moment:
public long GetNumberOfLines(string fileName) {
int buffSize = 65536;
int streamSize = 65536;
long numOfLines = 0;
byte[] bArr = new byte[buffSize];
using(FileStream br = new FileStream(fileName, FileMode.Open,
FileAccess.Read, FileShare.None, streamSize,
FileOptions.RandomAccess))
for(int i = br.Read(bArr, 0, buffSize); i > 0; i =
br.Read(bArr, 0, buffSize))
if(i == buffSize) foreach(byte A in bArr) { if(A ==
(byte)'\n') numOfLines++; }
else for(int ii = 0; ii < i; ii++) if(bArr[ii] ==
(byte)'\n') numOfLines++;
return numOfLines;
}
I'm able to count files under 300MB files (100MB in about 2 or 3
seconds, 300MB in about 20 seconds). My problem is that I need to
count lines in really big files (1GB -> 2GB files). Is there any
suggestions to make that possible? Or the above basically as good as
it gets?
--Notes--
1. I've tried a lot of combinations of buffSize/streamSize, and 65kb/
65kb seems to work fastest for files above 100MB. Making them higher
doesn't seem to increase speed much, streamSize can be ~8kb without
noticable changes, however, thought it doesn't hurt to keep them the
same.
2. I've tried other classes/methods like StreamReader, LINQ "Count" on
byte[], ReadByte(), etc, they all are usally a lot slower than the
above.
3. foreach seems to work a lot faster than the "for(...)" statment
which is why I do both. Cleaner way to do this?
4. FileOptions.RandomAccess seems to work better than
"FileOptions.SequentialScan" for me, expecially if I call
"CountLines()" twice in a row. IE 300MB files, 20seconds for first,
and ~5seconds for second. Is this normal? Or should SequentialScan
be better for the above? Is disposing whats messing that up?
First Scan is what I'm more-so try'ing to achive since I can play
around with the caching options after I'm able to count once in a
decent amount of time.
Thanks
at the moment:
public long GetNumberOfLines(string fileName) {
int buffSize = 65536;
int streamSize = 65536;
long numOfLines = 0;
byte[] bArr = new byte[buffSize];
using(FileStream br = new FileStream(fileName, FileMode.Open,
FileAccess.Read, FileShare.None, streamSize,
FileOptions.RandomAccess))
for(int i = br.Read(bArr, 0, buffSize); i > 0; i =
br.Read(bArr, 0, buffSize))
if(i == buffSize) foreach(byte A in bArr) { if(A ==
(byte)'\n') numOfLines++; }
else for(int ii = 0; ii < i; ii++) if(bArr[ii] ==
(byte)'\n') numOfLines++;
return numOfLines;
}
I'm able to count files under 300MB files (100MB in about 2 or 3
seconds, 300MB in about 20 seconds). My problem is that I need to
count lines in really big files (1GB -> 2GB files). Is there any
suggestions to make that possible? Or the above basically as good as
it gets?
--Notes--
1. I've tried a lot of combinations of buffSize/streamSize, and 65kb/
65kb seems to work fastest for files above 100MB. Making them higher
doesn't seem to increase speed much, streamSize can be ~8kb without
noticable changes, however, thought it doesn't hurt to keep them the
same.
2. I've tried other classes/methods like StreamReader, LINQ "Count" on
byte[], ReadByte(), etc, they all are usally a lot slower than the
above.
3. foreach seems to work a lot faster than the "for(...)" statment
which is why I do both. Cleaner way to do this?
4. FileOptions.RandomAccess seems to work better than
"FileOptions.SequentialScan" for me, expecially if I call
"CountLines()" twice in a row. IE 300MB files, 20seconds for first,
and ~5seconds for second. Is this normal? Or should SequentialScan
be better for the above? Is disposing whats messing that up?
First Scan is what I'm more-so try'ing to achive since I can play
around with the caching options after I'm able to count once in a
decent amount of time.
Thanks