Reading unknown number of strings from a file using C#

D

dgleeson3

Hello All

I have a txt file of strings of different lengths.
I dont know how many strings are in the file.

I have no problem reading the file and sending to the console (as
below).

To store the strings read, in a buffer, I had decided to use an array
of strings.

However I must know the array size in c# unlike in C++.

Ho do I read the strings (of unknown number) into a buffer?
I will be able to specify a reasonable maximum number of strings in
the file.



using (StreamReader sr = new StreamReader(path))
{
String line;
// Buffer index
//int i=0;

// Read and display lines from the file
until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
// Data_Buffer[0] = line;
//i++;
Console.WriteLine(line);
}
}
} // if


Many thanks for any help.


Regards

Denis
_____________________
http://www.CentronSolutions.com
 
M

Marc Gravell

Well, you could use a List<string> instead of an array, and just
..Add(line) each time

However, if possible you might try to stream them (i.e. read and
process individually / in small batches) instead of all-at-once to
save having them all loaded... but this is not always possible.

Marc
 
B

Bill Butler

Hello All

I have a txt file of strings of different lengths.
I dont know how many strings are in the file.

I have no problem reading the file and sending to the console (as
below).

To store the strings read, in a buffer, I had decided to use an array
of strings.
However I must know the array size in c# unlike in C++.
<Snip>

Could you elaborate on this statement, since at face value it is untrue.
Arrays a fixed size in in C++.

Bill
 
D

dgleeson3

Hi Guys

Yes Bill sorry. I put that incorrectly.

In C++ I would go

string Buffer[MAX_FILE_SIZE]; // Where
MAX_FILE_SIZE would give me the maximum number of strings that could
be in the file
int i=0;

while ((line = Read_A_line_From_File()) !=
null)
{
Buffer = line;
i++;
}

Maybe I can do similar in C# ?

Thanks

Denis
 
M

Marc Gravell

You *could* do
string[] buffer = new string[MAX_FILE_SIZE]

But generally a List<string> will be fine and will be much more
memory-efficient for the typical case.

Marc
 
D

dgleeson3

Thanks Marc

Ill check out both techniques for my own understanding.

I see that

#define MAX_DATA_BUFFER_LENGTH 200

isnt acceptable C#.

Whats the alternative?

Denis
 
M

Marc Gravell

const int MAX_DATA_BUFFER_LENGTH = 200;

This must be inside a class, not just out somewhere in the ether.
Typically the class that uses it - otherwise you'll have to mark it as
"internal" or "public", and (from the other classes) refer to it as
SomeClass.MAX_DATA_BUFFER_LENGTH

Marc
 
A

Analizer1

i have a Black box type class..where it actually sizes the one time.....for
speed
1. creation max count.....
i read say 30 lines of the orig file 1st find the largest / widest row
in this small set
lines u read are just to get a basis of a likely largest /widest row.
a. along with this test for largest/widest line i look for what char is
End of line
CRLF, CR only, Embedded LF with CRLF these are the most common
CTRL_Z usually last byte in file..so i just Disregard that

2. iTmp= (devide/largest row / 2) then create the array (iTmp/filesize)
what you want to achieve is a Array larger then the amount of rows you
have in the file
then at the end of the array load.
3. Create you byte array size of widest row*2 and create it outsize of the
read loop, reuse it
while reading , i check bytes read against the current WidestRow and if
its larger, I then
create a New byte arrary size of new widest row * 2


resize the array down to actual rows used.....


using a Arraylist.add() if you have a large file..you mite as well take a
vacation...so this is not a option

I tested the above with a million plus rows...and its blazing Fast

In my Case i only load the row offsets in the array...1 million row offsets
takes about 6 seconds....
meg size files a blink of a eye, then i just use this array of row offsets
to move to up or down
and returning rows as Needed. this is fast also since im only adding or
subtracting or updationg a
row pointer (long iPosition)
I have a getRow () method that returns the row based on the iPosition

hope this helps
Dave
 
M

Marc Gravell

using a Arraylist.add() if you have a large file..you mite as well
take a vacation...so this is not a option

Sorry, but that just simply isn't so (metrics below using
List<string>, but since string isn't boxed they should be quite
comparable). List<T> uses doubling, so it resizes itslef very quickly.
By my metrics, the List<T> approach is slower, yes, but only by a
small factor (~5). Given the amount of IO involved, this is nothing -
i.e. stroring 1M strings in a List<T> takes 59ms on my lowly laptop.
Maybe you take short vacations... I think all the faff you are doing
(including rewinding the stream) will bring them quite close.

In the test, for *both* cases I am simply storing nulls. Since we are
talking reference types, this is perfectly fine and doesn't affect the
results. The memory *in the list and array* is the same either way.
Again, most of the weight it either approach would be due to memory /
IO requirements for the *actual* data, which does not depend on the
storage mechanism.

output:
{size} {ticks for list} vs {ticks for fixed array} {multiplier}
({ms for list})
5 37 vs 6: 6.16666666666667 (0)
10000 3118 vs 602: 5.17940199335548 (0)
100000 23339 vs 4039: 5.77841049764793 (6)
1000000 214532 vs 50344: 4.261322103925 (59)
10000000 2606905 vs 472739: 5.51446992949598 (728)

Code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
static class Program {
static void Main() {
Test(5); // to get JIT etc
Test(10000);
Test(100000);
Test(1000000);
Test(10000000);
}
static void Test(int size) {
Stopwatch watch = new Stopwatch();
watch.Start();
List<string> list = new List<string>();
for (int i = 0; i < size; i++) {
list.Add(null);
}
watch.Stop();
long tickList = watch.ElapsedTicks;
long msList = watch.ElapsedMilliseconds;

watch = new Stopwatch();
watch.Start();
string[] array = new string[size];
for(int i = 0; i < size; i++) {
array = null;
}
watch.Stop();
long tickArray = watch.ElapsedTicks;

Console.WriteLine("{0}\t{1} vs {2}:\t{3} ({4})", size,
tickList, tickArray,
(tickList * 1.0) / tickArray, msList);

}
}
 
J

Jon Skeet [C# MVP]

i have a Black box type class..where it actually sizes the one time.....for
speed

<snip>

All of that sounds like a really bad idea compared with the simplicity
of just calling StreamReader.ReadLine() repeatedly and adding the
results into an ArrayList or a List<T>.

I doubt that you can find many examples where the performance
difference is significant, but the *complexity* difference is
significantly in favour of reusing the existing .NET classes.

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top