Reading in data from very large flat files

  • Thread starter Thread starter booksnore
  • Start date Start date
B

booksnore

I have to read data from a flat file with millions of records. I wanted
to find the most efficient way of doing this. I was just going to use a
StreamReader and then break up the input line using Substring as there
are no delimiters however I have a spec for the format of the file. Is
using Substring the only way to do this or is there a more efficient
way?


while ((line = sr.ReadLine()) != null)
{
string param1 = line.Substring(0,5);
string param2 = line.Substring(5,2);
//etc..etc..
}

regards,

Joe
 
booksnore said:
I have to read data from a flat file with millions of records. I wanted
to find the most efficient way of doing this. I was just going to use a
StreamReader and then break up the input line using Substring as there
are no delimiters however I have a spec for the format of the file. Is
using Substring the only way to do this or is there a more efficient
way?


while ((line = sr.ReadLine()) != null)
{
string param1 = line.Substring(0,5);
string param2 = line.Substring(5,2);
//etc..etc..
}

That's a pretty efficient way of reading it. Are you then storing the
data in memory, or just processing each line in turn? If you're storing
them and there are lots of little fields, you might consider storing
just the whole line, and breaking it into bits when it's used. Each
string has a certain overhead, and if you have lots of strings with
just a few characters, that overhead could become significant.
 
Depends on what you're doing with the data once you read it. If you only
need to read the data sequentially then that's a good method. If you need to
frequently jump from 1 record to another randomly then you should be able to
make use of each line being a fixed width to jump ahead/backwards in the
file and read only the desired information as required. But like Jon said
that's probably the best way (there are other ways but the gain wouldn't be
worth the coding, trust me) if you are reading the file sequentially.
 
Thanks for the replies. There will be some validation checks made on the
values of the resulting variable assignments but that will be on a line
by line basis (so for example I won't have to jump from one record and
check something against the last 10 records). The next step is that the
data is loaded into SQL Server following the validation checks, I was
going to batch insert by creating an xml document and feeding a stored
procedure using OPENXML. I am also going to performance test that method
against a DTS package load although I am not sure to what degree I can
perform effective validation checks uses DTS.

Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top