How to read a text file really fast??

G

G.Esmeijer

Friends,
I would like to read a text file (fixed length formaated) really fast and
store the data into an Access database (2003).
Using the streamreader and reading line by line, separating the line into
string just takes to long.

When I Import the file with access manually goes fast. But how to use this
fromout a C# programme

who has done this before and who can give met some answers

Gerrit Esmeijer
 
J

Jon Skeet [C# MVP]

G.Esmeijer said:
I would like to read a text file (fixed length formaated) really fast and
store the data into an Access database (2003).
Using the streamreader and reading line by line, separating the line into
string just takes to long.

When I Import the file with access manually goes fast. But how to use this
fromout a C# programme

who has done this before and who can give met some answers

It would help if you showed the code you're currently using. I suspect
it's not the reading which is taking the time, but the process you're
using for handling each line.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
 
J

J. Jones

G.Esmeijer said:
Friends,
I would like to read a text file (fixed length formaated) really fast and
store the data into an Access database (2003).
Using the streamreader and reading line by line, separating the line into
string just takes to long.

When I Import the file with access manually goes fast. But how to use this
fromout a C# programme

who has done this before and who can give met some answers

Gerrit Esmeijer

I've done it before, and StreamReader line-by-line was as fast as it gets in
C#. I was able to process 100 MB text files in under 10 seconds on my average
machine.

What does "takes to long" mean to you?

I'd suspect that most of your time is spent in processing the text to the
database, and not the action of reading the text. Have you profiled to
definitively determine that the StreamReader is the slow part?
 
J

Jason Black [MSFT]

I would like to read a text file (fixed length formaated) really fast and
store the data into an Access database (2003).
Using the streamreader and reading line by line, separating the line into
string just takes to long.

When I Import the file with access manually goes fast. But how to use this
fromout a C# programme

The fastest way I can think of is ugly, but perhaps worth investigating. If
your lines of text are all exactly the same length, and the fields within
the line are in exactly the same places, you could read the entire file into
an array of char, and use a struct with explicit layout to get the fields.
For instance, if your lines have fields A, B, C, and D starting at positions
0, 8, 12, and 27, with a total line length of 60 characters, then you could
do something like this:

[StructLayoutAttribute(LayoutKind.Explicit)]
struct RecordParser {
[FieldOffsetAttribute(0)] char[8] A;
[FieldOffsetAttribute(8)] char[4] B;
[FieldOffsetAttribute(12)] char[15] C;
[FieldOffsetAttribute(27)] char[33] D;
}

....

public unsafe void ParseFile() {
char[] theFile = ...; // do whatever you need to do to read the file
into a char[] buffer
int lineLength = 60;
RecordParser* record;
for(int RecordPosition=0; RecordPosition < theFile.Length;
RecordPosition+=lineLength) {
record = (RecordParser *) (&(theFile[RecordPosition]));
// do something useful with record->A, record->B, etc
}
}

Depending on the actual types you're trying to extract out of your lines of
text, you may have to do some additional work "do something useful" stage to
parse the individual fields into the proper types (e.g. converting "120.17"
to an actual float or double value, trim trailing whitespace, etc.)

Note that because we're using unmanaged pointers to coerce raw memory
(slices of the char array) into acting like structs, we're violating the
Common Language Runtime's type system and therefore we must mark the
ParseFile method as "unsafe" to satisfy the compiler that we know what we're
doing (for that matter, you may also need to mark the RecordParser struct as
"unsafe" as well; I'm not sure about that offhand). Also, you must add the
"/unsafe" command line switch to your compilation command.

For a much more thorough treatment of this type of memory access from C#,
see chapter 10 of Don Box's excellent book "Essential .NET Volume 1".
 
T

Tim Jarvis

G.Esmeijer said:
Friends,
I would like to read a text file (fixed length formaated) really fast
and store the data into an Access database (2003).
Using the streamreader and reading line by line, separating the line
into string just takes to long.

When I Import the file with access manually goes fast. But how to use
this fromout a C# programme

who has done this before and who can give met some answers

Gerrit Esmeijer

I had this issue as well, and not from .NET but a Win32 app, the issue
for me was the actual insertion into the database (it was access, but
using MSDE backend) the symptoms were it would start off fast enough,
but quickly slow down.

so no matter how fast I read the file for the bulk insert it didn't
help. In the end what I did was create a Data Transformation Service
(DTS) package and used that for the insert, it then inserted as fast as
an import directly from Access. Note you can create and run DTS
packages programatically. (link below)

http://tinyurl.com/6s3cx

Now this helped me when using Access with a MSDE backend (I note you
are using Access 2003 hence the assumption that you may possibly be
using MSDE) if it is a simple .mdb database then I don't think DTS
packages are suitable.

Regards Tim.
 
P

Peter N Roth

IIRC, there is a ReadEntireFile() or ReadAll() ... something
like that. I'm not sure where it is in the framework...

It reads the entire file into memory, then you use
the other methods to parse it.

As others have said, "iff the file is the problem". Usually,
bottlenecks are not where you "know" they are until
you measure.
--
Grace + Peace,
Peter N Roth
Engineering Objects International
http://engineeringobjects.com
Home of Matrix.NET
 
G

G.Esmeijer

Thanks for all the responses. The general response pointed me into the right
direction. Not the reading but the processing of each line took so much
time.
Since I know the number of line to read I add that many row to the grid and
the put the results in there. It really is very fast.
Gerrit Esmeijer
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top