What's the absolute fastest way...

D

dvestal

Two scenarios:

1) I have a lot of text, and need to write it to a file on a network
drive. What's the absolute fastest way to do that? Currently, I
write 1 line at a time using StreamWriter.Write, and I switch to a new
file after every 50K lines written. By the time this completes, I've
written some 235 files, and 12 minutes have elapsed.

I've tried such things as using a StringBuilder to create the entire
contents of a single file, then writing it all at once, and splitting
files into 500K lines each instead. Each of these make the process
longer. What's the fastest way to dump text into a file?

If it helps understand the context in which the problem occurs, the
text is data for a bulk SQL insert.


2) What is the absolute fastest way to add rows to a DataTable? I
have a compressed file containing a serialized version of data rows.
To parse the file, I open it using the .NET stream-based compression,
read it 1 line at a time, and insert each line into a DataTable. The
vast majority of time is spent calling DataTable.Rows.Add(); it dwarfs
every other step. Any recommendations on how to speed this up?
 
J

Jeroen Mostert

1) I have a lot of text, and need to write it to a file on a network
drive. What's the absolute fastest way to do that?

It depends on your OSes and your network latency. That said, the fastest way
of dealing with network drives is generally either not at all, or everything
at once using optimized methods (like leveraging Robocopy).
I've tried such things as using a StringBuilder to create the entire
contents of a single file, then writing it all at once, and splitting
files into 500K lines each instead. Each of these make the process
longer.

You show no code, but I can imagine a few ways in which this would be slower
than writing individual lines. Have you tried copying entire files, instead
of writing to them?
If it helps understand the context in which the problem occurs, the
text is data for a bulk SQL insert.
If that's the case, don't bother with network files. TDS (SQL Server's
protocol) is generally faster than SMB (the Windows file sharing protocol).
Use the SqlBulkCopy class. If you need additional processing that involves
the server data, or you need to minimize the time taken for the actual bulk
copy, bulk copy to a temporary table and do the rest on the server itself.

If you want to use files anyway, you can try reversing the setup: expose a
network share on the computer generating the files and have BCP copy from
that location, rather than pushing the file to the server and then bulk
loading. This means the bulk load operation will take longer, obviously,
which may or may not be acceptable.
2) What is the absolute fastest way to add rows to a DataTable? I
have a compressed file containing a serialized version of data rows.
To parse the file, I open it using the .NET stream-based compression,
read it 1 line at a time, and insert each line into a DataTable. The
vast majority of time is spent calling DataTable.Rows.Add(); it dwarfs
every other step. Any recommendations on how to speed this up?

Try setting .MinimumCapacity to a reasonable value to minimize
reallocations. If that doesn't help, try DataTable.Load(). This will require
writing a custom IDataReader implementation, which is tedious but not too
hard. Disclaimer: I have no actual performance numbers, these just seem like
good ideas.
 
G

Göran Andersson

Two scenarios:

1) I have a lot of text, and need to write it to a file on a network
drive. What's the absolute fastest way to do that? Currently, I
write 1 line at a time using StreamWriter.Write, and I switch to a new
file after every 50K lines written. By the time this completes, I've
written some 235 files, and 12 minutes have elapsed.

I've tried such things as using a StringBuilder to create the entire
contents of a single file, then writing it all at once, and splitting
files into 500K lines each instead. Each of these make the process
longer. What's the fastest way to dump text into a file?

If it helps understand the context in which the problem occurs, the
text is data for a bulk SQL insert.

The bottle neck here is clearly the network bandwidth, so it doesn't
matter much how you write the text to the files.

If each line is 200 characters, you get about 3 MB/s. That is about 24
Mbit, which sounds reasonable.

If you are using a wireless network, try a cable instead. Other than
that there's not very much you can do. You can't push more data through
the network than the bandwidth allows.
2) What is the absolute fastest way to add rows to a DataTable? I
have a compressed file containing a serialized version of data rows.
To parse the file, I open it using the .NET stream-based compression,
read it 1 line at a time, and insert each line into a DataTable. The
vast majority of time is spent calling DataTable.Rows.Add(); it dwarfs
every other step. Any recommendations on how to speed this up?

Do you really need the data table? As it sounds like most of the work is
done verifying that the data rows fit in the table, perhaps you should
just read the data from each data row and put in a custom object instead.
 
G

Göran Andersson

Two scenarios:

1) I have a lot of text, and need to write it to a file on a network
drive. What's the absolute fastest way to do that? Currently, I
write 1 line at a time using StreamWriter.Write, and I switch to a new
file after every 50K lines written. By the time this completes, I've
written some 235 files, and 12 minutes have elapsed.

I've tried such things as using a StringBuilder to create the entire
contents of a single file, then writing it all at once, and splitting
files into 500K lines each instead. Each of these make the process
longer. What's the fastest way to dump text into a file?

If it helps understand the context in which the problem occurs, the
text is data for a bulk SQL insert.

The bottle neck here is clearly the network bandwidth, so it doesn't
matter much how you write the text to the files.

If each line is 200 characters, you get about 3 MB/s. That is about 24
Mbit, which sounds reasonable.

If you are using a wireless network, try a cable instead. Other than
that there's not very much you can do. You can't push more data through
the network than the bandwidth allows.
2) What is the absolute fastest way to add rows to a DataTable? I
have a compressed file containing a serialized version of data rows.
To parse the file, I open it using the .NET stream-based compression,
read it 1 line at a time, and insert each line into a DataTable. The
vast majority of time is spent calling DataTable.Rows.Add(); it dwarfs
every other step. Any recommendations on how to speed this up?

Do you really need the data table? As it sounds like most of the work is
done verifying that the data rows fit in the table, perhaps you should
just read the data from each data row and put in a custom object instead.
 
H

Harlan Messinger

Two scenarios:

1) I have a lot of text, and need to write it to a file on a network
drive. What's the absolute fastest way to do that? Currently, I
write 1 line at a time using StreamWriter.Write, and I switch to a new
file after every 50K lines written. By the time this completes, I've
written some 235 files, and 12 minutes have elapsed.

I've tried such things as using a StringBuilder to create the entire
contents of a single file, then writing it all at once, and splitting
files into 500K lines each instead. Each of these make the process
longer. What's the fastest way to dump text into a file?

I would guess that in devising stream libraries as we now them today,
the implementers would already have built in everything at their
disposal to optimize the I/O operations better than you could do it
yourself using the higher-level language within which these libraries
are supplied.
 
J

Jesse Houwing

1)

have you tried writing the file locally and then copying it to it's
target location? That's probably the fastest by far, as writing to a
local disk is very quick, and you're using the OS optimization for
copying the file to its final destination.
Try setting .MinimumCapacity to a reasonable value to minimize
reallocations. If that doesn't help, try DataTable.Load(). This will
require writing a custom IDataReader implementation, which is tedious
but not too hard. Disclaimer: I have no actual performance numbers,
these just seem like good ideas.

Also make sure you've disabled any primary keys and unique indexes on
the datatable, the recomputation of those takes up a lot of time. Just
enforce them after loading the data, or use the target database to
enforce them.

Also, if you've serialized single rows, my question is how did you do
it. By default a whole dataset and a single datatable can be serialized,
but single rows cannot.

If you've written them out to a csv file (or something similar) look at
the OleDB Text Driver to load them back in. (though these don't work
with stream based compression).

One final question, why use compression at all if the processing speed
is so important? Generally speaking, the time used to compress and
decompress will take longer than to read a few more bytes from disk.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top