Incredibly slow writing by CSV driver

N

Norman Diamond

Since my application uses the Excel driver to read some files, I used the
same Excel driver to write some CSV files. I also tried using the Text
driver to write CSV files. The results are correct but they take enormous
amounts of CPU time.

Of course I know how to write a CSV file using plain old Shift-JIS character
strings and quotation marks and commas, and probably I'll do that in order
to speed up this operation, but the question still remains.

Why is this so slow? The sample data table had 213 rows (header plus 212
data rows) and 25 columns, all strings (some of them empty strings). The
total file size on disk is 34KB. The computations and database operations
in memory take a few milliseconds, not even noticeable when running under a
debugger. But the call to
dataAdapter.Update(dataTable);
takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It occupies
100% of one CPU core for 55 seconds.

Actual time to write the file might be a few hundred milliseconds since 34KB
occupies several NTFS structures. Anyway, this thing isn't disk bound, and
it's not CPU bound in my code, it's CPU bound in the Update method.

What is going on here?

#undef UseOdbc // Use OleDb
string fileName = @"C:\test.csv"; // (not really)
string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" };
int columnCount = 25; // (not really)
FileInfo fileInfo = new System.IO.FileInfo(fileName);
string dirName = fileInfo.DirectoryName;
string tableName = fileInfo.Name;
#if UseOdbc
OdbcConnection connection = new OdbcConnection(
"Provider=MSDASQL;" +
"DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OdbcDataAdapter dataAdapter = new OdbcDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OdbcCommand insertCmd = new OdbcCommand();
OdbcType varcharType = OdbcType.VarChar;
#else // OleDb
OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OleDbCommand insertCmd = new OleDbCommand();
OleDbType varcharType = OleDbType.VarChar;
#endif // OleDb
DataTable dataTable = new DataTable(tableName);
DataColumnCollection dataColumns = dataTable.Columns;
int columnNum;
string dataColumnName;
DataColumn dataColumn;
StringBuilder insertCmdHead = new StringBuilder(
"INSERT INTO [" + tableName + "] (");
StringBuilder insertCmdTail = new StringBuilder("VALUES (");
StringBuilder fileHeader = new StringBuilder();
for (columnNum = 0; columnNum < columnCount - 1; columnNum++)
{
dataColumnName = Headers[columnNum];
dataColumnName = dataColumnName.Replace('\n', ' ');
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "], ");
insertCmdTail.Append("?, ");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
fileHeader.Append("\"" + dataColumnName + "\",");
}
dataColumnName = Headers[columnCount - 1];
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "]) ");
insertCmdTail.Append("?)");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
insertCmd.CommandText = insertCmdHead.ToString() +
insertCmdTail.ToString();
insertCmd.Connection = connection;
dataAdapter.InsertCommand = insertCmd;
fileHeader.Append("\"" + dataColumnName + "\"");
for (int rowNum = 0; rowNum < 212; rowNum++) // (not really)
{
DataRow dataRow = dataTable.NewRow();
dataRow[2] = "hi i'm 2"; // (not really)
dataRow[18] = "18"; // (not really)
dataRow[19] = "19"; // (not really)
// (around half the cells default to empty strings)
dataTable.Rows.Add(dataRow);
}
StreamWriter fileWriter = new StreamWriter(fileInfo.Create(),
Encoding.Default);
fileWriter.WriteLine(fileHeader.ToString());
fileWriter.Close();
fileWriter.Dispose();
// UP TO THIS POINT TAKES A FEW MILLISECONDS, OK
//
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34 KILOBYTES
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
//
// OK AFTER THIS
dataAdapter.Dispose();
dataTable.Dispose();
connection.Close();
 
S

Steven Cheng[MSFT]

Hi Norman,

From your description, you're encountering high CPU issue when loading a
csv file via the OLEDB provider, correct?

As for the high cpu behavior when loading such data file, I think the
following things maybe the potential cause:

** the number of records in the csv file
** the data content that is contained in each record.

As for number of record, I think you can try reduce the number of record
and columns and test again. As for the content, sometimes the provider may
run into poor performance when some particular data(characters) in the file
cause the provider spend much time parsing it. Therefore, you can also try
test by isolating the data in the csv files(check whether there are some
records contains particular characters ).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================


This posting is provided "AS IS" with no warranties, and confers no
rights.--------------------
From: "Norman Diamond" <[email protected]>
Subject: Incredibly slow writing by CSV driver
Date: Fri, 8 Feb 2008 15:05:09 +0900
Since my application uses the Excel driver to read some files, I used the
same Excel driver to write some CSV files. I also tried using the Text
driver to write CSV files. The results are correct but they take enormous
amounts of CPU time.

Of course I know how to write a CSV file using plain old Shift-JIS character
strings and quotation marks and commas, and probably I'll do that in order
to speed up this operation, but the question still remains.

Why is this so slow? The sample data table had 213 rows (header plus 212
data rows) and 25 columns, all strings (some of them empty strings). The
total file size on disk is 34KB. The computations and database operations
in memory take a few milliseconds, not even noticeable when running under a
debugger. But the call to
dataAdapter.Update(dataTable);
takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It occupies
100% of one CPU core for 55 seconds.

Actual time to write the file might be a few hundred milliseconds since 34KB
occupies several NTFS structures. Anyway, this thing isn't disk bound, and
it's not CPU bound in my code, it's CPU bound in the Update method.

What is going on here?

#undef UseOdbc // Use OleDb
string fileName = @"C:\test.csv"; // (not really)
string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" };
int columnCount = 25; // (not really)
FileInfo fileInfo = new System.IO.FileInfo(fileName);
string dirName = fileInfo.DirectoryName;
string tableName = fileInfo.Name;
#if UseOdbc
OdbcConnection connection = new OdbcConnection(
"Provider=MSDASQL;" +
"DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OdbcDataAdapter dataAdapter = new OdbcDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OdbcCommand insertCmd = new OdbcCommand();
OdbcType varcharType = OdbcType.VarChar;
#else // OleDb
OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OleDbCommand insertCmd = new OleDbCommand();
OleDbType varcharType = OleDbType.VarChar;
#endif // OleDb
DataTable dataTable = new DataTable(tableName);
DataColumnCollection dataColumns = dataTable.Columns;
int columnNum;
string dataColumnName;
DataColumn dataColumn;
StringBuilder insertCmdHead = new StringBuilder(
"INSERT INTO [" + tableName + "] (");
StringBuilder insertCmdTail = new StringBuilder("VALUES (");
StringBuilder fileHeader = new StringBuilder();
for (columnNum = 0; columnNum < columnCount - 1; columnNum++)
{
dataColumnName = Headers[columnNum];
dataColumnName = dataColumnName.Replace('\n', ' ');
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "], ");
insertCmdTail.Append("?, ");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
fileHeader.Append("\"" + dataColumnName + "\",");
}
dataColumnName = Headers[columnCount - 1];
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "]) ");
insertCmdTail.Append("?)");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
insertCmd.CommandText = insertCmdHead.ToString() +
insertCmdTail.ToString();
insertCmd.Connection = connection;
dataAdapter.InsertCommand = insertCmd;
fileHeader.Append("\"" + dataColumnName + "\"");
for (int rowNum = 0; rowNum < 212; rowNum++) // (not really)
{
DataRow dataRow = dataTable.NewRow();
dataRow[2] = "hi i'm 2"; // (not really)
dataRow[18] = "18"; // (not really)
dataRow[19] = "19"; // (not really)
// (around half the cells default to empty strings)
dataTable.Rows.Add(dataRow);
}
StreamWriter fileWriter = new StreamWriter(fileInfo.Create(),
Encoding.Default);
fileWriter.WriteLine(fileHeader.ToString());
fileWriter.Close();
fileWriter.Dispose();
// UP TO THIS POINT TAKES A FEW MILLISECONDS, OK
//
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34 KILOBYTES
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
//
// OK AFTER THIS
dataAdapter.Dispose();
dataTable.Dispose();
connection.Close();
 
N

Norman Diamond

The high CPU issue is when WRITING a CSV file via either the OleDb or Odbc
driver.

READING is no problem, for these particular CSV files.

I already told you the number of records in the CSV file: 213, consisting
of a header row and 212 data rows. Each row contains 25 columns, about half
of which are text and half are null, and the nulls convert to "" because all
of the column definitions are set to be text. The total file size is around
34,000 bytes. Don't you think that 55 seconds of CPU time to format 55,000
bytes is pretty slow? Your posting includes my message where I showed all
this.

I wrote C# code to write the strings to the file myself. It takes a few
milliseconds now. I tested reading it back using the OleDb text driver, and
it reads correctly in a few milliseconds.


Steven Cheng said:
Hi Norman,

From your description, you're encountering high CPU issue when loading a
csv file via the OLEDB provider, correct?

As for the high cpu behavior when loading such data file, I think the
following things maybe the potential cause:

** the number of records in the csv file
** the data content that is contained in each record.

As for number of record, I think you can try reduce the number of record
and columns and test again. As for the content, sometimes the provider may
run into poor performance when some particular data(characters) in the
file
cause the provider spend much time parsing it. Therefore, you can also try
test by isolating the data in the csv files(check whether there are some
records contains particular characters ).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================


This posting is provided "AS IS" with no warranties, and confers no
rights.--------------------
From: "Norman Diamond" <[email protected]>
Subject: Incredibly slow writing by CSV driver
Date: Fri, 8 Feb 2008 15:05:09 +0900
Since my application uses the Excel driver to read some files, I used the
same Excel driver to write some CSV files. I also tried using the Text
driver to write CSV files. The results are correct but they take enormous
amounts of CPU time.

Of course I know how to write a CSV file using plain old Shift-JIS character
strings and quotation marks and commas, and probably I'll do that in order
to speed up this operation, but the question still remains.

Why is this so slow? The sample data table had 213 rows (header plus 212
data rows) and 25 columns, all strings (some of them empty strings). The
total file size on disk is 34KB. The computations and database operations
in memory take a few milliseconds, not even noticeable when running under
a
debugger. But the call to
dataAdapter.Update(dataTable);
takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It occupies
100% of one CPU core for 55 seconds.

Actual time to write the file might be a few hundred milliseconds since 34KB
occupies several NTFS structures. Anyway, this thing isn't disk bound,
and
it's not CPU bound in my code, it's CPU bound in the Update method.

What is going on here?

#undef UseOdbc // Use OleDb
string fileName = @"C:\test.csv"; // (not really)
string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" };
int columnCount = 25; // (not really)
FileInfo fileInfo = new System.IO.FileInfo(fileName);
string dirName = fileInfo.DirectoryName;
string tableName = fileInfo.Name;
#if UseOdbc
OdbcConnection connection = new OdbcConnection(
"Provider=MSDASQL;" +
"DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OdbcDataAdapter dataAdapter = new OdbcDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OdbcCommand insertCmd = new OdbcCommand();
OdbcType varcharType = OdbcType.VarChar;
#else // OleDb
OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OleDbCommand insertCmd = new OleDbCommand();
OleDbType varcharType = OleDbType.VarChar;
#endif // OleDb
DataTable dataTable = new DataTable(tableName);
DataColumnCollection dataColumns = dataTable.Columns;
int columnNum;
string dataColumnName;
DataColumn dataColumn;
StringBuilder insertCmdHead = new StringBuilder(
"INSERT INTO [" + tableName + "] (");
StringBuilder insertCmdTail = new StringBuilder("VALUES (");
StringBuilder fileHeader = new StringBuilder();
for (columnNum = 0; columnNum < columnCount - 1; columnNum++)
{
dataColumnName = Headers[columnNum];
dataColumnName = dataColumnName.Replace('\n', ' ');
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "], ");
insertCmdTail.Append("?, ");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
fileHeader.Append("\"" + dataColumnName + "\",");
}
dataColumnName = Headers[columnCount - 1];
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "]) ");
insertCmdTail.Append("?)");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
insertCmd.CommandText = insertCmdHead.ToString() +
insertCmdTail.ToString();
insertCmd.Connection = connection;
dataAdapter.InsertCommand = insertCmd;
fileHeader.Append("\"" + dataColumnName + "\"");
for (int rowNum = 0; rowNum < 212; rowNum++) // (not really)
{
DataRow dataRow = dataTable.NewRow();
dataRow[2] = "hi i'm 2"; // (not really)
dataRow[18] = "18"; // (not really)
dataRow[19] = "19"; // (not really)
// (around half the cells default to empty strings)
dataTable.Rows.Add(dataRow);
}
StreamWriter fileWriter = new StreamWriter(fileInfo.Create(),
Encoding.Default);
fileWriter.WriteLine(fileHeader.ToString());
fileWriter.Close();
fileWriter.Dispose();
// UP TO THIS POINT TAKES A FEW MILLISECONDS, OK
//
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34 KILOBYTES
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
//
// OK AFTER THIS
dataAdapter.Dispose();
dataTable.Dispose();
connection.Close();
 
N

Norman Diamond

Correcting a typo (34,000 bytes not 55,000)

The high CPU issue is when WRITING a CSV file via either the OleDb or Odbc
driver.

READING is no problem, for these particular CSV files.

I already told you the number of records in the CSV file: 213, consisting
of a header row and 212 data rows. Each row contains 25 columns, about half
of which are text and half are null, and the nulls convert to "" because all
of the column definitions are set to be text. The total file size is around
34,000 bytes. Don't you think that 55 seconds of CPU time to format 34,000
bytes is pretty slow? Your posting includes my message where I showed all
this.

I wrote C# code to write the strings to the file myself. It takes a few
milliseconds now. I tested reading it back using the OleDb text driver, and
it reads correctly in a few milliseconds.


Steven Cheng said:
Hi Norman,

From your description, you're encountering high CPU issue when loading a
csv file via the OLEDB provider, correct?

As for the high cpu behavior when loading such data file, I think the
following things maybe the potential cause:

** the number of records in the csv file
** the data content that is contained in each record.

As for number of record, I think you can try reduce the number of record
and columns and test again. As for the content, sometimes the provider may
run into poor performance when some particular data(characters) in the
file
cause the provider spend much time parsing it. Therefore, you can also try
test by isolating the data in the csv files(check whether there are some
records contains particular characters ).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================


This posting is provided "AS IS" with no warranties, and confers no
rights.--------------------
From: "Norman Diamond" <[email protected]>
Subject: Incredibly slow writing by CSV driver
Date: Fri, 8 Feb 2008 15:05:09 +0900
Since my application uses the Excel driver to read some files, I used the
same Excel driver to write some CSV files. I also tried using the Text
driver to write CSV files. The results are correct but they take enormous
amounts of CPU time.

Of course I know how to write a CSV file using plain old Shift-JIS character
strings and quotation marks and commas, and probably I'll do that in order
to speed up this operation, but the question still remains.

Why is this so slow? The sample data table had 213 rows (header plus 212
data rows) and 25 columns, all strings (some of them empty strings). The
total file size on disk is 34KB. The computations and database operations
in memory take a few milliseconds, not even noticeable when running under
a
debugger. But the call to
dataAdapter.Update(dataTable);
takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It occupies
100% of one CPU core for 55 seconds.

Actual time to write the file might be a few hundred milliseconds since 34KB
occupies several NTFS structures. Anyway, this thing isn't disk bound,
and
it's not CPU bound in my code, it's CPU bound in the Update method.

What is going on here?

#undef UseOdbc // Use OleDb
string fileName = @"C:\test.csv"; // (not really)
string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" };
int columnCount = 25; // (not really)
FileInfo fileInfo = new System.IO.FileInfo(fileName);
string dirName = fileInfo.DirectoryName;
string tableName = fileInfo.Name;
#if UseOdbc
OdbcConnection connection = new OdbcConnection(
"Provider=MSDASQL;" +
"DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OdbcDataAdapter dataAdapter = new OdbcDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OdbcCommand insertCmd = new OdbcCommand();
OdbcType varcharType = OdbcType.VarChar;
#else // OleDb
OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OleDbCommand insertCmd = new OleDbCommand();
OleDbType varcharType = OleDbType.VarChar;
#endif // OleDb
DataTable dataTable = new DataTable(tableName);
DataColumnCollection dataColumns = dataTable.Columns;
int columnNum;
string dataColumnName;
DataColumn dataColumn;
StringBuilder insertCmdHead = new StringBuilder(
"INSERT INTO [" + tableName + "] (");
StringBuilder insertCmdTail = new StringBuilder("VALUES (");
StringBuilder fileHeader = new StringBuilder();
for (columnNum = 0; columnNum < columnCount - 1; columnNum++)
{
dataColumnName = Headers[columnNum];
dataColumnName = dataColumnName.Replace('\n', ' ');
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "], ");
insertCmdTail.Append("?, ");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
fileHeader.Append("\"" + dataColumnName + "\",");
}
dataColumnName = Headers[columnCount - 1];
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "]) ");
insertCmdTail.Append("?)");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
insertCmd.CommandText = insertCmdHead.ToString() +
insertCmdTail.ToString();
insertCmd.Connection = connection;
dataAdapter.InsertCommand = insertCmd;
fileHeader.Append("\"" + dataColumnName + "\"");
for (int rowNum = 0; rowNum < 212; rowNum++) // (not really)
{
DataRow dataRow = dataTable.NewRow();
dataRow[2] = "hi i'm 2"; // (not really)
dataRow[18] = "18"; // (not really)
dataRow[19] = "19"; // (not really)
// (around half the cells default to empty strings)
dataTable.Rows.Add(dataRow);
}
StreamWriter fileWriter = new StreamWriter(fileInfo.Create(),
Encoding.Default);
fileWriter.WriteLine(fileHeader.ToString());
fileWriter.Close();
fileWriter.Dispose();
// UP TO THIS POINT TAKES A FEW MILLISECONDS, OK
//
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34 KILOBYTES
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
//
// OK AFTER THIS
dataAdapter.Dispose();
dataTable.Dispose();
connection.Close();
 
P

Patrice

I would likely try to see if some tools from
http://technet.microsoft.com/fr-fr/sysinternals/default.aspx such as
"filemon" would allow to track a bit what could happen behind the scene (I
suspect the driver could do a bit more than what you expect such as
rewriting the whole file on each insert ?)

Generally my personal preference is to handle these kind of files by myself
--
Patrice

Norman Diamond said:
The high CPU issue is when WRITING a CSV file via either the OleDb or Odbc
driver.

READING is no problem, for these particular CSV files.

I already told you the number of records in the CSV file: 213, consisting
of a header row and 212 data rows. Each row contains 25 columns, about
half of which are text and half are null, and the nulls convert to ""
because all of the column definitions are set to be text. The total file
size is around 34,000 bytes. Don't you think that 55 seconds of CPU time
to format 55,000 bytes is pretty slow? Your posting includes my message
where I showed all this.

I wrote C# code to write the strings to the file myself. It takes a few
milliseconds now. I tested reading it back using the OleDb text driver,
and it reads correctly in a few milliseconds.


Steven Cheng said:
Hi Norman,

From your description, you're encountering high CPU issue when loading a
csv file via the OLEDB provider, correct?

As for the high cpu behavior when loading such data file, I think the
following things maybe the potential cause:

** the number of records in the csv file
** the data content that is contained in each record.

As for number of record, I think you can try reduce the number of record
and columns and test again. As for the content, sometimes the provider
may
run into poor performance when some particular data(characters) in the
file
cause the provider spend much time parsing it. Therefore, you can also
try
test by isolating the data in the csv files(check whether there are some
records contains particular characters ).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent
issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each
follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================


This posting is provided "AS IS" with no warranties, and confers no
rights.--------------------
From: "Norman Diamond" <[email protected]>
Subject: Incredibly slow writing by CSV driver
Date: Fri, 8 Feb 2008 15:05:09 +0900
Since my application uses the Excel driver to read some files, I used the
same Excel driver to write some CSV files. I also tried using the Text
driver to write CSV files. The results are correct but they take
enormous
amounts of CPU time.

Of course I know how to write a CSV file using plain old Shift-JIS character
strings and quotation marks and commas, and probably I'll do that in
order
to speed up this operation, but the question still remains.

Why is this so slow? The sample data table had 213 rows (header plus 212
data rows) and 25 columns, all strings (some of them empty strings). The
total file size on disk is 34KB. The computations and database
operations
in memory take a few milliseconds, not even noticeable when running under
a
debugger. But the call to
dataAdapter.Update(dataTable);
takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It
occupies
100% of one CPU core for 55 seconds.

Actual time to write the file might be a few hundred milliseconds since 34KB
occupies several NTFS structures. Anyway, this thing isn't disk bound,
and
it's not CPU bound in my code, it's CPU bound in the Update method.

What is going on here?

#undef UseOdbc // Use OleDb
string fileName = @"C:\test.csv"; // (not really)
string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" };
int columnCount = 25; // (not really)
FileInfo fileInfo = new System.IO.FileInfo(fileName);
string dirName = fileInfo.DirectoryName;
string tableName = fileInfo.Name;
#if UseOdbc
OdbcConnection connection = new OdbcConnection(
"Provider=MSDASQL;" +
"DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OdbcDataAdapter dataAdapter = new OdbcDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OdbcCommand insertCmd = new OdbcCommand();
OdbcType varcharType = OdbcType.VarChar;
#else // OleDb
OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OleDbCommand insertCmd = new OleDbCommand();
OleDbType varcharType = OleDbType.VarChar;
#endif // OleDb
DataTable dataTable = new DataTable(tableName);
DataColumnCollection dataColumns = dataTable.Columns;
int columnNum;
string dataColumnName;
DataColumn dataColumn;
StringBuilder insertCmdHead = new StringBuilder(
"INSERT INTO [" + tableName + "] (");
StringBuilder insertCmdTail = new StringBuilder("VALUES (");
StringBuilder fileHeader = new StringBuilder();
for (columnNum = 0; columnNum < columnCount - 1; columnNum++)
{
dataColumnName = Headers[columnNum];
dataColumnName = dataColumnName.Replace('\n', ' ');
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "], ");
insertCmdTail.Append("?, ");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
fileHeader.Append("\"" + dataColumnName + "\",");
}
dataColumnName = Headers[columnCount - 1];
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "]) ");
insertCmdTail.Append("?)");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
insertCmd.CommandText = insertCmdHead.ToString() +
insertCmdTail.ToString();
insertCmd.Connection = connection;
dataAdapter.InsertCommand = insertCmd;
fileHeader.Append("\"" + dataColumnName + "\"");
for (int rowNum = 0; rowNum < 212; rowNum++) // (not really)
{
DataRow dataRow = dataTable.NewRow();
dataRow[2] = "hi i'm 2"; // (not really)
dataRow[18] = "18"; // (not really)
dataRow[19] = "19"; // (not really)
// (around half the cells default to empty strings)
dataTable.Rows.Add(dataRow);
}
StreamWriter fileWriter = new StreamWriter(fileInfo.Create(),
Encoding.Default);
fileWriter.WriteLine(fileHeader.ToString());
fileWriter.Close();
fileWriter.Dispose();
// UP TO THIS POINT TAKES A FEW MILLISECONDS, OK
//
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34 KILOBYTES
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
//
// OK AFTER THIS
dataAdapter.Dispose();
dataTable.Dispose();
connection.Close();
 
N

Norman Diamond

Of course filemon is a very useful tool, but how would it solve the problem
of 55 seconds of CPU time? The 34000 bytes of file contents get written
correctly to the file. The CPU is not waiting for the disk, the disk is
waiting for the CPU.

I worked around it by writing this file myself too. It takes a few
milliseconds now.


Patrice said:
I would likely try to see if some tools from
http://technet.microsoft.com/fr-fr/sysinternals/default.aspx such as
"filemon" would allow to track a bit what could happen behind the scene (I
suspect the driver could do a bit more than what you expect such as
rewriting the whole file on each insert ?)

Generally my personal preference is to handle these kind of files by
myself
--
Patrice

Norman Diamond said:
The high CPU issue is when WRITING a CSV file via either the OleDb or
Odbc driver.

READING is no problem, for these particular CSV files.

I already told you the number of records in the CSV file: 213,
consisting of a header row and 212 data rows. Each row contains 25
columns, about half of which are text and half are null, and the nulls
convert to "" because all of the column definitions are set to be text.
The total file size is around 34,000 bytes. Don't you think that 55
seconds of CPU time to format 55,000 bytes is pretty slow? Your posting
includes my message where I showed all this.

I wrote C# code to write the strings to the file myself. It takes a few
milliseconds now. I tested reading it back using the OleDb text driver,
and it reads correctly in a few milliseconds.


Steven Cheng said:
Hi Norman,

From your description, you're encountering high CPU issue when loading a
csv file via the OLEDB provider, correct?

As for the high cpu behavior when loading such data file, I think the
following things maybe the potential cause:

** the number of records in the csv file
** the data content that is contained in each record.

As for number of record, I think you can try reduce the number of record
and columns and test again. As for the content, sometimes the provider
may
run into poor performance when some particular data(characters) in the
file
cause the provider spend much time parsing it. Therefore, you can also
try
test by isolating the data in the csv files(check whether there are some
records contains particular characters ).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent
issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each
follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach
the
most efficient resolution. The offering is not appropriate for
situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are
best
handled working with a dedicated Microsoft Support Engineer by
contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================


This posting is provided "AS IS" with no warranties, and confers no
rights.--------------------
From: "Norman Diamond" <[email protected]>
Subject: Incredibly slow writing by CSV driver
Date: Fri, 8 Feb 2008 15:05:09 +0900


Since my application uses the Excel driver to read some files, I used
the
same Excel driver to write some CSV files. I also tried using the Text
driver to write CSV files. The results are correct but they take
enormous
amounts of CPU time.

Of course I know how to write a CSV file using plain old Shift-JIS
character
strings and quotation marks and commas, and probably I'll do that in
order
to speed up this operation, but the question still remains.

Why is this so slow? The sample data table had 213 rows (header plus
212
data rows) and 25 columns, all strings (some of them empty strings).
The
total file size on disk is 34KB. The computations and database
operations
in memory take a few milliseconds, not even noticeable when running
under a
debugger. But the call to
dataAdapter.Update(dataTable);
takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It
occupies
100% of one CPU core for 55 seconds.

Actual time to write the file might be a few hundred milliseconds since
34KB
occupies several NTFS structures. Anyway, this thing isn't disk bound,
and
it's not CPU bound in my code, it's CPU bound in the Update method.

What is going on here?

#undef UseOdbc // Use OleDb
string fileName = @"C:\test.csv"; // (not really)
string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" };
int columnCount = 25; // (not really)
FileInfo fileInfo = new System.IO.FileInfo(fileName);
string dirName = fileInfo.DirectoryName;
string tableName = fileInfo.Name;
#if UseOdbc
OdbcConnection connection = new OdbcConnection(
"Provider=MSDASQL;" +
"DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OdbcDataAdapter dataAdapter = new OdbcDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OdbcCommand insertCmd = new OdbcCommand();
OdbcType varcharType = OdbcType.VarChar;
#else // OleDb
OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OleDbCommand insertCmd = new OleDbCommand();
OleDbType varcharType = OleDbType.VarChar;
#endif // OleDb
DataTable dataTable = new DataTable(tableName);
DataColumnCollection dataColumns = dataTable.Columns;
int columnNum;
string dataColumnName;
DataColumn dataColumn;
StringBuilder insertCmdHead = new StringBuilder(
"INSERT INTO [" + tableName + "] (");
StringBuilder insertCmdTail = new StringBuilder("VALUES (");
StringBuilder fileHeader = new StringBuilder();
for (columnNum = 0; columnNum < columnCount - 1; columnNum++)
{
dataColumnName = Headers[columnNum];
dataColumnName = dataColumnName.Replace('\n', ' ');
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "], ");
insertCmdTail.Append("?, ");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
fileHeader.Append("\"" + dataColumnName + "\",");
}
dataColumnName = Headers[columnCount - 1];
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "]) ");
insertCmdTail.Append("?)");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
insertCmd.CommandText = insertCmdHead.ToString() +
insertCmdTail.ToString();
insertCmd.Connection = connection;
dataAdapter.InsertCommand = insertCmd;
fileHeader.Append("\"" + dataColumnName + "\"");
for (int rowNum = 0; rowNum < 212; rowNum++) // (not really)
{
DataRow dataRow = dataTable.NewRow();
dataRow[2] = "hi i'm 2"; // (not really)
dataRow[18] = "18"; // (not really)
dataRow[19] = "19"; // (not really)
// (around half the cells default to empty strings)
dataTable.Rows.Add(dataRow);
}
StreamWriter fileWriter = new StreamWriter(fileInfo.Create(),
Encoding.Default);
fileWriter.WriteLine(fileHeader.ToString());
fileWriter.Close();
fileWriter.Dispose();
// UP TO THIS POINT TAKES A FEW MILLISECONDS, OK
//
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34 KILOBYTES
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
//
// OK AFTER THIS
dataAdapter.Dispose();
dataTable.Dispose();
connection.Close();
 
S

Steven Cheng[MSFT]

Hi Norman,

For such high cpu issue, it is likely to be issue specific. I've checked
some former issue and there is no definite problem of the provider and most
of such issues are due to the data that is read or something related to the
running environment. To troubleshoot on such issue and get the root cause,
it may require much more work like dump analysis which is complex and time
costing.

If this problem is urgent and must be fixed in short time, I suggest you
contact product support service for further assistance:

http://msdn.microsoft.com/subscriptions/support/default.aspx.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
From: "Norman Diamond" <[email protected]>
References: <[email protected]>
<[email protected]>
Subject: Re: Incredibly slow writing by CSV driver
Date: Tue, 12 Feb 2008 09:42:48 +0900
Of course filemon is a very useful tool, but how would it solve the problem
of 55 seconds of CPU time? The 34000 bytes of file contents get written
correctly to the file. The CPU is not waiting for the disk, the disk is
waiting for the CPU.

I worked around it by writing this file myself too. It takes a few
milliseconds now.


Patrice said:
I would likely try to see if some tools from
http://technet.microsoft.com/fr-fr/sysinternals/default.aspx such as
"filemon" would allow to track a bit what could happen behind the scene (I
suspect the driver could do a bit more than what you expect such as
rewriting the whole file on each insert ?)

Generally my personal preference is to handle these kind of files by
myself
--
Patrice

Norman Diamond said:
The high CPU issue is when WRITING a CSV file via either the OleDb or
Odbc driver.

READING is no problem, for these particular CSV files.

I already told you the number of records in the CSV file: 213,
consisting of a header row and 212 data rows. Each row contains 25
columns, about half of which are text and half are null, and the nulls
convert to "" because all of the column definitions are set to be text.
The total file size is around 34,000 bytes. Don't you think that 55
seconds of CPU time to format 55,000 bytes is pretty slow? Your posting
includes my message where I showed all this.

I wrote C# code to write the strings to the file myself. It takes a few
milliseconds now. I tested reading it back using the OleDb text driver,
and it reads correctly in a few milliseconds.


Hi Norman,

From your description, you're encountering high CPU issue when loading a
csv file via the OLEDB provider, correct?

As for the high cpu behavior when loading such data file, I think the
following things maybe the potential cause:

** the number of records in the csv file
** the data content that is contained in each record.

As for number of record, I think you can try reduce the number of record
and columns and test again. As for the content, sometimes the provider
may
run into poor performance when some particular data(characters) in the
file
cause the provider spend much time parsing it. Therefore, you can also
try
test by isolating the data in the csv files(check whether there are some
records contains particular characters ).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent
issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each
follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach
the
most efficient resolution. The offering is not appropriate for
situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are
best
handled working with a dedicated Microsoft Support Engineer by
contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================


This posting is provided "AS IS" with no warranties, and confers no
rights.--------------------
From: "Norman Diamond" <[email protected]>
Subject: Incredibly slow writing by CSV driver
Date: Fri, 8 Feb 2008 15:05:09 +0900


Since my application uses the Excel driver to read some files, I used
the
same Excel driver to write some CSV files. I also tried using the Text
driver to write CSV files. The results are correct but they take
enormous
amounts of CPU time.

Of course I know how to write a CSV file using plain old Shift-JIS
character
strings and quotation marks and commas, and probably I'll do that in
order
to speed up this operation, but the question still remains.

Why is this so slow? The sample data table had 213 rows (header plus
212
data rows) and 25 columns, all strings (some of them empty strings).
The
total file size on disk is 34KB. The computations and database
operations
in memory take a few milliseconds, not even noticeable when running
under a
debugger. But the call to
dataAdapter.Update(dataTable);
takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It
occupies
100% of one CPU core for 55 seconds.

Actual time to write the file might be a few hundred milliseconds since
34KB
occupies several NTFS structures. Anyway, this thing isn't disk bound,
and
it's not CPU bound in my code, it's CPU bound in the Update method.

What is going on here?

#undef UseOdbc // Use OleDb
string fileName = @"C:\test.csv"; // (not really)
string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" };
int columnCount = 25; // (not really)
FileInfo fileInfo = new System.IO.FileInfo(fileName);
string dirName = fileInfo.DirectoryName;
string tableName = fileInfo.Name;
#if UseOdbc
OdbcConnection connection = new OdbcConnection(
"Provider=MSDASQL;" +
"DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OdbcDataAdapter dataAdapter = new OdbcDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OdbcCommand insertCmd = new OdbcCommand();
OdbcType varcharType = OdbcType.VarChar;
#else // OleDb
OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OleDbCommand insertCmd = new OleDbCommand();
OleDbType varcharType = OleDbType.VarChar;
#endif // OleDb
DataTable dataTable = new DataTable(tableName);
DataColumnCollection dataColumns = dataTable.Columns;
int columnNum;
string dataColumnName;
DataColumn dataColumn;
StringBuilder insertCmdHead = new StringBuilder(
"INSERT INTO [" + tableName + "] (");
StringBuilder insertCmdTail = new StringBuilder("VALUES (");
StringBuilder fileHeader = new StringBuilder();
for (columnNum = 0; columnNum < columnCount - 1; columnNum++)
{
dataColumnName = Headers[columnNum];
dataColumnName = dataColumnName.Replace('\n', ' ');
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "], ");
insertCmdTail.Append("?, ");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
fileHeader.Append("\"" + dataColumnName + "\",");
}
dataColumnName = Headers[columnCount - 1];
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "]) ");
insertCmdTail.Append("?)");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
insertCmd.CommandText = insertCmdHead.ToString() +
insertCmdTail.ToString();
insertCmd.Connection = connection;
dataAdapter.InsertCommand = insertCmd;
fileHeader.Append("\"" + dataColumnName + "\"");
for (int rowNum = 0; rowNum < 212; rowNum++) // (not really)
{
DataRow dataRow = dataTable.NewRow();
dataRow[2] = "hi i'm 2"; // (not really)
dataRow[18] = "18"; // (not really)
dataRow[19] = "19"; // (not really)
// (around half the cells default to empty strings)
dataTable.Rows.Add(dataRow);
}
StreamWriter fileWriter = new StreamWriter(fileInfo.Create(),
Encoding.Default);
fileWriter.WriteLine(fileHeader.ToString());
fileWriter.Close();
fileWriter.Dispose();
// UP TO THIS POINT TAKES A FEW MILLISECONDS, OK
//
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34 KILOBYTES
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
//
// OK AFTER THIS
dataAdapter.Dispose();
dataTable.Dispose();
connection.Close();
 
N

Norman Diamond

Mr. Cheng, again, data are being WRITTEN not read. Reading is OK and only
takes a few milliseconds. WRITING takes an enormously long time.

I can't imagine what needs dumping. My first posting in this thread gave
sample code.

As mentioned, I worked around it the same way as Patrice, by writing my own
code in C# (or other language as might be used in any project) instead of
using the text driver, to WRITE CSV files.


Steven Cheng said:
Hi Norman,

For such high cpu issue, it is likely to be issue specific. I've checked
some former issue and there is no definite problem of the provider and
most
of such issues are due to the data that is read or something related to
the
running environment. To troubleshoot on such issue and get the root cause,
it may require much more work like dump analysis which is complex and time
costing.

If this problem is urgent and must be fixed in short time, I suggest you
contact product support service for further assistance:

http://msdn.microsoft.com/subscriptions/support/default.aspx.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no
rights.
--------------------
From: "Norman Diamond" <[email protected]>
References: <[email protected]>
<[email protected]>
Subject: Re: Incredibly slow writing by CSV driver
Date: Tue, 12 Feb 2008 09:42:48 +0900
Of course filemon is a very useful tool, but how would it solve the problem
of 55 seconds of CPU time? The 34000 bytes of file contents get written
correctly to the file. The CPU is not waiting for the disk, the disk is
waiting for the CPU.

I worked around it by writing this file myself too. It takes a few
milliseconds now.


Patrice said:
I would likely try to see if some tools from
http://technet.microsoft.com/fr-fr/sysinternals/default.aspx such as
"filemon" would allow to track a bit what could happen behind the scene (I
suspect the driver could do a bit more than what you expect such as
rewriting the whole file on each insert ?)

Generally my personal preference is to handle these kind of files by
myself
--
Patrice

"Norman Diamond" <[email protected]> a ecrit dans le message de
%[email protected]...
The high CPU issue is when WRITING a CSV file via either the OleDb or
Odbc driver.

READING is no problem, for these particular CSV files.

I already told you the number of records in the CSV file: 213,
consisting of a header row and 212 data rows. Each row contains 25
columns, about half of which are text and half are null, and the nulls
convert to "" because all of the column definitions are set to be text.
The total file size is around 34,000 bytes. Don't you think that 55
seconds of CPU time to format 55,000 bytes is pretty slow? Your posting
includes my message where I showed all this.

I wrote C# code to write the strings to the file myself. It takes a few
milliseconds now. I tested reading it back using the OleDb text driver,
and it reads correctly in a few milliseconds.


Hi Norman,

From your description, you're encountering high CPU issue when loading a
csv file via the OLEDB provider, correct?

As for the high cpu behavior when loading such data file, I think the
following things maybe the potential cause:

** the number of records in the csv file
** the data content that is contained in each record.

As for number of record, I think you can try reduce the number of record
and columns and test again. As for the content, sometimes the provider
may
run into poor performance when some particular data(characters) in the
file
cause the provider spend much time parsing it. Therefore, you can also
try
test by isolating the data in the csv files(check whether there are some
records contains particular characters ).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent
issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each
follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach
the
most efficient resolution. The offering is not appropriate for
situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are
best
handled working with a dedicated Microsoft Support Engineer by
contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================


This posting is provided "AS IS" with no warranties, and confers no
rights.--------------------
From: "Norman Diamond" <[email protected]>
Subject: Incredibly slow writing by CSV driver
Date: Fri, 8 Feb 2008 15:05:09 +0900


Since my application uses the Excel driver to read some files, I used
the
same Excel driver to write some CSV files. I also tried using the
Text
driver to write CSV files. The results are correct but they take
enormous
amounts of CPU time.

Of course I know how to write a CSV file using plain old Shift-JIS
character
strings and quotation marks and commas, and probably I'll do that in
order
to speed up this operation, but the question still remains.

Why is this so slow? The sample data table had 213 rows (header plus
212
data rows) and 25 columns, all strings (some of them empty strings).
The
total file size on disk is 34KB. The computations and database
operations
in memory take a few milliseconds, not even noticeable when running
under a
debugger. But the call to
dataAdapter.Update(dataTable);
takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It
occupies
100% of one CPU core for 55 seconds.

Actual time to write the file might be a few hundred milliseconds
since
34KB
occupies several NTFS structures. Anyway, this thing isn't disk bound,
and
it's not CPU bound in my code, it's CPU bound in the Update method.

What is going on here?

#undef UseOdbc // Use OleDb
string fileName = @"C:\test.csv"; // (not really)
string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" };
int columnCount = 25; // (not really)
FileInfo fileInfo = new System.IO.FileInfo(fileName);
string dirName = fileInfo.DirectoryName;
string tableName = fileInfo.Name;
#if UseOdbc
OdbcConnection connection = new OdbcConnection(
"Provider=MSDASQL;" +
"DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OdbcDataAdapter dataAdapter = new OdbcDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OdbcCommand insertCmd = new OdbcCommand();
OdbcType varcharType = OdbcType.VarChar;
#else // OleDb
OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OleDbCommand insertCmd = new OleDbCommand();
OleDbType varcharType = OleDbType.VarChar;
#endif // OleDb
DataTable dataTable = new DataTable(tableName);
DataColumnCollection dataColumns = dataTable.Columns;
int columnNum;
string dataColumnName;
DataColumn dataColumn;
StringBuilder insertCmdHead = new StringBuilder(
"INSERT INTO [" + tableName + "] (");
StringBuilder insertCmdTail = new StringBuilder("VALUES (");
StringBuilder fileHeader = new StringBuilder();
for (columnNum = 0; columnNum < columnCount - 1; columnNum++)
{
dataColumnName = Headers[columnNum];
dataColumnName = dataColumnName.Replace('\n', ' ');
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "], ");
insertCmdTail.Append("?, ");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
fileHeader.Append("\"" + dataColumnName + "\",");
}
dataColumnName = Headers[columnCount - 1];
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "]) ");
insertCmdTail.Append("?)");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
insertCmd.CommandText = insertCmdHead.ToString() +
insertCmdTail.ToString();
insertCmd.Connection = connection;
dataAdapter.InsertCommand = insertCmd;
fileHeader.Append("\"" + dataColumnName + "\"");
for (int rowNum = 0; rowNum < 212; rowNum++) // (not really)
{
DataRow dataRow = dataTable.NewRow();
dataRow[2] = "hi i'm 2"; // (not really)
dataRow[18] = "18"; // (not really)
dataRow[19] = "19"; // (not really)
// (around half the cells default to empty strings)
dataTable.Rows.Add(dataRow);
}
StreamWriter fileWriter = new StreamWriter(fileInfo.Create(),
Encoding.Default);
fileWriter.WriteLine(fileHeader.ToString());
fileWriter.Close();
fileWriter.Dispose();
// UP TO THIS POINT TAKES A FEW MILLISECONDS, OK
//
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34
KILOBYTES
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
//
// OK AFTER THIS
dataAdapter.Dispose();
dataTable.Dispose();
connection.Close();
 
S

Steven Cheng[MSFT]

Hi Norman,

Yes, I saw that you got the workaround. The further troubleshooting I
mentioned means the work that required to identify what is the exact things
cause the poor performance when writing the data into the csv file through
OLE db provider. Normally, we are not abvle to give a rapid answer for such
issue.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
From: "Norman Diamond" <[email protected]>
Subject: Re: Incredibly slow writing by CSV driver
Date: Tue, 12 Feb 2008 17:07:31 +0900
Mr. Cheng, again, data are being WRITTEN not read. Reading is OK and only
takes a few milliseconds. WRITING takes an enormously long time.

I can't imagine what needs dumping. My first posting in this thread gave
sample code.

As mentioned, I worked around it the same way as Patrice, by writing my own
code in C# (or other language as might be used in any project) instead of
using the text driver, to WRITE CSV files.


Steven Cheng said:
Hi Norman,

For such high cpu issue, it is likely to be issue specific. I've checked
some former issue and there is no definite problem of the provider and
most
of such issues are due to the data that is read or something related to
the
running environment. To troubleshoot on such issue and get the root cause,
it may require much more work like dump analysis which is complex and time
costing.

If this problem is urgent and must be fixed in short time, I suggest you
contact product support service for further assistance:

http://msdn.microsoft.com/subscriptions/support/default.aspx.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no
rights.
--------------------
From: "Norman Diamond" <[email protected]>
References: <[email protected]>
<[email protected]>
Subject: Re: Incredibly slow writing by CSV driver
Date: Tue, 12 Feb 2008 09:42:48 +0900
Of course filemon is a very useful tool, but how would it solve the problem
of 55 seconds of CPU time? The 34000 bytes of file contents get written
correctly to the file. The CPU is not waiting for the disk, the disk is
waiting for the CPU.

I worked around it by writing this file myself too. It takes a few
milliseconds now.


"Patrice" <http://www.chez.com/scribe/> wrote in message
I would likely try to see if some tools from
http://technet.microsoft.com/fr-fr/sysinternals/default.aspx such as
"filemon" would allow to track a bit what could happen behind the scene (I
suspect the driver could do a bit more than what you expect such as
rewriting the whole file on each insert ?)

Generally my personal preference is to handle these kind of files by
myself
--
Patrice

"Norman Diamond" <[email protected]> a ecrit dans le message de
%[email protected]...
The high CPU issue is when WRITING a CSV file via either the OleDb or
Odbc driver.

READING is no problem, for these particular CSV files.

I already told you the number of records in the CSV file: 213,
consisting of a header row and 212 data rows. Each row contains 25
columns, about half of which are text and half are null, and the nulls
convert to "" because all of the column definitions are set to be text.
The total file size is around 34,000 bytes. Don't you think that 55
seconds of CPU time to format 55,000 bytes is pretty slow? Your posting
includes my message where I showed all this.

I wrote C# code to write the strings to the file myself. It takes a few
milliseconds now. I tested reading it back using the OleDb text driver,
and it reads correctly in a few milliseconds.


Hi Norman,

From your description, you're encountering high CPU issue when
loading
a
csv file via the OLEDB provider, correct?

As for the high cpu behavior when loading such data file, I think the
following things maybe the potential cause:

** the number of records in the csv file
** the data content that is contained in each record.

As for number of record, I think you can try reduce the number of record
and columns and test again. As for the content, sometimes the provider
may
run into poor performance when some particular data(characters) in the
file
cause the provider spend much time parsing it. Therefore, you can also
try
test by isolating the data in the csv files(check whether there are some
records contains particular characters ).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent
issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each
follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach
the
most efficient resolution. The offering is not appropriate for
situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are
best
handled working with a dedicated Microsoft Support Engineer by
contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================


This posting is provided "AS IS" with no warranties, and confers no
rights.--------------------
From: "Norman Diamond" <[email protected]>
Subject: Incredibly slow writing by CSV driver
Date: Fri, 8 Feb 2008 15:05:09 +0900


Since my application uses the Excel driver to read some files, I used
the
same Excel driver to write some CSV files. I also tried using the
Text
driver to write CSV files. The results are correct but they take
enormous
amounts of CPU time.

Of course I know how to write a CSV file using plain old Shift-JIS
character
strings and quotation marks and commas, and probably I'll do that in
order
to speed up this operation, but the question still remains.

Why is this so slow? The sample data table had 213 rows (header plus
212
data rows) and 25 columns, all strings (some of them empty strings).
The
total file size on disk is 34KB. The computations and database
operations
in memory take a few milliseconds, not even noticeable when running
under a
debugger. But the call to
dataAdapter.Update(dataTable);
takes 55 SECONDS OF CPU TIME on a Pentium 4 running at 3 GHz. It
occupies
100% of one CPU core for 55 seconds.

Actual time to write the file might be a few hundred milliseconds
since
34KB
occupies several NTFS structures. Anyway, this thing isn't disk bound,
and
it's not CPU bound in my code, it's CPU bound in the Update method.

What is going on here?

#undef UseOdbc // Use OleDb
string fileName = @"C:\test.csv"; // (not really)
string Headers[] = new string[25] { "1", "2", "3", /* ... */ "25" };
int columnCount = 25; // (not really)
FileInfo fileInfo = new System.IO.FileInfo(fileName);
string dirName = fileInfo.DirectoryName;
string tableName = fileInfo.Name;
#if UseOdbc
OdbcConnection connection = new OdbcConnection(
"Provider=MSDASQL;" +
"DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" +
dirName
+
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OdbcDataAdapter dataAdapter = new OdbcDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OdbcCommand insertCmd = new OdbcCommand();
OdbcType varcharType = OdbcType.VarChar;
#else // OleDb
OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dirName +
";Extended Properties='Text;Extensions=asc,csv,tab,txt;" +
"HDR=Yes;FMT=Delimited'");
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
OleDbCommand insertCmd = new OleDbCommand();
OleDbType varcharType = OleDbType.VarChar;
#endif // OleDb
DataTable dataTable = new DataTable(tableName);
DataColumnCollection dataColumns = dataTable.Columns;
int columnNum;
string dataColumnName;
DataColumn dataColumn;
StringBuilder insertCmdHead = new StringBuilder(
"INSERT INTO [" + tableName + "] (");
StringBuilder insertCmdTail = new StringBuilder("VALUES (");
StringBuilder fileHeader = new StringBuilder();
for (columnNum = 0; columnNum < columnCount - 1; columnNum++)
{
dataColumnName = Headers[columnNum];
dataColumnName = dataColumnName.Replace('\n', ' ');
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "], ");
insertCmdTail.Append("?, ");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
fileHeader.Append("\"" + dataColumnName + "\",");
}
dataColumnName = Headers[columnCount - 1];
dataColumn = dataColumns.Add(dataColumnName);
dataColumn.DataType = typeof(string);
dataColumn.DefaultValue = "";
insertCmdHead.Append("[" + dataColumnName + "]) ");
insertCmdTail.Append("?)");
insertCmd.Parameters.Add(dataColumnName, varcharType, 255,
dataColumnName);
insertCmd.CommandText = insertCmdHead.ToString() +
insertCmdTail.ToString();
insertCmd.Connection = connection;
dataAdapter.InsertCommand = insertCmd;
fileHeader.Append("\"" + dataColumnName + "\"");
for (int rowNum = 0; rowNum < 212; rowNum++) // (not really)
{
DataRow dataRow = dataTable.NewRow();
dataRow[2] = "hi i'm 2"; // (not really)
dataRow[18] = "18"; // (not really)
dataRow[19] = "19"; // (not really)
// (around half the cells default to empty strings)
dataTable.Rows.Add(dataRow);
}
StreamWriter fileWriter = new StreamWriter(fileInfo.Create(),
Encoding.Default);
fileWriter.WriteLine(fileHeader.ToString());
fileWriter.Close();
fileWriter.Dispose();
// UP TO THIS POINT TAKES A FEW MILLISECONDS, OK
//
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
dataAdapter.Update(dataTable); // 55 SECONDS TO WRITE 34
KILOBYTES
// 55 SECONDS OF CPU TIME (Pentium 4 3 GHz) TO WRITE 34 KILOBYTES
//
// OK AFTER THIS
dataAdapter.Dispose();
dataTable.Dispose();
connection.Close();
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top