Any way to make OleDb read an Excel file?

N

Norman Diamond

With Visual Studio 2005 SP1 and DotNet Framework 2 SP1, Excel 2003 can read
a file that was saved by Excel 2003, but the OleDb driver can't.

Example: Excel row 2 maps to row 1 in OleDb, and Excel columns 1 to 24 (A
to X) map to columns 0 to 23 in OleDb. Counting these in OleDb, row 1
columns 0 to 4 are OK, columns 5 and 6 get corrupted to DbNull, columns 7 to
12 are OK, columns 13 and 14 get corrupted to DbNull, etc.

If I do Interop to Excel 2003 instead of using OleDb, it works. Counting
these in Excel interop, row 2 columns 1 to 24 are entirely OK.

Back to the OleDb problem. One worksheet has 40 columns instead of 24. The
worksheet with 40 columns works. Several of the additional columns read as
doubles instead of text but ToString() works on them. But all of the
24-column worksheets get corrupted by OleDb, and the corrupted column
numbers vary.

Does anyone know how to get OleDb to work?

OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
fileName + ";Extended Properties='Excel 8.0;HDR=NO'");
connection.Open();
DataTable schemaTable = connection.GetSchema("Tables");
DataRow[] schemaRows = schemaTable.Select();
foreach (DataRow schemaRow in schemaRows)
{
string tableName = schemaRow["TABLE_NAME"].ToString();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
int headerRowNum = 1;
DataTable dataTable = new DataTable(tableName);
dataAdapter.FillSchema(dataTable, SchemaType.Source);
DataColumnCollection dataColumns = dataTable.Columns;
int columnCount = dataColumns.Count;
foreach (DataColumn dataColumn in dataColumns)
{
dataColumn.DataType = typeof(string);
}
dataAdapter.Fill(dataTable);
int rowCount = dataTable.Rows.Count;
if (!((rowCount > headerRowNum) &&
((columnCount == 24) || (columnCount == 40))))
{
dataTable.Dispose();
dataAdapter.Dispose();
continue;
}
DataRow headerRow = dataTable.Rows[headerRowNum];
// ... examine tableName in the debugger; it is correct including $ ...
// ... examine headerRow in the debugger; some elements are null ...
string x = (string)headerRow[5]; // crashes

Using Odbc, besides the obvious changes to the DotNet class names and the
connection string, one more change is this:
int headerRowNum = 0;
instead of
int headerRowNum = 1;
The reason of course is that HDR=NO in Odbc means HDR=YES.
So with those changes, I read the Excel file in Odbc instead of OleDb. The
corruption in Odbc is identical to the corruption in OleDb.
Does anyone know how to get either OleDb or Odbc to read an Excel file?

I think I have more news though. Although Excel 2003 saved the .xls file,
now I guess that Excel 97 or 2000 or XP first created the .xls file. Here's
the reason: In all of the cells which I'm trying to interpret, either the
cell value was alphabetic, or the cell value had an apostrophe followed by
digits, so the cell values were already text. For some other cells, Excel
2003 displays them with a little triangle in one corner, and a tooltip
explains that the cell is text due to one of those reasons. However, for
the cells which I'm trying to interpret, Excel 2003 doesn't display that
triangle ... at first. However, if I put the mouse cursor in one of those
cells and double-click, but do not type anything, and then move the mouse
cursor to another cell and click, then suddenly Excel 2003 displays that
little triangle in one corner of the cell that I double-clicked, and a
tooltip explains that the cell is text due to one of those reasons.
Furthermore, if I hit Ctrl+S to save the file, then suddenly OleDb starts
correctly reading the cell that I double-clicked.

So, was there a bug in Excel 97 or 2000 or XP, which corrupted the format
when saving some cells which were text due to having apostrophe plus digits,
but which saved some other cells correctly? And Excel 2003 contains a hack
to read corrupted .xls files correctly? But OleDb is missing that hack so
it gets all confused when reading files which were saved by Excel 97 or 2000
or XP (or saved by Excel 2003 without having double-clicked a corrupted
cell)?

Is there some way to get the same hack copied into OleDb so it will start
working?
 
P

Patrice

AFAIK this is a datatype problem. As there is no such thing as a field type
in Excel, the driver uses the first few rows to "guess" the type. If the
column is empty or contains contradictory data after those first few lines,
the driver just set the data to null.

You could do a first try to see if having a value in each cell would work to
see if this is that or some other problem...

--
Patrice

Norman Diamond said:
With Visual Studio 2005 SP1 and DotNet Framework 2 SP1, Excel 2003 can
read
a file that was saved by Excel 2003, but the OleDb driver can't.

Example: Excel row 2 maps to row 1 in OleDb, and Excel columns 1 to 24 (A
to X) map to columns 0 to 23 in OleDb. Counting these in OleDb, row 1
columns 0 to 4 are OK, columns 5 and 6 get corrupted to DbNull, columns 7
to
12 are OK, columns 13 and 14 get corrupted to DbNull, etc.

If I do Interop to Excel 2003 instead of using OleDb, it works. Counting
these in Excel interop, row 2 columns 1 to 24 are entirely OK.

Back to the OleDb problem. One worksheet has 40 columns instead of 24.
The
worksheet with 40 columns works. Several of the additional columns read
as
doubles instead of text but ToString() works on them. But all of the
24-column worksheets get corrupted by OleDb, and the corrupted column
numbers vary.

Does anyone know how to get OleDb to work?

OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
fileName + ";Extended Properties='Excel 8.0;HDR=NO'");
connection.Open();
DataTable schemaTable = connection.GetSchema("Tables");
DataRow[] schemaRows = schemaTable.Select();
foreach (DataRow schemaRow in schemaRows)
{
string tableName = schemaRow["TABLE_NAME"].ToString();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
int headerRowNum = 1;
DataTable dataTable = new DataTable(tableName);
dataAdapter.FillSchema(dataTable, SchemaType.Source);
DataColumnCollection dataColumns = dataTable.Columns;
int columnCount = dataColumns.Count;
foreach (DataColumn dataColumn in dataColumns)
{
dataColumn.DataType = typeof(string);
}
dataAdapter.Fill(dataTable);
int rowCount = dataTable.Rows.Count;
if (!((rowCount > headerRowNum) &&
((columnCount == 24) || (columnCount == 40))))
{
dataTable.Dispose();
dataAdapter.Dispose();
continue;
}
DataRow headerRow = dataTable.Rows[headerRowNum];
// ... examine tableName in the debugger; it is correct including $ ...
// ... examine headerRow in the debugger; some elements are null ...
string x = (string)headerRow[5]; // crashes

Using Odbc, besides the obvious changes to the DotNet class names and the
connection string, one more change is this:
int headerRowNum = 0;
instead of
int headerRowNum = 1;
The reason of course is that HDR=NO in Odbc means HDR=YES.
So with those changes, I read the Excel file in Odbc instead of OleDb.
The
corruption in Odbc is identical to the corruption in OleDb.
Does anyone know how to get either OleDb or Odbc to read an Excel file?

I think I have more news though. Although Excel 2003 saved the .xls file,
now I guess that Excel 97 or 2000 or XP first created the .xls file.
Here's
the reason: In all of the cells which I'm trying to interpret, either the
cell value was alphabetic, or the cell value had an apostrophe followed by
digits, so the cell values were already text. For some other cells, Excel
2003 displays them with a little triangle in one corner, and a tooltip
explains that the cell is text due to one of those reasons. However, for
the cells which I'm trying to interpret, Excel 2003 doesn't display that
triangle ... at first. However, if I put the mouse cursor in one of those
cells and double-click, but do not type anything, and then move the mouse
cursor to another cell and click, then suddenly Excel 2003 displays that
little triangle in one corner of the cell that I double-clicked, and a
tooltip explains that the cell is text due to one of those reasons.
Furthermore, if I hit Ctrl+S to save the file, then suddenly OleDb starts
correctly reading the cell that I double-clicked.

So, was there a bug in Excel 97 or 2000 or XP, which corrupted the format
when saving some cells which were text due to having apostrophe plus
digits,
but which saved some other cells correctly? And Excel 2003 contains a
hack
to read corrupted .xls files correctly? But OleDb is missing that hack so
it gets all confused when reading files which were saved by Excel 97 or
2000
or XP (or saved by Excel 2003 without having double-clicked a corrupted
cell)?

Is there some way to get the same hack copied into OleDb so it will start
working?
 
N

Norman Diamond

the driver uses the first few rows to "guess" the type.

Even after executing this code:
1. Is there any way to tell the driver to use the types that I specified?
or
2. Is there any way to tell the driver that a field containing
'15
isn't DBNull? Especially when the driver already figured out that a field
containing
'13
isn't DBNull....
You could do a first try to see if having a value in each cell would work
to see if this is that or some other problem...

OK, I can write code to detect that the driver read some fields as DBNull.
Then what should my code do next in order to persuade the driver to read
strings as strings?

How does Excel 2003 do it? Somehow Excel 2003 figures out that
'15
is a string, and it will even tell us so if we double-click on that cell.
(And if we tell Excel 2003 to save the file after double-clicking on that
cell, then the driver starts working correctly on that cell.) So what code
can I add to my application to do the same thing?


Patrice said:
AFAIK this is a datatype problem. As there is no such thing as a field
type in Excel, the driver uses the first few rows to "guess" the type. If
the column is empty or contains contradictory data after those first few
lines, the driver just set the data to null.

You could do a first try to see if having a value in each cell would work
to see if this is that or some other problem...

--
Patrice

Norman Diamond said:
With Visual Studio 2005 SP1 and DotNet Framework 2 SP1, Excel 2003 can
read
a file that was saved by Excel 2003, but the OleDb driver can't.

Example: Excel row 2 maps to row 1 in OleDb, and Excel columns 1 to 24
(A
to X) map to columns 0 to 23 in OleDb. Counting these in OleDb, row 1
columns 0 to 4 are OK, columns 5 and 6 get corrupted to DbNull, columns 7
to
12 are OK, columns 13 and 14 get corrupted to DbNull, etc.

If I do Interop to Excel 2003 instead of using OleDb, it works. Counting
these in Excel interop, row 2 columns 1 to 24 are entirely OK.

Back to the OleDb problem. One worksheet has 40 columns instead of 24.
The
worksheet with 40 columns works. Several of the additional columns read
as
doubles instead of text but ToString() works on them. But all of the
24-column worksheets get corrupted by OleDb, and the corrupted column
numbers vary.

Does anyone know how to get OleDb to work?

OleDbConnection connection = new OleDbConnection(
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
fileName + ";Extended Properties='Excel 8.0;HDR=NO'");
connection.Open();
DataTable schemaTable = connection.GetSchema("Tables");
DataRow[] schemaRows = schemaTable.Select();
foreach (DataRow schemaRow in schemaRows)
{
string tableName = schemaRow["TABLE_NAME"].ToString();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(
"SELECT * FROM [" + tableName + "]", connection);
int headerRowNum = 1;
DataTable dataTable = new DataTable(tableName);
dataAdapter.FillSchema(dataTable, SchemaType.Source);
DataColumnCollection dataColumns = dataTable.Columns;
int columnCount = dataColumns.Count;
foreach (DataColumn dataColumn in dataColumns)
{
dataColumn.DataType = typeof(string);
}
dataAdapter.Fill(dataTable);
int rowCount = dataTable.Rows.Count;
if (!((rowCount > headerRowNum) &&
((columnCount == 24) || (columnCount == 40))))
{
dataTable.Dispose();
dataAdapter.Dispose();
continue;
}
DataRow headerRow = dataTable.Rows[headerRowNum];
// ... examine tableName in the debugger; it is correct including $
...
// ... examine headerRow in the debugger; some elements are null ...
string x = (string)headerRow[5]; // crashes

Using Odbc, besides the obvious changes to the DotNet class names and the
connection string, one more change is this:
int headerRowNum = 0;
instead of
int headerRowNum = 1;
The reason of course is that HDR=NO in Odbc means HDR=YES.
So with those changes, I read the Excel file in Odbc instead of OleDb.
The
corruption in Odbc is identical to the corruption in OleDb.
Does anyone know how to get either OleDb or Odbc to read an Excel file?

I think I have more news though. Although Excel 2003 saved the .xls
file,
now I guess that Excel 97 or 2000 or XP first created the .xls file.
Here's
the reason: In all of the cells which I'm trying to interpret, either
the
cell value was alphabetic, or the cell value had an apostrophe followed
by
digits, so the cell values were already text. For some other cells,
Excel
2003 displays them with a little triangle in one corner, and a tooltip
explains that the cell is text due to one of those reasons. However, for
the cells which I'm trying to interpret, Excel 2003 doesn't display that
triangle ... at first. However, if I put the mouse cursor in one of
those
cells and double-click, but do not type anything, and then move the mouse
cursor to another cell and click, then suddenly Excel 2003 displays that
little triangle in one corner of the cell that I double-clicked, and a
tooltip explains that the cell is text due to one of those reasons.
Furthermore, if I hit Ctrl+S to save the file, then suddenly OleDb starts
correctly reading the cell that I double-clicked.

So, was there a bug in Excel 97 or 2000 or XP, which corrupted the format
when saving some cells which were text due to having apostrophe plus
digits,
but which saved some other cells correctly? And Excel 2003 contains a
hack
to read corrupted .xls files correctly? But OleDb is missing that hack
so
it gets all confused when reading files which were saved by Excel 97 or
2000
or XP (or saved by Excel 2003 without having double-clicked a corrupted
cell)?

Is there some way to get the same hack copied into OleDb so it will start
working?
 
J

Jialiang Ge [MSFT]

Hello,

From your post, my understanding on this issue is: you wonder why some
Excel worksheet cells returns DBNull through OLEDB or ODBC driver while
they are actually not Null, and how to work-around it. If I'm off base,
please feel free to let me know.

In our best guess without your xls file that we can test on directly, the
problem is caused by a limitation of the Excel ISAM driver in that once it
determines the datatype of an Excel column, it will return a Null for any
value that is not of the datatype the ISAM driver has defaulted to for that
excel column. The Excel driver reads a certain number of rows (by default,
8 rows) in the specified source to guess at the data type of each column.
When a column appears to contain mixed data types, especially numeric data
mixed with text data, the driver decides in favor of the majority data
type, and returns null values for cells that contain data of the other
type. (In a tie, the numeric type wins.) Most cell formatting options in
the Excel worksheet do not seem to affect this data type determination.
Here is a KB article that illustrates this limitation in detail:
http://support.microsoft.com/kb/194124.

In order to prove the hypothesis and narrow down our focus, we can do such
a test:
(1) Check the first 8 cells of the columns that, you said, can return the
values correctly, and see if the 8 cells are of the same data type (text or
numeric, etc)
(2) Check the first 8 cells of the columns that, you said, returns DBNull
values (crashes), and see if the 8 cells are of the same data type (text or
numeric, etc). In addition, if we manually change the values of the 8 cells
to text (not numeric) in Excel, and run your program again, can they be
retrieved successfully now?

If
test (1) turns out to be the same datatype, -and-
test (2) proves that the 8 cells are not of the same datatype. -and-
test (2) proves that when the 8 cells are set to the same datatype
manually in Excel, the data can be retrieved successfully.
Then
we can say the problem is indeed caused by
http://support.microsoft.com/kb/194124.
Else, I hope you can send the xls to my mailbox
([email protected], remove 'nospam.') , and I will test on it to
find out the root cause.

According to the KB article http://support.microsoft.com/kb/194124, there
are 2 workarounds:
(1) Insure that the data in Excel is entered as text. Just reformatting
the Excel column to Text will not accomplish this. You must re-enter the
existing values after reformatting the Excel column. In Excel, you can use
F5 to re-enter existing values in the selected cell.
If you xls source file is not allowed to be edited, I don't think this
workaround will fit your current situation.
(2) Add the IMEX=1 flag in the connection string, or change the default
TypeGuessRows=8 to a larger number, so the Excel will have a more correct
guess of the datatype.

Now, for the questions when you discuss the issue when Patrice:
Is there any way to tell the driver to use the types that I specified?
And Is there any way to tell the driver that a field containing?
I have to let you know that we cannot force the datatype programmatically
due to the ISAM driver limitation (see the above KB article).
How does Excel 2003 do it? Somehow Excel 2003 figures out that '15 is a
string, and it will even tell us so if we double-click on that cell.
In Excel 2003, the datatype of each cell is decided per cell, not per
column. If a cell contains '15 as a string, Excel feels that it is not
likely to be a numeric cell, so it will displays them with a little
triangle in one corner. However, in ISAM driver, the datatype is decided in
a completely different way. (see my previous explanation).

Please let me know if you have any other concerns, or need anything else.

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
For MSDN subscribers whose posts are left unanswered, please check this
document: http://blogs.msdn.com/msdnts/pages/postingAlias.aspx

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express/Windows Mail, please make sure
you clear the check box "Tools/Options/Read: Get 300 headers at a time" to
see your reply promptly.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
N

Norman Diamond

Mr./Ms. Ge, thank you for understanding the problem correctly. However, I
think that you have not diagnosed the bug in the Excel ISAM driver.

First, I think (though I am not sure) that a file created by Excel 2003 will
be readable correctly by the Excel ISAM driver. I think that the
troublesome files were created by Excel 97 or 2000 or XP (though I am not
sure). I have reproduced two odd cases:
(a) If Excel 2003 opens the file and saves it, the bug in the ISAM driver
remains.
(b) If Excel 2003 opens the file and I double-click each of the troublesome
cells, but do not do any other action (no typing or copying or pasting or
anything), and then save the file, then the ISAM driver starts working.

Second, the troublesome cells are in the second row of the worksheet. The
second row contains no null values at all. The second row contains no plain
numerics. Some are plain strings like ABC or DEF. Others are strings like
'15 or '14 or '13 or '12 etc. Whatever version of Excel created the file,
it saved these strings as strings. When Excel 2003 opens the file, it
displays these strings as strings. The ISAM driver handles some of these
strings OK (for example '13 reads as a string in some worksheets but not in
others). The ISAM driver mishandles some of these strings (for example '15
reads as DBNull in nearly every worksheet).

Now, in Excel 2003, if I double-click one of these troublesome cells, then
Excel 2003 displays a small blue triangle in one corner of the cell. The
explanation is that the string is a string because it starts with an
apostrophe. Excel 2003 is correct. The apostrophe was already there when
the file was created. Excel 2003 knows how to read the file even if an
older broken version miswrote the file. Only the ISAM driver has random
problems reading it.

If the ISAM driver reads 8 rows then it should have no problem at all. The
second row consists 100% of strings. Rows other than the second row have
some strings and some null values. The worksheets do not contain any
numerics at all. The ISAM driver should find the strings in the second row
and it should read strings correctly. Again, if I open the file in Excel
2003, double-click every cell in the second row, and save the file, then
suddenly the ISAM driver really does find the strings in the second row and
it reads strings correctly.

Therefore I think that you did not diagnose the ISAM driver correctly.

Of course I can't send you the real Excel files, but if I have time then
I'll try to install Excel 97 into a Virtual PC and see if I can construct a
repro.

You mentioned two workarounds in
http://support.microsoft.com/kb/194124
The first workaround sounds more complicated than mine, since I only have to
double-click. But I don't want to tell our customer to install Excel 2003,
reopen all their Excel files, double-click every cell in the second row of
every worksheet, and save. I want to fix the ISAM driver. The second
workaround is halfway workable because I would be glad to add IMEX=1 to the
connection string, but I don't think I have the right to make a global
change to our customer's registry.

Thank you for the explanation that Excel 2003 examines each cell but the
ISAM driver doesn't. Something is still very strange though. How can the
ISAM driver think that an apostrophe is a digit? There are no cells like a
plain 15 at all, they are already strings like '15. How can the ISAM driver
think that a column is numeric when the column doesn't contain any numeric
values?
 
N

Norman Diamond

Mr./Ms. Ge, thank you for referring me to
http://support.microsoft.com/kb/194124

It looks like the registry's value for ImportMixedTypes defaults to Text, so
adding IMEX=1 to the connection string has a good chance of success. The
customer might still have problems on some machines but there is a good
chance of success.

I still don't think my program should change the customer's registry if the
value is different though.
 
J

Jialiang Ge [MSFT]

Hello,

The ISAM driver product team published the KB article
http://support.microsoft.com/kb/194124 to explain the limitation. They did
not iterate each cell in the column to determine the datatype because of
some performance issues. I am sorry for the inconveniences if such a design
does not fit your request.

As you see, the workaround is to set IMEX=1 in the connection string.

Resetting the TypeGuessRows value in the regedit is useful when, for
instance, there is only one xls datasource on the server, and a program
reads data from the datasource, so we can reset the TypeGuessRows value on
the server's regedit, and help ISAM driver make a better guess. Apparently,
this workaround does not fit your situation, either.

The safest way that I'd recommend is to use Office object model, rather
than ISAM driver, especially when there is a very large chance that ISAM
driver cannot detect the corrent datatype according to the first 8 cells of
a column.

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
N

Norman Diamond

Mr./Ms. Ge,

Thank you for suggesting the Office Object Model instead of the Excel ISAM
driver. However, I wonder if I understand this correctly. When I tried to
use the Office Object Model, it depended on Excel being installed on the
target machine.

Our customer has Excel installed on some machines in order to create these
spreadsheets. However, my program has to run on machines where Excel is not
installed, and my program has to understand the contents of the
spreadsheets. My program controls hardware devices where the spreadsheets
say it can.

Is it possible to use Office Object Model without Excel being installed?
 
J

Jialiang Ge [MSFT]

Hello,

The prerequisite of using Office Object Model is to install the Office
product on the target machine. Since your customers do not have Excel
installed, the suggestion of using Office Object Model does not fit the
issue.

I notice that "the Excel workbooks contain the hardware device information,
and the client programs read the info. to control their hardware". Can I
understand that the workbooks only consist of plain text? Or even if there
are some charts in the workbook, the charts will be of no use for the
client programs? If that, here are two new suggestions for your reference:

Suggestion1. Use csv file with schema.ini
CSV (comma-separated values) file is a plain text file type that stores
tabular data. For instance, we open notepad.exe, type the following content:
1, 2, 3, 4
5, 6, 7, 8
save the file as filename.csv, then open it in Excel. The csv shows a 2
rows - 4 columns table, similar to what we used to do in Excel xls files.

CSV is a file type that is supported by Excel. We can save the original xls
files as csv with Excel Object Model (SaveAs) on the server computers that
have Excel installed, then distribute the generated csv files to the
clients. In the meantime, we also distribute a file "Schema.ini" file with
the csv. A schema.ini file contains the specifics on how data is formatted
in a particular text file (csv is actually a plain text file, while xls is
not, so schema.ini cannot apply to xls files to specify column types), and
is used by the Text ISAM driver to read and manipulate data. For more
information about Schema.ini, please see:
http://support.microsoft.com/kb/149090/en-us
http://msdn2.microsoft.com/en-us/library/ms709353.aspx
http://weblogs.asp.net/fmarguerie/archive/2003/10/01/manipulating-csv-files.
aspx

After we distribute both the csv and the schema.ini in the same folder, we
can retrieve the data with OLEDB or ODBC driver as what we did for xls
files. However, this time, the data will be retrieved with the datatype
that we specified in the schema.ini.

Suggestion2.
We can also read the xls content on a computer with Office Object model,
and write the contents with StreamWriter into a txt file where data is
delimitated by, e.g, tab '\t'. Then distribute the file to the client
machines. Our program will read the txt file with StreamReader, manipulate
each line of the txt, and split the values with '\t'.

Please have a try to see if the 2 suggestions fit your situation. If you
have any other concerns, feel free to let me know.

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
N

Norman Diamond

That is correct, the workbooks only consist of plain text, no charts and no
numeric fields.

We were going to suggest CSV to our customer but there are two problems:
(1) One Excel file contains more than one worksheet (describing more than
one hardware device).
(2) In some places several cells are merged. I could detect merged cells by
testing for DBNull before testing for "".

Anyway, I think that IMEX=1 is a big help. In cases where IMEX=1 doesn't
work, we will have to advise our customer to open the Excel file in Excel
2003, double-click every cell in the second row of every worksheet, save the
Excel file, and then try using the result on the target systems.

Actually I use CSV files for writing results, and I have a separate
complaint about that (it is sloooooooooow) but I will write a separate
posting about that.
 
N

Norman Diamond

I am wrong. IMEX=1 is a disaster.

Without IMEX=1, the Excel driver reads some cells in the second row as
DBNull instead of text. My program can test for this. My program can
report to the user that the Excel driver mishandles this Excel file so the
user has to play with the file in Excel 2003 and try again.

With IMEX=1, the Excel driver reads all cells in the second row as text. My
program thinks that the Excel driver is working. Then in the 58th row, the
Excel driver reads some cells as DBNull instead of text. My program thinks
that those cells are null. My program never sees the text. The text starts
with parentheses and letters, so no one could ever think they're numeric.
But the Excel driver gives DBNull to my program, my program thinks it is
working, the user thinks it is working, they get wrong results, and they
don't know that the results are wrong.

I had to take out IMEX=1. I have to warn the user that the Excel driver
cannot handle this Excel file.
 
J

Jialiang Ge [MSFT]

Hello,

I can hear that you are frustrated by the limits and issues of the ISAM
driver. I will try my best to help you work-around the situation.

So far, we have discussed the following five suggestions:
(1) Popup a message to ask the end users to convert the xls data to text
manually.
(2) Use IMEX=1 flag in the connection string
(3) Use csv file type + schema.ini, instead of xls
(4) Use Office object model, instead of ISAM driver.
(5) Enlarge the default TypeGuessRows in reg so as to have a better guess
of data type.

Suggestion (3) was disapproved because a csv file cannot contain multiple
worksheets. Suggestion (4) was inappropriate because the uses may not have
Excel installed. Suggestion (5) needs to change users' reg setting, so it
was not taken into consideration, either. Suggestion (2) encounter a
problem that a cell returns DBNull unexpectedly, and Suggestion (1) is the
last thing we would use.

Before I can proceed, I have some questions regarding the unexpected DBNull
returned by the Suggestion (2) - IMEX=1:

It was mentioned that "the text starts with parentheses and letters, so no
one could ever think they're numeric.". I am not sure if I understand it
correctly. Do you mean the cell's content is something like "{{afd" if we
open the xls directly in Excel, however, it returns DBNull when we read it
with OLEDB? What's the real content of the cell in Excel? In order to
reproduce the unexpected behavior on my side, I have tested such cell
content as "{{afd", "{}", "afds", "134", but they return texts as expected.
*In my best guess, is it possible that the cell you referred to is inside a
merged cell?* It is very much appreciated if you could send a sample xls to
my mailbox ([email protected]) so that I can have a clearer picture of
the problem.

If the cell is really inside a merged cell, a possible workaround is to
split all the merged cell on a computer with Office installed before we
distribute the xls to the end users. To split the merged cell
programmatically, please refer to the post:
http://www.themssforum.com/ExcelProgramming/splitting-cells-938113/
For you conveniences, I paste the main Unmerge function here:
Sub UnMerge_Cells()
Dim currentCell As Range
Dim mergeArea As Range
Dim mergeValue As Variant
Dim mergeCell As Range

For Each currentCell In Me.UsedRange
If currentCell.MergeCells Then
Set mergeArea = currentCell.mergeArea
mergeValue = mergeArea.Cells(1, 1).Value2
' split the area
mergeArea.UnMerge
For Each mergeCell In mergeArea
mergeCell.Value2 = mergeValue
Next mergeCell
End If
Next currentCell
End Sub

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
N

Norman Diamond

So far, we have discussed the following five suggestions:
(1) Popup a message to ask the end users to convert the xls data to text
manually.

That would not work because the Excel file contains several worksheets. In
my experience a text file can only contain a single worksheet.

But I do pop up a message to ask the end user to use Excel 2003 (on a
machine where Excel 2003 is installed) and adjust all cells in the second
row of each worksheet. This seems to be the most usable solution in this
case.
(2) Use IMEX=1 flag in the connection string

This is a disaster, as I described in the posting which you replied to this
time. IMEX=1 persuaded the Excel driver to read all cells in the second row
of each worksheet as text, so my program thought it was working. But then
on the 58th row of one worksheet, the Excel driver returned some cells as
DBNull instead of returning the actual text. My program thought that those
cells were really null. Usually an end user would think that they're being
shown correct results, they wouldn't know that they're being shown wrong
results, and the effect is a disaster.

Without IMEX=1, my program can detect the failure of the Excel driver on the
second row, so my program can pop up a message box. So it is better to stop
using IMEX=1.
(3) Use csv file type + schema.ini, instead of xls

See (1).
(4) Use Office object model, instead of ISAM driver.

Your previous posting said that use of the Office object model would require
Excel to be installed on the target machine.
(5) Enlarge the default TypeGuessRows in reg so as to have a better guess
of data type.

I think of experimenting on a development machine to see if setting
TypeGuessRows to 200 will solve this. But I still can't tell the customer
to do this on all of their workers' machines.
 
J

Jialiang Ge [MSFT]

Hello,

I am sorry because, possibly, I did not make the meaning clear enough in my
last reply.

In my last reply, I inquired the real content of the cell from which the
program returns DBNull. If we open the xls directly in Excel, what will we
see on the 58th row? Is it possibly in a merged cell?

This information is very important before I can proceed to propose futher
suggestions.

It was mentioned that "the text starts with parentheses and letters, so no
one could ever think they're numeric.". I am not sure if I understand it
correctly. Do you mean the cell's content is something like "{{afd" if we
open the xls directly in Excel, however, it returns DBNull when we read it
with OLEDB? What's the real content of the cell in Excel? In order to
reproduce the unexpected behavior on my side, I have tested such cell
content as "{{afd", "{}", "afds", "134", but they return texts as expected.
*In my best guess, is it possible that the cell you referred to is inside a
merged cell?* It is very much appreciated if you could send a sample xls to
my mailbox ([email protected]) so that I can have a clearer picture of
the problem.

If the cell is really inside a merged cell, a possible workaround is to
split all the merged cell on a computer with Office installed before we
distribute the xls to the end users. To split the merged cell
programmatically, please refer to the post:
http://www.themssforum.com/ExcelProgramming/splitting-cells-938113/
For you conveniences, I paste the main Unmerge function here:
Sub UnMerge_Cells()
Dim currentCell As Range
Dim mergeArea As Range
Dim mergeValue As Variant
Dim mergeCell As Range

For Each currentCell In Me.UsedRange
If currentCell.MergeCells Then
Set mergeArea = currentCell.mergeArea
mergeValue = mergeArea.Cells(1, 1).Value2
' split the area
mergeArea.UnMerge
For Each mergeCell In mergeArea
mergeCell.Value2 = mergeValue
Next mergeCell
End If
Next currentCell
End Sub

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
N

Norman Diamond

In my last reply, I inquired the real content of the cell from which the
program returns DBNull. If we open the xls directly in Excel, what will we
see on the 58th row?

That particular cell contains a string. The string starts with a
parenthesis and a letter:
(R
Is it possibly in a merged cell?

It is not. But my program thinks the cell is in a merged cell, because the
Excel driver returned DBNull.

We cannot split merged cells. Merged cells have meanings and my program
recognizes them. Maybe this helps explain why it is a disaster when my
program thinks that some cell is part of a merged cell when the cell really
has its own contents.

As mentioned in previous postings, I stopped using IMEX=1. I just inspect
the second row of each worksheet to see if all cells read as strings, and if
some of them read as DBNull then I pop up a warning. Then the user has to
use a machine with Excel installed and reformat the spreadsheet. I don't
think we'll find a better solution than this.

I still wish that the Excel driver would honour these settings though, as
shown in my first posting:
dataAdapter.FillSchema(dataTable, SchemaType.Source);
DataColumnCollection dataColumns = dataTable.Columns;
int columnCount = dataColumns.Count;
foreach (DataColumn dataColumn in dataColumns)
{
dataColumn.DataType = typeof(string);
}
dataAdapter.Fill(dataTable);
 
J

Jialiang Ge [MSFT]

Hello
That particular cell contains a string. The string starts
with a parenthesis and a letter:
(R
Thank you for the information. In order to reproduce the issue, I have
built a virtual PC with Office 97. I created a xls with Excel 97 and typed
some numeric (545 / 45 / -1 / 0) and texts(abc / '545 / (R / (R123 / {})
into a column. However, all the cells in the xls still return text as
expected from my OLEDB program. Therefore, in my best guess, the issue does
not result from the version of Excel. I suspect that the xls you tested may
has been corrupted, because, as you mentioned, some cells do not show the
small triangle until users click and leave the cell. In order to prove this
guess, would you have a look at the KB article:

Summary of methods to recover data from corrupted files
http://support.microsoft.com/kb/142117/EN-US/

It provides a tool Recover.exe that can recover the corrupted Excel 97
files. Please run the tool on your xls file, and use the OLEDB program to
read it again with *IMEX=1* in connection string, to see whether the 58th
row still returns DBNull unexpectedly.
I don't think we'll find a better solution than this.
The suggestion of "popping up a warning" is not friendly to end users, so I
think this is our last option to consider. Before that, I will do my best
to help you find a better resolution.
I still wish that the Excel driver would honour these settings though, as
shown in my first posting:
dataAdapter.FillSchema(dataTable, SchemaType.Source);
DataColumnCollection dataColumns = dataTable.Columns;
int columnCount = dataColumns.Count;
foreach (DataColumn dataColumn in dataColumns)
{
dataColumn.DataType = typeof(string);
}
dataAdapter.Fill(dataTable);

Thank you for the suggestions, I will let the Excel product team know about
it.

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
N

Norman Diamond

Mr./Ms. Ge,

Thank you for trying to reproduce this issue. I also tried to reproduce it
with Excel 2000, but Excel 2003 displayed the file correctly with little
triangles in cells like '15 and '14, and the Excel driver read them
correctly as text.

My development machine doesn't have Excel 97, only Excel 2000 and 2003. I
have Excel 95 in a virtual PC but I'm sure Excel 95 wasn't used in creating
this file (with merged cells). I executed the recover.exe file on my
development machine but it refused to install because Excel 97 is not
installed. If I have time I'll look for an Office 97 CD and install it in a
virtual PC, only in order to test recover.exe, but that will not be soon.
The suggestion of "popping up a warning" is not friendly to end users,

We agree, but disasters are worse than unfriendliness. I do not want to
show charts that look correct with no error messages, when the charts really
do have errors due to being read incorrectly. I do not want to control
hardware devices based on charts that appeared to be read correctly with no
error indications, when the charts really were read incorrectly.
 
J

Jialiang Ge [MSFT]

Hello,
I executed the recover.exe file on my development machine but it refused
to install because Excel 97 is not installed. If I have time I'll look for
an Office 97 CD and install it in a virtual PC, only in order to test
recover.exe, but that will not be soon

Without Excel 97, we can use the CLEANER.XLA addin embedded in recover.exe.
CLEANER.XLA is an addin contained in Office Resource Kit core tool set. You
can download the tool set from
http://download.microsoft.com/download/6/b/3/6b34f4c7-44e6-4d85-91d9-1acf947
9da7d/orktools.exe, install it, and go to Windows Start->All Programs->
Microsoft Office Tools-> Microsoft Office 2000 Resource Kit Tools-> Excel
Tools, and run the addin CLEANER.XLA on the xls created by Excel 97 (See
"To install the Excel 97 File Recovery Macro" section in the article:
https://www.microsoft.com.nsatc.net/technet/archive/office/office97/reskit/o
ffice97/appa.mspx?mfr=true).
We agree, but disasters are worse than unfriendliness. I do not want to
show charts that look correct with no error messages, when the charts
really do have errors due to being read incorrectly. I do not want to
control hardware devices based on charts that appeared to be read correctly
with no error indications, when the charts really were read incorrectly.

It sounds to me that you have decided to use the suggestion (1): pop up a
warning message to end users. If you encounter any follow-up problem
regarding the issue, please do let me know. I will spare no effort to help
you.

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
N

Norman Diamond

Mostly I followed your instructions. The only exception is that page
https://www.microsoft.com.nsatc.net...e/office97/reskit/office97/appa.mspx?mfr=true
says:
"In the Excel\Recover folder, copy Cleaner.xla to the Program
Files\Microsoft Office\Office\Library folder on your computer."
but I copied to folder
C:\Program Files\Microsoft Office\OFFICE11\Library

In Excel 2003, the add-in started to work. Here is a screenshot where it
wouldn't let me see the full pathnames:
http://www.geocities.jp/hitotsubishi/cleaner_dialog.png
Of course I knew the full source pathname, and it looked like the
destination was going to be in the same folder, so I clicked Create.

Then the add-in gave an error message:
http://www.geocities.jp/hitotsubishi/cleaner_error.png
Here is a rough translation:
Execution time error '1004':
Due to programming, access to the Visual Basic project is missing trust

While typing this message, I figured it out. The default setting to copy
VBA Components must be cleared. Then the add-in ran, but the results were
ugly, with all cell formatting lost. Next I understood that the option to
copy Cell Formats must be set. I enabled all options except VBA Components
and the add-in ran again.

The original .xls file is 1,487 KB in size. The Reconstructed .xls file is
910 KB in size. I am not feeling very confident about this.

Indeed the results are worse. As we've been discussing all this time, the
Windows Excel driver reads SOME cells like '13 as strings but it reads SOME
cells like '15 as DBNull. Now reading the Reconstructed file, the Excel
driver reads even more of those cells as DBNull. In the original file, in
just one of the worksheets, the Excel driver read cell value '15 correctly
as a string. In the Reconstructed file, the Excel driver reads that one as
DBNull now, so it no longer reads '15 correctly from any of the worksheets.
 
J

Jialiang Ge [MSFT]

Hello

Thank you for your efforts in trying out the CLEARNER.XLA add-in.
The original .xls file is 1,487 KB in size. The Reconstructed .xls
file is 910 KB in size. I am not feeling very confident about
this.
I think the reduce of file size is possibly because we do not copy the Row
Height, Column Width, Page Setup info to the result xls. These choices are
optional to be copied in the CLEANER.XLA.

It was mentioned that, in the original xls, some little triangles in the
worksheet cells do not show until users click and leave the cells manually.
Would you help to check if the these little triangles show in the recovered
xls generated by CLEANER.XLA?
Indeed the results are worse.
I am sorry for the worse results. I hope you have set 'IMEX=1' in the OLEDB
connection string when you do the test on the recovered xls.

It is really uneasy for me to figure out the problem without a sample xls
to test on directly. If the data in the xls is confidential, is it possible
for you to produce a reproducible xls from the old one, changing all its
confidential texts to non-sense value like "aaa"?

In addition, would you let me know you email by sending a mail to my
mailbox (e-mail address removed) so that we can discuss the issue more
efficiently?

Thanks in advance,

Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top