Dealing with unparseable records

J

jqq

Access 2003, Win XP

I am working on one-click automation programs for Access users. The
process involves importing data, updating/manipulating the data, then
exporting spreadsheets and reports.

The data to import is in comma-delimited files with double-quote text
qualifiers. Almost every file has a few records with problem data,
usually the use of double-quotes inside a text field (e.g., "Kennedy,
John "Jack" F."). There is no possibility of getting cleaner data or
of changing the file format.

The data is imported into an existing table using DoCmd.TransferText
with acImportDelim and import specifications. The bad data is not
imported and an ImportErrors table is created by Access; the
ImportErrors table contains the error type, problem field, and row
number of the bad records. (The problem records usually show as
"unparsable record" with no Field).

I have a routine that's been working for years to provide the database
user with a list of the problem records that were not included in
their output:
The data file is re-imported into a new table using DoCmd.TransferText
with import specifications that grab the whole record as one text
string.
Then both the new data table and the Access-created ImportErrors table
are opened as recordsets.
I loop through the ImportErrors recordset to get the row numbers of
the problem records, then move to that row of the DataTable recordset
and get the record data.
All of the error information (including the entire-record textstring)
is concatenated into an email message to the user, giving them the
list of problem records that weren't imported.

This has worked well until my latest couple of projects. The problem
is that these files have much larger records; the entire-record
textstring won't fit into a text field. I had to change the field
type to adLongVarWChar (Memo). Evidently that causes Access to switch
to a chunk-import method and the records are not imported in the same
order as the original import. That breaks the process, because the
data in the new data table with the row-number corresponding to the
row-number in the ImportErrors table is not the same record that was
originally skipped in the import; it's whatever random record happened
to get picked up at that point in the chunked import process.

I also can not sort the recordset prior to running the loop because
that field type does not allow sorting (using rstDataTable.Sort =
"SortField ASC").

Next I tried extracting the first portion of the DataTable textstring
into a new column [rstDataTbl.Fields("SortField") =
Mid(rstDataTbl.Fields("Data"), 11, 30)], sorting on that column, and
then performing the loop from the ImportErrors table.

First problem is that I have to have a primary key field or it won't
allow the update into the new field. I don't know how to
programmatically create a table with an auto-number primary key field
(same thing as checking the "Let Access add Primary Key" selection in
the Import Wizard) and haven't been able to find any instructions for
doing so. Is this possible, and if so, how do I do it?

I can stop the process and manually add an auto-number field to the
data table. That lets it work, but inconsistently. I haven't yet
figured out the exact problem, but it must be something in the
recordset sorting that doesn't match the sort in the original
datafile.

However, this whole rigmarole takes a LONG TIME to run.

Does anyone have suggestions for an alternate method to produce the
same result (e.g., a listing of the records not imported due to bad
data), or any suggestions for improving the routine I'm using?

Thanks very much.

jamileh
 
J

John Nurick

I'd work at the text file level. I'd write a script to scan the text
file and identify the bad records before attempting to import the file.
One could either (a) flag the bad records and return the whole file to
the user, or (b) filter them out and return just the bad records to the
user, then import the valid records from the filtered file.



Access 2003, Win XP

I am working on one-click automation programs for Access users. The
process involves importing data, updating/manipulating the data, then
exporting spreadsheets and reports.

The data to import is in comma-delimited files with double-quote text
qualifiers. Almost every file has a few records with problem data,
usually the use of double-quotes inside a text field (e.g., "Kennedy,
John "Jack" F."). There is no possibility of getting cleaner data or
of changing the file format.

The data is imported into an existing table using DoCmd.TransferText
with acImportDelim and import specifications. The bad data is not
imported and an ImportErrors table is created by Access; the
ImportErrors table contains the error type, problem field, and row
number of the bad records. (The problem records usually show as
"unparsable record" with no Field).

I have a routine that's been working for years to provide the database
user with a list of the problem records that were not included in
their output:
The data file is re-imported into a new table using DoCmd.TransferText
with import specifications that grab the whole record as one text
string.
Then both the new data table and the Access-created ImportErrors table
are opened as recordsets.
I loop through the ImportErrors recordset to get the row numbers of
the problem records, then move to that row of the DataTable recordset
and get the record data.
All of the error information (including the entire-record textstring)
is concatenated into an email message to the user, giving them the
list of problem records that weren't imported.

This has worked well until my latest couple of projects. The problem
is that these files have much larger records; the entire-record
textstring won't fit into a text field. I had to change the field
type to adLongVarWChar (Memo). Evidently that causes Access to switch
to a chunk-import method and the records are not imported in the same
order as the original import. That breaks the process, because the
data in the new data table with the row-number corresponding to the
row-number in the ImportErrors table is not the same record that was
originally skipped in the import; it's whatever random record happened
to get picked up at that point in the chunked import process.

I also can not sort the recordset prior to running the loop because
that field type does not allow sorting (using rstDataTable.Sort =
"SortField ASC").

Next I tried extracting the first portion of the DataTable textstring
into a new column [rstDataTbl.Fields("SortField") =
Mid(rstDataTbl.Fields("Data"), 11, 30)], sorting on that column, and
then performing the loop from the ImportErrors table.

First problem is that I have to have a primary key field or it won't
allow the update into the new field. I don't know how to
programmatically create a table with an auto-number primary key field
(same thing as checking the "Let Access add Primary Key" selection in
the Import Wizard) and haven't been able to find any instructions for
doing so. Is this possible, and if so, how do I do it?

I can stop the process and manually add an auto-number field to the
data table. That lets it work, but inconsistently. I haven't yet
figured out the exact problem, but it must be something in the
recordset sorting that doesn't match the sort in the original
datafile.

However, this whole rigmarole takes a LONG TIME to run.

Does anyone have suggestions for an alternate method to produce the
same result (e.g., a listing of the records not imported due to bad
data), or any suggestions for improving the routine I'm using?

Thanks very much.

jamileh
 
J

jqq

Yeah, I was hoping to avoid mucking with the textfiles. Does VBA have
that level of scripting or were you thinking WSH or something else? I
use a text editor for that kind of work, but the users won't have a
similar program, and I might have to get special perms authorized for
anything else - we try to keep their machines pretty locked down.

Thanks!
 
J

John Nurick

You could do it in pure VBA, though it would be a tremendous help to
have a regular expression engine available. With a bit of luck your IT
guys won't have locked down the VBScript regular expression object; try
setting a reference to the Microsoft VBScript Regular Expressions 5.5
library and then
Dim rgxR As VBScript_RegExp_55.RegExp
Set rgxR = CreateObject("VBScript.Regexp")

For preference I'd use Perl, because its text file and
string-manipulation functions are much more powerful than what the
VB/VBA/VBScript family offers.
 
J

jqq

John - thanks for the suggestions. I'm going to look into that
further for a better method.

In the meantime, I changed my import to fixed width and brought it in
as multiple columns, then stuck the data back together at the end.
Some days I'm smarter than others. :/

Thanks much.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top