G
GPO
My instructions are: Use ADO and Access 2000 - and avoid docmd.
I have several thousand text files (aka extracts) of several thousand rows
each that I want to insert into a table. Most of the rows in the extracts
are unwanted duplicates in the sense that my definition of a primary key in
the table, can appear several times over the course of several extracts.
Will probably have to sift through about 6 million rows of text to arrive at
a final table of about 1.4 million "unique" rows.
The extracts are named in such a way that if I load them in descending order
of file name, the most recent records (the ones I want) would load first. I
want to set things up such that subsequent records simply don't load if they
are duplicating the keys of what is already loaded. I thought I had seen
this functionality somewhere in Access (?transfer text), where a paste
errors table is created and the non duplicated records load successfully.
Instead I get the "The changes you requested to the table were not
successful because they would create duplicate values..." message (and, well
thanks but I know that).
For example
Extract name: "A1.txt"
KeyPart1 Key Part2 Field1 Field2 ...
..
..
..
1234 456 ABC DEF ...
..
..
..
Extract name: "A2.txt"
KeyPart1 Key Part2 Field1 Field2 ...
..
..
..
1234 456 ABE DEG ...
..
..
..
In the above example the row in A2.txt would be loaded first, and is the
desired row.
Given the volume of the data, I obviously want to be able to load it with
the fewest passes through the data possible. Is there a way to load only the
non duplicated records in each extract (hopefully in a single pass without
nested subqueries)?
Here is a hacked up aircode version of the import function called in the
process of looping through all the extracts (apols for any wrapping):
Function ImportExtract( _
strExtractName As String, _
strLocalStore As String, _
strDestTable As String) As Boolean
Dim cmd As ADODB.Command
Dim strTextFileNameModified As String
On Error GoTo ErrorHandler
'1. Modify the file name by substituting the dot (.) in file name for a
"#".
strTextFileNameModified = Replace(strExtractName, ".", "#", 1, 1,
vbTextCompare)
'3. Create command object
Set cmd = New ADODB.Command
With cmd
.ActiveConnection = CurrentProject.Connection
.CommandType = adCmdText
'4. Use SQL as command text.
'SQL references text file directly (needs schema.ini).
.CommandText = "INSERT INTO " & strDestTable & " ( KeyPart1, Key
Part2 [etc] ) " & _
"SELECT KeyPart1, Key Part2 [etc] " & _
"FROM " & strTextFileNameModified & _
" IN " & """" & """" & " [Text; DATABASE=" & strLocalStore & ";]"
'5. Run the SQL
.Execute
End With
ImportExtract = True
ExitHandler:
Set cmd = Nothing
Exit Function
ErrorHandler:
ImportExtract = False
Goto ExitHandler
End Function
Many thanks, and sorry for the length of the post.
I have several thousand text files (aka extracts) of several thousand rows
each that I want to insert into a table. Most of the rows in the extracts
are unwanted duplicates in the sense that my definition of a primary key in
the table, can appear several times over the course of several extracts.
Will probably have to sift through about 6 million rows of text to arrive at
a final table of about 1.4 million "unique" rows.
The extracts are named in such a way that if I load them in descending order
of file name, the most recent records (the ones I want) would load first. I
want to set things up such that subsequent records simply don't load if they
are duplicating the keys of what is already loaded. I thought I had seen
this functionality somewhere in Access (?transfer text), where a paste
errors table is created and the non duplicated records load successfully.
Instead I get the "The changes you requested to the table were not
successful because they would create duplicate values..." message (and, well
thanks but I know that).
For example
Extract name: "A1.txt"
KeyPart1 Key Part2 Field1 Field2 ...
..
..
..
1234 456 ABC DEF ...
..
..
..
Extract name: "A2.txt"
KeyPart1 Key Part2 Field1 Field2 ...
..
..
..
1234 456 ABE DEG ...
..
..
..
In the above example the row in A2.txt would be loaded first, and is the
desired row.
Given the volume of the data, I obviously want to be able to load it with
the fewest passes through the data possible. Is there a way to load only the
non duplicated records in each extract (hopefully in a single pass without
nested subqueries)?
Here is a hacked up aircode version of the import function called in the
process of looping through all the extracts (apols for any wrapping):
Function ImportExtract( _
strExtractName As String, _
strLocalStore As String, _
strDestTable As String) As Boolean
Dim cmd As ADODB.Command
Dim strTextFileNameModified As String
On Error GoTo ErrorHandler
'1. Modify the file name by substituting the dot (.) in file name for a
"#".
strTextFileNameModified = Replace(strExtractName, ".", "#", 1, 1,
vbTextCompare)
'3. Create command object
Set cmd = New ADODB.Command
With cmd
.ActiveConnection = CurrentProject.Connection
.CommandType = adCmdText
'4. Use SQL as command text.
'SQL references text file directly (needs schema.ini).
.CommandText = "INSERT INTO " & strDestTable & " ( KeyPart1, Key
Part2 [etc] ) " & _
"SELECT KeyPart1, Key Part2 [etc] " & _
"FROM " & strTextFileNameModified & _
" IN " & """" & """" & " [Text; DATABASE=" & strLocalStore & ";]"
'5. Run the SQL
.Execute
End With
ImportExtract = True
ExitHandler:
Set cmd = Nothing
Exit Function
ErrorHandler:
ImportExtract = False
Goto ExitHandler
End Function
Many thanks, and sorry for the length of the post.