G
Guest
Hi all,
Need some tips to help me approach this problem. I would consider myself
advanced in Access and VBA, so just looking for conceptual pointers.
In essence, I have a number of text files, that needs to be parsed and
merged. Each text file will contain around 70,000 records. For now, lets
assume that I have only 2 text files that will need merging.
I have managed to succesfully parse the files and sort each line in the text
file into a database record that will contain 3 fields; a time stamp field
(converted to a long number as time is measured in thousandth's of seconds),
a Datasource field (i.e. who actually generated the text file), and a string
field (containing a text string of up to 255 characters).
Basically, what I would like to do is merge the data from the two files, in
such a way that I am only left with one unique set of records; i.e. any
duplicates in the second datasource that are identical to the first
datasource will be deleted, and any records that are in the second datasource
but not in the first will be added. I will therefore end up with a "superset"
of data.
The primary difficulty in the above is the time stamp field in each
datasource is dependant on the user's own PC clock. Therefore, there may be
time differences between records that would otherwise be identical. These
time differences are never more than 30 seconds. Moreover, the time
difference is not exactly the same for each matched record. A SELECT DISTINCT
query would have done the job for me, other than the fact that the time stamp
field prevents this from happening. So I though of finding out the difference
in times between the two recordsources, and as it turns out the difference
may vary every so slightly through the datasources, so some records may be
out by 0.15 of a second and others by 0.14 and others by 0.13, for example.
So, what I have done up until now is to create a loop through the recordset,
but as expected, this takes a longgggg time (note that records are not yet
deleted/added; they are just marked as being matched in the following code).
I would love to hear if there are better ways of approaching this problem.
Many thanks for any help that you can provide.
Dim CurrTime As Long
Dim varArray
CurrTime = Time()
'Recordset to retrieve the combat log
Set cnn = CurrentProject.Connection
Set rs1 = New ADODB.Recordset
strSql1 = "Select ParseID, lngTime, strCharName, strMsg, lngParseID from
tblParsedData ORDER BY lngTime ASC"
rs1.Open strSql1, cnn, adOpenKeyset, adLockOptimistic
varArray = rs1.GetRows
lngRecordCount1 = rs1.RecordCount
lngRecordCount1 = rs1.RecordCount - 1
strDataSource1 = "Leandra"
strDataSource2 = "Spuge"
For i = 0 To lngRecordCount1
If Not IsNull(varArray(4, i)) Then 'Looks to see if we have matched
it yet
'Do nothing as its aleady been matched, so move onto next i
Else
If varArray(2, i) = strDataSource1 Then
strMsg = varArray(3, i)
lngParseID = varArray(0, i)
lngTime1 = varArray(1, i)
For j = lngCurrRecord To lngRecordCount1
If (varArray(1, j) - lngTime1) > 30000 Then '>30
seconds, stop looking for a match
Exit For
Else
If IsNull(varArray(4, j)) Then 'Looks to see if we
have matched it yet
If varArray(2, j) = strDataSource2 Then
If varArray(3, j) = strMsg Then
varArray(4, j) = lngParseID
varArray(4, i) = varArray(0, j)
lngCurrRecord = j
Exit For
End If
End If
End If
End If
Next
End If
End If
Next i
Debug.Print Time() - CurrTime
Need some tips to help me approach this problem. I would consider myself
advanced in Access and VBA, so just looking for conceptual pointers.
In essence, I have a number of text files, that needs to be parsed and
merged. Each text file will contain around 70,000 records. For now, lets
assume that I have only 2 text files that will need merging.
I have managed to succesfully parse the files and sort each line in the text
file into a database record that will contain 3 fields; a time stamp field
(converted to a long number as time is measured in thousandth's of seconds),
a Datasource field (i.e. who actually generated the text file), and a string
field (containing a text string of up to 255 characters).
Basically, what I would like to do is merge the data from the two files, in
such a way that I am only left with one unique set of records; i.e. any
duplicates in the second datasource that are identical to the first
datasource will be deleted, and any records that are in the second datasource
but not in the first will be added. I will therefore end up with a "superset"
of data.
The primary difficulty in the above is the time stamp field in each
datasource is dependant on the user's own PC clock. Therefore, there may be
time differences between records that would otherwise be identical. These
time differences are never more than 30 seconds. Moreover, the time
difference is not exactly the same for each matched record. A SELECT DISTINCT
query would have done the job for me, other than the fact that the time stamp
field prevents this from happening. So I though of finding out the difference
in times between the two recordsources, and as it turns out the difference
may vary every so slightly through the datasources, so some records may be
out by 0.15 of a second and others by 0.14 and others by 0.13, for example.
So, what I have done up until now is to create a loop through the recordset,
but as expected, this takes a longgggg time (note that records are not yet
deleted/added; they are just marked as being matched in the following code).
I would love to hear if there are better ways of approaching this problem.
Many thanks for any help that you can provide.
Dim CurrTime As Long
Dim varArray
CurrTime = Time()
'Recordset to retrieve the combat log
Set cnn = CurrentProject.Connection
Set rs1 = New ADODB.Recordset
strSql1 = "Select ParseID, lngTime, strCharName, strMsg, lngParseID from
tblParsedData ORDER BY lngTime ASC"
rs1.Open strSql1, cnn, adOpenKeyset, adLockOptimistic
varArray = rs1.GetRows
lngRecordCount1 = rs1.RecordCount
lngRecordCount1 = rs1.RecordCount - 1
strDataSource1 = "Leandra"
strDataSource2 = "Spuge"
For i = 0 To lngRecordCount1
If Not IsNull(varArray(4, i)) Then 'Looks to see if we have matched
it yet
'Do nothing as its aleady been matched, so move onto next i
Else
If varArray(2, i) = strDataSource1 Then
strMsg = varArray(3, i)
lngParseID = varArray(0, i)
lngTime1 = varArray(1, i)
For j = lngCurrRecord To lngRecordCount1
If (varArray(1, j) - lngTime1) > 30000 Then '>30
seconds, stop looking for a match
Exit For
Else
If IsNull(varArray(4, j)) Then 'Looks to see if we
have matched it yet
If varArray(2, j) = strDataSource2 Then
If varArray(3, j) = strMsg Then
varArray(4, j) = lngParseID
varArray(4, i) = varArray(0, j)
lngCurrRecord = j
Exit For
End If
End If
End If
End If
Next
End If
End If
Next i
Debug.Print Time() - CurrTime