How to find the gap?

  • Thread starter Günter Brandstätter
  • Start date
G

Günter Brandstätter

Hi all,

I have a severe problem with a calculation on a table.
My table contains numbers from 1 to 9.999.999 in a field. Somewhere in this
table there may be a gap from let's say 2.000.000 to 3.000.000
The gap is not always on the same position.
I tried to find a procedure to display the two values (start and end of the
gap), but I did not find a solution.
I know it is easy to scan all records and find a number missing, but with
that amount of records I need a short way, because the time it takes for
scanning all records is too long.
Isn't there an mathematical solution for this problem? I tried to begin in
the middle of the recordset and find the maximum number from that point, if
it is equal greatest value in the field, i decrease the starting point and
find the maximum, and so on..... until I find the end of the gap. Then I go
from the end of the gap downwards and find the maximum value. This will be
the beginning of the gap.
This takes waaaaaaaaaaay too long time. I would need a shorter solution.

Any answer appreciated
Günter
 
D

Dirk Goldgar

Günter Brandstätter said:
Hi all,

I have a severe problem with a calculation on a table.
My table contains numbers from 1 to 9.999.999 in a field. Somewhere
in this table there may be a gap from let's say 2.000.000 to 3.000.000
The gap is not always on the same position.
I tried to find a procedure to display the two values (start and end
of the gap), but I did not find a solution.
I know it is easy to scan all records and find a number missing, but
with that amount of records I need a short way, because the time it
takes for scanning all records is too long.
Isn't there an mathematical solution for this problem? I tried to
begin in the middle of the recordset and find the maximum number from
that point, if it is equal greatest value in the field, i decrease
the starting point and find the maximum, and so on..... until I find
the end of the gap. Then I go from the end of the gap downwards and
find the maximum value. This will be the beginning of the gap.
This takes waaaaaaaaaaay too long time. I would need a shorter
solution.

Any answer appreciated
Günter

If there's only going to be one gap, then you can use a query with SQL
along these lines, though it won't run very efficiently:

SELECT Min(SeqNo) + 1 As StartGap, Max(SeqNo) - 1 As EndGap
FROM
(
SELECT tblSequence.SeqNo FROM tblSequence
WHERE
(
(tblSequence.SeqNo<>
(SELECT Max(SeqNo) FROM tblSequence))
AND
(Not Exists
(Select SeqNo From tblSequence L
WHERE L.SeqNo = tblSequence.SeqNo + 1))
)
OR
(
(tblSequence.SeqNo<>
(SELECT Min(SeqNo) FROM tblSequence))
AND
(Not Exists
(Select SeqNo From tblSequence H
WHERE H.SeqNo = tblSequence.SeqNo - 1))
)
) As Gap;

The Jet database engine is notorious for handling the "Not Exists"
construction is inefficiently. I set up a test table with the number of
records and the gap that you described, and the query took several
minutes to run. I believe I may be able to come up with a more
efficient solution -- and certainly it could be handled very quickly
indeed if you had a matching table with no gaps, to LEFT JOIN to -- but
I don't have time to pursue that at the moment.
 
G

Günter Brandstätter

Hi Dirk,
thank you for this solution, I will give it a try tomorrow morning.

Günter
 
D

Dirk Goldgar

Dirk Goldgar said:
If there's only going to be one gap, then you can use a query with SQL
along these lines, though it won't run very efficiently:

SELECT Min(SeqNo) + 1 As StartGap, Max(SeqNo) - 1 As EndGap
FROM
(
SELECT tblSequence.SeqNo FROM tblSequence
WHERE
(
(tblSequence.SeqNo<>
(SELECT Max(SeqNo) FROM tblSequence))
AND
(Not Exists
(Select SeqNo From tblSequence L
WHERE L.SeqNo = tblSequence.SeqNo + 1))
)
OR
(
(tblSequence.SeqNo<>
(SELECT Min(SeqNo) FROM tblSequence))
AND
(Not Exists
(Select SeqNo From tblSequence H
WHERE H.SeqNo = tblSequence.SeqNo - 1))
)
) As Gap;

The Jet database engine is notorious for handling the "Not Exists"
construction is inefficiently. I set up a test table with the number
of records and the gap that you described, and the query took several
minutes to run. I believe I may be able to come up with a more
efficient solution -- and certainly it could be handled very quickly
indeed if you had a matching table with no gaps, to LEFT JOIN to --
but I don't have time to pursue that at the moment.

Although usually SQL solutions are faster than recordset solutions, this
code version takes 1/3 the time of that query, and doesn't require that
there be only one gap:

'----- start of code -----
Sub FindGapRS()

Dim rs As DAO.Recordset
Dim l As Long

Debug.Print "Starting search at " & Timer()
Set rs = CurrentDb.OpenRecordset( _
"SELECT SeqNo FROM tblSequence ORDER BY SeqNo")

With rs
Do Until .EOF
If !SeqNo <> l + 1 Then
Debug.Print "Gap begins at " & (l + 1) & _
" and ends at " & (!SeqNo - 1)
End If
l = !SeqNo
.MoveNext
Loop
.Close
End With

Set rs = Nothing
Debug.Print "Ending search at " & Timer()

End Sub

'----- end of code -----
 
T

Tonín

This query could be an first approach. It provides the start values of all
gaps.

SELECT YourTable.YourField
FROM YourTable LEFT JOIN YourTable AS YourTable_1 ON YourTable.YourField =
YourTable_1.YourField+1
WHERE (((YourTable.YourField)<>1) AND ((YourTable_1.YourField) Is Null))
ORDER BY YourTable.YourField;

You need to find the end of the gaps, then you can use this other one
(pretty similar):

SELECT YourTable.YourField
FROM YourTable LEFT JOIN YourTable AS YourTable_1 ON YourTable.YourField =
YourTable_1.YourField-1
WHERE (((YourTable.YourField)<>9999999) AND ((YourTable_1.YourField) Is
Null))
ORDER BY YourTable.YourField;

If you need both, then perform a UNION query. It all depends on your actual
needs.

Hope being helpful,

Tonín
 
G

Günter Brandstätter

Thank you Dirk and Tonin,

I tried both, the SQL and the DAO approach and I found that DAO is the most
effective in this case. To search about 7,5 millons od entries it took me 83
seconds in DAO and about 170 secs in SQL.
So I decided to use Dirks code whitch I modified to my needs (see below).
These times have been measured with the number-field indexed. If the field
is not indexed, the procedure takes approximately double time.

Thanks again
Günter

'begin of code
Sub FindGapRS()

Dim rs As DAO.Recordset
Dim l As Long
Dim strSQL As String
Dim lBegin As Single
Dim lng_MIN_gap As Long
Dim gapCount As Long
Dim myArray() As Long

lng_MIN_gap = 100000 ' Minimal length of gap
strSQL = "SELECT lng_Number FROM tbl_Number " & _
"ORDER BY lng_Number"

lBegin = Timer

Set rs = CurrentDb.OpenRecordset(strSQL)
With rs
Do Until .EOF
If !lng_Number > l + lng_MIN_gap Then
gapCount = gapCount + 1
ReDim Preserve myArray(2, gapCount)
myArray(1, gapCount) = l
myArray(2, gapCount) = !lng_Number
Debug.Print "Gap starts at " & (l) & _
" and ends at " & (!lng_Number)
End If
If Not IsNull(!lng_Number) Then l = !lng_Number
.MoveNext
Loop
.Close
End With

For l = 1 To gapCount
Debug.Print myArray(1, l) & ", " & myArray(2, l)
Next

Set rs = Nothing
Debug.Print "Time elapsed: " & Timer() - lBegin

End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top