finding missing numbers in a sequence

G

Guest

I am a non-develoer who has inherited a DBIV database with over 400K records in one table of over 30 fields that I have imported into Access2000/2003. There are numerous errors, duplicates and missing data. I have a natual primary key I can use which is a long integer, (when I manage to separate the alpha part of some rows to a new column), that should be a sequence of #'s (1-400 000+). (This number will continue to increase in our database, but not necessarily in a sequential order). I need to find the gaps in the sequence. Any suggestions on how to write a query that will find the missing numbers?
 
T

Tom Ellison

The general approach is to use a subquery that finds the preceding
number (SELECT MAX(T1.id) WHERE T1.id < T.id AS PrevID) and filter out
those rows where this is NOT T.id - 1.

If you are not familiar with using aliases and subqueries, this
probably doesn't mean anything. I'll attempt a query based on my best
guess of your setup.

SELECT T.id
FROM YourTable T

This is meant to represent just a list of all id values.

SELECT T.id,
(SELECT MAX(T1.id) FROM YourTable T1
WHERE T1.id < T.ID) AS PrevID
FROM YourTable T

Now you have each id, and the previous one (except for the first,
which has no previous id value)

SELECT id, PrevID FROM (
SELECT T.id,
(SELECT MAX(T1.id) FROM YourTable T1
WHERE T1.id < T.ID) AS PrevID
FROM YourTable T
) X
WHERE PrevID <> id - 1

Can you adapt that to your table and column names? Does it answer the
question?

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
J

John Vinson

I am a non-develoer who has inherited a DBIV database with over 400K records in one table of over 30 fields that I have imported into Access2000/2003. There are numerous errors, duplicates and missing data. I have a natual primary key I can use which is a long integer, (when I manage to separate the alpha part of some rows to a new column), that should be a sequence of #'s (1-400 000+). (This number will continue to increase in our database, but not necessarily in a sequential order). I need to find the gaps in the sequence. Any suggestions on how to write a query that will find the missing numbers?

I'd suggest creating a table named Num with 1000 records, each with a
single long integer field N, values 0-1000 (you can fill this in
seconds in Excel and copy and paste it).

Now create a query named AllNum:

SELECT Num.N + (Num_1.N*1000 + 1 AS M FROM Num, Num AS Num_1;

This "cartesian join" query will have 1,000,000 values from one to a
million.

Now use a "Frustrated Outer Join" query to find all values which exist
in AllNum but NOT in your table:

SELECT AllNum.M
FROM AllNum LEFT JOIN yourtable
ON AllNum.M = yourtable.ID
WHERE yourtable.ID IS NULL
AND M < (SELECT Max(ID) FROM YourTable);
 
T

Tom Ellison

Dear Brad:

The T and T1 are not variables. They are "aliases." They are used to
represent separate instances of tables within a query. They can be
very convenient ways to shorthand the full table names in a query even
when there is only one instance of that table. But when a query must
track two instances of the same table (or query) then they are
indispensable.

As a result, you can generally leave them completely alone, changing
the names of the tables, but retaining them as is.

To be able to help you with your situation, I suggest you post the SQL
of a query that shows me what you would have without trying to provide
the missing number ranges. This query should contain the column that
shows the existing numbers. Be sure to tell me which column this
number column is. I will modify this query to produce the results you
desire (well, I think I will.) That should provide a good basis for
me to see what you are doing.

So, as I understand you, you will want the missing number series of
begin and end numbers separately for each County. Is this correct?

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
G

Guest

Tom
Here is the SQL of the main table query. You are correct I am trying to find the missing number series of begin and end[Book Number] for a given [County]. I will also be trying to apply a date range to the query, so the user will be prompted for a from and to date to search within. Please let me know if you can help me

SELECT [Main Table].[Received Date], [Main Table].County, [Main Table].[Book Type], [Main Table].[Book Number], [Main Table].Suffix, [Main Table].[Original Format], [Main Table].[Beginning Page Number], [Main Table].[Beginning Recording Date], [Main Table].[Beginning Instrument Number], [Main Table].[Ending Page Number], [Main Table].[Ending Recording Date], [Main Table].[Ending Instrument Number
FROM [Main Table]

Thank you again
Brad Grov

----- Tom Ellison wrote: ----

Dear Brad

The T and T1 are not variables. They are "aliases." They are used t
represent separate instances of tables within a query. They can b
very convenient ways to shorthand the full table names in a query eve
when there is only one instance of that table. But when a query mus
track two instances of the same table (or query) then they ar
indispensable

As a result, you can generally leave them completely alone, changin
the names of the tables, but retaining them as is

To be able to help you with your situation, I suggest you post the SQ
of a query that shows me what you would have without trying to provid
the missing number ranges. This query should contain the column tha
shows the existing numbers. Be sure to tell me which column thi
number column is. I will modify this query to produce the results yo
desire (well, I think I will.) That should provide a good basis fo
me to see what you are doing

So, as I understand you, you will want the missing number series o
begin and end numbers separately for each County. Is this correct

Tom Elliso
Microsoft Access MV
Ellison Enterprises - Your One Stop IT Expert


On Wed, 7 Apr 2004 06:01:10 -0700, "Brad Grove" <[email protected]
wrote
 
T

Tom Ellison

Dear Brad:

As I would understand this, you will want to see 3 columns: County,
Begin Series, and EndSeries. Showing things like the [Received Date]
of the book that preceeded the missing [Book Number] series would seem
unrelated to the problem, right?

So, here's what I come up with.

SELECT County, [Book Number] + 1 AS BeginSeries,
(SELECT MIN([Book Number]) - 1 FROM [Main Table] T1
WHERE T1.County = T.County
AND T1.[Book Number] > T.[Book Number]) AS EndSeries
FROM [Main Table] T
WHERE NOT EXISTS(SELECT * FROM [Main Table] T1
WHERE T1.County = T.County
AND T1.[Book Number] = T.[BookNumber] + 1)
AND EXISTS(SELECT * FROM [Main Table] T1
WHERE T1.County = T.County
AND T1.[Book Number] > T.[BookNumber])

There are 3 things of interest.

The first thing to understand is in the lines 6 - 8 (assuming you see
11 lines total). This says to filter out those books where the next
book number is in the table for the same county. For those books
where there is no next book number in that county, a series of missing
numbers begins with the next book number, as performed in the
BeginSeries column in line 1.

However, there is no missing series beginning at the highest book
number for each county. This is eliminated in lines 9 - 11.

This gives you the beginning number for each series of missing
numbers. Now you must find where that series ends.

Lines 2 - 4 do this. This is a subquery (as are the things in the 2
EXISTS clauses lines 6 - 8 and 9 - 11) which finds the lowest [Book
Number] in the county that is greater than the book number we found
where the following number is missing (lines 6 - 8) but which is not
the last book number for the county (lines 9 - 11). The end of the
series is the number of this existing book less 1.

Hope this makes sense.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts


Tom,
Here is the SQL of the main table query. You are correct I am trying to find the missing number series of begin and end[Book Number] for a given [County]. I will also be trying to apply a date range to the query, so the user will be prompted for a from and to date to search within. Please let me know if you can help me.

SELECT [Main Table].[Received Date], [Main Table].County, [Main Table].[Book Type], [Main Table].[Book Number], [Main Table].Suffix, [Main Table].[Original Format], [Main Table].[Beginning Page Number], [Main Table].[Beginning Recording Date], [Main Table].[Beginning Instrument Number], [Main Table].[Ending Page Number], [Main Table].[Ending Recording Date], [Main Table].[Ending Instrument Number]
FROM [Main Table];


Thank you again,
Brad Grove

----- Tom Ellison wrote: -----

Dear Brad:

The T and T1 are not variables. They are "aliases." They are used to
represent separate instances of tables within a query. They can be
very convenient ways to shorthand the full table names in a query even
when there is only one instance of that table. But when a query must
track two instances of the same table (or query) then they are
indispensable.

As a result, you can generally leave them completely alone, changing
the names of the tables, but retaining them as is.

To be able to help you with your situation, I suggest you post the SQL
of a query that shows me what you would have without trying to provide
the missing number ranges. This query should contain the column that
shows the existing numbers. Be sure to tell me which column this
number column is. I will modify this query to produce the results you
desire (well, I think I will.) That should provide a good basis for
me to see what you are doing.

So, as I understand you, you will want the missing number series of
begin and end numbers separately for each County. Is this correct?

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts


Mr. Ellison
I think this approach may work for me as well. I have a main table that has a field "Counties" which is linked to another table called "Counties". The "Counties" field in the Main Table is using a list box to pull the data from the "Counties" Table.
For each County listed in the Main Table (which there can be many counties) there is a Unique "Book Number" assigned, but the "Book Number" is not unique between different Counties. So in other words, the "Book Number" field will have identical numbers but not for the same County. (Hope that makes sense). From this info I am also looking for missing "Book Numbers" for a given "County". I am having a hard time appling this subquery to my application. I'm not sure of your use of "T" and "T1". Please let me know what the variables are. Any help you can give me would be greatly appreciated
Brad Grove
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top