Delete dublicates from a table

J

jj

Hi

I have a very large table approximately 4 million records - every night I
import records to the table - unfortunately the import creates redundant
records which must be deleted. Is there a way to do this from the existing
table with a subquery ? It is not an option to create a new table, and use
distinctrow since the database size will reach the maximum size of 2
gigabyte by this operation.

Thanks
/JJ
 
J

John W. Vinson

Hi

I have a very large table approximately 4 million records - every night I
import records to the table - unfortunately the import creates redundant
records which must be deleted. Is there a way to do this from the existing
table with a subquery ? It is not an option to create a new table, and use
distinctrow since the database size will reach the maximum size of 2
gigabyte by this operation.

Thanks
/JJ

Two ways... though with this size of table I'm not sure either will be fully
practical!

One would be to creat a unique Index on the field or combination of fields
which define uniqueness. This might blow your 2Gbyte limit.

The second would be to base your append query on a "unmatched" query between
the source and target:

INSERT INTO target (field, field, field, ...)
SELECT source.field, source.field, source.field...
FROM source LEFT JOIN target
ON source.linkfield = target.linkfield
WHERE target.linkfield IS NULL;

This query will exclude any existing records from the append. Bloat might
still be a problem as may performance.

At this scale (though I'm all for JET in most cases) I think you should be
seriously investigating SQL/Server or another Client/Server solution.

John W. Vinson [MVP]
 
M

MH

As John has already pointed out, it may be time to go to SQL Server. There
is a free version called "SQL Server 2005 Express" which you can download
from the Microsoft site along with the "Management Studio" which gives you
the oportunity to use Transact SQL, you can create stored procedures for
example.

MH
 
E

engles

As John has already pointed out, it may be time to go to SQL Server. There
is a free version called "SQL Server 2005 Express" which you can download
from the Microsoft site along with the "Management Studio" which gives you
the oportunity to use Transact SQL, you can create stored procedures for
example.

MH






I t is not an option to create a new table, and use


- Show quoted text -

You already have some very good replies. Thought I'd just comment on
the max size. For a one time operation, you do have an option - put
the table into a new attached database. I am fairly certain the 2GB
applies by file, so you can put a single table in a separate database,
link to it, use whatever processing you need to eliminate duplicates,
and bring it back. Don't forget the repair/compacting to keep
yourself out of trouble.

-- Larry Engles
 
J

jj

Hi guys

Thanks for your answers - I appreciates them - The problem is that I have
never tried working with SQL servers and it must work within a very short
time - so this is not an option at the moment.

I was thinking of someting like this.

Append all the records which are dublicated to a temporary table like:

INSERT INTO Tbl_Temp ( Test1, Test2, Test2 )
SELECTTest1, Test2, test3
FROM Tbl1
GROUP BY Test1, Test2, test3
HAVING Count(Tbl1.Test1)>1;

and then delete the dublicate records from tbl1 with a query like this:

DELETE DistinctRow Tbl1.*
from Tbl1
WHERE TEST1,TEST2,TEST3 IN
(SELECT Test1, Test2, test3
FROM TBL1
GROUP BY Test1, Test2, test3
having Count(Test1)>1)

But unfortunately this doesn't seem to work :-(

Any Idea why?

Thanks
JJ
 
D

Douglas J. Steele

WHERE doesn't work like that.

Try:

DELETE DistinctRow Tbl1.*
from Tbl1
WHERE TEST1 & "," & TEST2 & "," & TEST3 IN
(SELECT Test1 & "," & Test2 & "," & test3
FROM TBL1
GROUP BY Test1, Test2, test3
having Count(Test1)>1)
 
J

John W. Vinson

INSERT INTO Tbl_Temp ( Test1, Test2, Test2 )
SELECTTest1, Test2, test3
FROM Tbl1
GROUP BY Test1, Test2, test3
HAVING Count(Tbl1.Test1)>1;

It's not necessary to insert and then delete! Just put a unique index on the
combination of the three fields and let the dups produce a warning message. If
you're doing this in a new empty mdb file you won't have the bloat issue.

John W. Vinson [MVP]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top