PC Review


Reply
Thread Tools Rate Thread

Delete dublicates from a table

 
 
jj
Guest
Posts: n/a
 
      24th Mar 2007
Hi

I have a very large table approximately 4 million records - every night I
import records to the table - unfortunately the import creates redundant
records which must be deleted. Is there a way to do this from the existing
table with a subquery ? It is not an option to create a new table, and use
distinctrow since the database size will reach the maximum size of 2
gigabyte by this operation.

Thanks
/JJ


 
Reply With Quote
 
 
 
 
John W. Vinson
Guest
Posts: n/a
 
      24th Mar 2007
On Sat, 24 Mar 2007 21:57:41 +0100, "jj" <(E-Mail Removed)> wrote:

>Hi
>
>I have a very large table approximately 4 million records - every night I
>import records to the table - unfortunately the import creates redundant
>records which must be deleted. Is there a way to do this from the existing
>table with a subquery ? It is not an option to create a new table, and use
>distinctrow since the database size will reach the maximum size of 2
>gigabyte by this operation.
>
>Thanks
>/JJ
>


Two ways... though with this size of table I'm not sure either will be fully
practical!

One would be to creat a unique Index on the field or combination of fields
which define uniqueness. This might blow your 2Gbyte limit.

The second would be to base your append query on a "unmatched" query between
the source and target:

INSERT INTO target (field, field, field, ...)
SELECT source.field, source.field, source.field...
FROM source LEFT JOIN target
ON source.linkfield = target.linkfield
WHERE target.linkfield IS NULL;

This query will exclude any existing records from the append. Bloat might
still be a problem as may performance.

At this scale (though I'm all for JET in most cases) I think you should be
seriously investigating SQL/Server or another Client/Server solution.

John W. Vinson [MVP]
 
Reply With Quote
 
MH
Guest
Posts: n/a
 
      24th Mar 2007
As John has already pointed out, it may be time to go to SQL Server. There
is a free version called "SQL Server 2005 Express" which you can download
from the Microsoft site along with the "Management Studio" which gives you
the oportunity to use Transact SQL, you can create stored procedures for
example.

MH

"jj" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi
>
> I have a very large table approximately 4 million records - every night I
> import records to the table - unfortunately the import creates redundant
> records which must be deleted. Is there a way to do this from the existing
> table with a subquery ? It is not an option to create a new table, and use
> distinctrow since the database size will reach the maximum size of 2
> gigabyte by this operation.
>
> Thanks
> /JJ
>



 
Reply With Quote
 
engles@ridesoft.com
Guest
Posts: n/a
 
      25th Mar 2007
On Mar 24, 3:45 pm, "MH" <n...@nohow.com> wrote:
> As John has already pointed out, it may be time to go to SQL Server. There
> is a free version called "SQL Server 2005 Express" which you can download
> from the Microsoft site along with the "Management Studio" which gives you
> the oportunity to use Transact SQL, you can create stored procedures for
> example.
>
> MH
>
> "jj" <j...@get2net.dk> wrote in message
>
> news:(E-Mail Removed)...
>
>
>
> > Hi

>
> I t is not an option to create a new table, and use
> > distinctrow since the database size will reach the maximum size of 2
> > gigabyte by this operation.

>
> > Thanks
> > /JJ- Hide quoted text -

>
> - Show quoted text -


You already have some very good replies. Thought I'd just comment on
the max size. For a one time operation, you do have an option - put
the table into a new attached database. I am fairly certain the 2GB
applies by file, so you can put a single table in a separate database,
link to it, use whatever processing you need to eliminate duplicates,
and bring it back. Don't forget the repair/compacting to keep
yourself out of trouble.

-- Larry Engles

 
Reply With Quote
 
jj
Guest
Posts: n/a
 
      25th Mar 2007
Hi guys

Thanks for your answers - I appreciates them - The problem is that I have
never tried working with SQL servers and it must work within a very short
time - so this is not an option at the moment.

I was thinking of someting like this.

Append all the records which are dublicated to a temporary table like:

INSERT INTO Tbl_Temp ( Test1, Test2, Test2 )
SELECTTest1, Test2, test3
FROM Tbl1
GROUP BY Test1, Test2, test3
HAVING Count(Tbl1.Test1)>1;

and then delete the dublicate records from tbl1 with a query like this:

DELETE DistinctRow Tbl1.*
from Tbl1
WHERE TEST1,TEST2,TEST3 IN
(SELECT Test1, Test2, test3
FROM TBL1
GROUP BY Test1, Test2, test3
having Count(Test1)>1)

But unfortunately this doesn't seem to work :-(

Any Idea why?

Thanks
JJ


<(E-Mail Removed)> skrev i en meddelelse
news:(E-Mail Removed)...
> On Mar 24, 3:45 pm, "MH" <n...@nohow.com> wrote:
>> As John has already pointed out, it may be time to go to SQL Server.
>> There
>> is a free version called "SQL Server 2005 Express" which you can download
>> from the Microsoft site along with the "Management Studio" which gives
>> you
>> the oportunity to use Transact SQL, you can create stored procedures for
>> example.
>>
>> MH
>>
>> "jj" <j...@get2net.dk> wrote in message
>>
>> news:(E-Mail Removed)...
>>
>>
>>
>> > Hi

>>
>> I t is not an option to create a new table, and use
>> > distinctrow since the database size will reach the maximum size of 2
>> > gigabyte by this operation.

>>
>> > Thanks
>> > /JJ- Hide quoted text -

>>
>> - Show quoted text -

>
> You already have some very good replies. Thought I'd just comment on
> the max size. For a one time operation, you do have an option - put
> the table into a new attached database. I am fairly certain the 2GB
> applies by file, so you can put a single table in a separate database,
> link to it, use whatever processing you need to eliminate duplicates,
> and bring it back. Don't forget the repair/compacting to keep
> yourself out of trouble.
>
> -- Larry Engles
>



 
Reply With Quote
 
Douglas J. Steele
Guest
Posts: n/a
 
      25th Mar 2007
WHERE doesn't work like that.

Try:

DELETE DistinctRow Tbl1.*
from Tbl1
WHERE TEST1 & "," & TEST2 & "," & TEST3 IN
(SELECT Test1 & "," & Test2 & "," & test3
FROM TBL1
GROUP BY Test1, Test2, test3
having Count(Test1)>1)


--
Doug Steele, Microsoft Access MVP
http://I.Am/DougSteele
(no private e-mails, please)


"jj" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi guys
>
> Thanks for your answers - I appreciates them - The problem is that I have
> never tried working with SQL servers and it must work within a very short
> time - so this is not an option at the moment.
>
> I was thinking of someting like this.
>
> Append all the records which are dublicated to a temporary table like:
>
> INSERT INTO Tbl_Temp ( Test1, Test2, Test2 )
> SELECTTest1, Test2, test3
> FROM Tbl1
> GROUP BY Test1, Test2, test3
> HAVING Count(Tbl1.Test1)>1;
>
> and then delete the dublicate records from tbl1 with a query like this:
>
> DELETE DistinctRow Tbl1.*
> from Tbl1
> WHERE TEST1,TEST2,TEST3 IN
> (SELECT Test1, Test2, test3
> FROM TBL1
> GROUP BY Test1, Test2, test3
> having Count(Test1)>1)
>
> But unfortunately this doesn't seem to work :-(
>
> Any Idea why?
>
> Thanks
> JJ
>
>
> <(E-Mail Removed)> skrev i en meddelelse
> news:(E-Mail Removed)...
>> On Mar 24, 3:45 pm, "MH" <n...@nohow.com> wrote:
>>> As John has already pointed out, it may be time to go to SQL Server.
>>> There
>>> is a free version called "SQL Server 2005 Express" which you can
>>> download
>>> from the Microsoft site along with the "Management Studio" which gives
>>> you
>>> the oportunity to use Transact SQL, you can create stored procedures for
>>> example.
>>>
>>> MH
>>>
>>> "jj" <j...@get2net.dk> wrote in message
>>>
>>> news:(E-Mail Removed)...
>>>
>>>
>>>
>>> > Hi
>>>
>>> I t is not an option to create a new table, and use
>>> > distinctrow since the database size will reach the maximum size of 2
>>> > gigabyte by this operation.
>>>
>>> > Thanks
>>> > /JJ- Hide quoted text -
>>>
>>> - Show quoted text -

>>
>> You already have some very good replies. Thought I'd just comment on
>> the max size. For a one time operation, you do have an option - put
>> the table into a new attached database. I am fairly certain the 2GB
>> applies by file, so you can put a single table in a separate database,
>> link to it, use whatever processing you need to eliminate duplicates,
>> and bring it back. Don't forget the repair/compacting to keep
>> yourself out of trouble.
>>
>> -- Larry Engles
>>

>
>



 
Reply With Quote
 
John W. Vinson
Guest
Posts: n/a
 
      25th Mar 2007
On Sun, 25 Mar 2007 08:58:50 +0200, "jj" <(E-Mail Removed)> wrote:

>INSERT INTO Tbl_Temp ( Test1, Test2, Test2 )
>SELECTTest1, Test2, test3
>FROM Tbl1
>GROUP BY Test1, Test2, test3
>HAVING Count(Tbl1.Test1)>1;


It's not necessary to insert and then delete! Just put a unique index on the
combination of the three fields and let the dups produce a warning message. If
you're doing this in a new empty mdb file you won't have the bloat issue.

John W. Vinson [MVP]
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
dublicates sferling Microsoft Outlook Discussion 1 31st Mar 2010 09:47 PM
Count dublicates Bogi Microsoft Excel Misc 2 28th Sep 2009 02:31 PM
How to find dublicates in Excel and delete it? bratka Microsoft Excel Misc 6 4th Jul 2008 12:51 PM
Search for dublicates =?Utf-8?B?TWFub3M=?= Microsoft Excel Worksheet Functions 1 2nd Jun 2006 07:41 AM
No dublicates except for zero value Andy Microsoft Access Database Table Design 2 16th Jan 2004 06:49 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 10:24 PM.