Simplify SQL

  • Thread starter Thread starter Alan
  • Start date Start date
A

Alan

Hi experets,

I have a SQL running very slowly on Vb program accessing MS Access
database, if possible, could you tell me how to simplify it? The SQL
is to select the the smaller files in the groups of same-name file.

Update SrchSession AS s set Mark=true where filename in (select
filename from SrchSession where filename & size not in (select
Filename & max(size) from srchsession group by filename ) group by
filename having count(filename)>=2)

Tables:
Folder:
FolderID Long 4
FolderName Text 255
IsSrchTarget Y/N 1
SrchSession:
FileID Long 4
Mark Y/N 1
FolderID Long 4
Filename Text 120
Size Long 4
DupGroupID Long 4
ModDate Date/Time 8


Thanks in advance.

Alan
 
Dear Alan:

It looks like this would create a table scan even if critical indexing
were available. When you search for "filename & size" you are
searching for a calculated value in a subquery that has a calculated
value. That would be a major reason it is slow.

So now I am trying to figure out why you are searching for that.

Now the subquery:

(select Filename & max(size) from srchsession group by filename)

returns the maximum of size found for each filename, along with that
filename.

This makes me think it would work better if you had a correlated
subquery here instead. I think that would be:

FROM SrchSession T
WHERE size < (SELECT MAX(size) FROM srchsession T1 WHERE T1.Filename =
T.Filename)

If there were an index on Filename / size then this could use it to
speed things along. I think the same index could also be used to
speed up the HAVING COUNT(Filename) >= 2 as well. But without this in
front of me, I'm running about half on instinct in suggesting this.

I would suggest developing this as a SELECT query first while you're
optimizing it. After you've found ways to speed it, change it back to
an UPDATE. The SELECT query should show which rows are to be updated
and the current value of Mark.

Also, the update query might run slightly faster if you filter out the
already marked rows, only setting the ones that are not marked. If,
however, you have just cleared all the marks, this would not be such a
useful idea. But if some are already marked, and you have a new set
to mark, perhaps including many that are already marked, it may be
good to skip marking the already marked rows. It's worth a quick
experiment, perhaps.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Dear Tom,

Thanks for your prompt reply. The SQL posted was formated by MSAccess,
so you can't know my original intention. The following is the orginal
syntax:

Update SrchSession AS s set Mark=true where filename in
(select filename from SrchSession group by filename having
count(filename)>=2) and FileName & Size not in
(Select FileName & Max(Size) from SrchSession group by FileName having
count(filename)>=2)

Among a set of files, there are groups of same-name file: retrieved
by:
(select filename from SrchSession group by filename having
count(filename)>=2)
and I want to exclue the larger-size files:
FileName & Size not in (Select FileName & Max(Size) from SrchSession
group by FileName having count(filename)>=2)

I tried to break down the problem today, it was found that simple
update SQL worked fine but not this:
Update SrchSession AS s set Mark=true where filename in
(select filename from SrchSession group by filename having
count(filename)>=2)
For about 10000 records, this statement works in MSAccess but not VB6
ADO connection to MSAccess 97 mdb using connection.execute method.

It's so strange!

Alan
 
Back
Top