Running Count/Sum in a form

  • Thread starter Thread starter Brig
  • Start date Start date
B

Brig

We are trying to imitate excel to show the number for each record/row.

For example, we have a continous form that has 10 records. In the form
view, we would like to create a textbox that has a running sum or running
count that will count from 1 to 10 at the side of each record.

I know we can do this in a report. However, is there a way to do this in a
form?

We are using Access 2002.

Thank you very much in advance.
 
Hi,


You need an ordering field, or group of fields, without duplication,
that determines uniquely the order.


SELECT LAST(a.f1), LAST(a.f2), LAST(a.f3), SUM(b.sumWhat)
FROM myTable As a INNER JOIN myTable As b
ON a.OrderingField >= b.OrderingField
GROUP BY a.OrderingField
ORDER BY a.OrderingField


That query is not updateable. Another query, updateable, but a little bit
slower, could be:

SELECT a.*, (SELECT SUM(b.sumWhat)
FROM myTable As b
WHERE a.OrderingField >= b.OrderingField)
FROM myTable As a
ORDER BY a.OrderingField



Hoping it may help,
Vanderghast, Access MVP
 
Thank you so much.

Michel Walsh said:
Hi,


You need an ordering field, or group of fields, without duplication,
that determines uniquely the order.


SELECT LAST(a.f1), LAST(a.f2), LAST(a.f3), SUM(b.sumWhat)
FROM myTable As a INNER JOIN myTable As b
ON a.OrderingField >= b.OrderingField
GROUP BY a.OrderingField
ORDER BY a.OrderingField


That query is not updateable. Another query, updateable, but a little bit
slower, could be:

SELECT a.*, (SELECT SUM(b.sumWhat)
FROM myTable As b
WHERE a.OrderingField >= b.OrderingField)
FROM myTable As a
ORDER BY a.OrderingField



Hoping it may help,
Vanderghast, Access MVP
 
Michel Walsh said:
You need an ordering field, or group of fields, without duplication,
that determines uniquely the order.


SELECT LAST(a.f1), LAST(a.f2), LAST(a.f3), SUM(b.sumWhat)
FROM myTable As a INNER JOIN myTable As b
ON a.OrderingField >= b.OrderingField
GROUP BY a.OrderingField
ORDER BY a.OrderingField


That query is not updateable. Another query, updateable, but a little bit
slower, could be:

SELECT a.*, (SELECT SUM(b.sumWhat)
FROM myTable As b
WHERE a.OrderingField >= b.OrderingField)
FROM myTable As a
ORDER BY a.OrderingField

Out of interest, have you tested this? The subquery approach is
familiar to me, the other 'group by' approach not, so I did my own
testing.

I used a modest ten thousand row table with two columns. MyKeyCol is a
sequential integer i.e. unique rows exist for values 1 to 10000.
MyDataCol is a random integer between 1 and 10000 and there are
duplicate values between rows. I decided to order on the MyDataCol
and, because we are talking about a row ID, I changed the set function
involved from SUM to COUNT (duplicates mean a not-so-neat row ID but
makes for a better test <g>).

Subquery version:

SELECT T1.MyKeyCol, T1.MyDataCol, (
SELECT COUNT(*) FROM 10K_row_table
WHERE MyDataCol <= T1.MyDataCol
) AS row_ID
FROM 10K_row_table T1
ORDER BY T1.MyDataCol;

Group by version:

SELECT LAST(a.MyKeyCol), LAST(a.MyDataCol), COUNT(b.MyDataCol)
FROM 10K_row_table As a
INNER JOIN 10K_row_table As b
ON a.MyDataCol >= b.MyDataCol
GROUP BY a.MyDataCol
ORDER BY a.MyDataCol;

Because I'm only familiar with the subquery approach, I felt unable to
tweak your group by version. [BTW shouldn't your subquery version as
posted have the join expression less than or equal to?]

The execution times (best of three) were as follows:

Subquery: 1min 44sec
Group by: 5min 36sec

Additionally, the group by version only returned 9680 rows (a symptom
of duplicates) and was not in row ID order. When I ordered on the
unique/sequential column, I got similar times (the group by version
ran 20secs faster) with correct results for both.

Will you please you confirm this is a fair test and agree the subquery
is superior approach due to it being updateable, faster and can handle
duplicates.

Many thanks,
Jamie.

--
 
Hi,


The Group By cannot handle duplicated tuples, since it returns just
"one" typical row (illustrated by the use of LAST) per group. The correlated
sub-query will pump all the tuples. In the case of not-ambiguous ordering,
not duplication should occur, or else unique position (no duplicated
position) for ex-equo won't be not-ambiguous assignation.


There are cases where a correlated sub-query are faster than a join, but
typically (whatever can be consider "typical" is illustrated here after) the
join is faster ( Henderson, "The Guru's Guide to Transact-SQL", page 143,
last paragraph). Cases where the correlated sub-query will be faster is when
it involves a special construct, like EXISTS, which can prune the search
(while the join may have to exhaust all the matching values, through the
index). We can expect the join to be faster than the correlated sub-query
since the join can immediate use indexes and walk down simultaneously, side
by side, the two indexes implied in the ON clause. The correlated sub-query
may miss that possibility in how to use efficiently the two indexes, and
thus, turn out to be slower. Sure, that assumes the fields are indexed, and
that table are larges (else, indexes won't be loaded).


You can see the side-by-side walk of the two index A LITTLE BIT as if
you manually want to spot any difference between two ordered lists. You
compare the first entry of each list, then the second entry, of each list,
and so on, moving your "index" from the left hand over the first list down
and your index from the right hand, on the second list, until you miss a
difference. Once you spot a disruption, and solve it, you don't start over,
at the start of the list, for one of the index. No, you continue to where
you were. That is a side-by-side walk of the JOIN. The correlated sub-query
WILL (is likely to) start over. The time difference is not as enormous as it
may first appear, since falling back at the right position is relatively
fast: If the main index is at the letter M, the index to be synch won't move
over each record of the second list, one record at a time one, from the
start, but some resynchronization is still required none the less and makes
the correlated sub runs slower, in general.


An BIG advantage of the join approach (that uses a GROUP BY), is that it
is entirely GRAPHICALLY reprehensible... well, when the join is an equi
join... (in MS SQL Server, non equi join are also graphically
reprehensible, but JET only graphically supports equi join).



Hoping it may help,
Vanderghast, Access MVP



Jamie Collins said:
Michel Walsh said:
You need an ordering field, or group of fields, without duplication,
that determines uniquely the order.


SELECT LAST(a.f1), LAST(a.f2), LAST(a.f3), SUM(b.sumWhat)
FROM myTable As a INNER JOIN myTable As b
ON a.OrderingField >= b.OrderingField
GROUP BY a.OrderingField
ORDER BY a.OrderingField


That query is not updateable. Another query, updateable, but a little bit
slower, could be:

SELECT a.*, (SELECT SUM(b.sumWhat)
FROM myTable As b
WHERE a.OrderingField >= b.OrderingField)
FROM myTable As a
ORDER BY a.OrderingField

Out of interest, have you tested this? The subquery approach is
familiar to me, the other 'group by' approach not, so I did my own
testing.

I used a modest ten thousand row table with two columns. MyKeyCol is a
sequential integer i.e. unique rows exist for values 1 to 10000.
MyDataCol is a random integer between 1 and 10000 and there are
duplicate values between rows. I decided to order on the MyDataCol
and, because we are talking about a row ID, I changed the set function
involved from SUM to COUNT (duplicates mean a not-so-neat row ID but
makes for a better test <g>).

Subquery version:

SELECT T1.MyKeyCol, T1.MyDataCol, (
SELECT COUNT(*) FROM 10K_row_table
WHERE MyDataCol <= T1.MyDataCol
) AS row_ID
FROM 10K_row_table T1
ORDER BY T1.MyDataCol;

Group by version:

SELECT LAST(a.MyKeyCol), LAST(a.MyDataCol), COUNT(b.MyDataCol)
FROM 10K_row_table As a
INNER JOIN 10K_row_table As b
ON a.MyDataCol >= b.MyDataCol
GROUP BY a.MyDataCol
ORDER BY a.MyDataCol;

Because I'm only familiar with the subquery approach, I felt unable to
tweak your group by version. [BTW shouldn't your subquery version as
posted have the join expression less than or equal to?]

The execution times (best of three) were as follows:

Subquery: 1min 44sec
Group by: 5min 36sec

Additionally, the group by version only returned 9680 rows (a symptom
of duplicates) and was not in row ID order. When I ordered on the
unique/sequential column, I got similar times (the group by version
ran 20secs faster) with correct results for both.

Will you please you confirm this is a fair test and agree the subquery
is superior approach due to it being updateable, faster and can handle
duplicates.

Many thanks,
Jamie.

--
 
Michel Walsh said:
We can expect the join to be faster than the correlated sub-query
since the join can immediate use indexes and walk down simultaneously, side
by side, the two indexes implied in the ON clause. The correlated sub-query
may miss that possibility in how to use efficiently the two indexes, and
thus, turn out to be slower. Sure, that assumes the fields are indexed, and
that table are larges (else, indexes won't be loaded).

My usual reply to an assertion such as yours is to ask, 'Did you
test?'

So, did you?

I appreciate the detail of your reply (snipped) and thank you for it.
May I ask, is it based on how JOINs and subqueries *should* work in
theory or is it based on how Jet' *does* work in actual
implementation? My practical experience with Jet seems to differ from
your description. I say 'seem' because I have no real evidence from
testing to offer, merely anecdotal evidence that Jet doesn't seem to
use indexes very efficiently in JOINS, ORDER BY and GROUP BY. I
suspect procedural code under the covers as being the culprit.

Back to practical testing. You asserted that indexing may give an
advantage to the JOIN in the group by approach which may not be
afforded to the subquery. In my original test I used no indexing;
again, I thought the absence of indexing would make for a better test
because it means more work for the SQL engine.

I just now created an index on MyDataCol, compacted the database to
ensure the index was rebuilt and re-ran the test. You may be surprised
by the results:

subquery: 0mins 17secs
group by: 5min 48secs

Again, group by results did not come out in rank order, which
presumably is a requirement for a row ID.

I think you may have missed the fact the JOIN is completed before the
GROUP BY clause is applied and a self JOIN such as yours returns a
relatively large row set. The most efficient way (I think) of
demonstrating this is to use COUNT(*) because the SQL engine is
optimized for this construct, so:

SELECT COUNT(*)
FROM 10K_row_table As a
INNER JOIN 10K_row_table As b
ON a.MyDataCol >= b.MyDataCol

This returns a single value in 2min 01secs (remember the subquery
approach completed and returned a 10K ordered rowset in just 17secs).

Will you now do your own testing of your assertions and post your
findings? Will you at least acknowledge the validity (or otherwise) of
my testing? If you don't, anyone reading this will be unsure whether
I've given your group by approach a fair test.
[a] BIG advantage of the join approach (that uses a GROUP BY), is that it
[can be represented] entirely GRAPHICALLY... well, when the join is an equi
join

From my perspective, there is zero advantage to being able to
represent a query graphically (whatever that means). And when the
trade-off is taking 200% to 2000% more time to run, I'm not sure
having a 'picture' of a query would be preferable to anyone.

BTW for ranking purposes, my subquery would be better written as:

SELECT COUNT(*) + 1 FROM 10K_row_table WHERE MyDataCol <
T1.MyDataCol

because it would award the lower number (higher rank) to duplicates of
the same rank.

Jamie.

--
 
Hi,


I didn't tested the joins and the sub-query, but I did once tested the
sub-query with DCount, ... and DCount WAS faster than the sub-query:



SubQuery 0.8281328230973491906673503162
DCount 0.0203543405491706580960360834

So, if DCount is 40 times faster than a SubQuery... no, I didn't test about
join, and I am really surprised by your results.


Here is the code (run TestDCount ), so anyone can make its own test (just
supply a table Iotas large enough, with (an indexed numerical field)
Iota ) :


------------------------
Private Type LARGE_INTEGER
lowpart As Long
highpart As Long
End Type

Private Declare Function QueryPerformanceCounter Lib "kernel32"
(lpPerformanceCount As LARGE_INTEGER) As Long
Private Declare Function QueryPerformanceFrequency Lib "kernel32"
(lpFrequency As LARGE_INTEGER) As Long



Public Sub TestDCount()
Dim freq As LARGE_INTEGER
Dim starting As LARGE_INTEGER
Dim ending As LARGE_INTEGER
Dim dfreq As Variant

QueryPerformanceFrequency freq
dfreq = LargeToDec(freq)

QueryPerformanceCounter starting
Set rst = CurrentProject.Connection.Execute("SELECT a.*, (SELECT
COUNT(*) FROM Iotas As b WHERE b.Iota>a.Iota) FROM Iotas as a;")
QueryPerformanceCounter ending

Debug.Print "SubQuery ", (LargeToDec(ending) - LargeToDec(starting)) /
dfreq

QueryPerformanceCounter starting
Set rst = CurrentProject.Connection.Execute("SELECT *, DCOUNT('*',
'Iotas', 'Iota>=' & Iota) FROM Iotas ;")
QueryPerformanceCounter ending

Debug.Print "DCount ", (LargeToDec(ending) - LargeToDec(starting)) /
dfreq

End Sub


Public Function LargeToDec(Arg As LARGE_INTEGER) As Variant

Dim temp As Variant
temp = 4 * CDec(1073741824) ' 2 ^ 32

If Arg.lowpart > 0 Then
LargeToDec = Arg.lowpart + Arg.highpart * temp
Else
LargeToDec = temp + Arg.lowpart + Arg.highpart * temp
End If

End Function

------------------------



Vanderghast, Access MVP



Jamie Collins said:
Michel Walsh said:
We can expect the join to be faster than the correlated sub-query
since the join can immediate use indexes and walk down simultaneously,
side
by side, the two indexes implied in the ON clause. The correlated
sub-query
may miss that possibility in how to use efficiently the two indexes, and
thus, turn out to be slower. Sure, that assumes the fields are indexed,
and
that table are larges (else, indexes won't be loaded).

My usual reply to an assertion such as yours is to ask, 'Did you
test?'

So, did you?

I appreciate the detail of your reply (snipped) and thank you for it.
May I ask, is it based on how JOINs and subqueries *should* work in
theory or is it based on how Jet' *does* work in actual
implementation? My practical experience with Jet seems to differ from
your description. I say 'seem' because I have no real evidence from
testing to offer, merely anecdotal evidence that Jet doesn't seem to
use indexes very efficiently in JOINS, ORDER BY and GROUP BY. I
suspect procedural code under the covers as being the culprit.

Back to practical testing. You asserted that indexing may give an
advantage to the JOIN in the group by approach which may not be
afforded to the subquery. In my original test I used no indexing;
again, I thought the absence of indexing would make for a better test
because it means more work for the SQL engine.

I just now created an index on MyDataCol, compacted the database to
ensure the index was rebuilt and re-ran the test. You may be surprised
by the results:

subquery: 0mins 17secs
group by: 5min 48secs

Again, group by results did not come out in rank order, which
presumably is a requirement for a row ID.

I think you may have missed the fact the JOIN is completed before the
GROUP BY clause is applied and a self JOIN such as yours returns a
relatively large row set. The most efficient way (I think) of
demonstrating this is to use COUNT(*) because the SQL engine is
optimized for this construct, so:

SELECT COUNT(*)
FROM 10K_row_table As a
INNER JOIN 10K_row_table As b
ON a.MyDataCol >= b.MyDataCol

This returns a single value in 2min 01secs (remember the subquery
approach completed and returned a 10K ordered rowset in just 17secs).

Will you now do your own testing of your assertions and post your
findings? Will you at least acknowledge the validity (or otherwise) of
my testing? If you don't, anyone reading this will be unsure whether
I've given your group by approach a fair test.
[a] BIG advantage of the join approach (that uses a GROUP BY), is that it
[can be represented] entirely GRAPHICALLY... well, when the join is an
equi
join

From my perspective, there is zero advantage to being able to
represent a query graphically (whatever that means). And when the
trade-off is taking 200% to 2000% more time to run, I'm not sure
having a 'picture' of a query would be preferable to anyone.

BTW for ranking purposes, my subquery would be better written as:

SELECT COUNT(*) + 1 FROM 10K_row_table WHERE MyDataCol <
T1.MyDataCol

because it would award the lower number (higher rank) to duplicates of
the same rank.

Jamie.

--
 
Back
Top