problem calculating an average in a one to many relationship

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I need to calculate an appropriate mean and variance when there are "missing"
data. Let me explain with a dataset from some fisheries sampling we
conducted:
I have two tables and a one to many relationship.

Table 1 is the parent record and is the sampling event table. It records the
sample ID, date, site & replicate #. Table 2 is the child and contains the
information of what I caught in the sample by species and number

A simple example view of the Data might look like the below.

Table 1
sID date site rep
1 1/1/05 1 1
2 1/1/05 1 2
3 1/2/05 1 3
4 1/2/05 1 4

Table 2
sID spp number
1 a 1
1 b 2
1 c 1
2 a 5
2 c 3
2 d 1
3 b 2
3 c 1
3 d 1
4 a 2
4 c 1
4 e 2

Thus, average catch rates for the 5 species caught by site would be:
a = (1+5+0+2)/4 = 2.0
b = (2+0+2+0)/4 = 1.0
c = (1+3+1+1)/4 = 1.5
d = (0+1+1+0)/4 = 0.5
e = (0+0+0+2)/4 = 0.5

Question is can I either generate a query that inserts the needed zero's or
write a function to derive the mean and sd for each spp by site. Preferrably,
this could be called "on the fly" in a Pivot chart so I can look at mean
catch rates of multiple species simultaneously over time.

Any guidance or pointers in the right direction would be greatly appreciated.
 
There is no need to insert zeros unless you are relying on them for count to
divide by. You can use a TOTALs query to find MAX sID to be used as divisor.
 
Thanks for the reply, but I've been down that road and I've been banging my
head against the wall. Maybe I'm confusing a left join and an outer join.

Using the data example and running the following query:
SELECT Table1.site, Table2.spp, Avg(Nz([number],0)) AS [avg]
FROM Table1 LEFT JOIN Table2 ON Table1.sID = Table2.sID
GROUP BY Table1.site, Table2.spp;

I get the following results:
site spp avg
1 a 2.66
1 b 2
1 c 1.5
1 d 1
1 e 2

What am I missing?

Thanks also for the tip on Total Access Statistics. I will sheck out the
trial version.

Don
**********************************************
 
I think you are missing a table with a complete list of
species. Joining Table1 only picks up sites when you didn't
catch anything. If that never happens, you don't need
Table1 at all.

I think you need something more like:

SELECT Table1.site,
Species.spp,
Avg(Nz(Table2.[number],0)) AS [avg]
FROM Species
LEFT JOIN (Table1
LEFT JOIN Table2
ON Table1.sID = Table2.sID)
ON Species.spp = Table2.spp
GROUP BY Table1.site, Species.spp
--
Marsh
MVP [MS Access]

Thanks for the reply, but I've been down that road and I've been banging my
head against the wall. Maybe I'm confusing a left join and an outer join.

Using the data example and running the following query:
SELECT Table1.site, Table2.spp, Avg(Nz([number],0)) AS [avg]
FROM Table1 LEFT JOIN Table2 ON Table1.sID = Table2.sID
GROUP BY Table1.site, Table2.spp;

I get the following results:
site spp avg
1 a 2.66
1 b 2
1 c 1.5
1 d 1
1 e 2

What am I missing? []
 
Marshall - Thanks for the response, but still not working.

Access would not let me perform a multiple left join so i created a query to
do the inner left join (tables 1 & 2) and a second query that joins the
Species table to that query. so:

Query join12
SELECT Table1.*, Table2.*
FROM Table1 LEFT JOIN Table2 ON Table1.sID=Table2.sID;

SELECT join12.site, Species.spp, Avg(Nz(join12.number,0)) AS [avg]
FROM Species LEFT JOIN join12
ON Species.sppID = join12.spp <--NOTE: spp is a lookup to sppID -->
GROUP BY join12.site, Species.spp

when I do this, I still get the inappropriate answers. For example, avg for
spp a equals 2.67 (8/3) rather than what it should be, which is 2 (8/4).

Any other ideas??

Marshall Barton said:
I think you are missing a table with a complete list of
species. Joining Table1 only picks up sites when you didn't
catch anything. If that never happens, you don't need
Table1 at all.

I think you need something more like:

SELECT Table1.site,
Species.spp,
Avg(Nz(Table2.[number],0)) AS [avg]
FROM Species
LEFT JOIN (Table1
LEFT JOIN Table2
ON Table1.sID = Table2.sID)
ON Species.spp = Table2.spp
GROUP BY Table1.site, Species.spp
--
Marsh
MVP [MS Access]

Thanks for the reply, but I've been down that road and I've been banging my
head against the wall. Maybe I'm confusing a left join and an outer join.

Using the data example and running the following query:
SELECT Table1.site, Table2.spp, Avg(Nz([number],0)) AS [avg]
FROM Table1 LEFT JOIN Table2 ON Table1.sID = Table2.sID
GROUP BY Table1.site, Table2.spp;

I get the following results:
site spp avg
1 a 2.66
1 b 2
1 c 1.5
1 d 1
1 e 2

What am I missing? []
I need to calculate an appropriate mean and variance when there are
"missing"
data. Let me explain with a dataset from some fisheries sampling we
conducted:
I have two tables and a one to many relationship.

Table 1 is the parent record and is the sampling event table. It records
the
sample ID, date, site & replicate #. Table 2 is the child and contains the
information of what I caught in the sample by species and number

A simple example view of the Data might look like the below.

Table 1
sID date site rep
1 1/1/05 1 1
2 1/1/05 1 2
3 1/2/05 1 3
4 1/2/05 1 4

Table 2
sID spp number
1 a 1
1 b 2
1 c 1
2 a 5
2 c 3
2 d 1
3 b 2
3 c 1
3 d 1
4 a 2
4 c 1
4 e 2

Thus, average catch rates for the 5 species caught by site would be:
a = (1+5+0+2)/4 = 2.0
b = (2+0+2+0)/4 = 1.0
c = (1+3+1+1)/4 = 1.5
d = (0+1+1+0)/4 = 0.5
e = (0+0+0+2)/4 = 0.5
 
Access does not have a problem with using two left joins,
you must have done something odd.

Ahhh, now I see where I missed it. I wasn't joining the
species table to each site so the left join was happy to
find a specie at any site.

I think you will need to create a virtual table that
contains all sites and all species. Assuming you also have
a Sites table with every site, you can use a query to create
the virtual table.

query SitesSpecies:
SELECT Sites.site, Species.spp
FROM Sites, Species

I still don't see where table1 has anything to do with this
so I will leave that out.

SELECT SitesSpecies.site,
SitesSpecies.spp,
Avg(Nz(Table2.[number],0)) AS [avg]
FROM SitesSpecies
LEFT JOIN Table2
ON SitesSpecies.spp = Table2.spp
GROUP BY SitesSpecies.site, SitesSpecies.spp

Hopefully, I haven't overlooked another critical point in
ths situation.
--
Marsh
MVP [MS Access]

Access would not let me perform a multiple left join so i created a query to
do the inner left join (tables 1 & 2) and a second query that joins the
Species table to that query. so:

Query join12
SELECT Table1.*, Table2.*
FROM Table1 LEFT JOIN Table2 ON Table1.sID=Table2.sID;

SELECT join12.site, Species.spp, Avg(Nz(join12.number,0)) AS [avg]
FROM Species LEFT JOIN join12
ON Species.sppID = join12.spp <--NOTE: spp is a lookup to sppID -->
GROUP BY join12.site, Species.spp

when I do this, I still get the inappropriate answers. For example, avg for
spp a equals 2.67 (8/3) rather than what it should be, which is 2 (8/4).

Any other ideas??

Marshall Barton said:
I think you are missing a table with a complete list of
species. Joining Table1 only picks up sites when you didn't
catch anything. If that never happens, you don't need
Table1 at all.

I think you need something more like:

SELECT Table1.site,
Species.spp,
Avg(Nz(Table2.[number],0)) AS [avg]
FROM Species
LEFT JOIN (Table1
LEFT JOIN Table2
ON Table1.sID = Table2.sID)
ON Species.spp = Table2.spp
GROUP BY Table1.site, Species.spp

Thanks for the reply, but I've been down that road and I've been banging my
head against the wall. Maybe I'm confusing a left join and an outer join.

Using the data example and running the following query:
SELECT Table1.site, Table2.spp, Avg(Nz([number],0)) AS [avg]
FROM Table1 LEFT JOIN Table2 ON Table1.sID = Table2.sID
GROUP BY Table1.site, Table2.spp;

I get the following results:
site spp avg
1 a 2.66
1 b 2
1 c 1.5
1 d 1
1 e 2

What am I missing? []
I need to calculate an appropriate mean and variance when there are
"missing"
data. Let me explain with a dataset from some fisheries sampling we
conducted:
I have two tables and a one to many relationship.

Table 1 is the parent record and is the sampling event table. It records
the
sample ID, date, site & replicate #. Table 2 is the child and contains the
information of what I caught in the sample by species and number

A simple example view of the Data might look like the below.

Table 1
sID date site rep
1 1/1/05 1 1
2 1/1/05 1 2
3 1/2/05 1 3
4 1/2/05 1 4

Table 2
sID spp number
1 a 1
1 b 2
1 c 1
2 a 5
2 c 3
2 d 1
3 b 2
3 c 1
3 d 1
4 a 2
4 c 1
4 e 2

Thus, average catch rates for the 5 species caught by site would be:
a = (1+5+0+2)/4 = 2.0
b = (2+0+2+0)/4 = 1.0
c = (1+3+1+1)/4 = 1.5
d = (0+1+1+0)/4 = 0.5
e = (0+0+0+2)/4 = 0.5
 
Back
Top