problem calculating an average in a one to many relationship

G

Guest

I need to calculate an appropriate mean and variance when there are "missing"
data. Let me explain with a dataset from some fisheries sampling we
conducted:
I have two tables and a one to many relationship.

Table 1 is the parent record and is the sampling event table. It records the
sample ID, date, site & replicate #. Table 2 is the child and contains the
information of what I caught in the sample by species and number

A simple example view of the Data might look like the below.

Table 1
sID date site rep
1 1/1/05 1 1
2 1/1/05 1 2
3 1/2/05 1 3
4 1/2/05 1 4

Table 2
sID spp number
1 a 1
1 b 2
1 c 1
2 a 5
2 c 3
2 d 1
3 b 2
3 c 1
3 d 1
4 a 2
4 c 1
4 e 2

Thus, average catch rates for the 5 species caught by site would be:
a = (1+5+0+2)/4 = 2.0
b = (2+0+2+0)/4 = 1.0
c = (1+3+1+1)/4 = 1.5
d = (0+1+1+0)/4 = 0.5
e = (0+0+0+2)/4 = 0.5

Question is can I either generate a query that inserts the needed zero's or
write a function to derive the mean and sd for each spp by site. Preferrably,
this could be called "on the fly" in a Pivot chart so I can look at mean
catch rates of multiple species simultaneously over time.

Any guidance or pointers in the right direction would be greatly appreciated.
 
G

Guest

There is no need to insert zeros unless you are relying on them for count to
divide by. You can use a TOTALs query to find MAX sID to be used as divisor.
 
G

Guest

Thanks for the reply, but I've been down that road and I've been banging my
head against the wall. Maybe I'm confusing a left join and an outer join.

Using the data example and running the following query:
SELECT Table1.site, Table2.spp, Avg(Nz([number],0)) AS [avg]
FROM Table1 LEFT JOIN Table2 ON Table1.sID = Table2.sID
GROUP BY Table1.site, Table2.spp;

I get the following results:
site spp avg
1 a 2.66
1 b 2
1 c 1.5
1 d 1
1 e 2

What am I missing?

Thanks also for the tip on Total Access Statistics. I will sheck out the
trial version.

Don
**********************************************
 
M

Marshall Barton

I think you are missing a table with a complete list of
species. Joining Table1 only picks up sites when you didn't
catch anything. If that never happens, you don't need
Table1 at all.

I think you need something more like:

SELECT Table1.site,
Species.spp,
Avg(Nz(Table2.[number],0)) AS [avg]
FROM Species
LEFT JOIN (Table1
LEFT JOIN Table2
ON Table1.sID = Table2.sID)
ON Species.spp = Table2.spp
GROUP BY Table1.site, Species.spp
--
Marsh
MVP [MS Access]

Thanks for the reply, but I've been down that road and I've been banging my
head against the wall. Maybe I'm confusing a left join and an outer join.

Using the data example and running the following query:
SELECT Table1.site, Table2.spp, Avg(Nz([number],0)) AS [avg]
FROM Table1 LEFT JOIN Table2 ON Table1.sID = Table2.sID
GROUP BY Table1.site, Table2.spp;

I get the following results:
site spp avg
1 a 2.66
1 b 2
1 c 1.5
1 d 1
1 e 2

What am I missing? []
 
G

Guest

Marshall - Thanks for the response, but still not working.

Access would not let me perform a multiple left join so i created a query to
do the inner left join (tables 1 & 2) and a second query that joins the
Species table to that query. so:

Query join12
SELECT Table1.*, Table2.*
FROM Table1 LEFT JOIN Table2 ON Table1.sID=Table2.sID;

SELECT join12.site, Species.spp, Avg(Nz(join12.number,0)) AS [avg]
FROM Species LEFT JOIN join12
ON Species.sppID = join12.spp <--NOTE: spp is a lookup to sppID -->
GROUP BY join12.site, Species.spp

when I do this, I still get the inappropriate answers. For example, avg for
spp a equals 2.67 (8/3) rather than what it should be, which is 2 (8/4).

Any other ideas??

Marshall Barton said:
I think you are missing a table with a complete list of
species. Joining Table1 only picks up sites when you didn't
catch anything. If that never happens, you don't need
Table1 at all.

I think you need something more like:

SELECT Table1.site,
Species.spp,
Avg(Nz(Table2.[number],0)) AS [avg]
FROM Species
LEFT JOIN (Table1
LEFT JOIN Table2
ON Table1.sID = Table2.sID)
ON Species.spp = Table2.spp
GROUP BY Table1.site, Species.spp
--
Marsh
MVP [MS Access]

Thanks for the reply, but I've been down that road and I've been banging my
head against the wall. Maybe I'm confusing a left join and an outer join.

Using the data example and running the following query:
SELECT Table1.site, Table2.spp, Avg(Nz([number],0)) AS [avg]
FROM Table1 LEFT JOIN Table2 ON Table1.sID = Table2.sID
GROUP BY Table1.site, Table2.spp;

I get the following results:
site spp avg
1 a 2.66
1 b 2
1 c 1.5
1 d 1
1 e 2

What am I missing? []
I need to calculate an appropriate mean and variance when there are
"missing"
data. Let me explain with a dataset from some fisheries sampling we
conducted:
I have two tables and a one to many relationship.

Table 1 is the parent record and is the sampling event table. It records
the
sample ID, date, site & replicate #. Table 2 is the child and contains the
information of what I caught in the sample by species and number

A simple example view of the Data might look like the below.

Table 1
sID date site rep
1 1/1/05 1 1
2 1/1/05 1 2
3 1/2/05 1 3
4 1/2/05 1 4

Table 2
sID spp number
1 a 1
1 b 2
1 c 1
2 a 5
2 c 3
2 d 1
3 b 2
3 c 1
3 d 1
4 a 2
4 c 1
4 e 2

Thus, average catch rates for the 5 species caught by site would be:
a = (1+5+0+2)/4 = 2.0
b = (2+0+2+0)/4 = 1.0
c = (1+3+1+1)/4 = 1.5
d = (0+1+1+0)/4 = 0.5
e = (0+0+0+2)/4 = 0.5
 
M

Marshall Barton

Access does not have a problem with using two left joins,
you must have done something odd.

Ahhh, now I see where I missed it. I wasn't joining the
species table to each site so the left join was happy to
find a specie at any site.

I think you will need to create a virtual table that
contains all sites and all species. Assuming you also have
a Sites table with every site, you can use a query to create
the virtual table.

query SitesSpecies:
SELECT Sites.site, Species.spp
FROM Sites, Species

I still don't see where table1 has anything to do with this
so I will leave that out.

SELECT SitesSpecies.site,
SitesSpecies.spp,
Avg(Nz(Table2.[number],0)) AS [avg]
FROM SitesSpecies
LEFT JOIN Table2
ON SitesSpecies.spp = Table2.spp
GROUP BY SitesSpecies.site, SitesSpecies.spp

Hopefully, I haven't overlooked another critical point in
ths situation.
--
Marsh
MVP [MS Access]

Access would not let me perform a multiple left join so i created a query to
do the inner left join (tables 1 & 2) and a second query that joins the
Species table to that query. so:

Query join12
SELECT Table1.*, Table2.*
FROM Table1 LEFT JOIN Table2 ON Table1.sID=Table2.sID;

SELECT join12.site, Species.spp, Avg(Nz(join12.number,0)) AS [avg]
FROM Species LEFT JOIN join12
ON Species.sppID = join12.spp <--NOTE: spp is a lookup to sppID -->
GROUP BY join12.site, Species.spp

when I do this, I still get the inappropriate answers. For example, avg for
spp a equals 2.67 (8/3) rather than what it should be, which is 2 (8/4).

Any other ideas??

Marshall Barton said:
I think you are missing a table with a complete list of
species. Joining Table1 only picks up sites when you didn't
catch anything. If that never happens, you don't need
Table1 at all.

I think you need something more like:

SELECT Table1.site,
Species.spp,
Avg(Nz(Table2.[number],0)) AS [avg]
FROM Species
LEFT JOIN (Table1
LEFT JOIN Table2
ON Table1.sID = Table2.sID)
ON Species.spp = Table2.spp
GROUP BY Table1.site, Species.spp

Thanks for the reply, but I've been down that road and I've been banging my
head against the wall. Maybe I'm confusing a left join and an outer join.

Using the data example and running the following query:
SELECT Table1.site, Table2.spp, Avg(Nz([number],0)) AS [avg]
FROM Table1 LEFT JOIN Table2 ON Table1.sID = Table2.sID
GROUP BY Table1.site, Table2.spp;

I get the following results:
site spp avg
1 a 2.66
1 b 2
1 c 1.5
1 d 1
1 e 2

What am I missing? []
I need to calculate an appropriate mean and variance when there are
"missing"
data. Let me explain with a dataset from some fisheries sampling we
conducted:
I have two tables and a one to many relationship.

Table 1 is the parent record and is the sampling event table. It records
the
sample ID, date, site & replicate #. Table 2 is the child and contains the
information of what I caught in the sample by species and number

A simple example view of the Data might look like the below.

Table 1
sID date site rep
1 1/1/05 1 1
2 1/1/05 1 2
3 1/2/05 1 3
4 1/2/05 1 4

Table 2
sID spp number
1 a 1
1 b 2
1 c 1
2 a 5
2 c 3
2 d 1
3 b 2
3 c 1
3 d 1
4 a 2
4 c 1
4 e 2

Thus, average catch rates for the 5 species caught by site would be:
a = (1+5+0+2)/4 = 2.0
b = (2+0+2+0)/4 = 1.0
c = (1+3+1+1)/4 = 1.5
d = (0+1+1+0)/4 = 0.5
e = (0+0+0+2)/4 = 0.5
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top