comparing lists

T Magritte · Sep 9, 2008

Hi,

I'm new to Access. I set up a database consisting of a master ID table
with details about each ID, and several other tables which are also
indexed by the same ID as in the master ID table (the IDs are unique).
Each of these other tables contains a probability column for each ID,
where the probability for each ID can be different for each table.

What I want to do is filter for IDs that have a p value below a
certain cutoff (say 0.05) then compare the subset of remaining IDs
between 2 or more tables. So, for instance, I'd like a list and count
of all IDs with p <= 0.05 in EITHER table 1 and 2, a list and count of
all IDs with p <= 0.05 in BOTH table 1 and 2, in table 1 but NOT in
table 2, etc...

Is there a relatively easy way to do this?

Thanks!

Lord Kelvan · Sep 9, 2008

your question makes no sence sorry

what are the ID's are these your primary keys and if so are you
linking all tables to this "master table"

what is p and what is .05

regards
Kelvan

T Magritte · Sep 9, 2008

your question makes no sence sorry

what are the ID's are these your primary keys and if so are you
linking all tables to this "master table"

The ID is a unique integer identifier. The master table will contain
all the IDs but each other table may contain only a subset. The ID is
the primary key for all tables. I'm not sure how I'm supposed to
associate all the tables. Do I just link each table through the ID
field to the master table or do they have to be associated with each
other also somehow?

what is p and what is .05

p is the result of a statistical test on each element identified by
the ID. So p is the probability that the differences seen for that
element are due to chance. Each table represent either a different
experiment or a different statistical analysis of an experiment. p
will be a floating point number with a value between 0 and 1. The
closer to 0, the more statistically significant the result. So first I
need to apply a cutoff for statistical significance (in this example,
0.05 or less than 5% probability of a false difference).

So for each table I'll want to generate a list of all IDs which have
an associated p <= 0.05. For these new lists, I also want to be able
to compare the IDs on each new list to those from other tables so I
can see which IDs occur in multiple tables, which occur in some tables
but not others, etc.

Is that any clearer?

Thanks!

T Magritte · Sep 9, 2008

your question makes no sence sorry

what are the ID's are these your primary keys and if so are you
linking all tables to this "master table"

The ID is a unique integer identifier. The master table will contain
all the IDs but each other table may contain only a subset. The ID is
the primary key for all tables. I'm not sure how I'm supposed to
associate all the tables. Do I just link each table through the ID
field to the master table or do they have to be associated with each
other also somehow?

what is p and what is .05

p is the result of a statistical test on each element identified by
the ID. So p is the probability that the differences seen for that
element are due to chance. Each table represent either a different
experiment or a different statistical analysis of an experiment. p
will be a floating point number with a value between 0 and 1. The
closer to 0, the more statistically significant the result. So first I
need to apply a cutoff for statistical significance (in this example,
0.05 or less than 5% probability of a false difference).

So for each table I'll want to generate a list of all IDs which have
an associated p <= 0.05. For these new lists, I also want to be able
to compare the IDs on each new list to those from other tables so I
can see which IDs occur in multiple tables, which occur in some tables
but not others, etc.

Is that any clearer?

Thanks!

Michel Walsh · Sep 9, 2008

p<=0.05 in either table

(in SQL view):

SELECT id FROM table1 WHERE p<=0.05
UNION
SELECT id FROM table2 WHRE p<=0.05

p<=0.05 in both tables

(in SQL view)

SELECT DISTINCT x.id
FROM (SELECT id FROM table1 WHERE p<=0.05) AS x
INNER JOIN
(SELECT id FROM table2 WHERE p<= 0.05) AS y
ON x.id=y.id

p<=0.05 in table1 but p> 0.5 in table2:

(in SQL view)

SELECT DISTINCT x.id
FROM (SELECT id FROM table1 WHERE p<=0.05) AS x
INNER JOIN
(SELECT id FROM table2 WHERE p> 0.05) AS y
ON x.id=y.id

Vanderghast, Access MVP

Lord Kelvan · Sep 9, 2008

i think the way you have linked your tables up are going to cause
problems but if michels answer works the thats good but i would revist
your structure you should only link tables together if they have an
assocation with one another and the only reason to have a "master"
table with all other tables primary keys as forigen keys woudl be
because all tables can be assocated with all other tables.

for michel's solutiont he first oen wont work you need to do

SELECT id FROM table1 WHERE p<=0.05
UNION ALL
SELECT id FROM table2 WHRE p<=0.05
UNION ALL
SELECT ...

and keep placing select statements for each table but you need union
all if your data is structured how i think it is.

hope this helps

Regards
Kelvan

T Magritte · Sep 10, 2008

Hi,

I haven't tried his solution yet. I was hoping this might be possible
using the graphical query generator but it seems like that's not the
case?

I was trying to avoid going into too much detail, but if it helps:
each ID represents a gene and each table (other than the master ID
table) represents a comparison between different experiments, where
the p value is derived from an ANOVA comparing 2 or more groups
between experiments. The significance of the p value relates to the
likelihood that the expression level for that gene differs between the
2 experiments being compared. The ANOVA is performed for each gene and
therefore there is a unique p value for each ID in each table. The
master ID table contains additional information about each gene,
including gene name(s), function, database accession numbers and other
annotations.

It seems to me that it makes sense to use the ID as master key for
each table as it is unique, and associate this for each table to the
master ID table. Since in all cases, I'm interested in comparing
changes in expression for each gene under different experimental
conditions.

Now, a more general question: does it even make sense to use Access? I
was hoping its GUI would decrease the learning curve. However, if I'm
going to have to use SQL commands to do what I want it to do, is there
any advantage in using Access over MySQL? (I haven't used either
before. I'm much more familiar with Excel, but it's limitations are
becoming quite apparent when working with really large datasets...)

Thanks!

Lord Kelvan · Sep 10, 2008

yeah 1000000000 reasons to use access over mysql ... prollay
1000000000 reasons to use any data base over mysql

basically the basic form of mysql dosnt have data intergary to be
honest it is a data store not a data base. you can do most of what
you want in the gui interface but there will be a few complex things
you have to use sql for and if you ask questions of people on here it
is eaiser for us to give you an sql answer that to try to explain how
to do it in gui.

excel is not designed to store data and shouldnt be used for it it is
a finational or satistical tool used for calculating data.

you may have been able to avoid the use of sql in this case if your
table structure was different.

i may bee reading this wrong but your master table sounds like the
gene table and all of your other tables are your experement tables
you should only have two tables

table1
tblgene
geneid
genename
expermentid
etc

table2
tblexperment
expermentid
p
expermentdate
etc

i may be wrong in that assumption but it is just what it sounds like
from your explination.

Regards
Kelvan

Michel Walsh · Sep 10, 2008

You can do it graphically, it is just longer to explain, that is why I
prefer the SQL text.

There is a catch with something like:

SELECT DISTINCT x.id
FROM (SELECT id FROM table1 WHERE p<=0.05) AS x
INNER JOIN
(SELECT id FROM table2 WHERE p<= 0.05) AS y
ON x.id=y.id

Graphically, you may have to write 3 queries. The first one:

SELECT id FROM table1 WHERE p<=0.05

should be easy to do graphically too: Make a new query, bring the table1,
drag field id in the grid, drag the probability field in the grid and, under
it, add the criteria <=0.05. Save the query, say, with the name
table1_LessThan005.

Make the same, using table2, to create the query table2_LessThan005.

Make a third query, bring the two queries you just made: table1_LessThan005
and query table2_LessThan005. In the upper half, drag the id field of one
table over the other. That makes the JOIN. Next, bring the field id from
one of the query into the grid. In the query property sheet, set the Unique
property to yes. That adds the word DISTINCT in the SQL text which now,
should look like:

SELECT DISTINCT table1_LessThan005.id
FROM table1_LessThan005 INNER JOIN table2_LessThan005
ON table1_LessThan005.id = table2_LessThan005.id

The original code I posted simply defines, in place, the queries, instead of
saving two of them, as we did, here.

You may have to define other queries for p>0.05, as example.

Vanderghast, Access MVP

Hi,

I haven't tried his solution yet. I was hoping this might be possible
using the graphical query generator but it seems like that's not the
case?

I was trying to avoid going into too much detail, but if it helps:
each ID represents a gene and each table (other than the master ID
table) represents a comparison between different experiments, where
the p value is derived from an ANOVA comparing 2 or more groups
between experiments. The significance of the p value relates to the
likelihood that the expression level for that gene differs between the
2 experiments being compared. The ANOVA is performed for each gene and
therefore there is a unique p value for each ID in each table. The
master ID table contains additional information about each gene,
including gene name(s), function, database accession numbers and other
annotations.

It seems to me that it makes sense to use the ID as master key for
each table as it is unique, and associate this for each table to the
master ID table. Since in all cases, I'm interested in comparing
changes in expression for each gene under different experimental
conditions.

Now, a more general question: does it even make sense to use Access? I
was hoping its GUI would decrease the learning curve. However, if I'm
going to have to use SQL commands to do what I want it to do, is there
any advantage in using Access over MySQL? (I haven't used either
before. I'm much more familiar with Excel, but it's limitations are
becoming quite apparent when working with really large datasets...)

Thanks!

Create a distinct Join	3	Jan 7, 2010
Joining 3 Queries That Count	8	May 28, 2010
Need Unique Entries From Across Tables	3	Jul 7, 2009
Linked Tables Append Query(ies)	4	Oct 8, 2009
Multiple Query Report- Really Need HELP!	2	Jan 13, 2010
Count if query matching list	1	Apr 14, 2009
Need to Count by two different criteria in same Totals query	4	May 29, 2009
Query to create multipe total fields	7	Jan 20, 2010

comparing lists

T Magritte

Lord Kelvan

T Magritte

T Magritte

Michel Walsh

Lord Kelvan

T Magritte

Lord Kelvan

Michel Walsh

Ask a Question

Similar Threads