Nasty Normalization for Newbie

G

Guest

I have a nasty normalization problem. I have inherited an excel dataset that would be better managed in access. The data looks like this:

Plot Year Spp1 Spp2 .... Spp100
1 1997 0 2 50
1 1998 10 0 30
2 1997 0 5 15
2 1998 5 5 25

I have 100 Plant species names as field names (or columns) - not my design!! and for each a number representing abundance for a particular plot and year (approximately 1500 records)

I need the data to look like something this - 4 fields (plot, year, species, abundance)

Plot Year Species Abundance
1 1997 Spp1 0
1 1997 Spp2 2
...
1 1997 Spp100 50
2 1997 Spp1 0
2 1997 Spp2 5
...
2 1997 Spp100 15


1 1998 Spp1 10
1 1998 Spp2 0
...
1 1998 Spp100 30
2 1997 Spp1 0
2 1997 Spp2 5
...
2 1997 Spp100 15



I believe I need to normalize this data but an unsure about how to do it. Is there a simple solution as I am an Access Newbie. I have tried to create a new table with a cross tab but am not getting what I want.
 
D

Douglas J. Steele

You can use a series of UNION queries.

SELECT Plot, Year, "Spp1" AS Species, Spp1 AS Abundance FROM ImportedTable
UNION
SELECT Plot, Year, "Spp2" AS Species, Spp2 AS Abundance FROM ImportedTable
UNION
SELECT Plot, Year, "Spp3" AS Species, Spp3 AS Abundance FROM ImportedTable
etc

Unfortunately, I don't believe you'll be able to have all fifty statements
in one query.

Create queries like that, then create Append queries based on those queries
to save the data in a properly normalized table.

--
Doug Steele, Microsoft Access MVP

(no e-mails, please!)



Steve Hills said:
I have a nasty normalization problem. I have inherited an excel dataset
that would be better managed in access. The data looks like this:
Plot Year Spp1 Spp2 .... Spp100
1 1997 0 2 50
1 1998 10 0 30
2 1997 0 5 15
2 1998 5 5 25

I have 100 Plant species names as field names (or columns) - not my
design!! and for each a number representing abundance for a particular plot
and year (approximately 1500 records)
I need the data to look like something this - 4 fields (plot, year, species, abundance)

Plot Year Species Abundance
1 1997 Spp1 0
1 1997 Spp2 2
...
1 1997 Spp100 50
2 1997 Spp1 0
2 1997 Spp2 5
...
2 1997 Spp100 15


1 1998 Spp1 10
1 1998 Spp2 0
...
1 1998 Spp100 30
2 1997 Spp1 0
2 1997 Spp2 5
...
2 1997 Spp100 15



I believe I need to normalize this data but an unsure about how to do it.
Is there a simple solution as I am an Access Newbie. I have tried to create
a new table with a cross tab but am not getting what I want.
 
S

Sandra Daigle

Hi Steve,

You are right on track with your new table design. Normalizing the data
isn't going to be too bad - basically you just need to run a series of
queries which take the data from the imported table (from the spreadsheet) 1
column at a time. Create One row of data for each existing row and use the
column name as the value of the Species Field. No problem!

Here's the SQL for one such query:

INSERT INTO
tblNormalized ( Plot, [Year], Abundance, Spp )
SELECT tblUnNormalized.Plot, tblUnNormalized.YEar,
tblUnNormalized.spp3,
"spp3" AS Expr1 FROM tblUnNormalized;

It's a little easier if you automate it - here's some code that you can use
as a starting point - basically it just goes through the fields and uses
those with names that start with "SPP" as data to build the Insert query for
that column of data:

Sub NormalizeData()
Dim db As dao.Database
Dim tdf As dao.TableDef
Dim fld As dao.Field
Dim strSQL As String
Set db = CurrentDb()
Set tdf = db.TableDefs("tblUnNormalized")
For Each fld In tdf.Fields
If Left(fld.Name, 3) = "SPP" Then
strSQL = "INSERT INTO tblNormalized " _
& "( Plot, [Year], Abundance, Spp ) " _
& "SELECT tblUnNormalized.Plot, " _
& "tblUnNormalized.YEar, " _
& "tblUnNormalized." & fld.Name & "," _
& """" & fld.Name & """ AS Expr1 FROM tblUnNormalized;"
db.Execute strSQL
End If
Next fld
Set fld = Nothing
Set tdf = Nothing
Set db = Nothing
End Sub
 
S

Sandra Daigle

No problem, in the 'if' clause that tests whether to include the column,
just swap the logic to *exclude* the other non-data columns.

Instead of:

If Left(fld.Name, 3) = "SPP" Then

Use something like:

if fld.name <>"Plot" and _
fld.name <>"Year" then

If there are other columns that should be excluded then just add them into
the above, separated by And's

--
Sandra Daigle [Microsoft Access MVP]
Please post all replies to the newsgroup.


Steve said:
My laziness has caught up with me ! While in fact I used Spp 1...
Spp100 as column or field names they are actually a portion of the
Latin name for the species and there is no continuity between them.

An example of theses 7 character names are like : astecil, fravir,
petapal, sanimar, prunpen, rosawoo, smilste ... etc.

Since each field name is different can I use some kind of variable
to call the actual latin name (field name) and use that ?

Sandra Daigle said:
Hi Steve,

You are right on track with your new table design. Normalizing the
data isn't going to be too bad - basically you just need to run a
series of queries which take the data from the imported table (from
the spreadsheet) 1 column at a time. Create One row of data for each
existing row and use the column name as the value of the Species
Field. No problem!

Here's the SQL for one such query:

INSERT INTO
tblNormalized ( Plot, [Year], Abundance, Spp )
SELECT tblUnNormalized.Plot, tblUnNormalized.YEar,
tblUnNormalized.spp3,
"spp3" AS Expr1 FROM tblUnNormalized;

It's a little easier if you automate it - here's some code that you
can use as a starting point - basically it just goes through the
fields and uses those with names that start with "SPP" as data to
build the Insert query for that column of data:

Sub NormalizeData()
Dim db As dao.Database
Dim tdf As dao.TableDef
Dim fld As dao.Field
Dim strSQL As String
Set db = CurrentDb()
Set tdf = db.TableDefs("tblUnNormalized")
For Each fld In tdf.Fields
If Left(fld.Name, 3) = "SPP" Then
strSQL = "INSERT INTO tblNormalized " _
& "( Plot, [Year], Abundance, Spp ) " _
& "SELECT tblUnNormalized.Plot, " _
& "tblUnNormalized.YEar, " _
& "tblUnNormalized." & fld.Name & "," _
& """" & fld.Name & """ AS Expr1 FROM tblUnNormalized;"
db.Execute strSQL
End If
Next fld
Set fld = Nothing
Set tdf = Nothing
Set db = Nothing
End Sub



--
Sandra Daigle
[Microsoft Access MVP]
For the benefit of others please post all replies to this newsgroup.

Steve said:
I have a nasty normalization problem. I have inherited an excel
dataset that would be better managed in access. The data looks
like this:

Plot Year Spp1 Spp2 .... Spp100
1 1997 0 2 50
1 1998 10 0 30
2 1997 0 5 15
2 1998 5 5 25

I have 100 Plant species names as field names (or columns) - not my
design!! and for each a number representing abundance for a
particular plot and year (approximately 1500 records)

I need the data to look like something this - 4 fields (plot, year,
species, abundance)

Plot Year Species Abundance
1 1997 Spp1 0
1 1997 Spp2 2
...
1 1997 Spp100 50
2 1997 Spp1 0
2 1997 Spp2 5
...
2 1997 Spp100 15


1 1998 Spp1 10
1 1998 Spp2 0
...
1 1998 Spp100 30
2 1997 Spp1 0
2 1997 Spp2 5
...
2 1997 Spp100 15



I believe I need to normalize this data but an unsure about how to
do it. Is there a simple solution as I am an Access Newbie. I have
tried to create a new table with a cross tab but am not getting
what I want.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top