census database design

G

Guest

Hello
I need to store information regarding race wise population of
every county of my state year wise, which is nothing but census data. Here
the data is categorized basing on different age groups. After thinking about
a lot of options for storing this kind of data, came up with a table(s)
design as follows.

Pop_RaceAndAgeGrpWiseFrm2001C1 - this table stores data for county one. here
the ctyDistrictID is 1.

the fields are: Pop_RaceAndAgeGrpWiseFrm2001C1.EntryID,
Pop_RaceAndAgeGrpWiseFrm2001C1.ctyDistrictID,
Pop_RaceAndAgeGrpWiseFrm2001C1.RaceName,
Pop_RaceAndAgeGrpWiseFrm2001C1.statsYear,
Pop_RaceAndAgeGrpWiseFrm2001C1.Under5Years,
Pop_RaceAndAgeGrpWiseFrm2001C1.5to9Years,
Pop_RaceAndAgeGrpWiseFrm2001C1.9to14Years...
Pop_RaceAndAgeGrpWiseFrm2001C1.85AndAbove

there are 33 counties in our state and have 33 different tables whose
structure looks same as the above one with
'Pop_RaceAndAgeGrpWiseTblC1.ctyDistrictID' being different for each county.
There is a 'ctyDistrictdata' table which stores the information relating to
the counties like its name, ID and region etc... After storing the census
data into all those different county tables year wise, everything looks fine.
We have the relationships as, the 'ctyDistrictID' field from all the 33
tables are linked to 'ctyDistrictdata' table onto the same ID(field).

But when a query is written to pull the 'Under5Years' data for a particular
race and for two specified years from 5 different counties(tables) by linking
them, the result has so many duplicates in it. But the expected result should
only have 2 tuples: first for year one and second for year two. The query
looks as follows:

SELECT DISTINCT Pop_RaceAndAgegrpwiseFrm2001C1.statsYear,
Pop_RaceAndAgegrpwiseFrm2001C1.RaceName,
Pop_RaceAndAgegrpwiseFrm2001C1.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C2.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C3.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C4.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C5.[under 5 years]
FROM Pop_RaceAndAgegrpwiseFrm2001C1, Pop_RaceAndAgegrpwiseFrm2001C2,
Pop_RaceAndAgegrpwiseFrm2001C3, Pop_RaceAndAgegrpwiseFrm2001C4,
Pop_RaceAndAgegrpwiseFrm2001C5
WHERE (Pop_RaceAndAgegrpwiseFrm2001C1.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C2.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C3.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C4.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C5.RaceName="White");

I know there should be something wrong in the query which is written above
or in the way table join is performed. It might be that the design of the
tables itself for storing the census data is in a bad shape. Even after
looking at things, nothing is working out good. Which way the above kind of
census data can be stored in a database with a better design. Please assist
me in this.

Thanks for any help or suggestions provided.
 
T

tina

you're making a common newbie mistake in table design: storing data in
table names (counties) and field names (age groups). you should have one
table for the counties (not one table for each county), and one record for
each age group. i would probably use the following table design, as

tblDistricts (your ctyDistrictdata table)
DistrictID (primary key)
CountyName
Region
(it's not clear if a district may have multiple counties in it, or vice
versa, or if a region may have multiple counties, or vice versa. so this
table design may or may not be correct.)

tblRaces
RaceID (pk)
RaceName

tblAgeGroups
GroupID (pk)
GroupDescription (field should be Text data type)
(one record for each defined age group: "Under 5 years", "5 to 9", etc)

tblCensus
CensusID (pk)
DistrictID (foreign key from tblDistricts)
RaceID (fk from tblRaces)
CensusYear
GroupID (fk from tblAgeGroups)
CensusCount

so if, in 2004, one county district had a count of people for each of five
races in five age groups, then tblCensus would have 25 records for that
data - one record for the number of persons in each age group of each race.

with the table setup outlined above, you can "slice and dice" the data just
about any way you want, for statistical analysis. for example, to get the
query results you described in your post, you would set criteria for the
counties you want to see, the race, the specific years, and the age group -
all on tblCensus, only. you can easily use a Totals query to present that
data as two records, one for each year, with a sum of the person count for
each year.

the ironclad rule of database design is: FIRST, define the
tables/relationships to correctly model the process, according to proven
normalization guidelines; build the tables and set the relationships
according to that model. then, and ONLY then, begin to build your queries,
form, reports on top of your table structure. to learn the basics of
relational data modeling, one good text is Database Design for Mere Mortals
by Michael Hernandez. also, the following webpage has many useful links to
help you:
http://www.ltcomputerdesigns.com/JCReferences.html

hth


talktobatchu said:
Hello
I need to store information regarding race wise population of
every county of my state year wise, which is nothing but census data. Here
the data is categorized basing on different age groups. After thinking about
a lot of options for storing this kind of data, came up with a table(s)
design as follows.

Pop_RaceAndAgeGrpWiseFrm2001C1 - this table stores data for county one. here
the ctyDistrictID is 1.

the fields are: Pop_RaceAndAgeGrpWiseFrm2001C1.EntryID,
Pop_RaceAndAgeGrpWiseFrm2001C1.ctyDistrictID,
Pop_RaceAndAgeGrpWiseFrm2001C1.RaceName,
Pop_RaceAndAgeGrpWiseFrm2001C1.statsYear,
Pop_RaceAndAgeGrpWiseFrm2001C1.Under5Years,
Pop_RaceAndAgeGrpWiseFrm2001C1.5to9Years,
Pop_RaceAndAgeGrpWiseFrm2001C1.9to14Years...
Pop_RaceAndAgeGrpWiseFrm2001C1.85AndAbove

there are 33 counties in our state and have 33 different tables whose
structure looks same as the above one with
'Pop_RaceAndAgeGrpWiseTblC1.ctyDistrictID' being different for each county.
There is a 'ctyDistrictdata' table which stores the information relating to
the counties like its name, ID and region etc... After storing the census
data into all those different county tables year wise, everything looks fine.
We have the relationships as, the 'ctyDistrictID' field from all the 33
tables are linked to 'ctyDistrictdata' table onto the same ID(field).

But when a query is written to pull the 'Under5Years' data for a particular
race and for two specified years from 5 different counties(tables) by linking
them, the result has so many duplicates in it. But the expected result should
only have 2 tuples: first for year one and second for year two. The query
looks as follows:

SELECT DISTINCT Pop_RaceAndAgegrpwiseFrm2001C1.statsYear,
Pop_RaceAndAgegrpwiseFrm2001C1.RaceName,
Pop_RaceAndAgegrpwiseFrm2001C1.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C2.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C3.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C4.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C5.[under 5 years]
FROM Pop_RaceAndAgegrpwiseFrm2001C1, Pop_RaceAndAgegrpwiseFrm2001C2,
Pop_RaceAndAgegrpwiseFrm2001C3, Pop_RaceAndAgegrpwiseFrm2001C4,
Pop_RaceAndAgegrpwiseFrm2001C5
WHERE (Pop_RaceAndAgegrpwiseFrm2001C1.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C2.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C3.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C4.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C5.RaceName="White");

I know there should be something wrong in the query which is written above
or in the way table join is performed. It might be that the design of the
tables itself for storing the census data is in a bad shape. Even after
looking at things, nothing is working out good. Which way the above kind of
census data can be stored in a database with a better design. Please assist
me in this.

Thanks for any help or suggestions provided.
 
G

Guest

First of all thanks for the suggestion. The database design, mean the table
structure each having its own specific data sounds perfect to me. I'm sure
this works fine for my census data.

Will implement it right now to look into the real scenario(s).

tina said:
you're making a common newbie mistake in table design: storing data in
table names (counties) and field names (age groups). you should have one
table for the counties (not one table for each county), and one record for
each age group. i would probably use the following table design, as

tblDistricts (your ctyDistrictdata table)
DistrictID (primary key)
CountyName
Region
(it's not clear if a district may have multiple counties in it, or vice
versa, or if a region may have multiple counties, or vice versa. so this
table design may or may not be correct.)

tblRaces
RaceID (pk)
RaceName

tblAgeGroups
GroupID (pk)
GroupDescription (field should be Text data type)
(one record for each defined age group: "Under 5 years", "5 to 9", etc)

tblCensus
CensusID (pk)
DistrictID (foreign key from tblDistricts)
RaceID (fk from tblRaces)
CensusYear
GroupID (fk from tblAgeGroups)
CensusCount

so if, in 2004, one county district had a count of people for each of five
races in five age groups, then tblCensus would have 25 records for that
data - one record for the number of persons in each age group of each race.

with the table setup outlined above, you can "slice and dice" the data just
about any way you want, for statistical analysis. for example, to get the
query results you described in your post, you would set criteria for the
counties you want to see, the race, the specific years, and the age group -
all on tblCensus, only. you can easily use a Totals query to present that
data as two records, one for each year, with a sum of the person count for
each year.

the ironclad rule of database design is: FIRST, define the
tables/relationships to correctly model the process, according to proven
normalization guidelines; build the tables and set the relationships
according to that model. then, and ONLY then, begin to build your queries,
form, reports on top of your table structure. to learn the basics of
relational data modeling, one good text is Database Design for Mere Mortals
by Michael Hernandez. also, the following webpage has many useful links to
help you:
http://www.ltcomputerdesigns.com/JCReferences.html

hth


talktobatchu said:
Hello
I need to store information regarding race wise population of
every county of my state year wise, which is nothing but census data. Here
the data is categorized basing on different age groups. After thinking about
a lot of options for storing this kind of data, came up with a table(s)
design as follows.

Pop_RaceAndAgeGrpWiseFrm2001C1 - this table stores data for county one. here
the ctyDistrictID is 1.

the fields are: Pop_RaceAndAgeGrpWiseFrm2001C1.EntryID,
Pop_RaceAndAgeGrpWiseFrm2001C1.ctyDistrictID,
Pop_RaceAndAgeGrpWiseFrm2001C1.RaceName,
Pop_RaceAndAgeGrpWiseFrm2001C1.statsYear,
Pop_RaceAndAgeGrpWiseFrm2001C1.Under5Years,
Pop_RaceAndAgeGrpWiseFrm2001C1.5to9Years,
Pop_RaceAndAgeGrpWiseFrm2001C1.9to14Years...
Pop_RaceAndAgeGrpWiseFrm2001C1.85AndAbove

there are 33 counties in our state and have 33 different tables whose
structure looks same as the above one with
'Pop_RaceAndAgeGrpWiseTblC1.ctyDistrictID' being different for each county.
There is a 'ctyDistrictdata' table which stores the information relating to
the counties like its name, ID and region etc... After storing the census
data into all those different county tables year wise, everything looks fine.
We have the relationships as, the 'ctyDistrictID' field from all the 33
tables are linked to 'ctyDistrictdata' table onto the same ID(field).

But when a query is written to pull the 'Under5Years' data for a particular
race and for two specified years from 5 different counties(tables) by linking
them, the result has so many duplicates in it. But the expected result should
only have 2 tuples: first for year one and second for year two. The query
looks as follows:

SELECT DISTINCT Pop_RaceAndAgegrpwiseFrm2001C1.statsYear,
Pop_RaceAndAgegrpwiseFrm2001C1.RaceName,
Pop_RaceAndAgegrpwiseFrm2001C1.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C2.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C3.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C4.[under 5 years],
Pop_RaceAndAgegrpwiseFrm2001C5.[under 5 years]
FROM Pop_RaceAndAgegrpwiseFrm2001C1, Pop_RaceAndAgegrpwiseFrm2001C2,
Pop_RaceAndAgegrpwiseFrm2001C3, Pop_RaceAndAgegrpwiseFrm2001C4,
Pop_RaceAndAgegrpwiseFrm2001C5
WHERE (Pop_RaceAndAgegrpwiseFrm2001C1.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C2.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C3.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C4.RaceName="White" And
Pop_RaceAndAgegrpwiseFrm2001C5.RaceName="White");

I know there should be something wrong in the query which is written above
or in the way table join is performed. It might be that the design of the
tables itself for storing the census data is in a bad shape. Even after
looking at things, nothing is working out good. Which way the above kind of
census data can be stored in a database with a better design. Please assist
me in this.

Thanks for any help or suggestions provided.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top