Basic Database Design Question

J

Jabberwocky

I'm setting up a new database with access 97. It will include several
daily extract files (excel format) from several old mainframe
machines.

I have several ideas on how to do this, but none of them address the
repetition of data in fields that could be referenced on another
table. If I make reference tables, It decreases the data size
significantly, but doesn't import the next day's excel file well.

some background:
there are 7000 unique client numbers, names, diagnoses, and other
identifying information (2mb per extract).
Each day I will receive an excel file that is nearly identical: it is
the changes I want to notice -- names added, diagnoses changed. On
another table, I will also receive a similar extract file listing
medication, doses, etc for these patients (and others that will be
filtered out) -- it's a utilization management task.

I have set up a macro to append a single table and include the date of
the dump as a field so that we can look at trends over time. -- this
solution will reach the 1 gb capacity in about 500 days. This is O.K.
if I start fresh each year, but doesn't seem very elegant.

I am considering learning SQL to address the capacity problem. What
i'm most confused about is how to add these daily extract files and
keep the database small.

Please forgive my poor use of terminology. I'm new to access, and
have no formal computer training. I would be grateful for any
suggestion. Books to read, alternate software, etc.
 
R

Roger Carlson

A normalized data model is not ALWAYS the best solution.

Before I go further, my assumption is that you get all of your date from
your mainframe extracts and you *never* add or modify data in the Access
database itself. You're only using Access to query and report the data.

Can you confirm that this is the case?

--
--Roger Carlson
MS Access MVP
Access Database Samples: www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/wa.exe?SUBED1=ACCESS-L
 
J

Jabberwocky

A normalized data model is not ALWAYS the best solution.

Before I go further, my assumption is that you get all of your date from
your mainframe extracts and you *never* add or modify data in the Access
database itself. You're only using Access to query and report the data.

Can you confirm that this is the case?

--
--Roger Carlson
MS Access MVP
Access Database Samples:www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/wa.exe?SUBED1=ACCESS-L

this is true.... i won't even bother to create forms.
 
R

Roger Carlson

Okay then. Here goes.

What you are describing is more of a datawarehouse or datamart (depending on
how you define them). A normalized structure is not necessarily the best
model for this.

The thing you have to remember is that any database project has two
competing requirements: Storage and Performance. Normalizing your tables
will reduce your storage, but will also give you worse performance. Leaving
your tables in a non-normalized structure (ie, with repeated data) will take
more storage, but you will gain performance. The thing to do is balance the
two. Before you can do that, you have to know what they are and why they
are important.

Performance is easy. It's how fact your queries will run and its importance
is also obvious. But Normalization is different. Normalization is the
methodology of splitting your data into multiple tables so that you 1)
minimize redundancy and 2) are able to put all the data back together again.

Normalization has two purposes: 1) reduce storage space by removing
redundant data, and 2) maintaining data integrity. But since you're getting
your data from your mainframe source systems, data integrity is not an
issue. Data integrity is being maintained by the source systems. You won't
be changing any data. Therefore, the only reason (in your case) to
normalize your data is for storage space.

So what you have to balance is storage space vs. performance. By
normalizing, you save space but will take a performance hit -- especially
as the database grows larger. By not normalizing, you will need more
storage space, but your performance will be better.

What is best for you depends on a lot of factors that I don't know.
Certainly, one factor will be breaking your mainframe data into normalized
tables. There will be time and effort (every day) to do that. Next, you
may or may not save very much storage. A lot depends on the data. Lastly,
I don't know how processor intensive your queries will be.

All that said, I have a similar situation to you. I work at a hospital and
do reporting from an Access application from imported mainframe data. I
have several normalized tables because that's the way the data from the
mainframe comes to me. But then I have a very large flat file full of
pre-calculated values that I use to query against. I do the data
transformation and load only once a month.

If your data is already in the format you need to report it, you might be
better off NOT normalizing this data. Of course, that means you'll have to
deal with the storage problems. One solution is to use SQL Server (or even
SQL Server Express) as your database back-end and to all the
querying/reporting from an Access front end with the SQL tables linked into
it.

Hope that helps some.

--
--Roger Carlson
MS Access MVP
Access Database Samples: www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/wa.exe?SUBED1=ACCESS-L
 
J

Jabberwocky

Okay then. Here goes.

What you are describing is more of a datawarehouse or datamart (depending on
how you define them). A normalized structure is not necessarily the best
model for this.

The thing you have to remember is that any database project has two
competing requirements: Storage and Performance. Normalizing your tables
will reduce your storage, but will also give you worse performance. Leaving
your tables in a non-normalized structure (ie, with repeated data) will take
more storage, but you will gain performance. The thing to do is balance the
two. Before you can do that, you have to know what they are and why they
are important.

Performance is easy. It's how fact your queries will run and its importance
is also obvious. But Normalization is different. Normalization is the
methodology of splitting your data into multiple tables so that you 1)
minimize redundancy and 2) are able to put all the data back together again.

Normalization has two purposes: 1) reduce storage space by removing
redundant data, and 2) maintaining data integrity. But since you're getting
your data from your mainframe source systems, data integrity is not an
issue. Data integrity is being maintained by the source systems. You won't
be changing any data. Therefore, the only reason (in your case) to
normalize your data is for storage space.

So what you have to balance is storage space vs. performance. By
normalizing, you save space but will take a performance hit -- especially
as the database grows larger. By not normalizing, you will need more
storage space, but your performance will be better.

What is best for you depends on a lot of factors that I don't know.
Certainly, one factor will be breaking your mainframe data into normalized
tables. There will be time and effort (every day) to do that. Next, you
may or may not save very much storage. A lot depends on the data. Lastly,
I don't know how processor intensive your queries will be.

All that said, I have a similar situation to you. I work at a hospital and
do reporting from an Access application from imported mainframe data. I
have several normalized tables because that's the way the data from the
mainframe comes to me. But then I have a very large flat file full of
pre-calculated values that I use to query against. I do the data
transformation and load only once a month.

If your data is already in the format you need to report it, you might be
better off NOT normalizing this data. Of course, that means you'll have to
deal with the storage problems. One solution is to use SQL Server (or even
SQL Server Express) as your database back-end and to all the
querying/reporting from an Access front end with the SQL tables linked into
it.

Hope that helps some.

--
--Roger Carlson
MS Access MVP
Access Database Samples:www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/wa.exe?SUBED1=ACCESS-L

thank you for your thoughtful response. I hadn't considered sql
server express. I'm watching the tutorials now.....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top