Table Analyzer and Normalization

G

Guest

Hi,
I'm trying to build a datamart using Access as our database. Ideally I'd do
it in SQL server, but I belong to a business group and SQL server would make
this a low priority IT project. While we convince the powers-that-be about
our specific reporting and analytical needs, I'm tasked with the
responsibility of making something work with Access: Now.

Currently I can get a flat file dump from our datawarehouse using a
reporting tool. It has about 20 columns and has about 1.5million rows for 6
months of data and is 1.5gb in size within Access. We do apply some specific
business rules to categorize and do some clean up within Access.

I've been trying to unsuccessfully create a database out of this in the hope
of bringing down the redundancy present. The hope is that I can fit 12 months
of data into the database after it has been normalized into 8 tables in a
dimensional model (one fact, 7 dimension tables). To begin, we'd be updating
our database only once a month.

Table Analyzer seems like a great tool, but it doesn't seem upto the task,
I'm getting all kinds of errors (Temporary Disk Full or System Resources
exceeded) even when I work with just 50,000 rows of the original data. I was
able to make it work once on another database, so I know it works and Excel
users can use the Select Query generated to build pivot tables seamlessly
without worrying about the joins etc...

Would like to know if I'm being too ambitious or if I'm on the wrong path
here. Should I just stick with one Front End Db and multiple back end db's
to solve our needs? I know that will definitely make the job of incremental
monthly updates easier.

All suggestions welcome and appreciated,
John H.
 
J

John Nurick

Hi John,

It sounds as you know more than Table Analyzer. Here's what I'd probably
do:

1) Work out a normalised (or part-normalised) data structure suitable
for the reporting needs. Calculate how many bytes of actual data this
would involve over 12 months. If this is much over 1GB (to allow for
indexes and other overhead) then a single Access back end won't do.

2) Link (save space by not importing) to the big text file and run a
series of append queries to populate the normalised tables. If any of
the normalised tables are really big, it seems to help if you remove as
many indexes as possible before appending, and re-create them afterwards
having first compacted the database.

3) Create a series of queries that give views of the data that the users
can base their own queries on.

NB1: To get round the size limitation, given that this data will be
read-only and there's therefore no worry about maintaining relational
integrity against user editing, you could put some of the tables into a
separate mdb file and use linked tables to connect the two. Or perhaps
you could put one over on your DBA and install MySql somewhere.

NB2: If you're faced with humongeous text files, they can be brought
down to size (e.g. strip out unwanted fields or records) with text-file
tools from the Unix world (see http://unxutils.sourceforge.net/ and
http://gnosis.cx/publish/programming/text_utils.html) or a scripting
language such as Perl.
 
G

Guest

Hi John,
Thanks a lot for the reply. It appears that I'm on the right track for the
present. See specific follow up comments below:

John Nurick said:
Hi John,

It sounds as you know more than Table Analyzer. Here's what I'd probably
do:

1) Work out a normalised (or part-normalised) data structure suitable
for the reporting needs. Calculate how many bytes of actual data this
would involve over 12 months. If this is much over 1GB (to allow for
indexes and other overhead) then a single Access back end won't do.

I was succesful finally in creating a 8 table db working with 25k rows in
Table Analyzer. For my current level of granularity, one back end might do,
otherwise I will store some of the tables in another db and use linked tables
to create one view for querying purposes as you suggest.
2) Link (save space by not importing) to the big text file and run a
series of append queries to populate the normalised tables. If any of
the normalised tables are really big, it seems to help if you remove as
many indexes as possible before appending, and re-create them afterwards
having first compacted the database.

I am linking my big Access table. My normalized dimesion tables have only
one index that link to lookup fields in the Fact table (thanks to Table
Anlayzer, this is done automatically). Now the challenge seems to be in
populating the rest of the 1million odd rows from my source table into this
schema via update queries - while still maintaining referential integrity.
Basically I've to recreate what Table Analyzer did, by associating the
correct lookup values in fact table to new id's in dimension table while I
append each of the tables with new data, so that the entire row can be
recreated.
3) Create a series of queries that give views of the data that the users
can base their own queries on.
NB1: To get round the size limitation, given that this data will be
read-only and there's therefore no worry about maintaining relational
integrity against user editing, you could put some of the tables into a
separate mdb file and use linked tables to connect the two. Or perhaps
you could put one over on your DBA and install MySql somewhere.

Isn't that blasphemous...mentioning MySql in a Microsoft forum? :) I have
in fact installed MySQl and an evaluation copy of SQl server too. Preference
is for SQL server, just because the integration between Excel, SQL server is
tighter, especially when it comes to building Pivot Tables or building cubes
from Excel.
NB2: If you're faced with humongeous text files, they can be brought
down to size (e.g. strip out unwanted fields or records) with text-file
tools from the Unix world (see http://unxutils.sourceforge.net/ and
http://gnosis.cx/publish/programming/text_utils.html) or a scripting
language such as Perl.

John Nurick [Microsoft Access MVP]

No such luck regarding reducing the number of fields. If anything if I am
successful with this effort, we'll be adding more granular data (weeks, city,
zip code etc) to the database.

Thanks again for all the suggestions.
 
J

John Nurick

Hi John,
Thanks a lot for the reply. It appears that I'm on the right track for the
present. See specific follow up comments below:
Isn't that blasphemous...mentioning MySql in a Microsoft forum? :) I have
in fact installed MySQl and an evaluation copy of SQl server too. Preference
is for SQL server, just because the integration between Excel, SQL server is
tighter, especially when it comes to building Pivot Tables or building cubes
from Excel.

I haven't been struck by a thunderbolt yet<g>. Part of being an MVP is
being independent of Microsoft and free to suggest what we feel are the
best solutions no matter where they come from. I'm far from an expert in
either, but ISTM that MySQL's only advantage in the present context is
price. MSDE seems to be ruled out by your need to store more than 2GB,
but check out MSDE's successor, which is called something like SQL
Server 2005 Express, now in beta. IIRC the size limit on this is more
than 2GB.

More blasphemy on my part said:
No such luck regarding reducing the number of fields. If anything if I am
successful with this effort, we'll be adding more granular data (weeks, city,
zip code etc) to the database.

With a bit of practice one can actually use either the textutils or a
Perl script to normalise an ultra-wide text file into multiple narrower
ones that can be imported into related tables in Access.

Good luck!
 
G

Guest

Hi John,
Good to know that you take your independence to heart. I really value your
unbiased advice on this issue. Hopefully, all the good that your doing,
improves your chances against any natural disasters <g>

SQL Server Express seems to be a good way to go since it has a 4GB limit on
file size and is free. As for Perl and the text based utilities, I do know
you can do very sophisticated things, but I guess I'm not a programmer at
heart. Good at vi, but never made the full transition to perl

No wonder I'm find myself struggling at this point in the datamart-building
process:
I've got all my dimension tables loaded with unique values and the fact
table also loaded from the source. Size of the DB is down from 1.3Gb (6months
data) to 240Mb after normalization, so I'm confident that I won't hit the 2Gb
limit for 12 months. The final step though is to populate the lookup fields
in the measures table with the id's in the dimension tables and this is where
I'm trying to account for the fact that my source is one big dump with only
one id per row.

Table Analyzer used composite keys made up of 2-3 fields (from the dimension
table) concatenated together as lookup values in the fact table. Except for
one dimension table which doesn't have a unique identifier, I believe I can
get by with a simpler approach. Each of the dimension tables has an
automatically generated ID that I hope to use, as a foreign key/lookup in the
fact table. My pseudo code for each lookup_field in the fact table is as
follows:

update fact
set fact.prod_lookup= (select d.prod_id
from dimension d, source s
where s.product_number=d.product_number)
where fact.id= (select s.id
from source s )

Access has some funky sql syntax that I have to learn or circumvent to code
the above logic. Think I'm on the right track?

Thanks again
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top