Am I Dealing With Cross-Tables Or Not?

B

Brian Bradley

I think I'm having trouble grasping the concept of cross-tables.

My data originates from huge delimited-text (CSV) files consisting of almost
1,500 fields per record. I have been successful in designing tables that
address only one subject, designating primary keys in each table, supplying
those keys to other tables as foreign keys, and designating relationships.
But there are dozens of segments of the CSV files that are structured
similarly to the example I give here, and I have left those segments for
last. I have read everything I can find about cross-tables, and I am just
not comprehending it. (Maybe these segments aren't even "cross-tabbed" and
I'm just missing something else?)

After importing one SEGMENT of the delimited-text file into Excel, it looks
like this:

Event_Location_1 First_Event_Date First_Event_Name
Second_Event_Date Second_Event_Name
Event_Location_2 First_Event_Date First_Event_Name
Event_Location_3 First_Event_Date First_Event_Name
Second_Event_Date Second_Event_Name

As you can see, Event_Locations do not always sponsor the same number of
Events. (Minimum number of Events = 0, maximum number of Events = 10, and
most Event Locations have sponsored 10 Events.)

Basic (background) Questions:

1. Once that type of data is in spreadsheet form, is it considered to be a
cross-table?
2. What do we call that sort of data structure while it's still in CSV-file
form? (Cross-tabbed delimited data?)
3. Is it considered a poor practice to publish cross-tabbed delimited data,
or is it sometimes the only way to publish it?

Back to my major needs:

With data in Access tables structured as illustrated above (as in the
spreadsheet), I cannot understand how to query the table to answer questions
such as "How many events happened at Event Location 3?" or "At what
locations have we done an HIV screen?" or "When did we last do Child
Fingerprinting at Location 2?"

I've read everything I can find, but I can't "un-cross" the data in my mind
sufficiently to know how to place that data in Access table(s). Should I
(must I?) transform each row of data (Event Location) into a separate Access
table that looks like this:

Design View of (possible) Table named tbl_Event_Location_1:

Event_Date (date type)
Event_Name (text type) (table will have PK or composite key,
eventually)

Datasheet View of same:

01/01/1991 HIV Screen
02/02/1992 Blood Drive
03/03/1993 Child Fingerprinting

I am successfully using an Extraction/Transformation/Loading (ETL) utility
called Visual Importer to import the non-cross-tabbed (?) segments of the
CSV files into my existing Access tables. Visual Importer appears to offer a
method of importing cross-tabbed data into database tables and to "un-cross"
the data as it does so, but the help instructions of Visual Importer are
scant and poorly written. And because I think I simply have some sort of
genetic predisposition for being unable to comprehend cross-tables, I am
stumped here, at the end of many, many hours of work trying to create this
database (pro bono) for a charitable organization. (I am a transcriptionist,
not a database designer, but I volunteered to do this because I was sure
that I could, having previously created a few other well-formed (IMHO) and
well-working -- but simpler -- databases for myself and for others. But I
have never had to deal with cross-tables. If there were no cross-tabbed data
in the CSV files, the charity organization would already have their
database, and I am confident that it would already be serving them well. But
this last part of the puzzle has me against a wall. So I come to you.

Please correct any misconceptions I have expressed above and please point me
to more learning. I don't even know what additional questions to ask. Thanks
so much.

Brian Bradley
San Diego
 
P

Pieter Wijnen

this is definetively crosstab data

I would make an EventType Table
containing the EventTypes (EventName)
EventType
-----------
EventTypeID
EventType

an EventLocation Table
containing the EventLocation (EventLocation)

EventLocation
--------------
EventLocationID
EventLocation

then a TempEvent Table

TempEvent
------------
TempEventID (Number , same for all events on one line)
TempID (Aka X)
TempEventType (Aka EventName_X )
TempEventLocation (Aka EventLocation_X)
TempEventDate

From This joined to the EventName & EventLocation Tables
you Can Then Make a Proper Event Table

Event
---------
EventID
ParentEvent (Self join to EventID, Created sorted on the X ordinal, Null for
x=1)
EventTypeID (FK -> EventType)
EventLocationID (FK -> EventLocation)
EventDate

HTH

Pieter
 
A

Albert D. Kallal

Brian Bradley said:
As you can see, Event_Locations do not always sponsor the same number of
Events. (Minimum number of Events = 0, maximum number of Events = 10, and
most Event Locations have sponsored 10 Events.)

Right, and what happens if you have 11 events, or need 11 events?

Surely, one could not re-design all tables, redo all reports and re-design
all data entry forms to add just extra stupid event. You could HUGE amounts
of time re-design virtually every report, every query and every form that
was designed for 10 events. As you can see, this is not workable as a
appcation design. Do note I not criticizing you here, as this is the way the
data is being GIVEN to you.. It very possible that the original data *is*
normalized.
Basic (background) Questions:

1. Once that type of data is in spreadsheet form, is it considered to be a
cross-table?

Not really. It just a big fat difficult use to table in a format that is
crappy. I don't think there is a particular term here. The "main" term one
would use is that the data is NOT normalized.

The "normalizing" is a term we use to explain how data is CORRECTLY modeled.
When I spoke of a design that allows 5 events, or 15 events, and you do NOT
have to modify every report, screen/form etc to accomplish an extra event
beyond 10 events, that what normalizing data accomplishes for us.
2. What do we call that sort of data structure while it's still in
CSV-file form? (Cross-tabbed delimited data?)

Sure, it just tab delimited data. As mentioned, it also not normalized data
either. And, it should be pointed out that a CSV file can't really represent
a one to many relationships (that means you have an event location, and then
a related table called sponsored events).

3. Is it considered a poor practice to publish cross-tabbed delimited
data, or is it sometimes the only way to publish it?

yes, often that the only way to publish it. It certainly poor practice when
you have to develop sql queries, and build reports that will automatic sum
and group and count the number of sponsor events for each location (sql can
do this in ONE command *IF* your data is normalized).


By the way, the new "buzz" called XML *can* represent normalized data.
However, XML is not useful here unless you have either tools to import, and
deal with that data.
Back to my major needs:

With data in Access tables structured as illustrated above (as in the
spreadsheet), I cannot understand how to query the table to answer
questions such as "How many events happened at Event Location 3?" or "At
what locations have we done an HIV screen?" or "When did we last do Child
Fingerprinting at Location 2?"


Yes, that exactly the problem. Further, a table in ms-access is limited to
255 fields (that is columns in spreadsheet lingo). I mean, if that is the
case, then how have you been importing this data now?

Remember, with database normalizing, you rarely every need a table with more
then 50 fields, let alone 1500. Databases ARE NOT spreadsheets, and they DO
NOT work that way.

A complex database that handles job costing built around labor one day, and
the next day built around material cost,a nd the next day built around 3
dimensional parts lists for the space shuttle will NEVER need a table
approaching 100 fields, let alone 1500. So, you have to likely break out
those "events" into another table.

Should I (must I?) transform each row of data (Event Location) into a
separate Access table that looks like this:

*excellent* idea...(yes, you should!!).
Design View of (possible) Table named tbl_Event_Location_1:

Event_Date (date type)
Event_Name (text type) (table will have PK or composite key,
eventually)

Datasheet View of same:

01/01/1991 HIV Screen
02/02/1992 Blood Drive
03/03/1993 Child Fingerprinting

yes, and you could also add a "event" number, so, you don't need "10" tables

eg:
Design View of (possible) Table named tbl_Event_Location_1:

EventLocation <----
Event_Date (date type)
Event_Name (text type) (table will have PK or composite key,
eventually)

By adding JUST one column to that 2nd child table, we just eliminated 9
tables. and, we also have a design that will allow 11, or 50 events at NO
extra cost to the design.
Visual Importer appears to offer a method of importing cross-tabbed data
into database tables and to "un-cross" the data as it does so,

The above would be excellent. I don't know that tool at all. If you can't
use visual Importer, then you going to have to write your own import routine
that reads the data, splits out the "repeating" event data into that other
table. Needless to say, this is not a HUGE HARD task, but it will assume you
are fluent in VBA, and is certainly some work.

And because I think I simply have some sort of genetic predisposition for
being unable to comprehend cross-tables

Sure, but you also shown a *amazing* grasp of this problem and have even
proposed a 2nd table to hold this repeating data.

Also, your not

So, seems like we have two tables (for starters....perhaps more).

tblEvvent
this table could have the event date, possibility the location, and any
other information that belongs to the ONE event

tblSponsors
tblEvent_ID
EventDate
Sponsor
EventNumber
etc.

Note that if the event date is the *same*, for the particular event, then
that field should be moved up to the tblEvent table.

Remember, normalizing data says we remove *repeating* data. if the 10
sponsors are for the ONE event and there is ONE event date, then the
EventDate belongs in the tblEvent....

Keep in mind that often building quires on normalized data takes MORE skill,
but gives you MORE flexibility. So, you will be able to build reports that
summarize those questions you have, and you be able to do this with GREATER
ease then trying to report on all those *zillions* of fields (but, you need
better skills to write that sql, and those reports).

So, normalizing data is a doubled edged sword. It gives more flexibility,
but you have increased skill complexity.

A dog house is less complex then a hotel, but we prefer the additional
complexity of staying in hotels as opposed to a dog house. Things like
showers etc VASTLY increase the complexity of the hotel over the dog house,
but that added complexity is worth the benefits. Normalizing is much the
same, it is BETTER, but often takes additional skill to work with. It
certainly more easy to put a garden hose into the dog house to provide
water, and if you don't have a plumber on staff, you might just settle for
that garden hose....

I would perhaps spend more time exploring the extraction utility you have.

And, often, you don't have to go "all" the way in normalizing data.

For example, in place of those two tables, you could just have one table,
and repeat some of the data

(ie: event number, and event date).

Your doing this for just reporting I assume? If you doing for building an
data entry, and application, then you want to normalize the data into
several tables. For just reporting...you likely can still get by using one
table, and adding the event date and event number as columns. (either way
you choose, don't use the "field" name, or column name to represent
something that can be a column).

You can see with this design, we eliminate LARGE number of fields, and
simply have a column that is the event number. so, one table is well fine if
you just reporting here, and it will save you the effort of the difficult
task of splitting out data to multiple tables with relationships.

The reason why we can consider one talbe is because this data is only being
used for reporting purposes....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top