How do I denormalize data using queries?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have a table that contains company_ids, contact names, and contact titles.
I need to denormalize/flatten this table so there are not multiple
company_ids. I need the contacts and titles to fall under the column names of
Person01, Title01, Person02, Title02, etc., there could be upwards of 100
contact names and titles per company_id. I was told that I could denormalize
by using queries but have had no luck in getting the contacts and titles to
fall under the appropriate column headings. Could someone please give me some
pointers?
 
I have tried that repeatedly and I couldn't get it to work. I even tried
doing a crosstab on my own. I can't get the column headings Person01,
Title01, Person02, Title02, etc. to work. Any ideas?
--
Thanks,
lfarina


Douglas J. Steele said:
Try the Crosstab Query wizard.
 
I frequently use Crosstabs, but usually to go the other direction. (See
below.) For what you're doing, I'd use a collection of Append Queries,
each of which grabs information from one set of fields and stuffs it
into a new [Contacts] Table (which you will have set up before running
this).

Suppose your Table looks like this:

[Company] Table Datasheet View:

CompanyID CompanyName Person01 Title01 Person02 Title02
---------- --------------- -------- ------- -------- -------
-163165739 Acme Explosives Jim Boss Fred Gofer
2096126066 Wacky Widgets Mary CEO

We grab the [Person01] and [Title01] fields with the following Query.
(Of course, you might have additional fields, such as phone number or
email, and if so you'll need to include them here, too.)

[Q_AppendToContacts] SQL View:

INSERT INTO Contacts ( [Number],
Company_ID, Person, Title )
SELECT 1 AS [Number], Company.CompanyID,
Company.Person01, Company.Title01
FROM Company
WHERE (((Company.Person01) Is Not Null
And (Company.Person01)<>""));

After running it, the previously empty [Contacts] Table looks like this:

[Contacts] Table Datasheet View:

ContactsID Company_ID Number Person Title
---------- ---------- ------ ------ -----
974552048 2096126066 1 Mary CEO
1211496459 -163165739 1 Jim Boss

Now we update the Query to grab the next 2 fields:

[Q_AppendToContacts] SQL View:

INSERT INTO Contacts ( [Number], Company_ID,
Person, Title )
SELECT 2 AS [Number], Company.CompanyID,
Company.Person02, Company.Title02
FROM Company
WHERE (((Company.Person02) Is Not Null
And (Company.Person02)<>""));

and running it adds another record.

[Contacts] Table Datasheet View:

ContactsID Company_ID Number Person Title
---------- ---------- ------ ------ -----
-1253217999 -163165739 2 Fred Gofer
974552048 2096126066 1 Mary CEO
1211496459 -163165739 1 Jim Boss

After backing up your database, you can delete the copied fields from
the [Company] Table.

===

This is all you asked, and you don't need to read further, but if you
want to reconstruct your original spreadsheet-style Datasheet, you can
do that with some additional Queries on this new [Contacts] Table.

To do this, first we list the "Person" fields from [Contacts]...

[Q_Persons] SQL:

SELECT Company.CompanyName,
Format([Contacts]![Number],"00") & "Person" AS FieldName,
Contacts.Person
FROM Contacts INNER JOIN Company
ON Contacts.Company_ID = Company.CompanyID
ORDER BY Company.CompanyName, FieldName;

[Q_Persons] Query Datasheet View:

CompanyName FieldName Person
--------------- ---------- ------
Acme Explosives 01Person Jim
Acme Explosives 02Person Fred
Wacky Widgets 01Person Mary

and we define a similar Query, [Q_Titles], for the titles. (You don't
really need to do that, as only the SQL is important, but I used Query
Design View to create the SQL statements that I then copied.)

Note that I put the numbers at the beginning of each field name, so
they'd sort properly in the Crosstab.

Now we combine the SQL from [Q_Persons] and [Q_Titles] into a Union Query.

[Q_XtabFields] SQL:

SELECT Company.CompanyName,
Format([Contacts]![Number],"00") & "Person" AS FieldName,
Contacts.Person
FROM Contacts INNER JOIN Company
ON Contacts.Company_ID = Company.CompanyID
UNION ALL
SELECT Company.CompanyName,
Format([Contacts]![Number],"00") & "Title" AS FieldName,
Contacts.Title
FROM Contacts INNER JOIN Company
ON Contacts.Company_ID = Company.CompanyID
ORDER BY Company.CompanyName, FieldName;

This Query produces the following list:

[Q_XtabFields] Query Datasheet View:

CompanyName FieldName Person
--------------- ---------- --------
Acme Explosives 01Person Jim
Acme Explosives 01Title Boss
Acme Explosives 02Person Fred
Acme Explosives 02Title Gofer
Wacky Widgets 01Person Mary
Wacky Widgets 01Title CEO

Now we can define a Crosstab Query based on this...

[Q_XtabFields_Crosstab] SQL:

TRANSFORM First(Q_XtabFields.Person) AS FirstOfPerson
SELECT Q_XtabFields.CompanyName
FROM Q_XtabFields
GROUP BY Q_XtabFields.CompanyName
PIVOT Q_XtabFields.FieldName;

If you wish, you can specify exactly which columns you want this Query
to display, so that you don't wind up with too many. Or you could
filter the results of the [Q_XtabFields] Query to report only certain
column numbers.

Anyway, run this Query ...

[Q_XtabFields_Crosstab] Query Datasheet View:

CompanyName 01Person 01Title 02Person 02Title
--------------- -------- ------- -------- -------
Acme Explosives Jim Boss Fred Gofer
Wacky Widgets Mary CEO

.... and -- voilá! -- you get the same kind of layout you had originally,
but now your underlying Tables are organized in a way that makes them
much easier to analyze and use.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.
 
Thanks for the reply Vincent. I'm a little lost. The table I have to begin
with is called Person, the 3 field names are Company_ID, Fulnm (for Full
Name), and Title. There could be 100 names per company_id. Would I need to
make 100 or more append queries? The results you have in the final query
Q_XtabFields_Crosstab is extactly what I need.
--
Thanks,
lfarina


Vincent Johns said:
I frequently use Crosstabs, but usually to go the other direction. (See
below.) For what you're doing, I'd use a collection of Append Queries,
each of which grabs information from one set of fields and stuffs it
into a new [Contacts] Table (which you will have set up before running
this).

Suppose your Table looks like this:

[Company] Table Datasheet View:

CompanyID CompanyName Person01 Title01 Person02 Title02
---------- --------------- -------- ------- -------- -------
-163165739 Acme Explosives Jim Boss Fred Gofer
2096126066 Wacky Widgets Mary CEO

We grab the [Person01] and [Title01] fields with the following Query.
(Of course, you might have additional fields, such as phone number or
email, and if so you'll need to include them here, too.)

[Q_AppendToContacts] SQL View:

INSERT INTO Contacts ( [Number],
Company_ID, Person, Title )
SELECT 1 AS [Number], Company.CompanyID,
Company.Person01, Company.Title01
FROM Company
WHERE (((Company.Person01) Is Not Null
And (Company.Person01)<>""));

After running it, the previously empty [Contacts] Table looks like this:

[Contacts] Table Datasheet View:

ContactsID Company_ID Number Person Title
---------- ---------- ------ ------ -----
974552048 2096126066 1 Mary CEO
1211496459 -163165739 1 Jim Boss

Now we update the Query to grab the next 2 fields:

[Q_AppendToContacts] SQL View:

INSERT INTO Contacts ( [Number], Company_ID,
Person, Title )
SELECT 2 AS [Number], Company.CompanyID,
Company.Person02, Company.Title02
FROM Company
WHERE (((Company.Person02) Is Not Null
And (Company.Person02)<>""));

and running it adds another record.

[Contacts] Table Datasheet View:

ContactsID Company_ID Number Person Title
---------- ---------- ------ ------ -----
-1253217999 -163165739 2 Fred Gofer
974552048 2096126066 1 Mary CEO
1211496459 -163165739 1 Jim Boss

After backing up your database, you can delete the copied fields from
the [Company] Table.

===

This is all you asked, and you don't need to read further, but if you
want to reconstruct your original spreadsheet-style Datasheet, you can
do that with some additional Queries on this new [Contacts] Table.

To do this, first we list the "Person" fields from [Contacts]...

[Q_Persons] SQL:

SELECT Company.CompanyName,
Format([Contacts]![Number],"00") & "Person" AS FieldName,
Contacts.Person
FROM Contacts INNER JOIN Company
ON Contacts.Company_ID = Company.CompanyID
ORDER BY Company.CompanyName, FieldName;

[Q_Persons] Query Datasheet View:

CompanyName FieldName Person
--------------- ---------- ------
Acme Explosives 01Person Jim
Acme Explosives 02Person Fred
Wacky Widgets 01Person Mary

and we define a similar Query, [Q_Titles], for the titles. (You don't
really need to do that, as only the SQL is important, but I used Query
Design View to create the SQL statements that I then copied.)

Note that I put the numbers at the beginning of each field name, so
they'd sort properly in the Crosstab.

Now we combine the SQL from [Q_Persons] and [Q_Titles] into a Union Query.

[Q_XtabFields] SQL:

SELECT Company.CompanyName,
Format([Contacts]![Number],"00") & "Person" AS FieldName,
Contacts.Person
FROM Contacts INNER JOIN Company
ON Contacts.Company_ID = Company.CompanyID
UNION ALL
SELECT Company.CompanyName,
Format([Contacts]![Number],"00") & "Title" AS FieldName,
Contacts.Title
FROM Contacts INNER JOIN Company
ON Contacts.Company_ID = Company.CompanyID
ORDER BY Company.CompanyName, FieldName;

This Query produces the following list:

[Q_XtabFields] Query Datasheet View:

CompanyName FieldName Person
--------------- ---------- --------
Acme Explosives 01Person Jim
Acme Explosives 01Title Boss
Acme Explosives 02Person Fred
Acme Explosives 02Title Gofer
Wacky Widgets 01Person Mary
Wacky Widgets 01Title CEO

Now we can define a Crosstab Query based on this...

[Q_XtabFields_Crosstab] SQL:

TRANSFORM First(Q_XtabFields.Person) AS FirstOfPerson
SELECT Q_XtabFields.CompanyName
FROM Q_XtabFields
GROUP BY Q_XtabFields.CompanyName
PIVOT Q_XtabFields.FieldName;

If you wish, you can specify exactly which columns you want this Query
to display, so that you don't wind up with too many. Or you could
filter the results of the [Q_XtabFields] Query to report only certain
column numbers.

Anyway, run this Query ...

[Q_XtabFields_Crosstab] Query Datasheet View:

CompanyName 01Person 01Title 02Person 02Title
--------------- -------- ------- -------- -------
Acme Explosives Jim Boss Fred Gofer
Wacky Widgets Mary CEO

.... and -- voilá! -- you get the same kind of layout you had originally,
but now your underlying Tables are organized in a way that makes them
much easier to analyze and use.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.

I have tried that repeatedly and I couldn't get it to work. I even tried
doing a crosstab on my own. I can't get the column headings Person01,
Title01, Person02, Title02, etc. to work. Any ideas?
Try the Crosstab Query wizard.
 
lfarina said:
Thanks for the reply Vincent. I'm a little lost. The table I have to begin
with is called Person, the 3 field names are Company_ID, Fulnm (for Full
Name), and Title. There could be 100 names per company_id. Would I need to
make 100 or more append queries? The results you have in the final query
Q_XtabFields_Crosstab is extactly what I need.

'Fraid so, at least you will if you do it the way I suggested.

If you're familiar with VBA, you could write a function to do that
automagically, but you could probably take care of updating and running
100 Queries (using my system) in an hour or two, and it might take
longer than that to get the VBA stuff working. It depends on how
comfortable you are with VBA. (And if you use the manual approach and
make a mistake on the 73rd run, you might have to start over. Don't let
that happen: save your file frequently, using a new file name each time,
so you won't have to do that. Delete the extra backup files when you're
done and are happy with your results.)

What might be easier is to copy the Table to an Excel file and copy
blocks of names into one long, skinny Excel list. You can have up to
65K records in one of those. Having laid them out like that, you can
define that as a named range in Excel and import that Excel list into
Access as a new Access Table. Same results as what I suggested, but
maybe with less hassle.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.
 
I would love to use VBA for this. I actually have some code that I was
working with, trying to get it to fit my needs. I'm new to it, but I
understand some. I'm just having trouble with a couple of pieces. I was told
by another forum that I should just use queries, so I got back on the query
train again. I would rather do the code thing.
Do you work with code? Could you look at my code and see what I am doing
wrong? IIf possible, I could attach a zip file that has my example mdb so you
can see what I am actually working with. I'm not really worried about how
long this takes, this is a pretty big project I am working on. I really
appreciate your help on this.
 
lfarina said:
I would love to use VBA for this. I actually have some code that I was
working with, trying to get it to fit my needs. I'm new to it, but I
understand some. I'm just having trouble with a couple of pieces. I was told
by another forum that I should just use queries, so I got back on the query
train again. I would rather do the code thing.
Do you work with code? Could you look at my code and see what I am doing
wrong? IIf possible, I could attach a zip file that has my example mdb so you
can see what I am actually working with. I'm not really worried about how
long this takes, this is a pretty big project I am working on. I really
appreciate your help on this.


Go ahead and send it ... it should take just a few lines, not very
involved. I think the easiest way is to open the recordset, romp
through the fields in each record, and output the results.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.
 
I don't know how to post the mdb so I'm posting the code only. Right now a
new table gets created but I'm just getting duplicates of the personnel that
should be listed under Person01. Both the Person01 and Title01 are
duplicating. Here's the code:

Option Compare Database
Option Explicit

Sub DenormalizeTable()
'this is the main subroutine which calls the others
CreateDenormalizedTable (MaxNumberOfFields)
Denormalize
End Sub
Function MaxNumberOfFields()
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim strSQL As String
Dim NumberOfFields As Integer

Set db = CurrentDb
strSQL = "SELECT TOP 1 Count(PERSON.COMPANY_ID) AS FieldCount " _
& "FROM PERSON " _
& "GROUP BY PERSON.Company_ID " _
& "ORDER BY Count(PERSON.COMPANY_ID) ASC;"
Set rs = db.OpenRecordset(strSQL)
MaxNumberOfFields = rs!FieldCount
End Function
Sub CreateDenormalizedTable(FieldCount As Integer)
On Error GoTo Err_CreateDenormalizedTable

Dim db As DAO.Database
Dim tblNew As DAO.TableDef
Dim fld As Field
Dim IndexNumber As Integer
Set db = CurrentDb

' Create the table and a field
Set tblNew = db.CreateTableDef("PERSONNEL")
Set fld = tblNew.CreateField("COMPANY_ID" & IndexNumber, dbDouble)
Set fld = tblNew.CreateField("PERSON01" & IndexNumber, dbText)
Set fld = tblNew.CreateField("TITLE01" & IndexNumber, dbText)
Set fld = tblNew.CreateField("PERSON02" & IndexNumber, dbText)
Set fld = tblNew.CreateField("TITLE02" & IndexNumber, dbText)
tblNew.Fields.Append fld

For IndexNumber = 1 To FieldCount

' Set field properties here if you want ie.
' fld.Required = True
'
' Append field to Fields collection
tblNew.Fields.Append fld
Next IndexNumber
' Append table to TableDef collection
db.TableDefs.Append tblNew

Exit_CreateDenormalizedTable:
Exit Sub

Err_CreateDenormalizedTable:
If Err.Number = 3265 Then
Resume Next
Else
MsgBox Err.Description
Resume Exit_CreateDenormalizedTable
End If
End Sub
Sub Denormalize()

Dim db As DAO.Database
Dim rs1 As DAO.Recordset
Dim rs2 As DAO.Recordset
Dim FieldCount As Integer
Dim currentCompany_ID As Double, previousCompany_ID As Double

Set db = CurrentDb
Set rs1 = db.OpenRecordset("PERSON") 'table with old format
Set rs2 = db.OpenRecordset("PERSONNEL") 'table with new format

DoCmd.SetWarnings False
DoCmd.RunSQL ("Delete * from PERSONNEL")
DoCmd.SetWarnings True

FieldCount = 1
rs1.MoveFirst

Do While Not rs1.EOF
currentCompany_ID = rs1!Company_ID
If currentCompany_ID <> previousCompany_ID Then
FieldCount = 1
rs2.AddNew
rs2!Company_ID = rs1!Company_ID
rs2!PERSON01 = rs1!FULNM
rs2!TITLE01 = rs1!TITLE
rs2!PERSON02 = rs1!FULNM
rs2!TITLE02 = rs1!TITLE
rs2.Update
Else
FieldCount = FieldCount + 1
rs2.MoveLast
rs2.Edit
rs2!Company_ID = rs1!Company_ID
rs2!PERSON01 = rs1!FULNM
rs2!TITLE01 = rs1!TITLE
rs2!PERSON02 = rs1!FULNM
rs2!TITLE02 = rs1!TITLE
rs2.Update
End If
previousCompany_ID = currentCompany_ID
rs1.MoveNext
Loop

End Sub
 
lfarina said:
I don't know how to post the mdb so I'm posting the code only. Right now a
new table gets created but I'm just getting duplicates of the personnel that
should be listed under Person01. Both the Person01 and Title01 are
duplicating. Here's the code:

Sorry, what I meant was for you to email me a copy, not to post it. You
shouldn't try to post non-text files on this newsgroup (though it's
acceptable on some others), and most people here don't want to receive
any personal email. But it's OK with me if you email me your file. But
I suggest you use Tools --> Database utilities --> Compact & Repair
Database to get rid of extra space, and then (in Windows Explorer, or by
using WinZip) compress the file to a ZIP file before attaching it to
your message.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.
 
Ifarina - I know this is several months old, but in case you are looking I
may have a suggestion: Is your table containing the data organize in a
similar manner: Field 1 = company, Field 2 = Name, and Field 3 = Title ? Or
is it set up in another manner?

Matt
 
Hi Matt - Thanks for responding. My table has Field 1= Company_ID (a ten
digit number), Field 2= FULNM (the full name of the person), Field 3= TITLE.
Hope this gives you something to work with.
 
Ifarina -

Let's say your table name is CONTACTS. Do the following:

1.) Under QUERIES, select 'Create query in Design View'
2.) Close out of the 'Show Table' dialogue.
3.) From the main menu, select View > SQL View
4.) The SQL view defaults with 'SELECT;' .... delete this

** Copy the below script and paste it into the SQL view window:

TRANSFORM First(CONTACTS.FULNM) AS FirstOfFULNM
SELECT CONTACTS.Company_ID
FROM CONTACTS
GROUP BY CONTACTS.Company_ID
PIVOT CONTACTS.TITLE;

5.) replace the table name 'CONTACTS' with what your actual table name is.
6.) Click on the Save icon and create a name when prompted.
7.) Click on the Run query icon and view the results.

The company will only appear once on each line and each unique title will
appear as a column header. The crosstabs will contain the respective person.
I realized this after I began responding, but there is a pitfall: if the two
(or more) people occupy the same title, only the first name in the list will
be included in your table.
As a sample, if your table looks like this:

Company_ID FULNM TITLE
12345 Joe CEO
12345 Pete CFO
12345 Sam Sales
12345 Alan Backup
54321 Susan CEO
54321 Michelle Sales
54321 Jennifer Sales
54321 Kathy Backup

Then the result will look like this:

Company_ID CEO CFO Sales
Backup
12345 Joe Pete Sam
Alan
54321 Susan Michelle
Kathy

Notice that Jennifer is not included in the table because a person with that
title ('Sales') already appeared. The only way I can think to get around this
is to only allow a title to be used once. (I have Broker1, Broker2, etc for
mine). If this sounds like a method you would like to pursue but you have a
lot of titles you would have to change in order to accomplish this, I might
be able to offer you a way to help expedite the changes. Just let me know.

Matt
 
Back
Top