Imported Data Normalization

T

Trisha

We are working from a DOS-based flat-file database from
which data is extracted, tables are then created in SQL
Server, then linked to Access for report writing.

One table called Staff Services has one field that
contains four fields' worth of data. Its label
is "GrossBilling" and it is populated with fields for
TimeBilling, TravelBilling, SuppliesBilling and
TotalBilling for each patient's ID from the original
database. None of these are labeled, we just finally
figured it out by comparing fields in both databases.
When our analyst extracted what we thought would be only
GrossBilling, the other three fields came along uninvited.

Since we can't find a way to not extract all four fields,
we're now faced with "de-transposing" the data into a new
table where every first field appends into TimeBilling,
the second field into TravelBilling, the third into
SuppliesBilling and dropping every fourth field entirely.
Luckily, they are always in the same order for every
patient.

If anyone has ever had to address this issue and can point
me toward any documentation on this problem using either
SQL or VB/VBA, I'd appreciate it. There are over 8,000
records in this table (which would drop to around two
thousand if we could normalize it and eliminate the totals
rows), so doing anything manually is out of the question.

Much gratitude in advance,
Trisha
 
T

TC

Well, you would import the data as best you can, then use update queries
and/or VBA code to change it into a better form.

For example, say the import created a column 'C' in table 'T'. Say every
record in that table, had precisely two values in that column, seperated by
one or more spaces:
"CODE1 CODE5"
"CODE1 N/A"
"N/A XRAY"
etc.

Say you wanted to split-out the two values into new columns Status1 and
Status2. You would add those two new fields to the table, then populate them
with the following SQL update statement (untested):

update T
set Status1 = left$ (C, instr(C, " ") - 1),
Status2 = ltrim$ (mid$ (C, instr (C, " ")))

HTH,
TC
 
D

Duane Hookom

Please take the time to type of paste a sample of existing records into a
message as well as the desired results.
 
T

trisha

SvcID PtID Date GrossBilling
1 001 5/1/2003 170
2 001 5/1/2003 20
3 001 5/1/2003 100
4 001 5/1/2003 290

The above is a sample of the current table structure.
Under GrossBilling, SvcID 1 is really TimeBilling, SvcID 2
is TravelBilling, SvcID3 is SuppliesBilling, SvcID4 is
TotalBilling.

The desired result is to eliminate all the TotalBilling
records, and separate the first three values into their
own columns of TimeBilling, TravelBilling and
SuppliesBilling. The result would resemble this
("Billing" is assumed for columns "Time", "Travel"
and "Supplies" to attempt to avoid word wrapping):

SvcID PtID Date Time Travel Supplies
1 001 5/1/2003 170 20 100

I hope this helps clarify the situation.

Thanks,
Trisha
 
D

Duane Hookom

I thought you wanted to normalize the data. The current table structure is
much more normalized than your desired result. Time, Travel, and Supplies
should be left as values in a field rather than field names in a table.

If, however you need to make the table more of a flat file, you can do this
with a crosstab query.
Set the Row Headings as PtID and Date. Then create a column
ColHead:Choose([SvcID],"TimeBilling", "TravelBilling",
"SuppliesBilling","TotalBilling")
Set Sum of GrossBilling as the Value.
 
T

Trisha

Duane,

Maybe I'm missing a major point, and that's not an
impossibility, but everything I've ever been taught about
normalization is that every record must be atomic and be
collected only once. Calculations (totals, counts, etc.)
are always performed in a query or in a control on a
report, never entered in a table.

With the table structure we're working from, every fourth
record is a calculation of the previous three. There is
now way to identify what those three records represent
other than they are some kind of bill. There is no method
to determine which is Time, Travel or Supplies without
outside knowledge of the third-party source database.

To wit, if this were going to become a database used for
data entry, each patient's data would have to be entered
four times to enter Time, Travel and Supplies, then a
fourth record to enter the total. With no independent
field to identify Time, Travel and Supplies, they would
just be the first three records in each patient's billing
recordsets. This isn't what I learned as correct
normalization.

In my mind this table is at very least missing a field for
(sic)BillingType which could then be used as a lookup
field/combo box (in a form) containing the values of Time,
Travel, and Supplies with their associated values listed
in GrossBilling.

Adding a field in a new table to contain BillingType would
not be a problem, the problem is writing the SQL/VB (and
I'm leaning toward thinking it will have to be a VB loop)
to automate populating the appropriate values into the
appropriate fields (e.g., for each PatientID every 1st
field in GrossBilling populates BillingType as Time, every
second field's BillingType becomes Travel, every third,
Supplies, every fourth is dropped; the dollar values can
stay the same unless there is zero charge).

We also have no need to collect Time and Travel charges
for Supplies when there are no associated Time and Travel
charges, and vice versa. (Hence the 8,000+ records, where
there are really only about 2,000 record that remain after
we filter out all the extraneous data, the rest are either
zero charges or totals fields.)

It strikes me that what is needed to correct this is
actually something of the reverse of a crosstab query.

I truly appreciate the effort you've put forth to try to
help. I hope I'm not just being remarkably dense!

Trisha
-----Original Message-----
I thought you wanted to normalize the data. The current table structure is
much more normalized than your desired result. Time, Travel, and Supplies
should be left as values in a field rather than field names in a table.

If, however you need to make the table more of a flat file, you can do this
with a crosstab query.
Set the Row Headings as PtID and Date. Then create a column
ColHead:Choose([SvcID],"TimeBilling", "TravelBilling",
"SuppliesBilling","TotalBilling")
Set Sum of GrossBilling as the Value.

--
Duane Hookom
MS Access MVP


trisha said:
SvcID PtID Date GrossBilling
1 001 5/1/2003 170
2 001 5/1/2003 20
3 001 5/1/2003 100
4 001 5/1/2003 290

The above is a sample of the current table structure.
Under GrossBilling, SvcID 1 is really TimeBilling, SvcID 2
is TravelBilling, SvcID3 is SuppliesBilling, SvcID4 is
TotalBilling.

The desired result is to eliminate all the TotalBilling
records, and separate the first three values into their
own columns of TimeBilling, TravelBilling and
SuppliesBilling. The result would resemble this
("Billing" is assumed for columns "Time", "Travel"
and "Supplies" to attempt to avoid word wrapping):

SvcID PtID Date Time Travel Supplies
1 001 5/1/2003 170 20 100

I hope this helps clarify the situation.

Thanks,
Trisha


.
 
D

Duane Hookom

I would agree that the total does not need to be stored. According to your
earlier message the SvcID identified whether the amount was for Time,
Travel, or Supplies. Your data still has repeating value in the PtID and
Date fields. A more normalized structure might be
tblPtBilling
===============
PtBillID
PtID
BillDate

tblPtBillingDetail
=============
PtBillDetailID
PtBillID (foreign key to tblPtBilling.PtBillID)
SvcID (link to values of Time, Travel, Supplies or other)
BillAmt

Each Amt value should be in its own record.

--
Duane Hookom
Microsoft Access MVP


Trisha said:
Duane,

Maybe I'm missing a major point, and that's not an
impossibility, but everything I've ever been taught about
normalization is that every record must be atomic and be
collected only once. Calculations (totals, counts, etc.)
are always performed in a query or in a control on a
report, never entered in a table.

With the table structure we're working from, every fourth
record is a calculation of the previous three. There is
now way to identify what those three records represent
other than they are some kind of bill. There is no method
to determine which is Time, Travel or Supplies without
outside knowledge of the third-party source database.

To wit, if this were going to become a database used for
data entry, each patient's data would have to be entered
four times to enter Time, Travel and Supplies, then a
fourth record to enter the total. With no independent
field to identify Time, Travel and Supplies, they would
just be the first three records in each patient's billing
recordsets. This isn't what I learned as correct
normalization.

In my mind this table is at very least missing a field for
(sic)BillingType which could then be used as a lookup
field/combo box (in a form) containing the values of Time,
Travel, and Supplies with their associated values listed
in GrossBilling.

Adding a field in a new table to contain BillingType would
not be a problem, the problem is writing the SQL/VB (and
I'm leaning toward thinking it will have to be a VB loop)
to automate populating the appropriate values into the
appropriate fields (e.g., for each PatientID every 1st
field in GrossBilling populates BillingType as Time, every
second field's BillingType becomes Travel, every third,
Supplies, every fourth is dropped; the dollar values can
stay the same unless there is zero charge).

We also have no need to collect Time and Travel charges
for Supplies when there are no associated Time and Travel
charges, and vice versa. (Hence the 8,000+ records, where
there are really only about 2,000 record that remain after
we filter out all the extraneous data, the rest are either
zero charges or totals fields.)

It strikes me that what is needed to correct this is
actually something of the reverse of a crosstab query.

I truly appreciate the effort you've put forth to try to
help. I hope I'm not just being remarkably dense!

Trisha
-----Original Message-----
I thought you wanted to normalize the data. The current table structure is
much more normalized than your desired result. Time, Travel, and Supplies
should be left as values in a field rather than field names in a table.

If, however you need to make the table more of a flat file, you can do this
with a crosstab query.
Set the Row Headings as PtID and Date. Then create a column
ColHead:Choose([SvcID],"TimeBilling", "TravelBilling",
"SuppliesBilling","TotalBilling")
Set Sum of GrossBilling as the Value.

--
Duane Hookom
MS Access MVP


trisha said:
SvcID PtID Date GrossBilling
1 001 5/1/2003 170
2 001 5/1/2003 20
3 001 5/1/2003 100
4 001 5/1/2003 290

The above is a sample of the current table structure.
Under GrossBilling, SvcID 1 is really TimeBilling, SvcID 2
is TravelBilling, SvcID3 is SuppliesBilling, SvcID4 is
TotalBilling.

The desired result is to eliminate all the TotalBilling
records, and separate the first three values into their
own columns of TimeBilling, TravelBilling and
SuppliesBilling. The result would resemble this
("Billing" is assumed for columns "Time", "Travel"
and "Supplies" to attempt to avoid word wrapping):

SvcID PtID Date Time Travel Supplies
1 001 5/1/2003 170 20 100

I hope this helps clarify the situation.

Thanks,
Trisha


-----Original Message-----
Please take the time to type of paste a sample of
existing records into a
message as well as the desired results.

--
Duane Hookom
MS Access MVP


We are working from a DOS-based flat-file database from
which data is extracted, tables are then created in SQL
Server, then linked to Access for report writing.

One table called Staff Services has one field that
contains four fields' worth of data. Its label
is "GrossBilling" and it is populated with fields for
TimeBilling, TravelBilling, SuppliesBilling and
TotalBilling for each patient's ID from the original
database. None of these are labeled, we just finally
figured it out by comparing fields in both databases.
When our analyst extracted what we thought would be only
GrossBilling, the other three fields came along
uninvited.

Since we can't find a way to not extract all four
fields,
we're now faced with "de-transposing" the data into a
new
table where every first field appends into TimeBilling,
the second field into TravelBilling, the third into
SuppliesBilling and dropping every fourth field
entirely.
Luckily, they are always in the same order for every
patient.

If anyone has ever had to address this issue and can
point
me toward any documentation on this problem using either
SQL or VB/VBA, I'd appreciate it. There are over 8,000
records in this table (which would drop to around two
thousand if we could normalize it and eliminate the
totals
rows), so doing anything manually is out of the
question.

Much gratitude in advance,
Trisha


.


.
 
T

Trisha

Duane,

You're absolutely right, my earlier message was unclear.
In an attempt to describe what the field contained, I made
it sound like the field was identified. In the future,
I'll try to make my descriptions more accurate.

I appreciate your patience and your solution.

Many thanks,
Trisha
-----Original Message-----
I would agree that the total does not need to be stored. According to your
earlier message the SvcID identified whether the amount was for Time,
Travel, or Supplies. Your data still has repeating value in the PtID and
Date fields. A more normalized structure might be
tblPtBilling
===============
PtBillID
PtID
BillDate

tblPtBillingDetail
=============
PtBillDetailID
PtBillID (foreign key to tblPtBilling.PtBillID)
SvcID (link to values of Time, Travel, Supplies or other)
BillAmt

Each Amt value should be in its own record.

--
Duane Hookom
Microsoft Access MVP


Trisha said:
Duane,

Maybe I'm missing a major point, and that's not an
impossibility, but everything I've ever been taught about
normalization is that every record must be atomic and be
collected only once. Calculations (totals, counts, etc.)
are always performed in a query or in a control on a
report, never entered in a table.

With the table structure we're working from, every fourth
record is a calculation of the previous three. There is
now way to identify what those three records represent
other than they are some kind of bill. There is no method
to determine which is Time, Travel or Supplies without
outside knowledge of the third-party source database.

To wit, if this were going to become a database used for
data entry, each patient's data would have to be entered
four times to enter Time, Travel and Supplies, then a
fourth record to enter the total. With no independent
field to identify Time, Travel and Supplies, they would
just be the first three records in each patient's billing
recordsets. This isn't what I learned as correct
normalization.

In my mind this table is at very least missing a field for
(sic)BillingType which could then be used as a lookup
field/combo box (in a form) containing the values of Time,
Travel, and Supplies with their associated values listed
in GrossBilling.

Adding a field in a new table to contain BillingType would
not be a problem, the problem is writing the SQL/VB (and
I'm leaning toward thinking it will have to be a VB loop)
to automate populating the appropriate values into the
appropriate fields (e.g., for each PatientID every 1st
field in GrossBilling populates BillingType as Time, every
second field's BillingType becomes Travel, every third,
Supplies, every fourth is dropped; the dollar values can
stay the same unless there is zero charge).

We also have no need to collect Time and Travel charges
for Supplies when there are no associated Time and Travel
charges, and vice versa. (Hence the 8,000+ records, where
there are really only about 2,000 record that remain after
we filter out all the extraneous data, the rest are either
zero charges or totals fields.)

It strikes me that what is needed to correct this is
actually something of the reverse of a crosstab query.

I truly appreciate the effort you've put forth to try to
help. I hope I'm not just being remarkably dense!

Trisha
-----Original Message-----
I thought you wanted to normalize the data. The current table structure is
much more normalized than your desired result. Time, Travel, and Supplies
should be left as values in a field rather than field names in a table.

If, however you need to make the table more of a flat file, you can do this
with a crosstab query.
Set the Row Headings as PtID and Date. Then create a column
ColHead:Choose ([SvcID],"TimeBilling", "TravelBilling",
"SuppliesBilling","TotalBilling")
Set Sum of GrossBilling as the Value.

--
Duane Hookom
MS Access MVP


SvcID PtID Date GrossBilling
1 001 5/1/2003 170
2 001 5/1/2003 20
3 001 5/1/2003 100
4 001 5/1/2003 290

The above is a sample of the current table structure.
Under GrossBilling, SvcID 1 is really TimeBilling, SvcID 2
is TravelBilling, SvcID3 is SuppliesBilling, SvcID4 is
TotalBilling.

The desired result is to eliminate all the TotalBilling
records, and separate the first three values into their
own columns of TimeBilling, TravelBilling and
SuppliesBilling. The result would resemble this
("Billing" is assumed for columns "Time", "Travel"
and "Supplies" to attempt to avoid word wrapping):

SvcID PtID Date Time Travel Supplies
1 001 5/1/2003 170 20 100

I hope this helps clarify the situation.

Thanks,
Trisha


-----Original Message-----
Please take the time to type of paste a sample of
existing records into a
message as well as the desired results.

--
Duane Hookom
MS Access MVP


We are working from a DOS-based flat-file database from
which data is extracted, tables are then created
in
SQL
Server, then linked to Access for report writing.

One table called Staff Services has one field that
contains four fields' worth of data. Its label
is "GrossBilling" and it is populated with fields for
TimeBilling, TravelBilling, SuppliesBilling and
TotalBilling for each patient's ID from the original
database. None of these are labeled, we just finally
figured it out by comparing fields in both databases.
When our analyst extracted what we thought would
be
only
GrossBilling, the other three fields came along
uninvited.

Since we can't find a way to not extract all four
fields,
we're now faced with "de-transposing" the data into a
new
table where every first field appends into TimeBilling,
the second field into TravelBilling, the third into
SuppliesBilling and dropping every fourth field
entirely.
Luckily, they are always in the same order for every
patient.

If anyone has ever had to address this issue and can
point
me toward any documentation on this problem using either
SQL or VB/VBA, I'd appreciate it. There are over 8,000
records in this table (which would drop to around two
thousand if we could normalize it and eliminate the
totals
rows), so doing anything manually is out of the
question.

Much gratitude in advance,
Trisha


.



.


.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top