Data Transformation

Z

zaisaki

Hi All,
I have a data set with variables id, a, b, c, d and looks like:


1 2 3 5 1
1 5 6 5 3
1 2 4 2 4
2 1 3 4 5
2 5 6 2 3
2 4 5 2 1


I would like to summarize data for each id and create a new variable
representing each code occuring in variables a, b and c.


Essentially the new data set should have variables id, v1 - v6.
v1 represents the number of times code 1 occurs in variables a, b and

in the original data set, and so on for v2 - v6.


The data set would like;
1 1 3 2 2 3 1
2 2 2 2 2 1 1


Your help is much appreciated.


Thanks very much,
Zai Sak
 
T

Tom Ogilvy

Sub AB()
Dim rng As Range
Dim cell As Range
Dim arr() As Long
Dim ub As Long
Set rng = Range("A1").CurrentRegion.Resize(, 4).Cells
ub = Application.Max(rng.Columns(1))
ReDim arr(1 To ub, 0 To 6)
For Each cell In rng
If cell.Column = 1 Then
Id = cell.Value
arr(Id, 0) = Id
Else
arr(Id, cell.Value) = arr(Id, cell.Value) + 1
End If
Next
Range("H1").Resize(ub, 7).Value = arr

End Sub
 
H

Harlan Grove

zaisaki said:
I have a data set with variables id, a, b, c, d and looks like:

1 2 3 5 1
1 5 6 5 3
1 2 4 2 4
2 1 3 4 5
2 5 6 2 3
2 4 5 2 1

I would like to summarize data for each id and create a new variable
representing each code occuring in variables a, b and c.


Essentially the new data set should have variables id, v1 - v6.
v1 represents the number of times code 1 occurs in variables a, b and c

in the original data set, and so on for v2 - v6.

The data set would like;
1 1 3 2 2 3 1
2 2 2 2 2 1 1
....

Given your source data, neither your 1st or 2nd result rows are consistent
with counting only fields a-c. The 1st row is consistent with counting
fields a-d, the 2nd isn't consistent with any set of fields since the 2nd
through 7th columns total 10, which isn't a multiple of 3 (the number of
rows in your source data with ID=2). If you meant to count fields a-d, your
second row should be

2 2 2 2 2 3 1

You could do this entirely with formulas, or a pivot table if you add field
names. As for formulas, if your source data was in a range named TBL, and
the top-left result cell were G5, try

G5:
=INDEX(TBL,1,1)

G6 [array formula]:
=INDEX(TBL,MATCH(0,COUNTIF(G$5:G5,INDEX(TBL,0,1)),0),1)

H5:M5 [array formula]:
=MMULT(COLUMN(INDIRECT("RC1:RC"&COUNTIF(INDEX(TBL,0,1),$G5),0))^0,
COUNTIF(OFFSET(TBL,SMALL(IF(INDEX(TBL,0,1)=$G5,ROW(TBL)),
ROW(INDIRECT("1:"&COUNTIF(INDEX(TBL,0,1),$G5))))-ROW(INDEX(TBL,1,1)),1,1,
COLUMNS(TBL)-1),{1,2,3,4,5,6}))

Select H5:M5 and fill down into H6:M6. Then select G6:M6 and fill down as
needed.

An alternative VBA approach that ignores values in fields a-d that aren't
integers between 1 and 6 would be


Sub foo()
'requires reference to Microsoft Scripting Runtime

Dim dk As New Dictionary, dv As New Dictionary
Dim sv As Variant, rv As Variant
Dim i As Long, j As Long, k As Long, x As Variant

k = 0
'modify next line as needed
For Each x In Array(1, 2, 3, 4, 5, 6)
k = k + 1
dv.Add Key:=x, Item:=k
Next x

k = 16
ReDim rv(0 To dv.Count, 1 To k)

sv = Range("A1").CurrentRegion.Value

For i = 1 To UBound(sv, 1)
If Not dk.Exists(sv(i, 1)) Then
dk.Add Key:=sv(i, 1), Item:=dk.Count + 1
rv(0, dk.Count) = sv(i, 1)
End If

If dk.Count >= k Then
k = 2 * k
ReDim Preserve rv(0 To dv.Count, 1 To k)
End If

For j = 2 To UBound(sv, 2)
If dv.Exists(sv(i, j)) Then _
rv(dv(sv(i, j)), dk(sv(i, 1))) = rv(dv(sv(i, j)), dk(sv(i, 1))) + 1
Next j
Next i

ReDim Preserve rv(0 To dv.Count, 1 To dk.Count)

ActiveCell.Resize(dk.Count, dv.Count + 1).Value = _
Application.WorksheetFunction.Transpose(rv)

End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top