Compare and consolidating duplicated rows

G

Guest

I have a worksheet that has over 4000 rows and 50 columns of information. An
individual may be listed multiple times. I am looking to compare the
information and if duplicated consolidate the duplicated data to one row.

Here is an example of the worksheet:

colA colB colC colD colE colF colG
JSMITH O M DIST V O 1
JSMITH 1 M O O DIST 1

End result I would like to see:
colA colB colC colD colE colF colG
JSMITH 1 M DIST V DIST 1



If, JSMITH, is listed twice, have formula look at JSMITH's information for
each column and pull to one row, if "O" pull the other value if different.
 
D

Dave Peterson

Try this against a copy of your worksheet (it destroys the original data).

Option Explicit
Sub testme()

Dim wks As Worksheet
Dim iRow As Long
Dim iCol As Long

Dim FirstRow As Long
Dim LastRow As Long
Dim maxColsToCheck As Long

maxColsToCheck = 50

Set wks = Worksheets("sheet1")

With wks
FirstRow = 2 'headers in row 1???
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row

For iRow = LastRow To FirstRow + 1 Step -1
If .Cells(iRow, "A").Value <> .Cells(iRow - 1, "A").Value Then
'do nothing
Else
For iCol = 2 To maxColsToCheck
If UCase(.Cells(iRow - 1, iCol).Value) = "O" Then
.Cells(iRow - 1, iCol).Value = .Cells(iRow, iCol).Value
End If
Next iCol
'delete that duplicate
.Rows(iRow).Delete
End If
Next iRow
End With

End Sub


If you're new to macros, you may want to read David McRitchie's intro at:
http://www.mvps.org/dmcritchie/excel/getstarted.htm
 
G

Guest

Thank you, this was very helpful. Save hours of work. Do I need to make sure
if cell value equals "O" (ALPHA). Would it matter if it was numeric?
 
D

Dave Peterson

Yes. It would matter.

If it's a one time thing, I'd just Edit|replace 0 with O (zero with oh). Make
sure you check "match entire cell contents".

If you have to do it lots of times...

change this:

If UCase(.Cells(iRow - 1, iCol).Value) = "O" Then
.Cells(iRow - 1, iCol).Value = .Cells(iRow, iCol).Value
End If

to:

If UCase(.Cells(iRow - 1, iCol).Value) = "O" _
Or (IsEmpty(.Cells(iRow - 1, iCol).Value) = False _
And .Cells(iRow - 1, iCol).Value = 0) Then
.Cells(iRow - 1, iCol).Value = .Cells(iRow, iCol).Value
End If


===
Blank cells have a value of 0, so the code has to check for that.
 
G

Guest

Thank you for your help.

I also have another worksheet that has a similar issue, it would need to
look at the name to see if it is duplicated and then look at employee type,
employee type may have two to three types: PC, A, O, PL.
What I would need the macro to do is combined to one row, only if it has PL
and PC as employee type, is there additonal VBA code that can be added to the
beginning of the macro you had sent me ? Thank you again for your asssitance,
you have saved me alot of time.
 
D

Dave Peterson

If one row has PL and one row has PC, then do the merge? In all other cases,
don't merge?

And it doesn't matter which one (top or bottom of a pair of rows) has the PL
while the "opposite" one has the PC????

And when the employeecolumn merges, does PL win or PC or what' on top???

I'm guessing that one of the PL/PC wins...

If yes to all that...

Option Explicit
Sub testme()

Dim wks As Worksheet
Dim iRow As Long
Dim iCol As Long

Dim FirstRow As Long
Dim LastRow As Long
Dim maxColsToCheck As Long
Dim EmplTypeCol As Long

maxColsToCheck = 50

Set wks = Worksheets("sheet1")

With wks
'where's the employee type column?
EmplTypeCol = .Range("b1").Column

FirstRow = 2 'headers in row 1???
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row

For iRow = LastRow To FirstRow + 1 Step -1
If .Cells(iRow, "A").Value <> .Cells(iRow - 1, "A").Value Then
'do nothing
Else
If (UCase(.Cells(iRow, EmplTypeCol).Value) = "PC" _
And UCase(.Cells(iRow - 1, EmplTypeCol).Value) = "PL") _
Or (UCase(.Cells(iRow, EmplTypeCol).Value) = "PL" _
And UCase(.Cells(iRow - 1, EmplTypeCol).Value) = "PC") _
Then
For iCol = 2 To maxColsToCheck
If iCol = EmplTypeCol Then
.Cells(iRow - 1, EmplTypeCol).Value _
= "PC" '"PL" ?????
Else
If UCase(.Cells(iRow - 1, iCol).Value) = "O" Then
.Cells(iRow - 1, iCol).Value _
= .Cells(iRow, iCol).Value
End If
End If
Next iCol
'delete that duplicate
.Rows(iRow).Delete
End If
End If
Next iRow
End With

End Sub

Look for some of those ??? where you might, er, probably will have to make
changes.
 
G

Guest

Hi Dave, This in regards to the original inquiry, if there are three rows or
four of the same individual, is it possible to have the module combine to one
row, there are a few cases this does happen.. Also what does the module do if
rows are duplicated and the values equal the same value, may not be "O"

In regards to your questions on my other inquiry:
If one row has PL and one row has PC, then do the merge? Yes

In all other cases,> don't merge? - Correct
And it doesn't matter which one (top or bottom of a pair of rows) has the PL
while the "opposite" one has the PC????

And when the employeecolumn merges, does PL win or PC or what' on top???
Does not matter
I'm guessing that one of the PL/PC wins... Either Or


Also wonder if their is more than one PC or PL type, for one individual will
this still work?
 
D

Dave Peterson

Both of the routines handle each pair of rows separately.

So once you merge a couple rows into one, the next comparison will just do the
newly merged row and the next row.

If rows 3,4,5,6 all had the same key.

old 6 and old 5 are merged into a new 5
new 5 and old 4 are merged into a new 4
new 4 and old 3 are merged into a new 3

But depending on which routine you use, the merges won't occur. (The PC/PL was
more restricting on when merges would occur.)

And the existing data in top row of the comparison will win--unless the "merge"
can occur.

======
If there are more than 2 duplicate keys, you may want to make sure that your
data is sorted the way you want (I have no idea what that is).

Hi Dave, This in regards to the original inquiry, if there are three rows or
four of the same individual, is it possible to have the module combine to one
row, there are a few cases this does happen.. Also what does the module do if
rows are duplicated and the values equal the same value, may not be "O"

In regards to your questions on my other inquiry:
If one row has PL and one row has PC, then do the merge? Yes

In all other cases,> don't merge? - Correct
And it doesn't matter which one (top or bottom of a pair of rows) has the PL
while the "opposite" one has the PC????

And when the employeecolumn merges, does PL win or PC or what' on top???
Does not matter
I'm guessing that one of the PL/PC wins... Either Or

Also wonder if their is more than one PC or PL type, for one individual will
this still work?
 
G

Guest

To understand your response below, Once I use the macro it merges the
duplicate rows, unfortunately if there is an employee with three rows of
information merges one of the rows to the other and still leaves me with two.
I tried rerunning the macro for it to look again for dupklicates and it does
not consolidate the two. Also seems if
I have the following:

Employee AK CA CT CO
JT M M 1 0
JT M O O M

It does not combine the two
to show:
Employee AK CA CT CO
JT M M 1 M

Any ideas?
 
D

Dave Peterson

Can you show the three rows of raw data?

Remember that only those cells with "0" get overwritten, too.

(And I didn't see if you were using the PL/PC version, too.)
To understand your response below, Once I use the macro it merges the
duplicate rows, unfortunately if there is an employee with three rows of
information merges one of the rows to the other and still leaves me with two.
I tried rerunning the macro for it to look again for dupklicates and it does
not consolidate the two. Also seems if
I have the following:

Employee AK CA CT CO
JT M M 1 0
JT M O O M

It does not combine the two
to show:
Employee AK CA CT CO
JT M M 1 M

Any ideas?
 
G

Guest

Thanks Dave, so if we run the Macro and the remaining duplicates would only
be the duplicates that had matching values excluding "O".

Here is the three rows of data, maybe could use another macro to run after
we run the testme macro
Name Lic Type AK CA CO CT DC
DE
JT IA M M O 1
V DIST
JT IA M M O 1
V DIST
JT RA O 0 M 0
0 0



We have not test the PL/PC yet, but plan on trying in next week.. Do you
have an email address we can correspond?
 
D

Dave Peterson

I started with this:

Name Lic Type AK CA CO CT DC DE
JT IA M M O 1 V DIST
JT IA M M O 1 V DIST
JT RA O 0 M 0 0 0

And ended with this (after running the first suggested macro):

Name Lic Type AK CA CO CT DC DE
JT IA M M M 1 V DIST


And keeping the discussion in the newsgroup is better for both of us. If
someone sees a mistake/better way, you'll benefit. If someone does what you
really want, I won't have to do it <vbg>.


Thanks Dave, so if we run the Macro and the remaining duplicates would only
be the duplicates that had matching values excluding "O".

Here is the three rows of data, maybe could use another macro to run after
we run the testme macro
Name Lic Type AK CA CO CT DC
DE
JT IA M M O 1
V DIST
JT IA M M O 1
V DIST
JT RA O 0 M 0
0 0

We have not test the PL/PC yet, but plan on trying in next week.. Do you
have an email address we can correspond?
 
G

Guest

So I should run the macro twice , to get the final result below or do I need
to copy into another workbook and then run for second time ?
 
D

Dave Peterson

I ran the macro once and the 3 lines collapsed to 1.

But until you're happy with the results, save your work first--or use windows
explorer to copy your workbook to a test workbook first.
So I should run the macro twice , to get the final result below or do I need
to copy into another workbook and then run for second time ?
 
G

Guest

Dave, It worked this time, I needed to place the name in column A. In the
original worksheet name was in column D. I am guessing that is why my
"triplicates" were not consolidating.

As I get deeper into this, I have a few more questions.

I see you set the macro up to determine if cell value is = "O" look at other
value if duplicated and to pull value in if different. What if the name was
duplicated and for example value is "M" and the other is"V" is there a way to
flag these types of entries. Maybe a macro that could have conditional
formatting, such as shading the cell and not consolidating the rows. When
these types of records consolidate, the macro made its own determination what
took precedence, can these not be consolidated? I need to catch these types
of scenarios

Would you be able to breakdown what each line in the module stands for?
 
D

Dave Peterson

You could even have it so that if there were two different values in rows that
had duplicate keys that the merged cell would have both values:

M and V (on different rows) would combine to be M,V in one cell.

Personally, I don't like working with colors--it may look nice, but there's not
much you can do with them just using regular worksheet formulas. I'd rather use
a cell (or multiple cells) to do the flagging. Then you can filter on them,
sort by them, and just use them any old way you want.

This was the original version...

Option Explicit
Sub testme()

'declare some variables
'wks is the worksheet that holds the data. If the worksheet name
'changes, then just fix it one spot and you're done.
Dim wks As Worksheet

'a variable that will be used to loop through the rows in that
'worksheet (starting from the bottom and going up.)
Dim iRow As Long

'a variable to loop through the columns
Dim iCol As Long

'just a variable that represents the first row with real data
'Change it once if you have different number of header rows
Dim FirstRow As Long

'the lastrow number on the worksheet
Dim LastRow As Long

'again, a variable to represent the number of columns to check
'change it in one spot if/when it needs to be changed.
Dim maxColsToCheck As Long

'here's that spot!
maxColsToCheck = 50

'fix the worksheet here and forget about it.
Set wks = Worksheets("sheet1")

'by using the with/end with structure, you don't need to type the fully
'qualified object.
With wks
FirstRow = 2 'headers in row 1???

'for instance, this could have been:
'LastRow = wks.Cells(wks.Rows.Count, "A").End(xlUp).Row
'instead of just:
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
'last row is determined by looking at A65536 and hitting
'End|then the up arrow key. Excel stops on that last used cell

'start at the bottom and work to the top
'that's why we have a negative number in the step portion
For iRow = LastRow To FirstRow + 1 Step -1

'say we're on row 25, then this line says
'to look at A25 and compare it with A24 (irow-1, in column
A)
If .Cells(iRow, "A").Value <> .Cells(iRow - 1, "A").Value Then
'if they're different (that's what <> means), then
'don't do anything (the next line is a comment to the reader
'do nothing
Else
'but if A25 = A24, then
'start at column 2 and go all the way to column 50
For iCol = 2 To maxColsToCheck
'and check to see if the previous row = "0"
If UCase(.Cells(iRow - 1, iCol).Value) = "O" Then
'if it is a "o", then
'plop the next row onto the previous row
'E24 is replaced with E25 (say)
.Cells(iRow - 1, iCol).Value = .Cells(iRow, iCol).Value
End If
Next iCol
'after you're done with comparing columns 2-50, delete the row.
'delete that duplicate
.Rows(iRow).Delete
End If
Next iRow
End With

End Sub

The modifications just check to see if either value was PL or PC (if I recall
correctly), then do the work.



Dave, It worked this time, I needed to place the name in column A. In the
original worksheet name was in column D. I am guessing that is why my
"triplicates" were not consolidating.

As I get deeper into this, I have a few more questions.

I see you set the macro up to determine if cell value is = "O" look at other
value if duplicated and to pull value in if different. What if the name was
duplicated and for example value is "M" and the other is"V" is there a way to
flag these types of entries. Maybe a macro that could have conditional
formatting, such as shading the cell and not consolidating the rows. When
these types of records consolidate, the macro made its own determination what
took precedence, can these not be consolidated? I need to catch these types
of scenarios

Would you be able to breakdown what each line in the module stands for?
<<snipped>>
 
G

Guest

We would need some type of indicater that will flag that there is two
different values in one cell that had a duplicate key. Coloring the cell or
something. The merging of the two in one cell can be easily missed by others.
Is there a way we can do this? Unfortunately we do not have a set precedence
for which value is more important so we will need to be flagged for additonal
cleanup in other databases that the information came from
 
D

Dave Peterson

Yeah, it could be done, but I'm kind of confused over what the current rules
are.

Can you restate them?
We would need some type of indicater that will flag that there is two
different values in one cell that had a duplicate key. Coloring the cell or
something. The merging of the two in one cell can be easily missed by others.
Is there a way we can do this? Unfortunately we do not have a set precedence
for which value is more important so we will need to be flagged for additonal
cleanup in other databases that the information came from
<<snipped>>
 
G

Guest

If a row has a duplicate name and values have two different values to merge
the two and indicate a color in that cell, if "O" than take in other value
 
D

Dave Peterson

Try this against a copy--just in case. It worked ok in minor testing.

Option Explicit
Sub testme()

Dim wks As Worksheet
Dim iRow As Long
Dim iCol As Long

Dim FirstRow As Long
Dim LastRow As Long
Dim maxColsToCheck As Long

Dim myColorIndex As Long

myColorIndex = 3 'red

maxColsToCheck = 50

Set wks = Worksheets("sheet1")

With wks
FirstRow = 2 'headers in row 1???
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row

For iRow = LastRow To FirstRow + 1 Step -1
If .Cells(iRow, "A").Value <> .Cells(iRow - 1, "A").Value Then
'do nothing
Else
For iCol = 2 To maxColsToCheck
With .Cells(iRow - 1, iCol)
If UCase(.Value) = "O" Then
.Value = .Parent.Cells(iRow, iCol).Value
.Interior.ColorIndex _
= .Parent.Cells(iRow, iCol).Interior.ColorIndex
Else
If InStr(1, .Parent.Cells(iRow, iCol).Value, _
.Value, vbTextCompare) > 0 Then
'do nothing, string already there
Else
.Value = .Value & "," _
& .Parent.Cells(iRow, iCol).Value
.Interior.ColorIndex = myColorIndex

End If
End If
End With
Next iCol
'delete that duplicate
.Rows(iRow).Delete
End If
Next iRow
End With

End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top