PC Review


Reply
Thread Tools Rate Thread

Comparing Rows

 
 
Monomeeth
Guest
Posts: n/a
 
      6th Oct 2009
Hello

I need two macros to do as follows:

ONE
I need a macro which will check every row in a worksheet and if it finds two
or more duplicate rows (i.e. every cell in the row is duplicated), it will
then proceed to delete all duplicate rows except the first duplicate row.

TWO
I need a macro which will check every row in a worksheet and if it finds two
or more rows with the exact same value in a particular column, it will then
proceed to delete all duplicate rows except the first duplicate row. I would
prefer the user to have the ability to specify the column they want
interrogated.

Any suggestions would be greatly appreciated!

Thanks,

Joe.

--
If you can measure it, you can improve it!
 
Reply With Quote
 
 
 
 
Joel
Guest
Posts: n/a
 
      6th Oct 2009
I have Few questions

1) Is it acceptable to sort the worksheet in the macro?

2) For macro ONE what is the last column?

3) Will the last column always be the same when you run the macro?

4) Does every Row have data filed to the last column.

What I propose for macro ONE is to sort every column of the worksheet
starting with the last column and moving towards the first column. Then
comparing adjacent rows for matches.

"Monomeeth" wrote:

> Hello
>
> I need two macros to do as follows:
>
> ONE
> I need a macro which will check every row in a worksheet and if it finds two
> or more duplicate rows (i.e. every cell in the row is duplicated), it will
> then proceed to delete all duplicate rows except the first duplicate row.
>
> TWO
> I need a macro which will check every row in a worksheet and if it finds two
> or more rows with the exact same value in a particular column, it will then
> proceed to delete all duplicate rows except the first duplicate row. I would
> prefer the user to have the ability to specify the column they want
> interrogated.
>
> Any suggestions would be greatly appreciated!
>
> Thanks,
>
> Joe.
>
> --
> If you can measure it, you can improve it!

 
Reply With Quote
 
Monomeeth
Guest
Posts: n/a
 
      7th Oct 2009
Hi Joel

Thanks very much for your help. My answers are as follows:

1) Is it acceptable to sort the worksheet in the macro?

Yes.

2) For macro ONE what is the last column?

This will be different as I want to be able to run the macro on various
spreadsheets. Although I suppose you could set the last column as IV (using
Excel 2003) and therefore it wouldn't matter as some cells may be blank.

3) Will the last column always be the same when you run the macro?

No. See answer to 2 above.

4) Does every Row have data filed to the last column.

Not necessarily. Some cells could be blank.


At the risk of providing too much info, here is some context for you:

BACKGROUND
Basically I am trying to remove duplicate rows from worksheets which import
data from various online forms submitted by users. Unfortunately, sometimes
users submit the exact same data multiple times (for whatever reason) and
this causes problems when we need to do an analysis of the collated data. So
I want to be able to remove duplicate rows where every cell in the row is
exactly the same as every corresponding cell within another row, and some of
these cells could be blank.

As for Macro TWO, I need this for another reason altogether. Last column is
not constant and yes there may be blank cells within the row.

JUST A THOUGHT
I suppose you could have the one macro do both jobs by designing it so that
the user can specify a range to interrogate - the user could select a single
column or a group of columns - but I don't know whether that would complicate
things as I don't want the user to specify WHAT the macro is actually looking
for, only WHERE to look for duplicates.

EXAMPLE

I may have a worksheet with a data range of A1:AG25000. Some cells within
this range may be blank for whatever reason. Now, I may want to remove any
duplicate rows where data in columns A to AG are an exact match for the
corresponding cells.

That is, A123 may be exactly the same as A237 and A767, B123 may be exactly
the same as B237 and B767, and so on for every column in those rows,
regardless of whether the match happens to contain data or be blank - hence
why you could go to column IV to do your interrogation. In this scenario I
want to only be left with unique rows after the macro is finished, so only
row 123 would remain while rows 237 and 767 are deleted.

Now, I may have another spreadsheet where I want to achieve the same thing,
but only want to check a single column for duplicates. For instance, I may
have 25000 rows in a spreadsheet containing employee data, but there may only
be 18000 employees in total, so I should only have 18000 rows not 25000. So,
if Column E contained a unique identifier such as employee numbers, then I
would want the macro to only interrogate that column and remove rows based
only on the duplicated data in row E regardless of whether the other cells in
the row were duplicated. In this scenario I only want to keep the first row
conatining that unique identifier.

Hope this makes sense and that I haven't confused you!



Once again, I appreciate your help and time with this.

Joe.
--
If you can measure it, you can improve it!


"Joel" wrote:

> I have Few questions
>
> 1) Is it acceptable to sort the worksheet in the macro?
>
> 2) For macro ONE what is the last column?
>
> 3) Will the last column always be the same when you run the macro?
>
> 4) Does every Row have data filed to the last column.
>
> What I propose for macro ONE is to sort every column of the worksheet
> starting with the last column and moving towards the first column. Then
> comparing adjacent rows for matches.
>
> "Monomeeth" wrote:
>
> > Hello
> >
> > I need two macros to do as follows:
> >
> > ONE
> > I need a macro which will check every row in a worksheet and if it finds two
> > or more duplicate rows (i.e. every cell in the row is duplicated), it will
> > then proceed to delete all duplicate rows except the first duplicate row.
> >
> > TWO
> > I need a macro which will check every row in a worksheet and if it finds two
> > or more rows with the exact same value in a particular column, it will then
> > proceed to delete all duplicate rows except the first duplicate row. I would
> > prefer the user to have the ability to specify the column they want
> > interrogated.
> >
> > Any suggestions would be greatly appreciated!
> >
> > Thanks,
> >
> > Joe.
> >
> > --
> > If you can measure it, you can improve it!

 
Reply With Quote
 
Joel
Guest
Posts: n/a
 
      7th Oct 2009
The code assumes there is a header row in both cases. I asked the questions
becasue I want to keep the code as simple as possible and to make it run
quicker. Becasue the columns lengths varied I had to check every row for the
last column.

To make the code run faster instead of deleting rows one at a time. I
placed an X in column IV for rows that needed to be deleted. Then filtered
the X's using Autofilter and removed the rows with the X's. See comments in
the code.

Sub RemoveDuplicatesAllColumns()

'clear columnn IV incase data ther is data from last run
Columns("IV").Delete

'find last row
'use column A to determine last row
LastRow = Range("A" & Rows.Count).End(xlUp).Row

'find last column
'check every row to determine last column
EndColumn = 0
For RowCount = 1 To LastRow
LastColumn = Cells(RowCount).End(xlUp).Column
If LastColumn > EndColumn Then
EndColumn = LastColumn
End If
Next RowCount


'sort cells three columns at a time
For ColCount = 1 To EndColumn Step 3
Rows("1:" & LastRow).Sort _
header:=xlYes, _
key1:=Cells(1, ColCount), _
order1:=xlAscending, _
key2:=Cells(1, ColCount + 1), _
order2:=xlAscending, _
key3:=Cells(1, ColCount + 1), _
order3:=xlAscending

Next ColCount

'compare rows putting a X in column IV where duplicates exist
RowCount = 2
Match = False 'use to indicate that last compared rows
'either matched or didn't match
Do While RowCount < LastRow
If Match = False Then
CompareRow = RowCount + 1
Else
CompareRow = CompareRow + 1
End If
'check if column lengths are equal if not skip
LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column
LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column
Match = True
If LastCol = LastCompareCol Then
For ColCount = 1 To LastCol
If Cells(RowCount, ColCount) <> _
Cells(CompareRow, ColCount) Then

Match = False
Exit For
End If
Next ColCount
If Match = True Then
Range("IV" & CompareRow) = "X"
End If
Else
Match = False
RowCount = RowCount + 1
End If
Loop


'remove rows with X's in column IV
Set c = Columns("IV").Find(what:="X", _
LookIn:=xlValues, lookat:=xlWhole)
'only filer if at least one X is found
If Not c Is Nothing Then
Columns("IV").AutoFilter Field:=1, Criteria1:="X"
Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
Columns("IV").Delete
End If

End Sub

Sub RemoveDuplicatesOneColumn()

'clear columnn IV incase data ther is data from last run
Columns("IV").Delete

'select column
Set Selected = Application.InputBox(Prompt:="Select one column", _
Title:="Select one column", Type:=8)

SelectCol = Selected.Column

'find last row
'use column A to determine last row
LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row


'sort by column selected
Rows("1:" & LastRow).Sort _
header:=xlYes, _
key1:=Cells(1, SelectCol), _
order1:=xlAscending

'compare rows putting a X in column IV where duplicates exist
RowCount = 2
Match = False 'use to indicate that last compared rows
'either matched or didn't match
Do While RowCount < LastRow
If Match = False Then
CompareRow = RowCount + 1
Else
CompareRow = CompareRow + 1
End If
If Cells(RowCount, SelectCol) = _
Cells(CompareRow, SelectCol) Then

Range("IV" & CompareRow) = "X"
Match = True
Else
Match = False
RowCount = RowCount + 1
End If
Loop

'remove rows with X's in column IV
Set c = Columns("IV").Find(what:="X", _
LookIn:=xlValues, lookat:=xlWhole)
'only filer if at least one X is found
If Not c Is Nothing Then
Columns("IV").AutoFilter Field:=1, Criteria1:="X"
Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
Columns("IV").Delete
End If

End Sub




"Monomeeth" wrote:

> Hi Joel
>
> Thanks very much for your help. My answers are as follows:
>
> 1) Is it acceptable to sort the worksheet in the macro?
>
> Yes.
>
> 2) For macro ONE what is the last column?
>
> This will be different as I want to be able to run the macro on various
> spreadsheets. Although I suppose you could set the last column as IV (using
> Excel 2003) and therefore it wouldn't matter as some cells may be blank.
>
> 3) Will the last column always be the same when you run the macro?
>
> No. See answer to 2 above.
>
> 4) Does every Row have data filed to the last column.
>
> Not necessarily. Some cells could be blank.
>
>
> At the risk of providing too much info, here is some context for you:
>
> BACKGROUND
> Basically I am trying to remove duplicate rows from worksheets which import
> data from various online forms submitted by users. Unfortunately, sometimes
> users submit the exact same data multiple times (for whatever reason) and
> this causes problems when we need to do an analysis of the collated data. So
> I want to be able to remove duplicate rows where every cell in the row is
> exactly the same as every corresponding cell within another row, and some of
> these cells could be blank.
>
> As for Macro TWO, I need this for another reason altogether. Last column is
> not constant and yes there may be blank cells within the row.
>
> JUST A THOUGHT
> I suppose you could have the one macro do both jobs by designing it so that
> the user can specify a range to interrogate - the user could select a single
> column or a group of columns - but I don't know whether that would complicate
> things as I don't want the user to specify WHAT the macro is actually looking
> for, only WHERE to look for duplicates.
>
> EXAMPLE
>
> I may have a worksheet with a data range of A1:AG25000. Some cells within
> this range may be blank for whatever reason. Now, I may want to remove any
> duplicate rows where data in columns A to AG are an exact match for the
> corresponding cells.
>
> That is, A123 may be exactly the same as A237 and A767, B123 may be exactly
> the same as B237 and B767, and so on for every column in those rows,
> regardless of whether the match happens to contain data or be blank - hence
> why you could go to column IV to do your interrogation. In this scenario I
> want to only be left with unique rows after the macro is finished, so only
> row 123 would remain while rows 237 and 767 are deleted.
>
> Now, I may have another spreadsheet where I want to achieve the same thing,
> but only want to check a single column for duplicates. For instance, I may
> have 25000 rows in a spreadsheet containing employee data, but there may only
> be 18000 employees in total, so I should only have 18000 rows not 25000. So,
> if Column E contained a unique identifier such as employee numbers, then I
> would want the macro to only interrogate that column and remove rows based
> only on the duplicated data in row E regardless of whether the other cells in
> the row were duplicated. In this scenario I only want to keep the first row
> conatining that unique identifier.
>
> Hope this makes sense and that I haven't confused you!
>
>
>
> Once again, I appreciate your help and time with this.
>
> Joe.
> --
> If you can measure it, you can improve it!
>
>
> "Joel" wrote:
>
> > I have Few questions
> >
> > 1) Is it acceptable to sort the worksheet in the macro?
> >
> > 2) For macro ONE what is the last column?
> >
> > 3) Will the last column always be the same when you run the macro?
> >
> > 4) Does every Row have data filed to the last column.
> >
> > What I propose for macro ONE is to sort every column of the worksheet
> > starting with the last column and moving towards the first column. Then
> > comparing adjacent rows for matches.
> >
> > "Monomeeth" wrote:
> >
> > > Hello
> > >
> > > I need two macros to do as follows:
> > >
> > > ONE
> > > I need a macro which will check every row in a worksheet and if it finds two
> > > or more duplicate rows (i.e. every cell in the row is duplicated), it will
> > > then proceed to delete all duplicate rows except the first duplicate row.
> > >
> > > TWO
> > > I need a macro which will check every row in a worksheet and if it finds two
> > > or more rows with the exact same value in a particular column, it will then
> > > proceed to delete all duplicate rows except the first duplicate row. I would
> > > prefer the user to have the ability to specify the column they want
> > > interrogated.
> > >
> > > Any suggestions would be greatly appreciated!
> > >
> > > Thanks,
> > >
> > > Joe.
> > >
> > > --
> > > If you can measure it, you can improve it!

 
Reply With Quote
 
Monomeeth
Guest
Posts: n/a
 
      8th Oct 2009
Hi Joel

Thank you very much for yoiur continued help. I really appreciate it. I also
like the way you have chosen to approach this.

I decided to test both your macros by doing a straight copy and paste as is.
I chose a worksheet which has data in the range A1:AL29420. The results were
as follows:

REMOVE DUPLICATES ALL COLUMNS
Macro seemed to run fine as it was obviously doing something. However, after
a few minutes I got a strange error message from Visual Basic. All it said
was "400". Nothing else, the only options I had was an OK button and a Help
button, but when I clicked on Help nothing seemed to happen.

REMOVE DUPLICATES ONE COLUMN
Macro started fine - I was able to select the column and get it running.
However, I waited for an hour and the Macro seemed to still be running. There
were no error messages, but Excel said it was not responding, while Visual
Basic Editor said it was still running.

Sorry, I can't give any further clues.

Now that I'm thinking about it, I will try these macros on much smaller
worksheets. I will post back shortly with the results.

Thanks Joel.


--
If you can measure it, you can improve it!





"Joel" wrote:

> The code assumes there is a header row in both cases. I asked the questions
> becasue I want to keep the code as simple as possible and to make it run
> quicker. Becasue the columns lengths varied I had to check every row for the
> last column.
>
> To make the code run faster instead of deleting rows one at a time. I
> placed an X in column IV for rows that needed to be deleted. Then filtered
> the X's using Autofilter and removed the rows with the X's. See comments in
> the code.
>
> Sub RemoveDuplicatesAllColumns()
>
> 'clear columnn IV incase data ther is data from last run
> Columns("IV").Delete
>
> 'find last row
> 'use column A to determine last row
> LastRow = Range("A" & Rows.Count).End(xlUp).Row
>
> 'find last column
> 'check every row to determine last column
> EndColumn = 0
> For RowCount = 1 To LastRow
> LastColumn = Cells(RowCount).End(xlUp).Column
> If LastColumn > EndColumn Then
> EndColumn = LastColumn
> End If
> Next RowCount
>
>
> 'sort cells three columns at a time
> For ColCount = 1 To EndColumn Step 3
> Rows("1:" & LastRow).Sort _
> header:=xlYes, _
> key1:=Cells(1, ColCount), _
> order1:=xlAscending, _
> key2:=Cells(1, ColCount + 1), _
> order2:=xlAscending, _
> key3:=Cells(1, ColCount + 1), _
> order3:=xlAscending
>
> Next ColCount
>
> 'compare rows putting a X in column IV where duplicates exist
> RowCount = 2
> Match = False 'use to indicate that last compared rows
> 'either matched or didn't match
> Do While RowCount < LastRow
> If Match = False Then
> CompareRow = RowCount + 1
> Else
> CompareRow = CompareRow + 1
> End If
> 'check if column lengths are equal if not skip
> LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column
> LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column
> Match = True
> If LastCol = LastCompareCol Then
> For ColCount = 1 To LastCol
> If Cells(RowCount, ColCount) <> _
> Cells(CompareRow, ColCount) Then
>
> Match = False
> Exit For
> End If
> Next ColCount
> If Match = True Then
> Range("IV" & CompareRow) = "X"
> End If
> Else
> Match = False
> RowCount = RowCount + 1
> End If
> Loop
>
>
> 'remove rows with X's in column IV
> Set c = Columns("IV").Find(what:="X", _
> LookIn:=xlValues, lookat:=xlWhole)
> 'only filer if at least one X is found
> If Not c Is Nothing Then
> Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> Columns("IV").Delete
> End If
>
> End Sub
>
> Sub RemoveDuplicatesOneColumn()
>
> 'clear columnn IV incase data ther is data from last run
> Columns("IV").Delete
>
> 'select column
> Set Selected = Application.InputBox(Prompt:="Select one column", _
> Title:="Select one column", Type:=8)
>
> SelectCol = Selected.Column
>
> 'find last row
> 'use column A to determine last row
> LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row
>
>
> 'sort by column selected
> Rows("1:" & LastRow).Sort _
> header:=xlYes, _
> key1:=Cells(1, SelectCol), _
> order1:=xlAscending
>
> 'compare rows putting a X in column IV where duplicates exist
> RowCount = 2
> Match = False 'use to indicate that last compared rows
> 'either matched or didn't match
> Do While RowCount < LastRow
> If Match = False Then
> CompareRow = RowCount + 1
> Else
> CompareRow = CompareRow + 1
> End If
> If Cells(RowCount, SelectCol) = _
> Cells(CompareRow, SelectCol) Then
>
> Range("IV" & CompareRow) = "X"
> Match = True
> Else
> Match = False
> RowCount = RowCount + 1
> End If
> Loop
>
> 'remove rows with X's in column IV
> Set c = Columns("IV").Find(what:="X", _
> LookIn:=xlValues, lookat:=xlWhole)
> 'only filer if at least one X is found
> If Not c Is Nothing Then
> Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> Columns("IV").Delete
> End If
>
> End Sub
>
>
>
>
> "Monomeeth" wrote:
>
> > Hi Joel
> >
> > Thanks very much for your help. My answers are as follows:
> >
> > 1) Is it acceptable to sort the worksheet in the macro?
> >
> > Yes.
> >
> > 2) For macro ONE what is the last column?
> >
> > This will be different as I want to be able to run the macro on various
> > spreadsheets. Although I suppose you could set the last column as IV (using
> > Excel 2003) and therefore it wouldn't matter as some cells may be blank.
> >
> > 3) Will the last column always be the same when you run the macro?
> >
> > No. See answer to 2 above.
> >
> > 4) Does every Row have data filed to the last column.
> >
> > Not necessarily. Some cells could be blank.
> >
> >
> > At the risk of providing too much info, here is some context for you:
> >
> > BACKGROUND
> > Basically I am trying to remove duplicate rows from worksheets which import
> > data from various online forms submitted by users. Unfortunately, sometimes
> > users submit the exact same data multiple times (for whatever reason) and
> > this causes problems when we need to do an analysis of the collated data. So
> > I want to be able to remove duplicate rows where every cell in the row is
> > exactly the same as every corresponding cell within another row, and some of
> > these cells could be blank.
> >
> > As for Macro TWO, I need this for another reason altogether. Last column is
> > not constant and yes there may be blank cells within the row.
> >
> > JUST A THOUGHT
> > I suppose you could have the one macro do both jobs by designing it so that
> > the user can specify a range to interrogate - the user could select a single
> > column or a group of columns - but I don't know whether that would complicate
> > things as I don't want the user to specify WHAT the macro is actually looking
> > for, only WHERE to look for duplicates.
> >
> > EXAMPLE
> >
> > I may have a worksheet with a data range of A1:AG25000. Some cells within
> > this range may be blank for whatever reason. Now, I may want to remove any
> > duplicate rows where data in columns A to AG are an exact match for the
> > corresponding cells.
> >
> > That is, A123 may be exactly the same as A237 and A767, B123 may be exactly
> > the same as B237 and B767, and so on for every column in those rows,
> > regardless of whether the match happens to contain data or be blank - hence
> > why you could go to column IV to do your interrogation. In this scenario I
> > want to only be left with unique rows after the macro is finished, so only
> > row 123 would remain while rows 237 and 767 are deleted.
> >
> > Now, I may have another spreadsheet where I want to achieve the same thing,
> > but only want to check a single column for duplicates. For instance, I may
> > have 25000 rows in a spreadsheet containing employee data, but there may only
> > be 18000 employees in total, so I should only have 18000 rows not 25000. So,
> > if Column E contained a unique identifier such as employee numbers, then I
> > would want the macro to only interrogate that column and remove rows based
> > only on the duplicated data in row E regardless of whether the other cells in
> > the row were duplicated. In this scenario I only want to keep the first row
> > conatining that unique identifier.
> >
> > Hope this makes sense and that I haven't confused you!
> >
> >
> >
> > Once again, I appreciate your help and time with this.
> >
> > Joe.
> > --
> > If you can measure it, you can improve it!
> >
> >
> > "Joel" wrote:
> >
> > > I have Few questions
> > >
> > > 1) Is it acceptable to sort the worksheet in the macro?
> > >
> > > 2) For macro ONE what is the last column?
> > >
> > > 3) Will the last column always be the same when you run the macro?
> > >
> > > 4) Does every Row have data filed to the last column.
> > >
> > > What I propose for macro ONE is to sort every column of the worksheet
> > > starting with the last column and moving towards the first column. Then
> > > comparing adjacent rows for matches.
> > >
> > > "Monomeeth" wrote:
> > >
> > > > Hello
> > > >
> > > > I need two macros to do as follows:
> > > >
> > > > ONE
> > > > I need a macro which will check every row in a worksheet and if it finds two
> > > > or more duplicate rows (i.e. every cell in the row is duplicated), it will
> > > > then proceed to delete all duplicate rows except the first duplicate row.
> > > >
> > > > TWO
> > > > I need a macro which will check every row in a worksheet and if it finds two
> > > > or more rows with the exact same value in a particular column, it will then
> > > > proceed to delete all duplicate rows except the first duplicate row. I would
> > > > prefer the user to have the ability to specify the column they want
> > > > interrogated.
> > > >
> > > > Any suggestions would be greatly appreciated!
> > > >
> > > > Thanks,
> > > >
> > > > Joe.
> > > >
> > > > --
> > > > If you can measure it, you can improve it!

 
Reply With Quote
 
Monomeeth
Guest
Posts: n/a
 
      8th Oct 2009
Hi Joel

Okay, retested the macro on a test worksheet I created. This worksheet had
data in the range A1:AC543. Oh, and yes, the first row was a header row.

Results are as follows:

REMOVE DUPLICATES ALL COLUMNS
Okay, same problem as before. It didn't take long to get the Visual Basic
error message "400". When I click on Help all I get is a blank white window.
I did notice, however, that the worksheet had been sorted - so the error
occurs sometime after the sort is done. However, the duplicate rows had not
beed deleted.

I did try to find out what the error 400 was about, but all I found was some
info relating to showing forms which are already visible - but in this case
we there is no form.

Anyway, hope this helps to narrow the proble down.

REMOVE DUPLICATES ONE COLUMN
Success! This worked. I deliberately designed the test worksheet to have
duplicate rows - both in terms of every column and in terms of a single
column. The macro went through fairly quickly and everything was A-OK.

Thanks for this Joel - I will just have to bear in mind it may not work on
larger worksheets - or take care to run it when the computer isn't doing
anything else in case it's a memory issue.

Now all I have to do is sort out the first Macro.

Thanks again for your help!



--
If you can measure it, you can improve it!


"Monomeeth" wrote:

> Hi Joel
>
> Thank you very much for yoiur continued help. I really appreciate it. I also
> like the way you have chosen to approach this.
>
> I decided to test both your macros by doing a straight copy and paste as is.
> I chose a worksheet which has data in the range A1:AL29420. The results were
> as follows:
>
> REMOVE DUPLICATES ALL COLUMNS
> Macro seemed to run fine as it was obviously doing something. However, after
> a few minutes I got a strange error message from Visual Basic. All it said
> was "400". Nothing else, the only options I had was an OK button and a Help
> button, but when I clicked on Help nothing seemed to happen.
>
> REMOVE DUPLICATES ONE COLUMN
> Macro started fine - I was able to select the column and get it running.
> However, I waited for an hour and the Macro seemed to still be running. There
> were no error messages, but Excel said it was not responding, while Visual
> Basic Editor said it was still running.
>
> Sorry, I can't give any further clues.
>
> Now that I'm thinking about it, I will try these macros on much smaller
> worksheets. I will post back shortly with the results.
>
> Thanks Joel.
>
>
> --
> If you can measure it, you can improve it!
>
>
>
>
>
> "Joel" wrote:
>
> > The code assumes there is a header row in both cases. I asked the questions
> > becasue I want to keep the code as simple as possible and to make it run
> > quicker. Becasue the columns lengths varied I had to check every row for the
> > last column.
> >
> > To make the code run faster instead of deleting rows one at a time. I
> > placed an X in column IV for rows that needed to be deleted. Then filtered
> > the X's using Autofilter and removed the rows with the X's. See comments in
> > the code.
> >
> > Sub RemoveDuplicatesAllColumns()
> >
> > 'clear columnn IV incase data ther is data from last run
> > Columns("IV").Delete
> >
> > 'find last row
> > 'use column A to determine last row
> > LastRow = Range("A" & Rows.Count).End(xlUp).Row
> >
> > 'find last column
> > 'check every row to determine last column
> > EndColumn = 0
> > For RowCount = 1 To LastRow
> > LastColumn = Cells(RowCount).End(xlUp).Column
> > If LastColumn > EndColumn Then
> > EndColumn = LastColumn
> > End If
> > Next RowCount
> >
> >
> > 'sort cells three columns at a time
> > For ColCount = 1 To EndColumn Step 3
> > Rows("1:" & LastRow).Sort _
> > header:=xlYes, _
> > key1:=Cells(1, ColCount), _
> > order1:=xlAscending, _
> > key2:=Cells(1, ColCount + 1), _
> > order2:=xlAscending, _
> > key3:=Cells(1, ColCount + 1), _
> > order3:=xlAscending
> >
> > Next ColCount
> >
> > 'compare rows putting a X in column IV where duplicates exist
> > RowCount = 2
> > Match = False 'use to indicate that last compared rows
> > 'either matched or didn't match
> > Do While RowCount < LastRow
> > If Match = False Then
> > CompareRow = RowCount + 1
> > Else
> > CompareRow = CompareRow + 1
> > End If
> > 'check if column lengths are equal if not skip
> > LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column
> > LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column
> > Match = True
> > If LastCol = LastCompareCol Then
> > For ColCount = 1 To LastCol
> > If Cells(RowCount, ColCount) <> _
> > Cells(CompareRow, ColCount) Then
> >
> > Match = False
> > Exit For
> > End If
> > Next ColCount
> > If Match = True Then
> > Range("IV" & CompareRow) = "X"
> > End If
> > Else
> > Match = False
> > RowCount = RowCount + 1
> > End If
> > Loop
> >
> >
> > 'remove rows with X's in column IV
> > Set c = Columns("IV").Find(what:="X", _
> > LookIn:=xlValues, lookat:=xlWhole)
> > 'only filer if at least one X is found
> > If Not c Is Nothing Then
> > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > Columns("IV").Delete
> > End If
> >
> > End Sub
> >
> > Sub RemoveDuplicatesOneColumn()
> >
> > 'clear columnn IV incase data ther is data from last run
> > Columns("IV").Delete
> >
> > 'select column
> > Set Selected = Application.InputBox(Prompt:="Select one column", _
> > Title:="Select one column", Type:=8)
> >
> > SelectCol = Selected.Column
> >
> > 'find last row
> > 'use column A to determine last row
> > LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row
> >
> >
> > 'sort by column selected
> > Rows("1:" & LastRow).Sort _
> > header:=xlYes, _
> > key1:=Cells(1, SelectCol), _
> > order1:=xlAscending
> >
> > 'compare rows putting a X in column IV where duplicates exist
> > RowCount = 2
> > Match = False 'use to indicate that last compared rows
> > 'either matched or didn't match
> > Do While RowCount < LastRow
> > If Match = False Then
> > CompareRow = RowCount + 1
> > Else
> > CompareRow = CompareRow + 1
> > End If
> > If Cells(RowCount, SelectCol) = _
> > Cells(CompareRow, SelectCol) Then
> >
> > Range("IV" & CompareRow) = "X"
> > Match = True
> > Else
> > Match = False
> > RowCount = RowCount + 1
> > End If
> > Loop
> >
> > 'remove rows with X's in column IV
> > Set c = Columns("IV").Find(what:="X", _
> > LookIn:=xlValues, lookat:=xlWhole)
> > 'only filer if at least one X is found
> > If Not c Is Nothing Then
> > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > Columns("IV").Delete
> > End If
> >
> > End Sub
> >
> >
> >
> >
> > "Monomeeth" wrote:
> >
> > > Hi Joel
> > >
> > > Thanks very much for your help. My answers are as follows:
> > >
> > > 1) Is it acceptable to sort the worksheet in the macro?
> > >
> > > Yes.
> > >
> > > 2) For macro ONE what is the last column?
> > >
> > > This will be different as I want to be able to run the macro on various
> > > spreadsheets. Although I suppose you could set the last column as IV (using
> > > Excel 2003) and therefore it wouldn't matter as some cells may be blank.
> > >
> > > 3) Will the last column always be the same when you run the macro?
> > >
> > > No. See answer to 2 above.
> > >
> > > 4) Does every Row have data filed to the last column.
> > >
> > > Not necessarily. Some cells could be blank.
> > >
> > >
> > > At the risk of providing too much info, here is some context for you:
> > >
> > > BACKGROUND
> > > Basically I am trying to remove duplicate rows from worksheets which import
> > > data from various online forms submitted by users. Unfortunately, sometimes
> > > users submit the exact same data multiple times (for whatever reason) and
> > > this causes problems when we need to do an analysis of the collated data. So
> > > I want to be able to remove duplicate rows where every cell in the row is
> > > exactly the same as every corresponding cell within another row, and some of
> > > these cells could be blank.
> > >
> > > As for Macro TWO, I need this for another reason altogether. Last column is
> > > not constant and yes there may be blank cells within the row.
> > >
> > > JUST A THOUGHT
> > > I suppose you could have the one macro do both jobs by designing it so that
> > > the user can specify a range to interrogate - the user could select a single
> > > column or a group of columns - but I don't know whether that would complicate
> > > things as I don't want the user to specify WHAT the macro is actually looking
> > > for, only WHERE to look for duplicates.
> > >
> > > EXAMPLE
> > >
> > > I may have a worksheet with a data range of A1:AG25000. Some cells within
> > > this range may be blank for whatever reason. Now, I may want to remove any
> > > duplicate rows where data in columns A to AG are an exact match for the
> > > corresponding cells.
> > >
> > > That is, A123 may be exactly the same as A237 and A767, B123 may be exactly
> > > the same as B237 and B767, and so on for every column in those rows,
> > > regardless of whether the match happens to contain data or be blank - hence
> > > why you could go to column IV to do your interrogation. In this scenario I
> > > want to only be left with unique rows after the macro is finished, so only
> > > row 123 would remain while rows 237 and 767 are deleted.
> > >
> > > Now, I may have another spreadsheet where I want to achieve the same thing,
> > > but only want to check a single column for duplicates. For instance, I may
> > > have 25000 rows in a spreadsheet containing employee data, but there may only
> > > be 18000 employees in total, so I should only have 18000 rows not 25000. So,
> > > if Column E contained a unique identifier such as employee numbers, then I
> > > would want the macro to only interrogate that column and remove rows based
> > > only on the duplicated data in row E regardless of whether the other cells in
> > > the row were duplicated. In this scenario I only want to keep the first row
> > > conatining that unique identifier.
> > >
> > > Hope this makes sense and that I haven't confused you!
> > >
> > >
> > >
> > > Once again, I appreciate your help and time with this.
> > >
> > > Joe.
> > > --
> > > If you can measure it, you can improve it!
> > >
> > >
> > > "Joel" wrote:
> > >
> > > > I have Few questions
> > > >
> > > > 1) Is it acceptable to sort the worksheet in the macro?
> > > >
> > > > 2) For macro ONE what is the last column?
> > > >
> > > > 3) Will the last column always be the same when you run the macro?
> > > >
> > > > 4) Does every Row have data filed to the last column.
> > > >
> > > > What I propose for macro ONE is to sort every column of the worksheet
> > > > starting with the last column and moving towards the first column. Then
> > > > comparing adjacent rows for matches.
> > > >
> > > > "Monomeeth" wrote:
> > > >
> > > > > Hello
> > > > >
> > > > > I need two macros to do as follows:
> > > > >
> > > > > ONE
> > > > > I need a macro which will check every row in a worksheet and if it finds two
> > > > > or more duplicate rows (i.e. every cell in the row is duplicated), it will
> > > > > then proceed to delete all duplicate rows except the first duplicate row.
> > > > >
> > > > > TWO
> > > > > I need a macro which will check every row in a worksheet and if it finds two
> > > > > or more rows with the exact same value in a particular column, it will then
> > > > > proceed to delete all duplicate rows except the first duplicate row. I would
> > > > > prefer the user to have the ability to specify the column they want
> > > > > interrogated.
> > > > >
> > > > > Any suggestions would be greatly appreciated!

 
Reply With Quote
 
Joel
Guest
Posts: n/a
 
      8th Oct 2009
I asked my original questions because I knew these programs could take a long
time to run. The sorting and putting the X'x in column IV probably increased
the speed of this program by an order of maginitude. Integer Variables
default to 16,536 when not declared. when you have a worksheet larger than
16,536 rows you must declare any variables that are used to count the rows as
long. try putting these statements at the top of the two macros

Dim LastRow as long
Dim RowCount as Long
Dim CompareRow as Long


"Monomeeth" wrote:

> Hi Joel
>
> Okay, retested the macro on a test worksheet I created. This worksheet had
> data in the range A1:AC543. Oh, and yes, the first row was a header row.
>
> Results are as follows:
>
> REMOVE DUPLICATES ALL COLUMNS
> Okay, same problem as before. It didn't take long to get the Visual Basic
> error message "400". When I click on Help all I get is a blank white window.
> I did notice, however, that the worksheet had been sorted - so the error
> occurs sometime after the sort is done. However, the duplicate rows had not
> beed deleted.
>
> I did try to find out what the error 400 was about, but all I found was some
> info relating to showing forms which are already visible - but in this case
> we there is no form.
>
> Anyway, hope this helps to narrow the proble down.
>
> REMOVE DUPLICATES ONE COLUMN
> Success! This worked. I deliberately designed the test worksheet to have
> duplicate rows - both in terms of every column and in terms of a single
> column. The macro went through fairly quickly and everything was A-OK.
>
> Thanks for this Joel - I will just have to bear in mind it may not work on
> larger worksheets - or take care to run it when the computer isn't doing
> anything else in case it's a memory issue.
>
> Now all I have to do is sort out the first Macro.
>
> Thanks again for your help!
>
>
>
> --
> If you can measure it, you can improve it!
>
>
> "Monomeeth" wrote:
>
> > Hi Joel
> >
> > Thank you very much for yoiur continued help. I really appreciate it. I also
> > like the way you have chosen to approach this.
> >
> > I decided to test both your macros by doing a straight copy and paste as is.
> > I chose a worksheet which has data in the range A1:AL29420. The results were
> > as follows:
> >
> > REMOVE DUPLICATES ALL COLUMNS
> > Macro seemed to run fine as it was obviously doing something. However, after
> > a few minutes I got a strange error message from Visual Basic. All it said
> > was "400". Nothing else, the only options I had was an OK button and a Help
> > button, but when I clicked on Help nothing seemed to happen.
> >
> > REMOVE DUPLICATES ONE COLUMN
> > Macro started fine - I was able to select the column and get it running.
> > However, I waited for an hour and the Macro seemed to still be running. There
> > were no error messages, but Excel said it was not responding, while Visual
> > Basic Editor said it was still running.
> >
> > Sorry, I can't give any further clues.
> >
> > Now that I'm thinking about it, I will try these macros on much smaller
> > worksheets. I will post back shortly with the results.
> >
> > Thanks Joel.
> >
> >
> > --
> > If you can measure it, you can improve it!
> >
> >
> >
> >
> >
> > "Joel" wrote:
> >
> > > The code assumes there is a header row in both cases. I asked the questions
> > > becasue I want to keep the code as simple as possible and to make it run
> > > quicker. Becasue the columns lengths varied I had to check every row for the
> > > last column.
> > >
> > > To make the code run faster instead of deleting rows one at a time. I
> > > placed an X in column IV for rows that needed to be deleted. Then filtered
> > > the X's using Autofilter and removed the rows with the X's. See comments in
> > > the code.
> > >
> > > Sub RemoveDuplicatesAllColumns()
> > >
> > > 'clear columnn IV incase data ther is data from last run
> > > Columns("IV").Delete
> > >
> > > 'find last row
> > > 'use column A to determine last row
> > > LastRow = Range("A" & Rows.Count).End(xlUp).Row
> > >
> > > 'find last column
> > > 'check every row to determine last column
> > > EndColumn = 0
> > > For RowCount = 1 To LastRow
> > > LastColumn = Cells(RowCount).End(xlUp).Column
> > > If LastColumn > EndColumn Then
> > > EndColumn = LastColumn
> > > End If
> > > Next RowCount
> > >
> > >
> > > 'sort cells three columns at a time
> > > For ColCount = 1 To EndColumn Step 3
> > > Rows("1:" & LastRow).Sort _
> > > header:=xlYes, _
> > > key1:=Cells(1, ColCount), _
> > > order1:=xlAscending, _
> > > key2:=Cells(1, ColCount + 1), _
> > > order2:=xlAscending, _
> > > key3:=Cells(1, ColCount + 1), _
> > > order3:=xlAscending
> > >
> > > Next ColCount
> > >
> > > 'compare rows putting a X in column IV where duplicates exist
> > > RowCount = 2
> > > Match = False 'use to indicate that last compared rows
> > > 'either matched or didn't match
> > > Do While RowCount < LastRow
> > > If Match = False Then
> > > CompareRow = RowCount + 1
> > > Else
> > > CompareRow = CompareRow + 1
> > > End If
> > > 'check if column lengths are equal if not skip
> > > LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column
> > > LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column
> > > Match = True
> > > If LastCol = LastCompareCol Then
> > > For ColCount = 1 To LastCol
> > > If Cells(RowCount, ColCount) <> _
> > > Cells(CompareRow, ColCount) Then
> > >
> > > Match = False
> > > Exit For
> > > End If
> > > Next ColCount
> > > If Match = True Then
> > > Range("IV" & CompareRow) = "X"
> > > End If
> > > Else
> > > Match = False
> > > RowCount = RowCount + 1
> > > End If
> > > Loop
> > >
> > >
> > > 'remove rows with X's in column IV
> > > Set c = Columns("IV").Find(what:="X", _
> > > LookIn:=xlValues, lookat:=xlWhole)
> > > 'only filer if at least one X is found
> > > If Not c Is Nothing Then
> > > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > > Columns("IV").Delete
> > > End If
> > >
> > > End Sub
> > >
> > > Sub RemoveDuplicatesOneColumn()
> > >
> > > 'clear columnn IV incase data ther is data from last run
> > > Columns("IV").Delete
> > >
> > > 'select column
> > > Set Selected = Application.InputBox(Prompt:="Select one column", _
> > > Title:="Select one column", Type:=8)
> > >
> > > SelectCol = Selected.Column
> > >
> > > 'find last row
> > > 'use column A to determine last row
> > > LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row
> > >
> > >
> > > 'sort by column selected
> > > Rows("1:" & LastRow).Sort _
> > > header:=xlYes, _
> > > key1:=Cells(1, SelectCol), _
> > > order1:=xlAscending
> > >
> > > 'compare rows putting a X in column IV where duplicates exist
> > > RowCount = 2
> > > Match = False 'use to indicate that last compared rows
> > > 'either matched or didn't match
> > > Do While RowCount < LastRow
> > > If Match = False Then
> > > CompareRow = RowCount + 1
> > > Else
> > > CompareRow = CompareRow + 1
> > > End If
> > > If Cells(RowCount, SelectCol) = _
> > > Cells(CompareRow, SelectCol) Then
> > >
> > > Range("IV" & CompareRow) = "X"
> > > Match = True
> > > Else
> > > Match = False
> > > RowCount = RowCount + 1
> > > End If
> > > Loop
> > >
> > > 'remove rows with X's in column IV
> > > Set c = Columns("IV").Find(what:="X", _
> > > LookIn:=xlValues, lookat:=xlWhole)
> > > 'only filer if at least one X is found
> > > If Not c Is Nothing Then
> > > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > > Columns("IV").Delete
> > > End If
> > >
> > > End Sub
> > >
> > >
> > >
> > >
> > > "Monomeeth" wrote:
> > >
> > > > Hi Joel
> > > >
> > > > Thanks very much for your help. My answers are as follows:
> > > >
> > > > 1) Is it acceptable to sort the worksheet in the macro?
> > > >
> > > > Yes.
> > > >
> > > > 2) For macro ONE what is the last column?
> > > >
> > > > This will be different as I want to be able to run the macro on various
> > > > spreadsheets. Although I suppose you could set the last column as IV (using
> > > > Excel 2003) and therefore it wouldn't matter as some cells may be blank.
> > > >
> > > > 3) Will the last column always be the same when you run the macro?
> > > >
> > > > No. See answer to 2 above.
> > > >
> > > > 4) Does every Row have data filed to the last column.
> > > >
> > > > Not necessarily. Some cells could be blank.
> > > >
> > > >
> > > > At the risk of providing too much info, here is some context for you:
> > > >
> > > > BACKGROUND
> > > > Basically I am trying to remove duplicate rows from worksheets which import
> > > > data from various online forms submitted by users. Unfortunately, sometimes
> > > > users submit the exact same data multiple times (for whatever reason) and
> > > > this causes problems when we need to do an analysis of the collated data. So
> > > > I want to be able to remove duplicate rows where every cell in the row is
> > > > exactly the same as every corresponding cell within another row, and some of
> > > > these cells could be blank.
> > > >
> > > > As for Macro TWO, I need this for another reason altogether. Last column is
> > > > not constant and yes there may be blank cells within the row.
> > > >
> > > > JUST A THOUGHT
> > > > I suppose you could have the one macro do both jobs by designing it so that
> > > > the user can specify a range to interrogate - the user could select a single
> > > > column or a group of columns - but I don't know whether that would complicate
> > > > things as I don't want the user to specify WHAT the macro is actually looking
> > > > for, only WHERE to look for duplicates.
> > > >
> > > > EXAMPLE
> > > >
> > > > I may have a worksheet with a data range of A1:AG25000. Some cells within
> > > > this range may be blank for whatever reason. Now, I may want to remove any
> > > > duplicate rows where data in columns A to AG are an exact match for the
> > > > corresponding cells.
> > > >
> > > > That is, A123 may be exactly the same as A237 and A767, B123 may be exactly
> > > > the same as B237 and B767, and so on for every column in those rows,
> > > > regardless of whether the match happens to contain data or be blank - hence
> > > > why you could go to column IV to do your interrogation. In this scenario I
> > > > want to only be left with unique rows after the macro is finished, so only
> > > > row 123 would remain while rows 237 and 767 are deleted.
> > > >
> > > > Now, I may have another spreadsheet where I want to achieve the same thing,
> > > > but only want to check a single column for duplicates. For instance, I may
> > > > have 25000 rows in a spreadsheet containing employee data, but there may only
> > > > be 18000 employees in total, so I should only have 18000 rows not 25000. So,
> > > > if Column E contained a unique identifier such as employee numbers, then I
> > > > would want the macro to only interrogate that column and remove rows based
> > > > only on the duplicated data in row E regardless of whether the other cells in
> > > > the row were duplicated. In this scenario I only want to keep the first row
> > > > conatining that unique identifier.
> > > >
> > > > Hope this makes sense and that I haven't confused you!
> > > >
> > > >
> > > >
> > > > Once again, I appreciate your help and time with this.

 
Reply With Quote
 
Joel
Guest
Posts: n/a
 
      8th Oct 2009
One thing that might help speed up the code s to reduce the size of the
worksheet. Once a cell is used excel sometimes thinks there is data in the
cell. So when you do a .end(xlup) or .end(xltoleft) you get the wrong
results. to solve this probelm is you manually delete the rows and columns
after the used data this helps

to delte the unused rows
1) select A1
2) Press Shift-Cntl-Down Arrow to get to last row
3) Select the row number after the last row of data
4) Press again Shift-Cntl-down Arrow all the rows after your data will be
selected
5) Right click any row number that is selected and choose delete

Repeat the same for columns instead of down arrow use right arrow.

"Monomeeth" wrote:

> Hi Joel
>
> Okay, retested the macro on a test worksheet I created. This worksheet had
> data in the range A1:AC543. Oh, and yes, the first row was a header row.
>
> Results are as follows:
>
> REMOVE DUPLICATES ALL COLUMNS
> Okay, same problem as before. It didn't take long to get the Visual Basic
> error message "400". When I click on Help all I get is a blank white window.
> I did notice, however, that the worksheet had been sorted - so the error
> occurs sometime after the sort is done. However, the duplicate rows had not
> beed deleted.
>
> I did try to find out what the error 400 was about, but all I found was some
> info relating to showing forms which are already visible - but in this case
> we there is no form.
>
> Anyway, hope this helps to narrow the proble down.
>
> REMOVE DUPLICATES ONE COLUMN
> Success! This worked. I deliberately designed the test worksheet to have
> duplicate rows - both in terms of every column and in terms of a single
> column. The macro went through fairly quickly and everything was A-OK.
>
> Thanks for this Joel - I will just have to bear in mind it may not work on
> larger worksheets - or take care to run it when the computer isn't doing
> anything else in case it's a memory issue.
>
> Now all I have to do is sort out the first Macro.
>
> Thanks again for your help!
>
>
>
> --
> If you can measure it, you can improve it!
>
>
> "Monomeeth" wrote:
>
> > Hi Joel
> >
> > Thank you very much for yoiur continued help. I really appreciate it. I also
> > like the way you have chosen to approach this.
> >
> > I decided to test both your macros by doing a straight copy and paste as is.
> > I chose a worksheet which has data in the range A1:AL29420. The results were
> > as follows:
> >
> > REMOVE DUPLICATES ALL COLUMNS
> > Macro seemed to run fine as it was obviously doing something. However, after
> > a few minutes I got a strange error message from Visual Basic. All it said
> > was "400". Nothing else, the only options I had was an OK button and a Help
> > button, but when I clicked on Help nothing seemed to happen.
> >
> > REMOVE DUPLICATES ONE COLUMN
> > Macro started fine - I was able to select the column and get it running.
> > However, I waited for an hour and the Macro seemed to still be running. There
> > were no error messages, but Excel said it was not responding, while Visual
> > Basic Editor said it was still running.
> >
> > Sorry, I can't give any further clues.
> >
> > Now that I'm thinking about it, I will try these macros on much smaller
> > worksheets. I will post back shortly with the results.
> >
> > Thanks Joel.
> >
> >
> > --
> > If you can measure it, you can improve it!
> >
> >
> >
> >
> >
> > "Joel" wrote:
> >
> > > The code assumes there is a header row in both cases. I asked the questions
> > > becasue I want to keep the code as simple as possible and to make it run
> > > quicker. Becasue the columns lengths varied I had to check every row for the
> > > last column.
> > >
> > > To make the code run faster instead of deleting rows one at a time. I
> > > placed an X in column IV for rows that needed to be deleted. Then filtered
> > > the X's using Autofilter and removed the rows with the X's. See comments in
> > > the code.
> > >
> > > Sub RemoveDuplicatesAllColumns()
> > >
> > > 'clear columnn IV incase data ther is data from last run
> > > Columns("IV").Delete
> > >
> > > 'find last row
> > > 'use column A to determine last row
> > > LastRow = Range("A" & Rows.Count).End(xlUp).Row
> > >
> > > 'find last column
> > > 'check every row to determine last column
> > > EndColumn = 0
> > > For RowCount = 1 To LastRow
> > > LastColumn = Cells(RowCount).End(xlUp).Column
> > > If LastColumn > EndColumn Then
> > > EndColumn = LastColumn
> > > End If
> > > Next RowCount
> > >
> > >
> > > 'sort cells three columns at a time
> > > For ColCount = 1 To EndColumn Step 3
> > > Rows("1:" & LastRow).Sort _
> > > header:=xlYes, _
> > > key1:=Cells(1, ColCount), _
> > > order1:=xlAscending, _
> > > key2:=Cells(1, ColCount + 1), _
> > > order2:=xlAscending, _
> > > key3:=Cells(1, ColCount + 1), _
> > > order3:=xlAscending
> > >
> > > Next ColCount
> > >
> > > 'compare rows putting a X in column IV where duplicates exist
> > > RowCount = 2
> > > Match = False 'use to indicate that last compared rows
> > > 'either matched or didn't match
> > > Do While RowCount < LastRow
> > > If Match = False Then
> > > CompareRow = RowCount + 1
> > > Else
> > > CompareRow = CompareRow + 1
> > > End If
> > > 'check if column lengths are equal if not skip
> > > LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column
> > > LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column
> > > Match = True
> > > If LastCol = LastCompareCol Then
> > > For ColCount = 1 To LastCol
> > > If Cells(RowCount, ColCount) <> _
> > > Cells(CompareRow, ColCount) Then
> > >
> > > Match = False
> > > Exit For
> > > End If
> > > Next ColCount
> > > If Match = True Then
> > > Range("IV" & CompareRow) = "X"
> > > End If
> > > Else
> > > Match = False
> > > RowCount = RowCount + 1
> > > End If
> > > Loop
> > >
> > >
> > > 'remove rows with X's in column IV
> > > Set c = Columns("IV").Find(what:="X", _
> > > LookIn:=xlValues, lookat:=xlWhole)
> > > 'only filer if at least one X is found
> > > If Not c Is Nothing Then
> > > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > > Columns("IV").Delete
> > > End If
> > >
> > > End Sub
> > >
> > > Sub RemoveDuplicatesOneColumn()
> > >
> > > 'clear columnn IV incase data ther is data from last run
> > > Columns("IV").Delete
> > >
> > > 'select column
> > > Set Selected = Application.InputBox(Prompt:="Select one column", _
> > > Title:="Select one column", Type:=8)
> > >
> > > SelectCol = Selected.Column
> > >
> > > 'find last row
> > > 'use column A to determine last row
> > > LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row
> > >
> > >
> > > 'sort by column selected
> > > Rows("1:" & LastRow).Sort _
> > > header:=xlYes, _
> > > key1:=Cells(1, SelectCol), _
> > > order1:=xlAscending
> > >
> > > 'compare rows putting a X in column IV where duplicates exist
> > > RowCount = 2
> > > Match = False 'use to indicate that last compared rows
> > > 'either matched or didn't match
> > > Do While RowCount < LastRow
> > > If Match = False Then
> > > CompareRow = RowCount + 1
> > > Else
> > > CompareRow = CompareRow + 1
> > > End If
> > > If Cells(RowCount, SelectCol) = _
> > > Cells(CompareRow, SelectCol) Then
> > >
> > > Range("IV" & CompareRow) = "X"
> > > Match = True
> > > Else
> > > Match = False
> > > RowCount = RowCount + 1
> > > End If
> > > Loop
> > >
> > > 'remove rows with X's in column IV
> > > Set c = Columns("IV").Find(what:="X", _
> > > LookIn:=xlValues, lookat:=xlWhole)
> > > 'only filer if at least one X is found
> > > If Not c Is Nothing Then
> > > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > > Columns("IV").Delete
> > > End If
> > >
> > > End Sub
> > >
> > >
> > >
> > >
> > > "Monomeeth" wrote:
> > >
> > > > Hi Joel
> > > >
> > > > Thanks very much for your help. My answers are as follows:
> > > >
> > > > 1) Is it acceptable to sort the worksheet in the macro?
> > > >
> > > > Yes.
> > > >
> > > > 2) For macro ONE what is the last column?
> > > >
> > > > This will be different as I want to be able to run the macro on various
> > > > spreadsheets. Although I suppose you could set the last column as IV (using
> > > > Excel 2003) and therefore it wouldn't matter as some cells may be blank.
> > > >
> > > > 3) Will the last column always be the same when you run the macro?
> > > >
> > > > No. See answer to 2 above.
> > > >
> > > > 4) Does every Row have data filed to the last column.
> > > >
> > > > Not necessarily. Some cells could be blank.
> > > >
> > > >
> > > > At the risk of providing too much info, here is some context for you:
> > > >
> > > > BACKGROUND
> > > > Basically I am trying to remove duplicate rows from worksheets which import
> > > > data from various online forms submitted by users. Unfortunately, sometimes
> > > > users submit the exact same data multiple times (for whatever reason) and
> > > > this causes problems when we need to do an analysis of the collated data. So
> > > > I want to be able to remove duplicate rows where every cell in the row is
> > > > exactly the same as every corresponding cell within another row, and some of
> > > > these cells could be blank.
> > > >
> > > > As for Macro TWO, I need this for another reason altogether. Last column is
> > > > not constant and yes there may be blank cells within the row.
> > > >
> > > > JUST A THOUGHT
> > > > I suppose you could have the one macro do both jobs by designing it so that
> > > > the user can specify a range to interrogate - the user could select a single
> > > > column or a group of columns - but I don't know whether that would complicate
> > > > things as I don't want the user to specify WHAT the macro is actually looking
> > > > for, only WHERE to look for duplicates.
> > > >
> > > > EXAMPLE
> > > >
> > > > I may have a worksheet with a data range of A1:AG25000. Some cells within
> > > > this range may be blank for whatever reason. Now, I may want to remove any
> > > > duplicate rows where data in columns A to AG are an exact match for the
> > > > corresponding cells.
> > > >
> > > > That is, A123 may be exactly the same as A237 and A767, B123 may be exactly
> > > > the same as B237 and B767, and so on for every column in those rows,
> > > > regardless of whether the match happens to contain data or be blank - hence
> > > > why you could go to column IV to do your interrogation. In this scenario I
> > > > want to only be left with unique rows after the macro is finished, so only
> > > > row 123 would remain while rows 237 and 767 are deleted.
> > > >
> > > > Now, I may have another spreadsheet where I want to achieve the same thing,
> > > > but only want to check a single column for duplicates. For instance, I may
> > > > have 25000 rows in a spreadsheet containing employee data, but there may only
> > > > be 18000 employees in total, so I should only have 18000 rows not 25000. So,
> > > > if Column E contained a unique identifier such as employee numbers, then I
> > > > would want the macro to only interrogate that column and remove rows based
> > > > only on the duplicated data in row E regardless of whether the other cells in
> > > > the row were duplicated. In this scenario I only want to keep the first row
> > > > conatining that unique identifier.
> > > >
> > > > Hope this makes sense and that I haven't confused you!
> > > >
> > > >
> > > >
> > > > Once again, I appreciate your help and time with this.

 
Reply With Quote
 
Monomeeth
Guest
Posts: n/a
 
      13th Oct 2009
Hi Joel

I'm afraid I've run into problems. I've been experimenting and trying to
sort it out myself, not wanting to trouble you anymore than necessary, but
I've hit a brick wall.

I tried to run the REMOVE DUPLICATES ONE COLUMN macro on a "real" worksheet
which had data in the range A1:AL4942, but was met with the Visual Basic
"400" error. I decided to reduce the worksheet down, since this macro did
work on a much smaller worksheet last week. I progressively reduced the
worksheet down from 4942 rows to 2000 rows to 1000 rows to 500 rows and
finally to 250 rows.

I ran the Macro each time with the same result (i.e. the "400" error). I
also tried deleting all unused rows and columns, but to no avail. Adding the
extra DIM statements at the top of the code made no difference either.

DEBUGGING

What I can tell you is that in every instance the first column was filtered,
with a custom filter looking for "X", but there is no "X" in the first
column. The Xs only appear in column IV as they should, so I checked the code
looking for something obvious, but couldn't see anything. I'm now wondering
whether the same problem is affecting both macros?

I also redid my test from last week when I got the macro to work on my dummy
workbook, but cannot replicate the success - so much for my testing. I swear
it worked TWICE last week, but now not at all - so I can't explain that as I
was certainly methodical in my approach.

Sorry to hassle you, but perhaps you might see this as a challenge?

I'm using Excel 2003, just in case that's an issue.

Regards,

Joe.
--
If you can measure it, you can improve it!


"Joel" wrote:

> I asked my original questions because I knew these programs could take a long
> time to run. The sorting and putting the X'x in column IV probably increased
> the speed of this program by an order of maginitude. Integer Variables
> default to 16,536 when not declared. when you have a worksheet larger than
> 16,536 rows you must declare any variables that are used to count the rows as
> long. try putting these statements at the top of the two macros
>
> Dim LastRow as long
> Dim RowCount as Long
> Dim CompareRow as Long
>
>
> "Monomeeth" wrote:
>
> > Hi Joel
> >
> > Okay, retested the macro on a test worksheet I created. This worksheet had
> > data in the range A1:AC543. Oh, and yes, the first row was a header row.
> >
> > Results are as follows:
> >
> > REMOVE DUPLICATES ALL COLUMNS
> > Okay, same problem as before. It didn't take long to get the Visual Basic
> > error message "400". When I click on Help all I get is a blank white window.
> > I did notice, however, that the worksheet had been sorted - so the error
> > occurs sometime after the sort is done. However, the duplicate rows had not
> > beed deleted.
> >
> > I did try to find out what the error 400 was about, but all I found was some
> > info relating to showing forms which are already visible - but in this case
> > we there is no form.
> >
> > Anyway, hope this helps to narrow the proble down.
> >
> > REMOVE DUPLICATES ONE COLUMN
> > Success! This worked. I deliberately designed the test worksheet to have
> > duplicate rows - both in terms of every column and in terms of a single
> > column. The macro went through fairly quickly and everything was A-OK.
> >
> > Thanks for this Joel - I will just have to bear in mind it may not work on
> > larger worksheets - or take care to run it when the computer isn't doing
> > anything else in case it's a memory issue.
> >
> > Now all I have to do is sort out the first Macro.
> >
> > Thanks again for your help!
> >
> >
> >
> > --
> > If you can measure it, you can improve it!
> >
> >
> > "Monomeeth" wrote:
> >
> > > Hi Joel
> > >
> > > Thank you very much for yoiur continued help. I really appreciate it. I also
> > > like the way you have chosen to approach this.
> > >
> > > I decided to test both your macros by doing a straight copy and paste as is.
> > > I chose a worksheet which has data in the range A1:AL29420. The results were
> > > as follows:
> > >
> > > REMOVE DUPLICATES ALL COLUMNS
> > > Macro seemed to run fine as it was obviously doing something. However, after
> > > a few minutes I got a strange error message from Visual Basic. All it said
> > > was "400". Nothing else, the only options I had was an OK button and a Help
> > > button, but when I clicked on Help nothing seemed to happen.
> > >
> > > REMOVE DUPLICATES ONE COLUMN
> > > Macro started fine - I was able to select the column and get it running.
> > > However, I waited for an hour and the Macro seemed to still be running. There
> > > were no error messages, but Excel said it was not responding, while Visual
> > > Basic Editor said it was still running.
> > >
> > > Sorry, I can't give any further clues.
> > >
> > > Now that I'm thinking about it, I will try these macros on much smaller
> > > worksheets. I will post back shortly with the results.
> > >
> > > Thanks Joel.
> > >
> > >
> > > --
> > > If you can measure it, you can improve it!
> > >
> > >
> > >
> > >
> > >
> > > "Joel" wrote:
> > >
> > > > The code assumes there is a header row in both cases. I asked the questions
> > > > becasue I want to keep the code as simple as possible and to make it run
> > > > quicker. Becasue the columns lengths varied I had to check every row for the
> > > > last column.
> > > >
> > > > To make the code run faster instead of deleting rows one at a time. I
> > > > placed an X in column IV for rows that needed to be deleted. Then filtered
> > > > the X's using Autofilter and removed the rows with the X's. See comments in
> > > > the code.
> > > >
> > > > Sub RemoveDuplicatesAllColumns()
> > > >
> > > > 'clear columnn IV incase data ther is data from last run
> > > > Columns("IV").Delete
> > > >
> > > > 'find last row
> > > > 'use column A to determine last row
> > > > LastRow = Range("A" & Rows.Count).End(xlUp).Row
> > > >
> > > > 'find last column
> > > > 'check every row to determine last column
> > > > EndColumn = 0
> > > > For RowCount = 1 To LastRow
> > > > LastColumn = Cells(RowCount).End(xlUp).Column
> > > > If LastColumn > EndColumn Then
> > > > EndColumn = LastColumn
> > > > End If
> > > > Next RowCount
> > > >
> > > >
> > > > 'sort cells three columns at a time
> > > > For ColCount = 1 To EndColumn Step 3
> > > > Rows("1:" & LastRow).Sort _
> > > > header:=xlYes, _
> > > > key1:=Cells(1, ColCount), _
> > > > order1:=xlAscending, _
> > > > key2:=Cells(1, ColCount + 1), _
> > > > order2:=xlAscending, _
> > > > key3:=Cells(1, ColCount + 1), _
> > > > order3:=xlAscending
> > > >
> > > > Next ColCount
> > > >
> > > > 'compare rows putting a X in column IV where duplicates exist
> > > > RowCount = 2
> > > > Match = False 'use to indicate that last compared rows
> > > > 'either matched or didn't match
> > > > Do While RowCount < LastRow
> > > > If Match = False Then
> > > > CompareRow = RowCount + 1
> > > > Else
> > > > CompareRow = CompareRow + 1
> > > > End If
> > > > 'check if column lengths are equal if not skip
> > > > LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column
> > > > LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column
> > > > Match = True
> > > > If LastCol = LastCompareCol Then
> > > > For ColCount = 1 To LastCol
> > > > If Cells(RowCount, ColCount) <> _
> > > > Cells(CompareRow, ColCount) Then
> > > >
> > > > Match = False
> > > > Exit For
> > > > End If
> > > > Next ColCount
> > > > If Match = True Then
> > > > Range("IV" & CompareRow) = "X"
> > > > End If
> > > > Else
> > > > Match = False
> > > > RowCount = RowCount + 1
> > > > End If
> > > > Loop
> > > >
> > > >
> > > > 'remove rows with X's in column IV
> > > > Set c = Columns("IV").Find(what:="X", _
> > > > LookIn:=xlValues, lookat:=xlWhole)
> > > > 'only filer if at least one X is found
> > > > If Not c Is Nothing Then
> > > > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > > > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > > > Columns("IV").Delete
> > > > End If
> > > >
> > > > End Sub
> > > >
> > > > Sub RemoveDuplicatesOneColumn()
> > > >
> > > > 'clear columnn IV incase data ther is data from last run
> > > > Columns("IV").Delete
> > > >
> > > > 'select column
> > > > Set Selected = Application.InputBox(Prompt:="Select one column", _
> > > > Title:="Select one column", Type:=8)
> > > >
> > > > SelectCol = Selected.Column
> > > >
> > > > 'find last row
> > > > 'use column A to determine last row
> > > > LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row
> > > >
> > > >
> > > > 'sort by column selected
> > > > Rows("1:" & LastRow).Sort _
> > > > header:=xlYes, _
> > > > key1:=Cells(1, SelectCol), _
> > > > order1:=xlAscending
> > > >
> > > > 'compare rows putting a X in column IV where duplicates exist
> > > > RowCount = 2
> > > > Match = False 'use to indicate that last compared rows
> > > > 'either matched or didn't match
> > > > Do While RowCount < LastRow
> > > > If Match = False Then
> > > > CompareRow = RowCount + 1
> > > > Else
> > > > CompareRow = CompareRow + 1
> > > > End If
> > > > If Cells(RowCount, SelectCol) = _
> > > > Cells(CompareRow, SelectCol) Then
> > > >
> > > > Range("IV" & CompareRow) = "X"
> > > > Match = True
> > > > Else
> > > > Match = False
> > > > RowCount = RowCount + 1
> > > > End If
> > > > Loop
> > > >
> > > > 'remove rows with X's in column IV
> > > > Set c = Columns("IV").Find(what:="X", _
> > > > LookIn:=xlValues, lookat:=xlWhole)
> > > > 'only filer if at least one X is found
> > > > If Not c Is Nothing Then
> > > > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > > > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > > > Columns("IV").Delete
> > > > End If
> > > >
> > > > End Sub
> > > >
> > > >
> > > >
> > > >
> > > > "Monomeeth" wrote:
> > > >
> > > > > Hi Joel
> > > > >
> > > > > Thanks very much for your help. My answers are as follows:
> > > > >
> > > > > 1) Is it acceptable to sort the worksheet in the macro?
> > > > >
> > > > > Yes.
> > > > >
> > > > > 2) For macro ONE what is the last column?
> > > > >
> > > > > This will be different as I want to be able to run the macro on various
> > > > > spreadsheets. Although I suppose you could set the last column as IV (using
> > > > > Excel 2003) and therefore it wouldn't matter as some cells may be blank.
> > > > >
> > > > > 3) Will the last column always be the same when you run the macro?
> > > > >
> > > > > No. See answer to 2 above.
> > > > >
> > > > > 4) Does every Row have data filed to the last column.
> > > > >
> > > > > Not necessarily. Some cells could be blank.
> > > > >
> > > > >
> > > > > At the risk of providing too much info, here is some context for you:
> > > > >
> > > > > BACKGROUND
> > > > > Basically I am trying to remove duplicate rows from worksheets which import
> > > > > data from various online forms submitted by users. Unfortunately, sometimes
> > > > > users submit the exact same data multiple times (for whatever reason) and
> > > > > this causes problems when we need to do an analysis of the collated data. So
> > > > > I want to be able to remove duplicate rows where every cell in the row is
> > > > > exactly the same as every corresponding cell within another row, and some of
> > > > > these cells could be blank.
> > > > >
> > > > > As for Macro TWO, I need this for another reason altogether. Last column is
> > > > > not constant and yes there may be blank cells within the row.
> > > > >
> > > > > JUST A THOUGHT
> > > > > I suppose you could have the one macro do both jobs by designing it so that
> > > > > the user can specify a range to interrogate - the user could select a single
> > > > > column or a group of columns - but I don't know whether that would complicate
> > > > > things as I don't want the user to specify WHAT the macro is actually looking
> > > > > for, only WHERE to look for duplicates.
> > > > >
> > > > > EXAMPLE
> > > > >
> > > > > I may have a worksheet with a data range of A1:AG25000. Some cells within
> > > > > this range may be blank for whatever reason. Now, I may want to remove any
> > > > > duplicate rows where data in columns A to AG are an exact match for the
> > > > > corresponding cells.
> > > > >
> > > > > That is, A123 may be exactly the same as A237 and A767, B123 may be exactly
> > > > > the same as B237 and B767, and so on for every column in those rows,
> > > > > regardless of whether the match happens to contain data or be blank - hence
> > > > > why you could go to column IV to do your interrogation. In this scenario I
> > > > > want to only be left with unique rows after the macro is finished, so only
> > > > > row 123 would remain while rows 237 and 767 are deleted.
> > > > >
> > > > > Now, I may have another spreadsheet where I want to achieve the same thing,

 
Reply With Quote
 
Joel
Guest
Posts: n/a
 
      13th Oct 2009
Which line is giving the Error 400?

Try removing the autofilter before running the macro.


If is still fails check the value of LastRow and let me know what LastRow is
being set to.



"Monomeeth" wrote:

> Hi Joel
>
> I'm afraid I've run into problems. I've been experimenting and trying to
> sort it out myself, not wanting to trouble you anymore than necessary, but
> I've hit a brick wall.
>
> I tried to run the REMOVE DUPLICATES ONE COLUMN macro on a "real" worksheet
> which had data in the range A1:AL4942, but was met with the Visual Basic
> "400" error. I decided to reduce the worksheet down, since this macro did
> work on a much smaller worksheet last week. I progressively reduced the
> worksheet down from 4942 rows to 2000 rows to 1000 rows to 500 rows and
> finally to 250 rows.
>
> I ran the Macro each time with the same result (i.e. the "400" error). I
> also tried deleting all unused rows and columns, but to no avail. Adding the
> extra DIM statements at the top of the code made no difference either.
>
> DEBUGGING
>
> What I can tell you is that in every instance the first column was filtered,
> with a custom filter looking for "X", but there is no "X" in the first
> column. The Xs only appear in column IV as they should, so I checked the code
> looking for something obvious, but couldn't see anything. I'm now wondering
> whether the same problem is affecting both macros?
>
> I also redid my test from last week when I got the macro to work on my dummy
> workbook, but cannot replicate the success - so much for my testing. I swear
> it worked TWICE last week, but now not at all - so I can't explain that as I
> was certainly methodical in my approach.
>
> Sorry to hassle you, but perhaps you might see this as a challenge?
>
> I'm using Excel 2003, just in case that's an issue.
>
> Regards,
>
> Joe.
> --
> If you can measure it, you can improve it!
>
>
> "Joel" wrote:
>
> > I asked my original questions because I knew these programs could take a long
> > time to run. The sorting and putting the X'x in column IV probably increased
> > the speed of this program by an order of maginitude. Integer Variables
> > default to 16,536 when not declared. when you have a worksheet larger than
> > 16,536 rows you must declare any variables that are used to count the rows as
> > long. try putting these statements at the top of the two macros
> >
> > Dim LastRow as long
> > Dim RowCount as Long
> > Dim CompareRow as Long
> >
> >
> > "Monomeeth" wrote:
> >
> > > Hi Joel
> > >
> > > Okay, retested the macro on a test worksheet I created. This worksheet had
> > > data in the range A1:AC543. Oh, and yes, the first row was a header row.
> > >
> > > Results are as follows:
> > >
> > > REMOVE DUPLICATES ALL COLUMNS
> > > Okay, same problem as before. It didn't take long to get the Visual Basic
> > > error message "400". When I click on Help all I get is a blank white window.
> > > I did notice, however, that the worksheet had been sorted - so the error
> > > occurs sometime after the sort is done. However, the duplicate rows had not
> > > beed deleted.
> > >
> > > I did try to find out what the error 400 was about, but all I found was some
> > > info relating to showing forms which are already visible - but in this case
> > > we there is no form.
> > >
> > > Anyway, hope this helps to narrow the proble down.
> > >
> > > REMOVE DUPLICATES ONE COLUMN
> > > Success! This worked. I deliberately designed the test worksheet to have
> > > duplicate rows - both in terms of every column and in terms of a single
> > > column. The macro went through fairly quickly and everything was A-OK.
> > >
> > > Thanks for this Joel - I will just have to bear in mind it may not work on
> > > larger worksheets - or take care to run it when the computer isn't doing
> > > anything else in case it's a memory issue.
> > >
> > > Now all I have to do is sort out the first Macro.
> > >
> > > Thanks again for your help!
> > >
> > >
> > >
> > > --
> > > If you can measure it, you can improve it!
> > >
> > >
> > > "Monomeeth" wrote:
> > >
> > > > Hi Joel
> > > >
> > > > Thank you very much for yoiur continued help. I really appreciate it. I also
> > > > like the way you have chosen to approach this.
> > > >
> > > > I decided to test both your macros by doing a straight copy and paste as is.
> > > > I chose a worksheet which has data in the range A1:AL29420. The results were
> > > > as follows:
> > > >
> > > > REMOVE DUPLICATES ALL COLUMNS
> > > > Macro seemed to run fine as it was obviously doing something. However, after
> > > > a few minutes I got a strange error message from Visual Basic. All it said
> > > > was "400". Nothing else, the only options I had was an OK button and a Help
> > > > button, but when I clicked on Help nothing seemed to happen.
> > > >
> > > > REMOVE DUPLICATES ONE COLUMN
> > > > Macro started fine - I was able to select the column and get it running.
> > > > However, I waited for an hour and the Macro seemed to still be running. There
> > > > were no error messages, but Excel said it was not responding, while Visual
> > > > Basic Editor said it was still running.
> > > >
> > > > Sorry, I can't give any further clues.
> > > >
> > > > Now that I'm thinking about it, I will try these macros on much smaller
> > > > worksheets. I will post back shortly with the results.
> > > >
> > > > Thanks Joel.
> > > >
> > > >
> > > > --
> > > > If you can measure it, you can improve it!
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > "Joel" wrote:
> > > >
> > > > > The code assumes there is a header row in both cases. I asked the questions
> > > > > becasue I want to keep the code as simple as possible and to make it run
> > > > > quicker. Becasue the columns lengths varied I had to check every row for the
> > > > > last column.
> > > > >
> > > > > To make the code run faster instead of deleting rows one at a time. I
> > > > > placed an X in column IV for rows that needed to be deleted. Then filtered
> > > > > the X's using Autofilter and removed the rows with the X's. See comments in
> > > > > the code.
> > > > >
> > > > > Sub RemoveDuplicatesAllColumns()
> > > > >
> > > > > 'clear columnn IV incase data ther is data from last run
> > > > > Columns("IV").Delete
> > > > >
> > > > > 'find last row
> > > > > 'use column A to determine last row
> > > > > LastRow = Range("A" & Rows.Count).End(xlUp).Row
> > > > >
> > > > > 'find last column
> > > > > 'check every row to determine last column
> > > > > EndColumn = 0
> > > > > For RowCount = 1 To LastRow
> > > > > LastColumn = Cells(RowCount).End(xlUp).Column
> > > > > If LastColumn > EndColumn Then
> > > > > EndColumn = LastColumn
> > > > > End If
> > > > > Next RowCount
> > > > >
> > > > >
> > > > > 'sort cells three columns at a time
> > > > > For ColCount = 1 To EndColumn Step 3
> > > > > Rows("1:" & LastRow).Sort _
> > > > > header:=xlYes, _
> > > > > key1:=Cells(1, ColCount), _
> > > > > order1:=xlAscending, _
> > > > > key2:=Cells(1, ColCount + 1), _
> > > > > order2:=xlAscending, _
> > > > > key3:=Cells(1, ColCount + 1), _
> > > > > order3:=xlAscending
> > > > >
> > > > > Next ColCount
> > > > >
> > > > > 'compare rows putting a X in column IV where duplicates exist
> > > > > RowCount = 2
> > > > > Match = False 'use to indicate that last compared rows
> > > > > 'either matched or didn't match
> > > > > Do While RowCount < LastRow
> > > > > If Match = False Then
> > > > > CompareRow = RowCount + 1
> > > > > Else
> > > > > CompareRow = CompareRow + 1
> > > > > End If
> > > > > 'check if column lengths are equal if not skip
> > > > > LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column
> > > > > LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column
> > > > > Match = True
> > > > > If LastCol = LastCompareCol Then
> > > > > For ColCount = 1 To LastCol
> > > > > If Cells(RowCount, ColCount) <> _
> > > > > Cells(CompareRow, ColCount) Then
> > > > >
> > > > > Match = False
> > > > > Exit For
> > > > > End If
> > > > > Next ColCount
> > > > > If Match = True Then
> > > > > Range("IV" & CompareRow) = "X"
> > > > > End If
> > > > > Else
> > > > > Match = False
> > > > > RowCount = RowCount + 1
> > > > > End If
> > > > > Loop
> > > > >
> > > > >
> > > > > 'remove rows with X's in column IV
> > > > > Set c = Columns("IV").Find(what:="X", _
> > > > > LookIn:=xlValues, lookat:=xlWhole)
> > > > > 'only filer if at least one X is found
> > > > > If Not c Is Nothing Then
> > > > > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > > > > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > > > > Columns("IV").Delete
> > > > > End If
> > > > >
> > > > > End Sub
> > > > >
> > > > > Sub RemoveDuplicatesOneColumn()
> > > > >
> > > > > 'clear columnn IV incase data ther is data from last run
> > > > > Columns("IV").Delete
> > > > >
> > > > > 'select column
> > > > > Set Selected = Application.InputBox(Prompt:="Select one column", _
> > > > > Title:="Select one column", Type:=8)
> > > > >
> > > > > SelectCol = Selected.Column
> > > > >
> > > > > 'find last row
> > > > > 'use column A to determine last row
> > > > > LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row
> > > > >
> > > > >
> > > > > 'sort by column selected
> > > > > Rows("1:" & LastRow).Sort _
> > > > > header:=xlYes, _
> > > > > key1:=Cells(1, SelectCol), _
> > > > > order1:=xlAscending
> > > > >
> > > > > 'compare rows putting a X in column IV where duplicates exist
> > > > > RowCount = 2
> > > > > Match = False 'use to indicate that last compared rows
> > > > > 'either matched or didn't match
> > > > > Do While RowCount < LastRow
> > > > > If Match = False Then
> > > > > CompareRow = RowCount + 1
> > > > > Else
> > > > > CompareRow = CompareRow + 1
> > > > > End If
> > > > > If Cells(RowCount, SelectCol) = _
> > > > > Cells(CompareRow, SelectCol) Then
> > > > >
> > > > > Range("IV" & CompareRow) = "X"
> > > > > Match = True
> > > > > Else
> > > > > Match = False
> > > > > RowCount = RowCount + 1
> > > > > End If
> > > > > Loop
> > > > >
> > > > > 'remove rows with X's in column IV
> > > > > Set c = Columns("IV").Find(what:="X", _
> > > > > LookIn:=xlValues, lookat:=xlWhole)
> > > > > 'only filer if at least one X is found
> > > > > If Not c Is Nothing Then
> > > > > Columns("IV").AutoFilter Field:=1, Criteria1:="X"
> > > > > Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
> > > > > Columns("IV").Delete
> > > > > End If
> > > > >
> > > > > End Sub
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > "Monomeeth" wrote:
> > > > >
> > > > > > Hi Joel
> > > > > >
> > > > > > Thanks very much for your help. My answers are as follows:
> > > > > >
> > > > > > 1) Is it acceptable to sort the worksheet in the macro?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > 2) For macro ONE what is the last column?
> > > > > >
> > > > > > This will be different as I want to be able to run the macro on various
> > > > > > spreadsheets. Although I suppose you could set the last column as IV (using
> > > > > > Excel 2003) and therefore it wouldn't matter as some cells may be blank.
> > > > > >
> > > > > > 3) Will the last column always be the same when you run the macro?
> > > > > >

 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
comparing rows Papa Jonah Microsoft Excel Programming 3 8th Aug 2008 01:26 AM
comparing rows RobcPettit@yahoo.co.uk Microsoft Excel Programming 2 12th Jun 2006 06:04 PM
Comparing rows praveen_khm Microsoft Excel Programming 3 24th Mar 2006 04:19 PM
comparing rows =?Utf-8?B?cm93ZXJuZWVkc2hlbHA=?= Microsoft Access 0 16th Sep 2004 09:53 AM
Comparing two rows mika. Microsoft Excel Programming 1 17th Dec 2003 06:27 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 10:14 AM.