Delete duplicate data in SAME CELL

T

Tom

I am completing data cleanup in Excel (imported data from
Access database).

At this time, I have cells (records) that potentially
contain the same information multiple times due to
concatining data. Sounds fuzzy, right? Here's an
example:

SAMPLE DATA BEFORE CLEANUP
Column A Column B
Row 1 1000 Reno
Row 2 1000 Las Vegas
Row 3 1001 Miami
Row 4 1002 Richmond
Row 5 1002 Norfolk
Row 6 1002 Richmond
Row 7 1002 Richmond


SAMPLE DATA AFTER CLEANUP
Column A Column B
Row 1 1000 Reno; Las Vegas
Row 3 1001 Miami
Row 4 1002 Richmond; Norfolk; Richmond; Richmond

As the sample illustrates, I now have 3 times "Richmond"
in Row (Record) 4. I need to delete the second and third
occurence of "Richmond". Currently, after concatening
the data, the strings are separated by a semicolon.

Moreover, I have other fields that may contain a much
longer string in the cell. For instance, a cell might
contain the following:

Hardware maintenance; System monitoring; Network
monitoring; Administration and maintenance; WS change-
management; Inventory SW/HW; System monitoring

In this case, I need to identify and automatically delete
the second occurence of "System monitoring".


Any help is appreciated!!!

Thanks,
Tom
 
T

Tom

Dave:

Thanks a lot for the macros... the second one works just
superb (I couldn't get the first one to work).

I realized though that I forgot to specify one thing... I
was wondering if you could provide me further information.

If may have a cell that contains ONLY the same value.
For instance, it might contain:

Richmond; Richmond; Richmond

After I ran the query, I need to be left with at least 1
value of "Richmond".

Otherwise, again, it works perfectly... a cell
containing "Richmond; Richmond; Norfolk; Richmond" is
changed to "Richmond; Norfolk"

THANKS DAVE!!!!

Hopefully, you'll have a chance to let me know if there's
a way to adjust the macro. ('Apologize for not
describing this potential sceanrio in the initial
problem).

Tom
 
T

Tom

Dave:

I did further analysis on the problem...

In the example of "Richmond; Richmond; Richmond", it
actually DOES LEAVE one occurence of "Richmond" in the
cell.

The problem is when the last character ends with a
semicolon as well. For instance:

Richmond; Richmond; Richmond;

I believe, the occurence of the semicolon after the
3rd "Richmond" makes it to delete them all.

So, my follow-up question to you is the following...
Having thousands of records, how can I ensure that if the
last character is NOT a "." to make it a "."

Or, how can I run a macro which will delete the last
occurence of a semicolon in a cell with BLANK.

Thanks again!

Tom
 
T

Tom

Dave:

I apologize for leaving so many replies before you even
had the chance to respond...

When the cell value ends with a semicolon, in some
instances it completely empties the cell while in other
case it leaves " ; Richmond".

I believe the best way for me to analyze this problem is
to "highlight" the cell where something was deleted after
the macro was executed. I then get a better idea where
the cause might be or where I, potentially, have to re-
enter the data.

So, my final question would be: What is the VBA code
that will highlight a cell (let's say yellow background
color) after the macro was executed?

THANKS SO MUCH!!!!!

Tom
 
D

Dave Peterson

I did respond, but it looks like there was a problem with the MS news servers.

I was going to just give you the changes, but here's the whole thing:

Option Explicit
Sub testme02()

Dim myCell As Range
Dim myRng As Range
Dim myString As String

Set myRng = Selection

For Each myCell In myRng.Cells
With myCell
.Value = removeDuplicates(.Value)
With .Interior
.ColorIndex = 40 'change to your favorite color
.Pattern = xlSolid
End With
End With
Next myCell

End Sub
Function removeDuplicates(myStr As String) As String

Dim mySplit As Variant
Dim myCollection As Collection
Dim iCtr As Long
Dim jCtr As Long
Dim Swap1 As String
Dim Swap2 As String

Set myCollection = New Collection

mySplit = Split97(myStr, ";")
On Error Resume Next
For iCtr = LBound(mySplit) To UBound(mySplit)
myCollection.Add CStr(Trim(mySplit(iCtr))), _
Trim(CStr(mySplit(iCtr)))
Next iCtr
On Error GoTo 0

'routine to sort the entries
For iCtr = 1 To myCollection.Count - 1
For jCtr = iCtr + 1 To myCollection.Count
If myCollection(iCtr) > myCollection(jCtr) Then
Swap1 = myCollection(iCtr)
Swap2 = myCollection(jCtr)
myCollection.Add Swap1, before:=jCtr
myCollection.Add Swap2, before:=iCtr
myCollection.Remove iCtr + 1
myCollection.Remove jCtr + 1
End If
Next jCtr
Next iCtr

myStr = ""
For iCtr = 1 To myCollection.Count
myStr = myStr & "; " & myCollection(iCtr)
Next iCtr

If Len(myStr) > 0 Then
myStr = Mid(myStr, 3)
End If

removeDuplicates = myStr & "."

End Function

Public Function ReadUntil(ByRef sIn As String, _
sDelim As String, Optional bCompare As Long _
= vbBinaryCompare) As String
Dim nPos As String
nPos = InStr(1, sIn, sDelim, bCompare)
If nPos > 0 Then
ReadUntil = Left(sIn, nPos - 1)
sIn = Mid(sIn, nPos + Len(sDelim))
End If
End Function

Public Function Split97(ByVal sIn As String, Optional sDelim As _
String, Optional nLimit As Long = -1, Optional bCompare As _
Long = vbBinaryCompare) As Variant
Dim sRead As String, sOut() As String, nC As Integer
If sDelim = "" Then
Split97 = sIn
End If
sRead = ReadUntil(sIn, sDelim, bCompare)
Do
ReDim Preserve sOut(nC)
sOut(nC) = sRead
nC = nC + 1
If nLimit <> -1 And nC >= nLimit Then Exit Do
sRead = ReadUntil(sIn, sDelim)
Loop While sRead <> ""
ReDim Preserve sOut(nC)
sOut(nC) = sIn
Split97 = sOut
End Function

=========
Just some notes:

I think that if you have extra delimiters in the string, you get empty strings
added to the collection. You can avoid it by looking first:

For iCtr = LBound(mySplit) To UBound(mySplit)
If Trim(mySplit(iCtr)) <> "" Then
myCollection.Add CStr(Trim(mySplit(iCtr))), _
Trim(CStr(mySplit(iCtr)))
End If
Next iCtr


then later, right at the bottom to add the period:

removeDuplicates = myStr & "."

===================
For the emptying the cell problem:

And what version of excel are you using?

If the value in the cell is really long then that version of split97 won't work.

If you're using xl2k or higher, change split97 to split (and delete the function
itself).

If you're using xl97, delete the existing split97 function and replace it with
these two (I already replaced it.)
 
T

Tom

Dave:

I actually ended up replying to the other thread from
this morning.

I'd REALLY appreciate if you could have another look at
this and provide me more help.


Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top