Here's a UDF that should get you started. There may be more efficient methods,
but I was using some "new to me" techniques here.
It returns a two-dimensional array consisting of the unique words; and the
count of each of those unique words.
By the way, a "word" is defined as a collection of word characters (bounded by
a non-word character or the beginning or end of the line). A word character is
defined as being in the class of the alphabet (A-Za-z), a digit (0-9) or the
underscore (_). If this definition of "word" gives unwanted results, it can be
changed.
To enter this, <alt-F11> opens the VBEditor. Ensure your project is
highlighted in the project explorer window, then Insert/Module and paste the
code below into the window that opens.
THEN: Select Tools/References and set a reference to Microsoft VBScript
Regular Expressions 5.5
There are several ways to display the results.
Assuming your "data" is in A1:A3, enter a formula into some cell:
First word
D1: =INDEX(uniquecount($A$1:$A$3),1,ROWS($1:1))
Count of first word
E1: =INDEX(uniquecount($A$1:$A$3),2,ROWS($1:1))
Then select D1:E1 and fill down as far as required. If you go to far, you'll
see #REF errors.
This might be better for you sorted, but I don't have time to do that right
now.
Once you have the results, you can copy/paste-special Values to some other area
of your worksheet, and then sort on the values.
If this is going to be used frequently, a sort routine can be incorporated.
===============================================
Option Explicit
Function UniqueCount(rg As Range) As Variant
'Requires reference to Microsoft VBScript Regular Expressions 5.5
'Returns a two dimensional array of unique words and count
Dim cWordList As Collection
Dim Str As String
Dim sRes() As Variant
Dim i As Long, j As Long
Dim c As Range
Dim re As RegExp
Dim mc As MatchCollection, m As Match
'Put all words into a single string
For Each c In rg
Str = Str & c.Value & " "
Next c
'get list of unique words
Set re = New RegExp
re.Global = True
re.Pattern = "\b\w+\b"
Set cWordList = New Collection
On Error Resume Next
'Add method with index=word will give error on duplicates
Set mc = re.Execute(Str)
For Each m In mc
cWordList.Add m.Value, m.Value
Next m
On Error GoTo 0
ReDim sRes(0 To 1, 1 To cWordList.Count)
For i = 1 To cWordList.Count
sRes(0, i) = cWordList(i)
Next i
'get word count for each word
re.Global = True
re.IgnoreCase = True
For i = 1 To UBound(sRes, 2)
re.Pattern = "\b" & sRes(0, i) & "\b"
Set mc = re.Execute(Str)
sRes(1, i) = mc.Count
Next i
Set re = Nothing
UniqueCount = sRes
End Function
===============================
--ron
With a little more fooling around, I modified the above to include words with
apostrophe's; and also did a double sort so the most common words would at the
top; and the subsort would be alphabetical.
If you don't want the results sorted, just comment out one or both of the two
sorting lines below.
=========================================
Option Explicit
Option Compare Text
Function UniqueCount(rg As Range) As Variant
'Requires reference to Microsoft VBScript Regular Expressions 5.5
'Returns a two dimensional array of unique words and count
Dim cWordList As Collection
Dim Str As String
Dim sRes() As Variant
Dim i As Long, j As Long
Dim c As Range
Dim re As RegExp
Dim mc As MatchCollection, m As Match
'Put all words into a single string
For Each c In rg
Str = Str & c.Value & " "
Next c
'get list of unique words
Set re = New RegExp
re.Global = True
re.Pattern = "\b[\w']+\b"
Set cWordList = New Collection
On Error Resume Next
'Add method with index=word will give error on duplicates
Set mc = re.Execute(Str)
For Each m In mc
cWordList.Add m.Value, m.Value
Next m
On Error GoTo 0
ReDim sRes(0 To 1, 1 To cWordList.Count)
For i = 1 To cWordList.Count
sRes(0, i) = cWordList(i)
Next i
'get word count for each word
re.Global = True
re.IgnoreCase = True
For i = 1 To UBound(sRes, 2)
re.Pattern = "\b" & sRes(0, i) & "\b"
Set mc = re.Execute(Str)
sRes(1, i) = mc.Count
Next i
Set re = Nothing
'you can comment out one or both of the sort lines
' depending on your requirements
'Sort words alphabetically A-Z
BubbleSort sRes, 0, True
'then sort by Count highest to lowest
BubbleSort sRes, 1, False
UniqueCount = sRes
End Function
'--------------------------------------------------------------
Private Sub BubbleSort(TempArray As Variant, d As Long, _
bSortDirection As Boolean)
'bSortDirection = True means sort ascending
'bSortDirection = False means sort descending
Dim Temp1 As Variant, Temp2
Dim i As Long
Dim NoExchanges As Boolean
Dim Exchange As Boolean
' Loop until no more "exchanges" are made.
Do
NoExchanges = True
' Loop through each element in the array.
For i = 1 To UBound(TempArray, 2) - 1
' If the element is greater/less than the element
' following it, exchange the two elements.
Exchange = IIf(TempArray(d, i) < TempArray(d, i + 1), True, False)
If bSortDirection = True Then Exchange = Not Exchange
If Exchange Then
NoExchanges = False
Temp1 = TempArray(0, i)
Temp2 = TempArray(1, i)
TempArray(0, i) = TempArray(0, i + 1)
TempArray(1, i) = TempArray(1, i + 1)
TempArray(0, i + 1) = Temp1
TempArray(1, i + 1) = Temp2
End If
Next i
Loop While Not (NoExchanges)
End Sub
============================================
--ron