Top Ten Word count

  • Thread starter Thread starter Kat3n
  • Start date Start date
K

Kat3n

I create policies using Word 2003. I would like to be able to count each word
in the entire policy and use the top ten words (ignoring words such as I,
the, in, etc.) as keywords. Is there anyway to do this?
 
Here's a macro that will count the frequency of the words in a document to
allow you to determine the top 10 most frequently used words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are excluded:
" & Excludes & ". Enter words that you wish to exclude, surrounding each
word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord = ""
'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And
Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) &
vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
wdAlignParagraphCenter
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Number of different words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum))
System.Cursor = wdCursorNormal
' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words
", vbOKOnly, "Finished")
Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
 
Hi Doug,

I apologize in advance for hijacking this thread - sorry!

I am working on a similar project, as I need to count word frequency in a
document. That being said, I am relatively new to macros and VBA.

I copied and pasted in the code into a module and there where numerous
syntax errors (lines of code in red font). I must be doing something wrong.

Any assistance would be appreciated...thanks in advance,
george

Doug Robbins - Word MVP said:
Here's a macro that will count the frequency of the words in a document to
allow you to determine the top 10 most frequently used words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are excluded:
" & Excludes & ". Enter words that you wish to exclude, surrounding each
word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord = ""
'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And
Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) &
vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
wdAlignParagraphCenter
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Number of different words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum))
System.Cursor = wdCursorNormal
' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words
", vbOKOnly, "Finished")
Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

Kat3n said:
I create policies using Word 2003. I would like to be able to count each
word
in the entire policy and use the top ten words (ignoring words such as I,
the, in, etc.) as keywords. Is there anyway to do this?
 
Newsgroup posting forces a limited line length, and that causes long code
lines to be broken by a carriage return. When you copy/paste the code, VBA
sees the broken lines as syntax errors.

Wherever you see a line that starts at the left margin, and that line or the
next line is red, you need to delete the carriage return that causes the
line to break. For example, these two lines should be only one line:

If SingleWord < "A" Or SingleWord > "z" Then SingleWord
= "" 'Out of range?

--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
Hi Doug,

I apologize in advance for hijacking this thread - sorry!

I am working on a similar project, as I need to count word frequency
in a document. That being said, I am relatively new to macros and VBA.

I copied and pasted in the code into a module and there where numerous
syntax errors (lines of code in red font). I must be doing something
wrong.

Any assistance would be appreciated...thanks in advance,
george

Doug Robbins - Word MVP said:
Here's a macro that will count the frequency of the words in a
document to allow you to determine the top 10 most frequently used
words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words
allowed Dim Words(maxwords) As String 'Array to hold
unique words Dim Freq(maxwords) As Integer 'Frequency
counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the
document Dim Excludes As String 'Words to be
excluded Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are
excluded: " & Excludes & ". Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order",
"FREQ") If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord
= "" 'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then
SingleWord = "" 'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been
exceeded. Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " &
WordNum Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq
And Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab &
Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add
BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore
"Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add

ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Number of different words in
Document"


ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor =
wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & "
different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

Kat3n said:
I create policies using Word 2003. I would like to be able to count
each word
in the entire policy and use the top ten words (ignoring words such
as I, the, in, etc.) as keywords. Is there anyway to do this?
 
Jay,

I appreciate that piece of info. I am too much of a noob to figure that out
on my own.

I will fix the code and try again.

Thanks again,
george

Jay Freedman said:
Newsgroup posting forces a limited line length, and that causes long code
lines to be broken by a carriage return. When you copy/paste the code, VBA
sees the broken lines as syntax errors.

Wherever you see a line that starts at the left margin, and that line or the
next line is red, you need to delete the carriage return that causes the
line to break. For example, these two lines should be only one line:

If SingleWord < "A" Or SingleWord > "z" Then SingleWord
= "" 'Out of range?

--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
Hi Doug,

I apologize in advance for hijacking this thread - sorry!

I am working on a similar project, as I need to count word frequency
in a document. That being said, I am relatively new to macros and VBA.

I copied and pasted in the code into a module and there where numerous
syntax errors (lines of code in red font). I must be doing something
wrong.

Any assistance would be appreciated...thanks in advance,
george

Doug Robbins - Word MVP said:
Here's a macro that will count the frequency of the words in a
document to allow you to determine the top 10 most frequently used
words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words
allowed Dim Words(maxwords) As String 'Array to hold
unique words Dim Freq(maxwords) As Integer 'Frequency
counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the
document Dim Excludes As String 'Words to be
excluded Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are
excluded: " & Excludes & ". Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order",
"FREQ") If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord
= "" 'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then
SingleWord = "" 'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been
exceeded. Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " &
WordNum Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq
And Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab &
Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add
BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore
"Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add

ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Number of different words in
Document"


ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor =
wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & "
different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

I create policies using Word 2003. I would like to be able to count
each word
in the entire policy and use the top ten words (ignoring words such
as I, the, in, etc.) as keywords. Is there anyway to do this?
 
Works perfectly! Exactly what I needed.

Thanks!

george 16-17 said:
Jay,

I appreciate that piece of info. I am too much of a noob to figure that out
on my own.

I will fix the code and try again.

Thanks again,
george

Jay Freedman said:
Newsgroup posting forces a limited line length, and that causes long code
lines to be broken by a carriage return. When you copy/paste the code, VBA
sees the broken lines as syntax errors.

Wherever you see a line that starts at the left margin, and that line or the
next line is red, you need to delete the carriage return that causes the
line to break. For example, these two lines should be only one line:

If SingleWord < "A" Or SingleWord > "z" Then SingleWord
= "" 'Out of range?

--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
Hi Doug,

I apologize in advance for hijacking this thread - sorry!

I am working on a similar project, as I need to count word frequency
in a document. That being said, I am relatively new to macros and VBA.

I copied and pasted in the code into a module and there where numerous
syntax errors (lines of code in red font). I must be doing something
wrong.

Any assistance would be appreciated...thanks in advance,
george

:

Here's a macro that will count the frequency of the words in a
document to allow you to determine the top 10 most frequently used
words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words
allowed Dim Words(maxwords) As String 'Array to hold
unique words Dim Freq(maxwords) As Integer 'Frequency
counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the
document Dim Excludes As String 'Words to be
excluded Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are
excluded: " & Excludes & ". Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order",
"FREQ") If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord
= "" 'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then
SingleWord = "" 'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been
exceeded. Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " &
WordNum Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq
And Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab &
Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add
BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore
"Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add

ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Number of different words in
Document"


ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor =
wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & "
different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

I create policies using Word 2003. I would like to be able to count
each word
in the entire policy and use the top ten words (ignoring words such
as I, the, in, etc.) as keywords. Is there anyway to do this?
 
I've went at the problem a little differently, I also tried to remove common
contractions, possesives, etc.

'
' WordFrequencyCount
' Creates a list of all the words (and their frequency) in the active document
' and presents the words & frequencies in a new document.
' It attempts to remove common contractions, possesives, numbers and
punctuation.
' Is also has a small exclude list (which can be adjusted).
'
Sub WordFrequencyCount()
Dim WordList() As String
Dim WordCount() As Long
Dim nWords As Long
Dim Index As Long
Dim actDoc As Document
Dim newDoc As Document
Dim oTable As Table
Dim oRow As Row
Dim aWord As Object
Dim sWord As String
Dim sExcludeList As String
Dim bFrequencyThreshold As Byte
'
' The variable bFrequencyThreshold is the minimum frequency needed
' to have the word & frequency count reported. The value of 1 (or 0)
' reports every word counted. The value of 5 omits all words
' with a frequency less than 5. (A higher number allows the macro
' to run faster, since fewer items are added to the table.)
'
bFrequencyThreshold = 5
'
'
ReDim WordList(1)
ReDim WordCount(1)
WordList(1) = ""
WordCount(1) = 0
nWords = 0
Set actDoc = ActiveDocument
For Each aWord In actDoc.Words
sWord = Trim(aWord.Text)
'
' Any of the next six lines of code can be omitted.
' Omit a line of code by placing a comma before it.
' The line will then turn green (or the same color as this line).
' Chr(160) = non-breaking spaces
'
sWord = Replace(sWord, Chr(160), "")
sWord = RemoveContractions(sWord)
If IsExcluded(sWord) Then sWord = ""
If IsAllDigits(sWord) Then sWord = ""
If IsOnlyPunctuation(sWord) Then sWord = ""
If Len(sWord) = 1 Then sWord = ""
'
' Any of the above six lines of code can be omitted.
' Omit a line of code by placing a comma before it.
'
If Len(sWord) > 0 Then
Index = 1
While (Index > 0 And Index <= nWords)
If StrComp(WordList(Index), sWord, vbTextCompare) = 0 Then
WordCount(Index) = WordCount(Index) + 1
Index = 0
Else
Index = Index + 1
End If
Wend
If Index > 0 Then
If nWords = 0 Then
nWords = 1
Else
nWords = nWords + 1
Application.StatusBar = "Counting Tokens in Document: "
& nWords
ReDim Preserve WordList(nWords)
ReDim Preserve WordCount(nWords)
End If
WordList(nWords) = sWord
WordCount(nWords) = 1
End If
End If
Next aWord
Set newDoc = Documents.Add
Set oRange = newDoc.Range
Set oTable = newDoc.Tables.Add(oRange, NumRows:=1, NumColumns:=2)
With oTable.Range.Rows(1)
.Cells(1).Range.Text = "Words"
.Cells(2).Range.Text = "Count"
End With
For Index = 1 To nWords
Application.StatusBar = "Creating Table: " & nWords & ": " & Index
If WordCount(Index) >= bFrequencyThreshold Then
Set oRow = oTable.Rows.Add
With oRow
.Cells(1).Range.Text = WordList(Index)
.Cells(2).Range.Text = WordCount(Index)
End With
End If
Next Index
If oTable.Rows.Count > 2 Then
oTable.Sort ExcludeHeader:=True, _
FieldNumber:=2, _
SortFieldType:=wdSortFieldNumeric, _
SortOrder:=wdSortOrderDescending
End If
End Sub
'
' IsExcluded
'
' Note that comparisons are not case sensitive,
' so that "ACT" would be excluded, if "act" is excluded.
'
' Each word in the sExcludeList
' should be separated on each side by a space (" ").
' Any word can be added, and they don't need to be in any order.
'
Private Function IsExcluded(ByVal sWord As String) As Boolean
Const sExcludeList As String = " a about act after again all also an and
any are as ask at away back be been before between big but by call came can
cause close come could did do does down each end even every far few find
first follow for form from get give go good great had hard has have he help
her here high him his hot how I if in is it its just keep kind know large
last late left let like little live long look low made make man many may me
mean might more most move much must my name near need never new no not now of
off on one only or other our out over own part people place press put round
said same saw say see set she should show side small so some stand still such
take tell than that the their them then there these they thing think this
those through to too try turn two under up us use very want was way we well
went were what when where which while who why will with word work would you
your "

If InStr(1, sExcludeList, " " & sWord & " ", vbTextCompare) > 0 Then
IsExcluded = True
End If
End Function

'
' IsOnlyPunctuation
' Returns true only if every character in a word string is punctuation
'
Private Function IsOnlyPunctuation(ByVal sWord As String) As Boolean
Dim sPunctuation As String
Dim sChar As String
Dim nIndex As Long
sPunctuation = " .,?';:![]{}()-_" & Chr(9) & Chr(10) & Chr(11) & Chr(12)
& Chr(13) & Chr(14) & Chr(34) & Chr(145) & Chr(146) & Chr(147) & Chr(148) &
Chr(150) & Chr(151) & Chr(160)
nIndex = 1
While (nIndex <= Len(sWord))
sChar = Mid(sWord, nIndex, 1)
If InStr(1, sPunctuation, sChar, vbBinaryCompare) = 0 Then
IsOnlyPunctuation = False
Exit Function
End If
nIndex = nIndex + 1
Wend
IsOnlyPunctuation = True
End Function
'
' IsAllDigits returns true if every character is a digit (0-9)
'
Private Function IsAllDigits(ByVal sWord As String) As Boolean
Dim sChar As String
Dim nIndex As Long
nIndex = 1
While (nIndex <= Len(sWord))
sChar = Mid(sWord, nIndex, 1)
If sChar < "0" Or sChar > "9" Then
IsAllDigits = False
Exit Function
End If
nIndex = nIndex + 1
Wend
IsAllDigits = True
End Function
'
' RemoveContractions
' Attempts to remove some obvious contractions and possesives
' Chr(146) = the closing single smart quotation mark/apostrophe
'
Private Function RemoveContractions(ByVal sWord As String) As String
If InStr(1, sWord, "'") Then
If sWord = "won't" Then
sWord = ""
ElseIf sWord = "can't" Then
sWord = "can"
ElseIf Right(sWord, 1) = "'" Then
sWord = Left(sWord, Len(sWord) - 1)
ElseIf Right(sWord, 2) = "'s" Then
sWord = Left(sWord, Len(sWord) - 2)
ElseIf Right(sWord, 2) = "'d" Then
sWord = Left(sWord, Len(sWord) - 2)
ElseIf Right(sWord, 3) = "'ll" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = "'ve" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = "'re" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = "n't" Then
sWord = Left(sWord, Len(sWord) - 3)
End If
ElseIf InStr(1, sWord, Chr(146)) Then
If sWord = "won" & Chr(146) & "t" Then
sWord = ""
ElseIf sWord = "can" & Chr(146) & "t" Then
sWord = "can"
ElseIf Right(sWord, 1) = Chr(146) Then
sWord = Left(sWord, Len(sWord) - 1)
ElseIf Right(sWord, 2) = Chr(146) & "s" Then
sWord = Left(sWord, Len(sWord) - 2)
ElseIf Right(sWord, 2) = Chr(146) & "d" Then
sWord = Left(sWord, Len(sWord) - 2)
ElseIf Right(sWord, 3) = Chr(146) & "ll" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = Chr(146) & "ve" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = Chr(146) & "re" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = "n" & Chr(146) & "t" Then
sWord = Left(sWord, Len(sWord) - 3)
End If
End If
RemoveContractions = sWord
End Function

Steven Craig Miller
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top