PC Review


Reply
Thread Tools Rate Thread

How can I create a frequency wordlist of a text?

 
 
=?Utf-8?B?UC1PIE9sc3Nvbg==?=
Guest
Posts: n/a
 
      14th Jan 2007
How can I create a frequency wordlist of a text? I know I can make a
concordance index, but that function only lists one appearance per page, no
matter how many appearances of the word. I only want to know which words in a
text are the most frequent ones.
 
Reply With Quote
 
 
 
 
Jezebel
Guest
Posts: n/a
 
      14th Jan 2007
1. Make a copy oif the document.
2. Remove all punctuation and non-text matter
3. Replace all spaces with paragraph marks, so every word is on a line by
itself.
4. Copy and paste into Excel.
5. Sort and count.




"P-O Olsson" <P-O (E-Mail Removed)> wrote in message
news:092CB900-46ED-4AE8-8CCB-(E-Mail Removed)...
> How can I create a frequency wordlist of a text? I know I can make a
> concordance index, but that function only lists one appearance per page,
> no
> matter how many appearances of the word. I only want to know which words
> in a
> text are the most frequent ones.



 
Reply With Quote
 
Doug Robbins - Word MVP
Guest
Posts: n/a
 
      14th Jan 2007
Run the following macro

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are excluded:
" & Excludes & ". Enter words that you wish to exclude, surrounding each
word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord = ""
'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And
Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) &
vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
wdAlignParagraphCenter
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Number of different words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum))
System.Cursor = wdCursorNormal
' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words
", vbOKOnly, "Finished")
Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

"P-O Olsson" <P-O (E-Mail Removed)> wrote in message
news:092CB900-46ED-4AE8-8CCB-(E-Mail Removed)...
> How can I create a frequency wordlist of a text? I know I can make a
> concordance index, but that function only lists one appearance per page,
> no
> matter how many appearances of the word. I only want to know which words
> in a
> text are the most frequent ones.



 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to create a tone of a given frequency Conny Microsoft VB .NET 1 2nd Oct 2008 05:07 PM
How do I create a frequency polygon in Excel? =?Utf-8?B?QmVubnk=?= Microsoft Excel Charting 1 14th Sep 2007 01:11 PM
How do I create a more than cumulative frequency polygon ? yahoo Microsoft Excel Misc 0 20th May 2006 03:00 AM
How to I create a frequency distribtuion? =?Utf-8?B?TGVhcm5lcg==?= Microsoft Excel Worksheet Functions 1 22nd Sep 2005 12:44 AM
how can I create a histogram with relative frequency? =?Utf-8?B?cGhvbmc=?= Microsoft Excel Misc 1 30th Nov 2004 08:24 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 02:36 PM.