counting sentences

D

David Newmarch

I'm looking for a way to count the number of sentences in a long document
(110,000 words). Readability Statistics might have been a way to go, but it
seems not to work on a selection of that scale - or am I just not being
patient enough?

Could anyone suggest code that might do the trick? Would welcome any
thoughts on this.
 
P

Pesach Shelnitz

Hi David,

The following macro will do this for you.

Sub SentenceCount()

MsgBox "Number of sentences in the document: " _
& ActiveDocument.Sentences.Count
End Sub
 
D

Doug Robbins - Word MVP

Run a macro containing the following code:

MsgBox ActiveDocument.Sentences.Count


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP, originally posted via msnews.microsoft.com
 
P

Peter T. Daniels

Can it somehow distinguish periods that end abbreviations and numbers
from periods at the end of sentences? If not, you'll get a bit of an
overcount.
 
P

Pesach Shelnitz

Hi Peter,

The code does not count periods (or question marks or exclamation marks)
that are followed immediately by a number or letter, but it does count any of
these punctuation marks followed by nearly anything else as a sentence. Would
a macro that counts a period, question mark, or exclamation mark followed by
a space and a capatal letter give a more accurate count for your purposes?
Can you suggest a better search criterion?
 
P

Peter T. Daniels

The problem is not what follows the period, but what preceds it.
There's no way that doesn't involve the semantics of the language to
tell which periods end sentences and which end abbreviations.

I didn't mention question marks and exclamation points because
_usually_ when they don't end a sentence, they're enclosed in
parentheses so are easy enough to skip over in the count.

If it were possible for a computer (any computer, not necessarily one
as small as a desktop) to accurately parse and interpret a human
language, _then_ you might be able to devise a foolproof sentence-
counter.

But if David just needs a general idea of the number of sentences,
then something that simply counts the three possible end-marks will
do. (Space + capital doesn't take care of the case of numbered lists,
where the listed items often begin with a capital letter.)
 
D

David Newmarch

Thank very much to all. I'll give these suggestions a try in a day or so.

The 110,000 word doc is keeping me too busy for the moment editing it, but I
thought the author might later on find it instructive to see just what
proportion of the sentences (high, but I'd like a number!) begin with exactly
the same (tedious) bunch of phrases.

An ordinary Find easily counts the number of sentences that begin with any
one phrase. So now I'll set about getting the overall count, and Peter,
you're right: all I'm looking for is a reasonable approximation.

Hope I wasn't being too frivolous in asking for your assistance. It seemed
like an interesting puzzle to solve.
 
G

Greg Maxey

Can it somehow distinguish periods that end abbreviations and numbers
from periods at the end of sentences? If not, you'll get a bit of an
overcount.





- Show quoted text -

I think only if you identify the collection of abbreviations used in
the document. This is exhaustively tested, but the following code
returns an accurate count for the sample text used:

Sub ScratchMacro()
Dim i As Long
Dim oSent As Range
For Each oSent In ActiveDocument.Range.Sentences
oSent.Select
If oSent.Words(1) = Selection.Paragraphs(1).Range.Words(1) _
And oSent.Words(1).Start = Selection.Paragraphs(1).Range.Words
(1).Start Then
Select Case oSent.Words(1)
Case "Dr", "Mr", "Mrs", "etc", "Jr", "Sr"
GoTo Skip
End Select
End If
oSent.MoveEnd wdCharacter, -1
oSent.Collapse wdCollapseEnd
oSent.MoveStartUntil Cset:=" ", Count:=wdBackward
Select Case oSent
Case "Dr.", "Mr.", "Mrs.", "etc.", "Jr.", "Sr."
'Do nothing
Case Else
i = i + 1
End Select
Skip:
Next oSent
Selection.Collapse wdCollapseEnd
MsgBox "Word thinks that there are " & ActiveDocument.Sentences.Count
& " sentences in this bit of text." _
& vbCr + vbCr & "There are really only " & i & "."
End Sub


Mr. Bojangles is having an affair with Mrs. Jones.
My name is Dr. Duck. I am a quack. My dad's name is Donald. People
call him Mr. Duck.
Mrs. Duck met Mr. Duck on Golden Pond.
 
P

Peter T. Daniels

I find that when an author has an annoying mannerism like that, if I
change _most_ of the occurrences but leave a few in (widely spaced),
the author doesn't squawk. Pointing out the gaffes might be
counterproductive.
 
D

David Newmarch

I'd go with you on that Peter, but there's also a place for a word to the
wise with someone on the threshold of a very promising career – with a
lifetime of more writing still to come. This is an author thoroughly at home
with statistics, who will probably appreciate the arithmetic!

What I'd like to know is how Word itself identifies a sentence, if
Readability Statistics can count them, and if "sentence" is recognized as an
object in VBA.
 
P

Peter T. Daniels

The first is probably proprietary information ... second, as a
linguist I have very little to no faith in computer grammar checkers
(or readability judgments) ... and third, I don't know anything about
VBA but it seems tike that's the question that's most likely to
receive an actual answer.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top