Finding duplicate phrases and paragraphs.

F

Frank Martin

I am copying a rare particular story from many different
newsgroups and pasting the fragments into a Word2003
document.

Is there some way to automatically find duplicated sections
of the story to as to help weld it into one seamless whole?

In the spell checker one can easily do this for duplicated
words, but I need the same thing for duplicated strings, and
even sentences.

Please help, Frank
 
K

Klaus Linke

Frank Martin said:
I am copying a rare particular story from many different newsgroups and
pasting the fragments into a Word2003 document.

Is there some way to automatically find duplicated sections of the story
to as to help weld it into one seamless whole?

In the spell checker one can easily do this for duplicated words, but I
need the same thing for duplicated strings, and even sentences.


Hi Frank,

For repeated paragraphs, you could try a wildcard search for

(^13[!^13]@^13)*\1

If a repeated paragraph is found, you'll see it at the start and end of the
selection... though there needs to be at least one paragraph in between.
For repeated paragraphs right next to each other, you could use

^13([!^13]@^13)\1


For repeated sentences or other duplicated strings of some length, you'd
need a more complicated macro.
You could read the whole document into a string. You probably can find
algorithms for finding repeated phrases in the string using Google:
http://en.wikipedia.org/wiki/Longest_common_substring_problem

Regards,
Klaus
 
F

Frank Martin

Klaus Linke said:
Frank Martin said:
I am copying a rare particular story from many different
newsgroups and pasting the fragments into a Word2003
document.

Is there some way to automatically find duplicated
sections of the story to as to help weld it into one
seamless whole?

In the spell checker one can easily do this for
duplicated words, but I need the same thing for
duplicated strings, and even sentences.


Hi Frank,

For repeated paragraphs, you could try a wildcard search
for

(^13[!^13]@^13)*\1

If a repeated paragraph is found, you'll see it at the
start and end of the selection... though there needs to be
at least one paragraph in between.
For repeated paragraphs right next to each other, you
could use

^13([!^13]@^13)\1


For repeated sentences or other duplicated strings of some
length, you'd need a more complicated macro.
You could read the whole document into a string. You
probably can find algorithms for finding repeated phrases
in the string using Google:
http://en.wikipedia.org/wiki/Longest_common_substring_problem

Regards,
Klaus


Thank you. I could not get this to work, but I have found a
site with worked examples in word.
http://www.tutorials-win.com/archive/WordDoc/
Is there any way to search this archive for a specific
example?
Frank
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top