Amazing Bug in Search/Replace -- still!

U

Uriel

In an exploring mood, I dug up the following old post I sent in 2003. The
same buggy behavior, exactly as described, persists in MS Word 2003:

----- Original Message -----
From: "Uriel Wittenberg" <[email protected]>
Newsgroups:
microsoft.public.word.application.errors,microsoft.public.word.formatting.longdocs
Sent: Sunday, June 01, 2003 7:49 PM
Subject: Amazing Bug in Search/Replace


Create a 16,500-line .TXT file as follows:

--------------------------------
0 test test test test test test test test test test test test $q
1 test test test test test test test test test test test test $q
2 test test test test test test test test test test test test $q
....
16499 test test test test test test test test test test test test $q
--------------------------------

Open it in Word 2000 (9.0.6926 SR-3).

Replace all

"$q^p" (no formatting)

with

"^p" (with formatting: choose some paragraph style)

The quotation marks (") above are not included in the search & replace
strings.

Here are the amazing results I get: all but 36 of the 16,500 lines are
changed as expected. 36 are left unchanged.

The ones that don't get changed are not contiguous:

1002 test test test test test test test test test test test test $q
1258 test test test test test test test test test test test test $q
1514 test test test test test test test test test test test test $q
1770 test test test test test test test test test test test test $q
....
9962 test test test test test test test test test test test test $q

In fact, they appear to be 256 lines apart, don't they? But 1002 is the
first, and 9962 the last.

Is there some way to get reliable, comprehensive search & replace in
Word?
 
G

Graham Mayor

It works fine here - eventually - no entries are missed.
It also works (faster) with a wildcard search
([0-9]{1,}*)$q^13
replace with
\1^p
 
G

Graham Mayor

Incidentally, it works much faster if you do it in Excel.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP

My web site www.gmayor.com

<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
U

Uriel

Graham, this is a report of a bug, not a search for a better way to modify
this test file.

I don't know why it would have worked for you if you're using WinWord 2003.
What style did you use?

It works fine here - eventually - no entries are missed.
It also works (faster) with a wildcard search
([0-9]{1,}*)$q^13
replace with
\1^p
 
T

Tony Jollans

I haven't tried this one - I don't have the inclination to create the text
file :)

This would seem to be another symptom of the same problem you have with your
'China' file - does it resolve itself if you save it as a .doc file?

I think the problem is more likely to be with what Word does when it imports
text files rather than with the Find and Replace function. I appreciate that
may not be much consolation but you seem to have a known and reproducible
error and a workaround (assuming it _does_ work as a doc file) so, for the
moment at least, you will probably just have to live with the workaround.

As a side note about your observation about the lines being 256 apart but
starting at 1002. The lines in that range all have a particular length that
is different from the length of other lines - I guess that's relevant but
don't know how.
 
U

Uriel

I haven't tried this one - I don't have the inclination to create the text

Sorry, shoulda provided it. Just grab it at ftp://ftp.urielw.com/bug0.zip (a
44KB file).
This would seem to be another symptom of the same problem you have with
your 'China' file

I think you're right. I might have explored this more. This is a problem
with Edit:Find. It's quite amazing. I've got all these similar lines ......

996 test test test test test test test test test test test test $q
997 test test test test test test test test test test test test $q
998 test test test test test test test test test test test test $q
999 test test test test test test test test test test test test $q
1000 test test test test test test test test test test test test $q
1001 test test test test test test test test test test test test $q
1002 test test test test test test test test test test test test $q
1003 test test test test test test test test test test test test $q
1004 test test test test test test test test test test test test $q
1005 test test test test test test test test test test test test $q

..... and by searching on "$q^p" I can highlight one instance after another
by (after exiting the dialogue) doing ctrl-pagedown repeatedly. But when I
get to the one on the 1001 line and do ctrl-pagedown again, it highlights
the one on the 1003 line, skipping the one on the 1002 line!
does it resolve itself if you save it as a .doc file?
Yes.

you seem to have a known and reproducible error and a workaround

You mean, save as a .DOC file before doing any editing. Yes, that's a
workaround. But you have to know about the bug to know you need the
workaround. And given that Word only trips at the 1002'nd instance, after
finding the first 1001 successfully, it's easy not to know about the bug.
As a side note about your observation about the lines being 256 apart but
starting at 1002. The lines in that range all have a particular length that
is different from the length of other lines

Don't know what you mean there. The 1000 and 1001 lines are the same length
as the 1002 line.
 
G

Graham Mayor

I have to go out now, but I will try it with your test file later. FWIW
using the file I created it works just fine and I could not reproduce your
'bug'.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP

My web site www.gmayor.com

<>>< ><<> ><<> <>>< ><<> <>>< <>><<>

Graham, this is a report of a bug, not a search for a better way to
modify this test file.

I don't know why it would have worked for you if you're using WinWord
2003. What style did you use?

It works fine here - eventually - no entries are missed.
It also works (faster) with a wildcard search
([0-9]{1,}*)$q^13
replace with
\1^p


In an exploring mood, I dug up the following old post I sent in 2003.
The same buggy behavior, exactly as described, persists in MS Word
2003:

----- Original Message -----
From: "Uriel Wittenberg" <[email protected]>
Newsgroups:
microsoft.public.word.application.errors,microsoft.public.word.formatting.longdocs
Sent: Sunday, June 01, 2003 7:49 PM
Subject: Amazing Bug in Search/Replace


Create a 16,500-line .TXT file as follows:

--------------------------------
0 test test test test test test test test test test test test $q
1 test test test test test test test test test test test test $q
2 test test test test test test test test test test test test $q
...
16499 test test test test test test test test test test test test $q
--------------------------------

Open it in Word 2000 (9.0.6926 SR-3).

Replace all

"$q^p" (no formatting)

with

"^p" (with formatting: choose some paragraph style)

The quotation marks (") above are not included in the search &
replace strings.

Here are the amazing results I get: all but 36 of the 16,500 lines
are changed as expected. 36 are left unchanged.

The ones that don't get changed are not contiguous:

1002 test test test test test test test test test test test test $q
1258 test test test test test test test test test test test test $q
1514 test test test test test test test test test test test test $q
1770 test test test test test test test test test test test test $q
...
9962 test test test test test test test test test test test test $q

In fact, they appear to be 256 lines apart, don't they? But 1002 is
the first, and 9962 the last.

Is there some way to get reliable, comprehensive search & replace in
Word?
 
G

Graham Mayor

I have re-tested with the file from your ftp site (which is identical to the
file I created in Excel) and have the following observations:

If you run the search:
$q^p
replace with
^p

on an unsaved plain text file opened in Word, then the entries you quote are
indeed missed.

Save as a Word document (which is what I had done yesterday after creating
your sample) and they are not missed.

This may be to do with the fact that ^p is not merely the code for a
paragraph mark (that would be ^13), but is the area in which the formatting
of the paragraph is stored. The regular nature of the missed entries
suggests that something else may be stored here.

If you use ^13 rather than ^p in the *search string only* then all the lines
are changed as anticipated.

Whether this is a bug or a 'feature' is not something I care to debate.
Either way, this is not a forum for reporting bugs to Microsoft, but a place
for reporting observations and inviting comment from fellow users - which I
did by offering what you asked for i.e..

"Is there some way to get reliable, comprehensive search & replace in Word"

So now I have offered you three methods by which you can reliably and
comprehensively search and replace your test sample in Word. :)


--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP

My web site www.gmayor.com

<>>< ><<> ><<> <>>< ><<> <>>< <>><<>

Graham said:
I have to go out now, but I will try it with your test file later.
FWIW using the file I created it works just fine and I could not
reproduce your 'bug'.

Graham, this is a report of a bug, not a search for a better way to
modify this test file.

I don't know why it would have worked for you if you're using WinWord
2003. What style did you use?

It works fine here - eventually - no entries are missed.
It also works (faster) with a wildcard search
([0-9]{1,}*)$q^13
replace with
\1^p


In an exploring mood, I dug up the following old post I sent in
2003. The same buggy behavior, exactly as described, persists in MS
Word 2003:

----- Original Message -----
From: "Uriel Wittenberg" <[email protected]>
Newsgroups:
microsoft.public.word.application.errors,microsoft.public.word.formatting.longdocs
Sent: Sunday, June 01, 2003 7:49 PM
Subject: Amazing Bug in Search/Replace


Create a 16,500-line .TXT file as follows:

--------------------------------
0 test test test test test test test test test test test test $q
1 test test test test test test test test test test test test $q
2 test test test test test test test test test test test test $q
...
16499 test test test test test test test test test test test test $q
--------------------------------

Open it in Word 2000 (9.0.6926 SR-3).

Replace all

"$q^p" (no formatting)

with

"^p" (with formatting: choose some paragraph style)

The quotation marks (") above are not included in the search &
replace strings.

Here are the amazing results I get: all but 36 of the 16,500 lines
are changed as expected. 36 are left unchanged.

The ones that don't get changed are not contiguous:

1002 test test test test test test test test test test test test $q
1258 test test test test test test test test test test test test $q
1514 test test test test test test test test test test test test $q
1770 test test test test test test test test test test test test $q
...
9962 test test test test test test test test test test test test $q

In fact, they appear to be 256 lines apart, don't they? But 1002 is
the first, and 9962 the last.

Is there some way to get reliable, comprehensive search & replace in
Word?
 
T

Tony Jollans

In both this case and the other one, the strings which are not found include
paragraph marks which fall across 256-byte boundaries in the txt file - in
other words ...

At address (in hex) mFF character 13
At address (in hex) n00 character 10 (n = m+1)

It suggests that the conversion from txt to internal word format doesn't
catch and convert this quite properly

But the conversion from internal Word format to doc format (on saving as
doc) finishes the job off properly - in fact I think a conversion to normal
style without saving might be enough to do the trick.

Whatever the precise reason, searching for the constituent bytes of the ^p -
either ^13^10, or just ^13 appears to work correctly all the time so, bug or
not, there is a way to get round the problem.

A brief note on ^13 - although it, itself, is not a complete Word
'character', when it is at the end of a Find string, the complete Word
character is included in the selected result of the Find (an analagous
situation exists when searching for UTF-16 surrogate pairs designating
unicode code points in planes 1-16).

--
Enjoy,
Tony


Graham Mayor said:
I have re-tested with the file from your ftp site (which is identical to the
file I created in Excel) and have the following observations:

If you run the search:
$q^p
replace with
^p

on an unsaved plain text file opened in Word, then the entries you quote are
indeed missed.

Save as a Word document (which is what I had done yesterday after creating
your sample) and they are not missed.

This may be to do with the fact that ^p is not merely the code for a
paragraph mark (that would be ^13), but is the area in which the formatting
of the paragraph is stored. The regular nature of the missed entries
suggests that something else may be stored here.

If you use ^13 rather than ^p in the *search string only* then all the lines
are changed as anticipated.

Whether this is a bug or a 'feature' is not something I care to debate.
Either way, this is not a forum for reporting bugs to Microsoft, but a place
for reporting observations and inviting comment from fellow users - which I
did by offering what you asked for i.e..

"Is there some way to get reliable, comprehensive search & replace in Word"

So now I have offered you three methods by which you can reliably and
comprehensively search and replace your test sample in Word. :)


--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP

My web site www.gmayor.com

<>>< ><<> ><<> <>>< ><<> <>>< <>><<>

Graham said:
I have to go out now, but I will try it with your test file later.
FWIW using the file I created it works just fine and I could not
reproduce your 'bug'.

Graham, this is a report of a bug, not a search for a better way to
modify this test file.

I don't know why it would have worked for you if you're using WinWord
2003. What style did you use?

It works fine here - eventually - no entries are missed.
It also works (faster) with a wildcard search
([0-9]{1,}*)$q^13
replace with
\1^p



Uriel wrote:
In an exploring mood, I dug up the following old post I sent in
2003. The same buggy behavior, exactly as described, persists in MS
Word 2003:

----- Original Message -----
From: "Uriel Wittenberg" <[email protected]>
Newsgroups:
microsoft.public.word.application.errors,microsoft.public.word.formatting.lo
ngdocs
 
U

Uriel

Smart thinking, Tony!

I didn't rack my brain to figure out where the 256-byte boundaries occur in
my earlier test file, but anyone can confirm your hypothesis in their living
room by creating a text file with 255 characters on line 1 and just a bit of
stuff on line 2. Of course my mail program will chop the lines, but this
gives the idea:

--- file contents: ---
123456789 123456789 123456789 123456789 123456789 123456789 123456789
123456789 123456789 123456789 123456789 123456789 123456789 123456789
123456789 123456789 123456789 123456789 123456789 123456789 123456789
123456789 123456789 123456789 123456789 12345
xxx
----------------------

Indeed, save that as Plain text, close the file, reopen as Plain text, then
do Edit:Find to search "5^p", and you won't find it. (Searching just for
"^p" does for some reason work properly, however.)

Also, as you note, the problem doesn't occur if using "^13^10" instead of
"^p"; searching for "5^13^10" succeeds.

Gee. Edit:Find hardly has any mystery left anymore.

"Tony Jollans" <My Forename at My Surname dot com> wrote in message
In both this case and the other one, the strings which are not found include
paragraph marks which fall across 256-byte boundaries in the txt file - in
other words ...

At address (in hex) mFF character 13
At address (in hex) n00 character 10 (n = m+1)

It suggests that the conversion from txt to internal word format doesn't
catch and convert this quite properly

But the conversion from internal Word format to doc format (on saving as
doc) finishes the job off properly - in fact I think a conversion to normal
style without saving might be enough to do the trick.

Whatever the precise reason, searching for the constituent bytes of the ^p -
either ^13^10, or just ^13 appears to work correctly all the time so, bug or
not, there is a way to get round the problem.

A brief note on ^13 - although it, itself, is not a complete Word
'character', when it is at the end of a Find string, the complete Word
character is included in the selected result of the Find (an analagous
situation exists when searching for UTF-16 surrogate pairs designating
unicode code points in planes 1-16).
 
U

Uriel

Whether this is a bug or a 'feature' is not something I care to debate.

I guess I can live with that.

Thanks for your comments.

I have re-tested with the file from your ftp site (which is identical to the
file I created in Excel) and have the following observations:

If you run the search:
$q^p
replace with
^p

on an unsaved plain text file opened in Word, then the entries you quote are
indeed missed.

Save as a Word document (which is what I had done yesterday after creating
your sample) and they are not missed.

This may be to do with the fact that ^p is not merely the code for a
paragraph mark (that would be ^13), but is the area in which the formatting
of the paragraph is stored. The regular nature of the missed entries
suggests that something else may be stored here.

If you use ^13 rather than ^p in the *search string only* then all the lines
are changed as anticipated.

Whether this is a bug or a 'feature' is not something I care to debate.
Either way, this is not a forum for reporting bugs to Microsoft, but a place
for reporting observations and inviting comment from fellow users - which I
did by offering what you asked for i.e..

"Is there some way to get reliable, comprehensive search & replace in Word"

So now I have offered you three methods by which you can reliably and
comprehensively search and replace your test sample in Word. :)


--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP

My web site www.gmayor.com

<>>< ><<> ><<> <>>< ><<> <>>< <>><<>

Graham said:
I have to go out now, but I will try it with your test file later.
FWIW using the file I created it works just fine and I could not
reproduce your 'bug'.

Graham, this is a report of a bug, not a search for a better way to
modify this test file.

I don't know why it would have worked for you if you're using WinWord
2003. What style did you use?

It works fine here - eventually - no entries are missed.
It also works (faster) with a wildcard search
([0-9]{1,}*)$q^13
replace with
\1^p


In an exploring mood, I dug up the following old post I sent in
2003. The same buggy behavior, exactly as described, persists in MS
Word 2003:

----- Original Message -----
From: "Uriel Wittenberg" <[email protected]>
Newsgroups:
microsoft.public.word.application.errors,microsoft.public.word.formatting.longdocs
Sent: Sunday, June 01, 2003 7:49 PM
Subject: Amazing Bug in Search/Replace


Create a 16,500-line .TXT file as follows:

--------------------------------
0 test test test test test test test test test test test test $q
1 test test test test test test test test test test test test $q
2 test test test test test test test test test test test test $q
...
16499 test test test test test test test test test test test test $q
--------------------------------

Open it in Word 2000 (9.0.6926 SR-3).

Replace all

"$q^p" (no formatting)

with

"^p" (with formatting: choose some paragraph style)

The quotation marks (") above are not included in the search &
replace strings.

Here are the amazing results I get: all but 36 of the 16,500 lines
are changed as expected. 36 are left unchanged.

The ones that don't get changed are not contiguous:

1002 test test test test test test test test test test test test $q
1258 test test test test test test test test test test test test $q
1514 test test test test test test test test test test test test $q
1770 test test test test test test test test test test test test $q
...
9962 test test test test test test test test test test test test $q

In fact, they appear to be 256 lines apart, don't they? But 1002 is
the first, and 9962 the last.

Is there some way to get reliable, comprehensive search & replace in
Word?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top