Save As Encoded Text, Unicode characters save differently

K

Ken Benson

I'm trying to save Word files containing Unicode characters to plain text so
I can clean them up and code them as Indesign Tagged Text. Save As Encoded
Text actually works nicely, except that certain characters sometimes save
successfully, and sometimes get converted to a single open parenthesis. I've
posted a tiny test file showing this at http://www.pegtype.com/test.doc.

The mystery is that the same character in two places in the same file
converts differently.

Thanks for any help.
Ken Benson
 
B

Bob Buckland ?:-\)

Hi Ken,

Interestingly, while the Reveal Formatting Task Pane
in Word did not show any formatting differences between
your paragraph #1 and #'s 2-5 using File=>Web Page Preview
and View=>Source did show differences and those are
apparently enough to confuse the Plain text converter.

If you turn on the [x] Show Paste Options button in
Tools=>Options=Edit then select all of paragraph 1,
cut-it (Ctrl+X), repaste it (Ctrl+V) and from the icon
select 'keep text only' it corrects the problem.

Copying the similar two characters from items 2-5 and
pasting them over the problem ones in item 1 also corrected
the problem.

========
I'm trying to save Word files containing Unicode characters to plain text so
I can clean them up and code them as Indesign Tagged Text. Save As Encoded
Text actually works nicely, except that certain characters sometimes save
successfully, and sometimes get converted to a single open parenthesis. I've
posted a tiny test file showing this at http://www.pegtype.com/test.doc.

The mystery is that the same character in two places in the same file
converts differently.

Thanks for any help.
Ken Benson >>
--
Let us know if this helped you,

Bob Buckland ?:)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*

Office 2003 Editions explained
http://www.microsoft.com/uk/office/editions.mspx
 
K

Ken Benson

Hi Bob

Thanks for looking into this.

Show Paste Options is probably an option for a newer version of Word (I've
got Word 2000), but I seem to be able to accomplish the same thing by using
Paste Special|Unformatted Unicode Text.

This is a workable solution. Do you have any idea how the original author
(I'm several steps removed from him) could have accomplished this?

Thank again,
Ken Benson
 
K

Ken Benson

Hi Bob

I came up with an even better solution than "cut/paste as text". I installed
OpenOffice, opened my problem file there, resaved it, and then opened it
again in Word. All the problem characters were converted nicely. This method
both fixes the problem and keeps the formatting.

Thanks for your help,
Ken Benson
 
K

Klaus Linke

Do you have any idea how the original author (I'm several steps
removed from him) could have accomplished this?


Hi Ken,

He probably used "Insert > Symbol", and the font is a symbol ("decorative") font like Symbol or Wingdings.

Since you don't want the symbols to change if you change the font (say by applying another one, or applying another style), Word inserts those symbols as a kind of symbol field. But it will never show you the field code, only the result.

The effect is as desired: The field can have any font applied, but the character will still be inserted from the font specified in the field.
The drawback is that Word usually can't tell you the code or font. AscW(Selection.Text) will return the code 40 ... "(" = opening brace, and the font dropdown will show you the font of the surrounding text.
If you select the symbol and open "Insert > Symbol" again, Word can usually tell you the font and the code.
This doesn't always work, though. Word expects the codes to be in the range from U+F000 to U+F0FF. But since Symbol fields don't change if you add or subtract multiples of 256 from that code, you often get files with messed up symbols.
The WordPerfect import filters seem to be a special culprit in this regard, but I suspect some other filters aren't working correctly either.

If you only need unformatted text, "Paste Special as text" is a good solution.
If you need to keep the formatting, I have posted a macro to turn "proper symbol fields" into regular characters... google for "SymbolsUnprotect".
For messed up symbol "fields", I haven't found a good solution, since you'd have to do a lot of processing for each character which takes too long even on moderately sized files.

Greetings,
Klaus
 
K

Klaus Linke

Hi Ken,

Nice!!!

I just tried it on a file with "messed up" symbols.
That file had given me trouble last week, because the symbols messed up in InDesign, and I couldn't fix it in Word.
I had resorted to editing the RTF code "by hand" with Find/Replace.

OO Writer 1.9.97 fixed the symbols fine. I'd keep it installed if only for that reason.

Thanks for the tip!
Klaus
 
K

Ken Benson

OO Writer 1.9.97 fixed the symbols fine. I'd keep it installed if only
for that reason.


Yes, I think I'll just be running all my bizarre author files through
OpenOffice from now on.

Ken Benson
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top