We've found a solution.
It turns out Word can save as UTF-8 (save as plain text, then choose
"Unicode (UTF-8)" in the dialog). And Excel can open the UTF-8 file
created by Word and the Japanese characters are still correct.
So, we'll just automate Word from Excel to do the conversion to UTF-8:
Const WORD_TEXT_FORMAT As Long = 2 'FileFormat:=wdFormatText
Const WORD_UTF8_ENCODING As Long = 65001 'Encoding:=65001
ObjWordDoc.SaveAs _
Filename:="save as UTF-8.txt", _
FileFormat:=WORD_TEXT_FORMAT, _
Encoding:=WORD_UTF8_ENCODING
Greg
On Jun 2, 12:30 pm, Greg Lovern <gr...@gregl.net> wrote:
> If I save an Excel file with Japanese characters as "Text (Tab
> delimited)", the Japanese characters are converted to question marks,
> as expected with a plain text file. The same thing happens to some
> other characters too, such as some (not all) accented characters in
> some European languages.
>
> If I save an Excel file as "Unicode Text" (xlUnicodeText), I get a tab-
> delimited UTF-16 file. It works fine, preserving all the characters
> correctly. However:
>
> Our company has decided to do code reviews on these files, which are
> frequently modified (the files are not actually programming code of
> course, but since they are text the code review works fine). Our
> company uses a code reviewing tool that is compatible with UTF-8, but
> not compatible with UTF16.
>
> The UTF-8 standard is compatible with the Japanese characters. We have
> other UTF-8 files (not created by Excel) that have the Japanese
> characters.
>
> What we're doing now is saving two copies: one as plain text (not
> unicode) and one as unicode (UTF-16). Then we use the plain text copy
> for the code review, and the UTF-16 copy for production. But there are
> two problems:
>
> 1) We can't effectively do the code review on changes to the Japanese
> characters, since those characters are all converted to question marks
> in the plain text version.
>
> 2) Since we're reviewing one file and using another in production,
> there is the risk of accidentally using the wrong UTF-16 file in
> production after reviewing a plain text file.
>
> This is a large company with thousands of programmers, and there is
> zero chance of getting them to use a different code reviewer
> compatible with UTF-16.
>
> Any suggestions? Is there any way in VBA to convert between UTF-8
> files and UTF-16? files?
>
> Thanks,
>
> Greg
|