Convert UTF-16 Excel file to UTF-8 and back

G

Greg Lovern

If I save an Excel file with Japanese characters as "Text (Tab
delimited)", the Japanese characters are converted to question marks,
as expected with a plain text file. The same thing happens to some
other characters too, such as some (not all) accented characters in
some European languages.

If I save an Excel file as "Unicode Text" (xlUnicodeText), I get a tab-
delimited UTF-16 file. It works fine, preserving all the characters
correctly. However:

Our company has decided to do code reviews on these files, which are
frequently modified (the files are not actually programming code of
course, but since they are text the code review works fine). Our
company uses a code reviewing tool that is compatible with UTF-8, but
not compatible with UTF16.

The UTF-8 standard is compatible with the Japanese characters. We have
other UTF-8 files (not created by Excel) that have the Japanese
characters.

What we're doing now is saving two copies: one as plain text (not
unicode) and one as unicode (UTF-16). Then we use the plain text copy
for the code review, and the UTF-16 copy for production. But there are
two problems:

1) We can't effectively do the code review on changes to the Japanese
characters, since those characters are all converted to question marks
in the plain text version.

2) Since we're reviewing one file and using another in production,
there is the risk of accidentally using the wrong UTF-16 file in
production after reviewing a plain text file.

This is a large company with thousands of programmers, and there is
zero chance of getting them to use a different code reviewer
compatible with UTF-16.


Any suggestions? Is there any way in VBA to convert between UTF-8
files and UTF-16? files?


Thanks,

Greg
 
G

Greg Lovern

We've found a solution.

It turns out Word can save as UTF-8 (save as plain text, then choose
"Unicode (UTF-8)" in the dialog). And Excel can open the UTF-8 file
created by Word and the Japanese characters are still correct.

So, we'll just automate Word from Excel to do the conversion to UTF-8:

Const WORD_TEXT_FORMAT As Long = 2 'FileFormat:=wdFormatText
Const WORD_UTF8_ENCODING As Long = 65001 'Encoding:=65001

ObjWordDoc.SaveAs _
Filename:="save as UTF-8.txt", _
FileFormat:=WORD_TEXT_FORMAT, _
Encoding:=WORD_UTF8_ENCODING


Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top