PC Review


Reply
Thread Tools Rate Thread

Convert UTF-16 Excel file to UTF-8 and back

 
 
Greg Lovern
Guest
Posts: n/a
 
      2nd Jun 2009
If I save an Excel file with Japanese characters as "Text (Tab
delimited)", the Japanese characters are converted to question marks,
as expected with a plain text file. The same thing happens to some
other characters too, such as some (not all) accented characters in
some European languages.

If I save an Excel file as "Unicode Text" (xlUnicodeText), I get a tab-
delimited UTF-16 file. It works fine, preserving all the characters
correctly. However:

Our company has decided to do code reviews on these files, which are
frequently modified (the files are not actually programming code of
course, but since they are text the code review works fine). Our
company uses a code reviewing tool that is compatible with UTF-8, but
not compatible with UTF16.

The UTF-8 standard is compatible with the Japanese characters. We have
other UTF-8 files (not created by Excel) that have the Japanese
characters.

What we're doing now is saving two copies: one as plain text (not
unicode) and one as unicode (UTF-16). Then we use the plain text copy
for the code review, and the UTF-16 copy for production. But there are
two problems:

1) We can't effectively do the code review on changes to the Japanese
characters, since those characters are all converted to question marks
in the plain text version.

2) Since we're reviewing one file and using another in production,
there is the risk of accidentally using the wrong UTF-16 file in
production after reviewing a plain text file.

This is a large company with thousands of programmers, and there is
zero chance of getting them to use a different code reviewer
compatible with UTF-16.


Any suggestions? Is there any way in VBA to convert between UTF-8
files and UTF-16? files?


Thanks,

Greg
 
Reply With Quote
 
 
 
 
Greg Lovern
Guest
Posts: n/a
 
      2nd Jun 2009
We've found a solution.

It turns out Word can save as UTF-8 (save as plain text, then choose
"Unicode (UTF-8)" in the dialog). And Excel can open the UTF-8 file
created by Word and the Japanese characters are still correct.

So, we'll just automate Word from Excel to do the conversion to UTF-8:

Const WORD_TEXT_FORMAT As Long = 2 'FileFormat:=wdFormatText
Const WORD_UTF8_ENCODING As Long = 65001 'Encoding:=65001

ObjWordDoc.SaveAs _
Filename:="save as UTF-8.txt", _
FileFormat:=WORD_TEXT_FORMAT, _
Encoding:=WORD_UTF8_ENCODING


Greg



On Jun 2, 12:30 pm, Greg Lovern <gr...@gregl.net> wrote:
> If I save an Excel file with Japanese characters as "Text (Tab
> delimited)", the Japanese characters are converted to question marks,
> as expected with a plain text file. The same thing happens to some
> other characters too, such as some (not all) accented characters in
> some European languages.
>
> If I save an Excel file as "Unicode Text" (xlUnicodeText), I get a tab-
> delimited UTF-16 file. It works fine, preserving all the characters
> correctly. However:
>
> Our company has decided to do code reviews on these files, which are
> frequently modified (the files are not actually programming code of
> course, but since they are text the code review works fine). Our
> company uses a code reviewing tool that is compatible with UTF-8, but
> not compatible with UTF16.
>
> The UTF-8 standard is compatible with the Japanese characters. We have
> other UTF-8 files (not created by Excel) that have the Japanese
> characters.
>
> What we're doing now is saving two copies: one as plain text (not
> unicode) and one as unicode (UTF-16). Then we use the plain text copy
> for the code review, and the UTF-16 copy for production. But there are
> two problems:
>
> 1) We can't effectively do the code review on changes to the Japanese
> characters, since those characters are all converted to question marks
> in the plain text version.
>
> 2) Since we're reviewing one file and using another in production,
> there is the risk of accidentally using the wrong UTF-16 file in
> production after reviewing a plain text file.
>
> This is a large company with thousands of programmers, and there is
> zero chance of getting them to use a different code reviewer
> compatible with UTF-16.
>
> Any suggestions? Is there any way in VBA to convert between UTF-8
> files and UTF-16? files?
>
> Thanks,
>
> Greg


 
Reply With Quote
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
convert a .mde file back to a .mdb =?Utf-8?B?V2lsbGlhbQ==?= Microsoft Access 4 24th Feb 2011 07:54 PM
Can I convert my MDE file back to MDB Database? Lorne Microsoft Access Security 2 22nd Nov 2009 06:58 AM
How convert PDS file back to Wordfile Henpiet Microsoft Word Document Management 2 3rd Jul 2009 05:32 AM
convert pocket excel back to standard excel =?Utf-8?B?a2V2cm95YWw=?= Microsoft Excel Misc 1 16th Feb 2006 11:35 AM
Can you convert a pdf file back into Word? =?Utf-8?B?TGl6IE1vbnRlcmVv?= Microsoft Word Document Management 2 7th Dec 2005 10:41 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 11:23 AM.