Extracting Unicode characters from RTF

Joined
Mar 4, 2008
Messages
2
Reaction score
0
Hi All,
I have come across a difficult problem to do with extracting UniCode characters from RTF strings.
A detailed description of my problem is below, if anyone could help, it would be much appreciated. I've tried to make the problem as clear as possible, but if any clarification is needed please let me know.

Task
-Convert RTF2 formatted text containing foreign characters (UniCode) to PlainText.

Background

-We are using Stephan Lebans RTF2 control to display and edit text.
-RTF2 fields cannot be displayed appropriately on reports, so unformatted text must be stored in database.
-The RTF2 parser cannot handle Unicode (our overseas clients, specifically Romania, use Unicode characters), so often the rtf2.PlainText method returns strings containing ???
-I have built a simple parser to convert Hex values in rtf2.RTFText to characters
-Given a character table, I can add functionality to generate characters appropriately depending on RTF Character Set defined in .RTFText.

Question
-Where can I find a character table for the Character Sets specified in .RTFText (specifically fcharset238)?

Technical/Testing info:
Fonts
These are the 2 relevant fonts:
F1: {\f1\fnil\fcharset0 MS Sans Serif;}
F2: {\f2\fswiss\fcharset238{\*\fname Arial;}Arial CE;}

*Testing in MSWord showed that the actual font (Sans Serif, Arial etc made no difference to presented character, so fcharset is most likely the issue).

Keys
-Pressing ";" usually generates "ş" (hereby referred to as "s")
-However, when in VB6 code window it generates "º" (this probably isn't important).
-Copy/pasting from/into VB6 code window alternates between the characters.

RTF
-In RTF format, abnormal characters are partly referenced by “\’XX” with XX being their hex values. Eg the RTF string “xxx\’BAxxx” corresponds to “xxxşxxx”.
-In RTF format, abnormal characters are partly referenced by the specified font.

-So, the actual character displayed is dependent on the hex value, as well as the font (character set) specified in RTF.

Characters
Below is a table indicating my observations for a character. Hex Value and Font are the inputs.

Hex Value || Font ||Character Displayed || Unicode for Character Displayed
BA || F1 || ş || 00BA
BA || F2 || º || 015F
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top