PC Review


Reply
Thread Tools Rate Thread

Convert text encoded with character referense ({) to unicode or uft-8

 
 
Daniel Köster
Guest
Posts: n/a
 
      28th May 2004
Is there someone who has got some tips on how to convert text encoded with
character referense ({) to unicode or uft-8 format using VB.net? Is
there a function or something that can help with the conversion?

To use a simple replace "this" with "that" is not an option since there are
som asian-texts that I need to convert as well. (chinese, thai and
japanese;
the replace list would be to large to handle)

What i want to do is to be able to compare a file coded with character
references (i.e. {) with a file coded with normal unicode characters
(i.e. ö,ä,å)

Best regards
Daniel


 
Reply With Quote
 
 
 
 
Jon Skeet [C# MVP]
Guest
Posts: n/a
 
      28th May 2004
Daniel Köster <(E-Mail Removed)> wrote:
> Is there someone who has got some tips on how to convert text encoded with
> character referense ({) to unicode or uft-8 format using VB.net? Is
> there a function or something that can help with the conversion?
>
> To use a simple replace "this" with "that" is not an option since there are
> som asian-texts that I need to convert as well. (chinese, thai and
> japanese;
> the replace list would be to large to handle)
>
> What i want to do is to be able to compare a file coded with character
> references (i.e. {) with a file coded with normal unicode characters
> (i.e. ö,ä,å)


Just do "normal" parsing to find the &#xxx; to start with, then use
Substring (or whatever) to get the xxx bit, parse it as an integer
(Int32.Parse or Convert.ToInt32) and cast the result to a character.

--
Jon Skeet - <(E-Mail Removed)>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
 
Reply With Quote
 
 
 
 
Cor Ligthert
Guest
Posts: n/a
 
      28th May 2004
:-)


 
Reply With Quote
 
Mihai N.
Guest
Posts: n/a
 
      29th May 2004
> Just do "normal" parsing to find the &#xxx; to start with, then use
> Substring (or whatever) to get the xxx bit, parse it as an integer
> (Int32.Parse or Convert.ToInt32) and cast the result to a character.


HttpUtility.HtmlDecode
HttpUtility.HtmlEncode


--
Mihai
-------------------------
Replace _year_ with _ to get the real email
 
Reply With Quote
 
Daniel Köster
Guest
Posts: n/a
 
      1st Jun 2004
Thank you very much!!!

Best regards
Daniel
"Mihai N." <(E-Mail Removed)> wrote in message
news:Xns94F810B4B753DMihaiN@204.127.204.17...
> > Just do "normal" parsing to find the &#xxx; to start with, then use
> > Substring (or whatever) to get the xxx bit, parse it as an integer
> > (Int32.Parse or Convert.ToInt32) and cast the result to a character.

>
> HttpUtility.HtmlDecode
> HttpUtility.HtmlEncode
>
>
> --
> Mihai
> -------------------------
> Replace _year_ with _ to get the real email



 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Protect input data stored in external files (or Deserialize encoded xml file? (serialized and encoded)) Magnus Microsoft C# .NET 2 5th Apr 2006 11:27 AM
Open "encoded text" document? How it gets encoded? =?Utf-8?B?U29veg==?= Microsoft Word Document Management 0 27th Sep 2005 04:15 AM
Unicode character in non-unicode text file =?Utf-8?B?ZGJhbGRp?= Microsoft Dot NET 6 8th Jul 2005 05:59 AM
convert text encoded with character referense (&#123;) to unicode or uft-8 Daniel Köster Microsoft Dot NET Framework 1 29th May 2004 09:37 AM
Object referense not set to instance error Stephen Microsoft VB .NET 4 3rd Oct 2003 04:28 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 05:08 AM.