PC Review


Reply
Thread Tools Rate Thread

Convert string to "best possible" ascii representation

 
 
Achim Domma
Guest
Posts: n/a
 
      30th Aug 2006
Hi,

I have to convert a string to its "best possible" ascii representation.
It's clear to me that this is not possible or sense full for all unicode
characters. But for most European characters it should be possible.

For example:

"Müller" should become "Muller" and "é" should become "e".

Does some functionality like this already exist?

Achim
 
Reply With Quote
 
 
 
 
=?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=
Guest
Posts: n/a
 
      30th Aug 2006
"Best possible"? Who, pray tell, is the arbiter of that? You are the one that
chooses the encoding, and there are many to choose from. If you use strict
ASCII encoding, you may have characters that render as ? Question Marks.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com




"Achim Domma" wrote:

> Hi,
>
> I have to convert a string to its "best possible" ascii representation.
> It's clear to me that this is not possible or sense full for all unicode
> characters. But for most European characters it should be possible.
>
> For example:
>
> "Müller" should become "Muller" and "é" should become "e".
>
> Does some functionality like this already exist?
>
> Achim
>

 
Reply With Quote
 
Morten Wennevik
Guest
Posts: n/a
 
      30th Aug 2006
Hi Achim,

There is nothing out of the box that will do this for you.
You are probably best served using a lookup table to convert the
characters, but there is a method that will approximate most of the
characters. This is not guaranteed to work!

string s = "éëæñúüøå";
byte[] data = Encoding.GetEncoding("ISO-8859-6").GetBytes(s);
s = Encoding.GetEncoding("ISO-8859-1").GetString(data);

// s == "eeanuuoa"

--
Happy Coding!
Morten Wennevik [C# MVP]
 
Reply With Quote
 
Larry Lard
Guest
Posts: n/a
 
      30th Aug 2006
Achim Domma wrote:
> Hi,
>
> I have to convert a string to its "best possible" ascii representation.
> It's clear to me that this is not possible or sense full for all unicode
> characters. But for most European characters it should be possible.
>
> For example:
>
> "Müller" should become "Muller" and "é" should become "e".
>
> Does some functionality like this already exist?


Would you say this is something that's commonly done? Because that's
what gets in the Framework.

By the way, what are you going to do with the Scandinavian å and ø ?
Replacing them with a and o would be wrong at best.

--
Larry Lard
(E-Mail Removed)
The address is real, but unread - please reply to the group
For VB and C# questions - tell us which version
 
Reply With Quote
 
Cor Ligthert [MVP]
Guest
Posts: n/a
 
      30th Aug 2006
Achim,

Maybe these two links on this page can help you in addition to the other
information you have got.

http://www.vb-tips.com/dbPages.aspx?...f-76c81839e6c9

I hope this helps,

Cor

"Achim Domma" <(E-Mail Removed)> schreef in bericht
news:44f56f65$0$26951$(E-Mail Removed)...
> Hi,
>
> I have to convert a string to its "best possible" ascii representation.
> It's clear to me that this is not possible or sense full for all unicode
> characters. But for most European characters it should be possible.
>
> For example:
>
> "Müller" should become "Muller" and "é" should become "e".
>
> Does some functionality like this already exist?
>
> Achim



 
Reply With Quote
 
joachim@yamagata-europe.com
Guest
Posts: n/a
 
      30th Aug 2006
> there is a method that will approximate most of the
> characters. This is not guaranteed to work!


I once needed a converter from any codepage to any codepage (as a
matter of fact, all windows codepages to all macintosh codepages). On
this link you can get all the
mappings you'll need for ASCII to Unicode:

http://www.unicode.org/Public/MAPPINGS/VENDORS/

I wrote a parser that built a substitution matrix from two files to
only switch the characters that had different ASCII codes for the same
unicode value. In your case, I'd suggest
you build your matrix from one single file (don't hard code it to keep
your solution flexible).

To make the substitiutions I implemented an Aho-Corasick engine with
callbacks
(you'll definitely want to use this if you want your replacement to be
efficient when processing large files - let's say 1GB)

http://en.wikipedia.org/wiki/Aho-Corasick_algorithm

With this method you are in complete control of what you want to
change. It is also flexible, because you only need to change the file
which holds your substitutions.

Drop me a line and I'll send you some code,

Best Regards,
Joachim

 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert String Representation of Excel Constant to Actual Value jgrob3@hotmail.com Microsoft Excel Programming 5 10th Jul 2007 09:49 AM
Convert.Int32("0")causes{"Input string was not in a correct format =?Utf-8?B?SmFtZXMgUG9zZQ==?= Microsoft Dot NET 1 23rd Feb 2006 03:01 AM
Convert "char" into "string" in java privetv7 Microsoft VC .NET 2 27th Dec 2004 09:27 PM
Trying to convert string "TextText" to "Text Joebloggs Microsoft ASP .NET 2 12th Aug 2004 12:03 PM
Convert String do double "," "." michi Microsoft VB .NET 4 25th Jan 2004 04:36 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 11:15 AM.