Converting Unicode

S

sams

Hi @all,

I'm searching for a solution for the following problem:
I want to replace all unicode characters in a string with a valid
substituition.

For example:

string s = "Catalán";
string s2 = ModifyMyString(s); //s2 = "Catal\xC3\xA1n"

Since replacing unicode characters in a string that way, should be a
very common task, I asked myself whether there is a function in the
..NET-Framework, that does this job. Doing a s.Replace("á","\xC3\xA1")
would not be a very effective way cause there are "many" unicode
characters. :)

Thanks for help
Sams
 
D

Dmytro Lapshyn [MVP]

Hi,

You can probably use Regular Expressions to replace multiple occurences of
the same character with the substitution sequence.

--
Sincerely,
Dmytro Lapshyn [Visual Developer - Visual C# MVP]

Hi @all,

I'm searching for a solution for the following problem:
I want to replace all unicode characters in a string with a valid
substituition.

For example:

string s = "Catalán";
string s2 = ModifyMyString(s); //s2 = "Catal\xC3\xA1n"

Since replacing unicode characters in a string that way, should be a
very common task, I asked myself whether there is a function in the
..NET-Framework, that does this job. Doing a s.Replace("á","\xC3\xA1")
would not be a very effective way cause there are "many" unicode
characters. :)

Thanks for help
Sams
 
B

Bruce Wood

What is the nature of your substitution? Are you trying to convert
Unicode to UTF-8? If so, there are methods for doing this within the
Framework.

If the encoding is totally your own then you would need to create a new
subclass of Encoding (if you want to build the Cadillac version).
 
S

sams

Thanks so far for your suggestions.

To answer the question about the nature of the substitution Bruce
asked: I'm reading content from a SQL Server 2000 and want to insert it
into an PostgreSQL DB. I think the database driver I use only accepts
ASCII encodings (characters [0..9][A..z] and those replacement strings
I already mentioned). The database, of course, is unicode compatible.
Since my knowledge of unicode/utf-8 is not sufficient enough, I'm going
to find those functions Bruce mentioned. I will keep you up to date. If
someone has another idea I would be very happy.

Sams
 
D

Dmytro Lapshyn [MVP]

Sams,

The real question is how are you going to pass UTF-8 characters to the
driver. Remember that System.String is *always* Unicode, so unless you have
a way to pass a byte array, you might have hard time passing a UTF-8 string.
Can you please elaborate on the driver interface you are using?
 
J

Joerg Jooss

sams said:
Hi @all,

I'm searching for a solution for the following problem:
I want to replace all unicode characters in a string with a valid
substituition.

For example:

string s = "Catalán";
string s2 = ModifyMyString(s); //s2 = "Catal\xC3\xA1n"

Since replacing unicode characters in a string that way, should be a
very common task, I asked myself whether there is a function in the
.NET-Framework, that does this job. Doing a s.Replace("á","\xC3\xA1")
would not be a very effective way cause there are "many" unicode
characters. :)

Characters and thus strings in .NET are alyways Unicode. There's no
difference between replacing characters with characters and replacing
Unicode characters with characters. And "\xC3\xA1" is not a character,
but a string that says
\xC3\xA1

You seem to be confusing these things with character encoding?

Cheers,
 
M

Mihai N.

string s = "Catalán";
string s2 = ModifyMyString(s); //s2 = "Catal\xC3\xA1n"

C3 A1 are the bytes used to represent á as UTF-8.
A .NET string is Unicode (UTF-16 representation), so probably
what you want is to convert a string to a UTF-8 byte array.
If this is the case, take a look at System.Text.UTF8Encoding

But depending on what mechanism you are using to interact with the database,
you may not need to do your own conversion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top