Detecting any type of dieresis in a string...

A

almurph

Hi,

I'm wondering can you help me please. I am looking for a way to
detect any type of dieresis in a string. You know the ones - a little
symbol above or below the letter that changes it sound form when
talking. Examples include: French grave accent, German umlaut, etc,
etc...

I don't know how to do this. I am looking for a nice way to be able to
detect any of these letters and replace them by the equivalent normal
letter...

Comments/suggestions/code-samples much appreciated.

Thank you,
Al.
 
J

Jeff Johnson

I'm wondering can you help me please. I am looking for a way to
detect any type of dieresis in a string.

There is only one type of dieresis and that's two dots, exactly like the
German umlaut. What you're asking for are accent marks, and I believe the
word you're looking for is "diacritic."
You know the ones - a little
symbol above or below the letter that changes it sound form when
talking. Examples include: French grave accent, German umlaut, etc,
etc...

I don't know how to do this. I am looking for a nice way to be able to
detect any of these letters and replace them by the equivalent normal
letter...

Unicode has something called Normalization which does exactly what you're
looking for. Do a search for it.
 
H

Harlan Messinger

Jeff said:
There is only one type of dieresis and that's two dots, exactly like the
German umlaut. What you're asking for are accent marks, and I believe the
word you're looking for is "diacritic."


Unicode has something called Normalization which does exactly what you're
looking for. Do a search for it.

Whatever information you find, make sure you understand what these
diacritics represent in different languages in the context of your
reason for wanting to identify them. In some languages, a letter with a
diacritic is considered to be that letter with a diacritic, as in
French, where the e's in "très", "parlé", and "prêt" are all considered
the letter "e" with one or another accent and "ç" is likewise not
considered a separate letter from "c", with the cedilla serving only to
indicate that the letter is to be pronounced as "s" in a particular use.
In Hungarian, on the other hand, ö and ü are considered separate letters
from o and u, respectively. (It isn't clear to me whether a and á, e and
é, etc., are officially considered pairs of separate letters in
Hungarian. I'm not sure I can rely on what I'm finding on-line.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top