German special character and string.Replace() method

C

Christian Schwarz

Hi,

I've noticed a problem with the string.Replace() and the
StringBuilder.Replace() methods.

I try to replace the german special character "ß" (sharp spoken "ss") in
order to mask the character with "ß" for showing it in the OpenNETCF's
HTMLViewer control. The problem is, that the Replace() method finds the "ß"
in each string containing "ss". When calling Replace("ß", "ß") the
first "s" of an "ss" sequence is replaced by "ß" resulting in
"ßs"! That behaviour is bad, because "ß" isn't the same as "ss".

Is there a way to influence this behaviour? Is there another way than to
scan the string manually?

Btw, other string and StringBuilder methods (like IndexOf()) also treat "ss"
as "ß" ...

Greetings, Christian
 
J

Jon Skeet [C# MVP]

Christian Schwarz said:
I've noticed a problem with the string.Replace() and the
StringBuilder.Replace() methods.

I try to replace the german special character "ß" (sharp spoken "ss") in
order to mask the character with "ß" for showing it in the OpenNETCF's
HTMLViewer control. The problem is, that the Replace() method finds the "ß"
in each string containing "ss". When calling Replace("ß", "ß") the
first "s" of an "ss" sequence is replaced by "ß" resulting in
"ßs"! That behaviour is bad, because "ß" isn't the same as "ss".

Is there a way to influence this behaviour? Is there another way than to
scan the string manually?

Btw, other string and StringBuilder methods (like IndexOf()) also treat "ss"
as "ß" ...

Hmm. You may find that changing the culture of the current thread to
CultureInfo.InvariantCulture temporarily fixes things.

Alternatively, according to the documentation, StringBuilder.Replace
does an ordinal search instead of a culture-sensitive one, so that may
be an easier option for you.
 
C

Christian Schwarz

Hmm. You may find that changing the culture of the current thread to
CultureInfo.InvariantCulture temporarily fixes things.

Maybe this is a solution. But I would really dislike to do this ...
Alternatively, according to the documentation, StringBuilder.Replace
does an ordinal search instead of a culture-sensitive one, so that may
be an easier option for you.

Although the documentation states that StringBuilder.Replace() is
culture-insensitive, it behaves exactly as string.Replace().

string s = new StringBuilder("Wirst du das wohl lassen?").Replace("ß",
""ß").ToString()

"s" contains "Wirst du das wohl laßsen?" instead of "Wirst du das wohl
lassen?" ...

Greetings, Christian
 
J

Jon Skeet [C# MVP]

Christian Schwarz said:
Maybe this is a solution. But I would really dislike to do this ...

Understandably.
Although the documentation states that StringBuilder.Replace() is
culture-insensitive, it behaves exactly as string.Replace().

string s = new StringBuilder("Wirst du das wohl lassen?").Replace("ß",
""ß").ToString()

"s" contains "Wirst du das wohl laßsen?" instead of "Wirst du das wohl
lassen?" ...

Dear me. That sounds like a nasty bug.

What culture are you in, exactly? I'm trying to reproduce this (on the
desktop first) and failing at the moment.
 
C

Christian Schwarz

Dear me. That sounds like a nasty bug.
What culture are you in, exactly? I'm trying to reproduce this (on the
desktop first) and failing at the moment.

The current culture is "de-DE".

Btw, I did all tests on the mobile device ...

Greetings, Christian
 
J

Jon Skeet [C# MVP]

Christian Schwarz said:
The current culture is "de-DE".

Btw, I did all tests on the mobile device ...

Hmm. Very odd. I can't test on a device just at the moment - it may be
Monday before I get time to, I'm afraid. I'll see what I can do though.

I'm surprised that I can't see it on the desktop though. What Unicode
character are you using? I was trying with \u00DF.
 
C

Christian Schwarz

Hmm. Very odd. I can't test on a device just at the moment - it may be
Monday before I get time to, I'm afraid. I'll see what I can do though.

I'm surprised that I can't see it on the desktop though. What Unicode
character are you using? I was trying with \u00DF.

Jon,

you are right, the "ß" character is 0x00df.

Kind regards, Christian
 
J

Jon Skeet [C# MVP]

Christian Schwarz said:
The current culture is "de-DE".

Btw, I did all tests on the mobile device ...

As it happens, I've now had time to test it - and I can easily
reproduce your problem, even without changing culture. (Not sure why I
couldn't on the desktop.)

I suspect that StringBuilder.Replace has been implemented using
String.Replace, which it shouldn't have been due to this difference.

Two options, neither terribly pleasant:

1) Use CompareInfo.IndexOf, specifying CompareOptions.Ordinal, to find
the character. Then do the replacement yourself, using Substring etc.

2) Use
x = x.Replace('\u00df', '\uf000');
x = x.Replace("\uf000", "ß");

String.Replace(char,char) is case-insensitive - so long as you can find
a "spare" character (I picked 0xf000 at random) to use temporarily, you
should be okay.
 
C

Christian Schwarz

I suspect that StringBuilder.Replace has been implemented using
String.Replace, which it shouldn't have been due to this difference.

Two options, neither terribly pleasant:

1) Use CompareInfo.IndexOf, specifying CompareOptions.Ordinal, to find
the character. Then do the replacement yourself, using Substring etc.

2) Use
x = x.Replace('\u00df', '\uf000');
x = x.Replace("\uf000", "ß");

String.Replace(char,char) is case-insensitive - so long as you can find
a "spare" character (I picked 0xf000 at random) to use temporarily, you
should be okay.

Jon,

many thanks for your help. I think I'll stick with method 1 ...

Greetings, Christian
 
A

Andy Becker

Christian Schwarz said:
Hi,

I've noticed a problem with the string.Replace() and the
StringBuilder.Replace() methods.

I try to replace the german special character "ß" (sharp spoken "ss") in
order to mask the character with "ß" for showing it in the OpenNETCF's
HTMLViewer control. The problem is, that the Replace() method finds the "ß"
in each string containing "ss". When calling Replace("ß", "ß") the
first "s" of an "ss" sequence is replaced by "ß" resulting in
"ßs"! That behaviour is bad, because "ß" isn't the same as "ss".

Is there a way to influence this behaviour? Is there another way than to
scan the string manually?

Btw, other string and StringBuilder methods (like IndexOf()) also treat "ss"
as "ß" ...

Greetings, Christian

I once heard from someone in our Einbeck office that the two (ß and ss) were
officially made the same. If this is the reasoning behind replacing ss, it
still should replace both characters, IMHO.

Best Regards,

Andy
 
C

Christian Schwarz

I once heard from someone in our Einbeck office that the two (ß and ss)
were
officially made the same. If this is the reasoning behind replacing ss,
it
still should replace both characters, IMHO.

Andy,
that's not entirely true. Even after the controversy discussed spelling
reform (starting in 1996) there are still words that have to be written with
"ß" instead of "ss" ...

PS: Some weeks ago, serveral big newspaper companies (Springer, SPIEGEL
Verlag) have announced to return to the old spelling rules.

Greetings, Christian
 
A

Andy Becker

Christian Schwarz said:
Andy,
that's not entirely true. Even after the controversy discussed spelling
reform (starting in 1996) there are still words that have to be written with
"ß" instead of "ss" ...

I got some clarification on that, yes. I had spoken way too soon.

My friend thinks it is a bad bug also!

Best Regards,

Andy
 
F

Fernando Fanton [MSFT]

Hi, String.Replace() is culture sensitive but please bear in mind that
there are some differences between Desktop and NETCF implementations due to
constraints in the NLS data of both operating systems.
Please make sure that the default CultureInfo.CurrentCulture (no
CurrentUICulture) is set to the locale that you want to use (in this case
de-DE).


For more info go to:
Mobility: http://msdn.microsoft.com/mobility
NETCF:
http://msdn.microsoft.com/mobility/prodtechinfo/devtools/netcf/default.aspx

This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
| From: "Christian Schwarz" <[email protected]>
| Newsgroups: microsoft.public.dotnet.framework.compactframework
| Subject: Re: German special character and string.Replace() method
| Date: Fri, 20 Aug 2004 16:18:54 +0200
| Lines: 14
| Message-ID: <[email protected]>
| References: <[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
| X-Trace: news.uni-berlin.de
QtKhPKpHK6P4mJab0KnWNQFRs0uqFmk2DKQKONtsntIBvKbNU=
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2800.1437
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441
| Path:
cpmsftngxa10.phx.gbl!TK2MSFTFEED01.phx.gbl!TK2MSFTNGP08.phx.gbl!newsfeed00.s
ul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!fu-berlin.de!uni-berli
n.de!not-for-mail
| Xref: cpmsftngxa10.phx.gbl
microsoft.public.dotnet.framework.compactframework:59729
| X-Tomcat-NG: microsoft.public.dotnet.framework.compactframework
|
| > Hmm. Very odd. I can't test on a device just at the moment - it may be
| > Monday before I get time to, I'm afraid. I'll see what I can do though.
| >
| > I'm surprised that I can't see it on the desktop though. What Unicode
| > character are you using? I was trying with \u00DF.
|
| Jon,
|
| you are right, the "ß" character is 0x00df.
|
| Kind regards, Christian
|
|
|
|
 
C

Christian Schwarz

Please make sure that the default CultureInfo.CurrentCulture (no
CurrentUICulture) is set to the locale that you want to use (in this case
de-DE).

The culture is set to "de-DE".

Greetings, Christian
 
B

Boris Nienke

I once heard from someone in our Einbeck office that the two (ß and ss) were
officially made the same. If this is the reasoning behind replacing ss, it
still should replace both characters, IMHO.

right! if it would be declared as "the same" then both of the "s" should
be replaced just as they were ONE character.

BUT: it's NOT the same! There are rules when you use "ss" or "ß".

BTW: ...which Einbeck Company is it? Brauerrei? Merkur? KWS?

Boris
 
F

Fernando Fanton [MSFT]

Christian, I'd like to shed some light regarding the behavior of the
String.Replace PME's on both the .Net FrameWork as well as NETCF.

1) StringBuilder.Replace --> .Net Framework: culture INSENSITIVE NETCF:
culture SENSITIVE
2) String.Replace --> .Net Framework culture INSENSITIVE NETCF: culture
SENSITIVE

Yes, String.Replace is also culture INSENSITIVE on the .Net FrmaeWork even
though the docs state otherwise, that's a bug on the product documentation.
NETCF is culture sensitive on both cases, we are revisiting this behavior
for our next version to follow the .Net FrameWork implementation more
closely.
Regarding the issue about the german character, there is no easy
workarround for it besides using CompareInfo.IndexOf( ..
CompareOptions.Ordinal) to find the specific character and then replace it
by hand.

I appreciate your feedback on this matter, please let me know if you need
anything else

Fernando Fanton
SDET .Net Compact FrameWork


For more info go to:
Mobility: http://msdn.microsoft.com/mobility
NETCF:
http://msdn.microsoft.com/mobility/prodtechinfo/devtools/netcf/default.aspx

This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
| From: "Christian Schwarz"
<[email protected]>
| Newsgroups: microsoft.public.dotnet.framework.compactframework
| Subject: Re: German special character and string.Replace() method
| Date: Tue, 26 Oct 2004 10:14:00 +0200
| Lines: 9
| Message-ID: <[email protected]>
| References: <[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]> <qE#[email protected]>
| X-Trace: news.uni-berlin.de
h0c/Vt8FLHNL7ozVa3CRpQfh+RV9BvsWe+0ZhM5O2szT60DHA=
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
| Path:
cpmsftngxa10.phx.gbl!TK2MSFTFEED01.phx.gbl!TK2MSFTNGP08.phx.gbl!newsfeed00.s
ul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!fu-berlin.de!uni-berli
n.de!not-for-mail
| Xref: cpmsftngxa10.phx.gbl
microsoft.public.dotnet.framework.compactframework:63896
| X-Tomcat-NG: microsoft.public.dotnet.framework.compactframework
|
| > Please make sure that the default CultureInfo.CurrentCulture (no
| > CurrentUICulture) is set to the locale that you want to use (in this
case
| > de-DE).
|
| The culture is set to "de-DE".
|
| Greetings, Christian
|
|
|
 
C

Christian Schwarz

Fernando,

we're working around this problem using this method:

private const string MaskedSzett = "&szlig;";
....
int startIndex = 0, indexSzet;
while ((indexSzet = text.IndexOf('ß', startIndex)) >= 0)
{
if (text[indexSzet] == 0x00df) // 0x00df == unicode value of 'ß'
{
text = text.Remove(indexSzet, 1).Insert(indexSzet, MaskedSzett);
startIndex += MaskedSzett.Length;
}
else
startIndex++;
}

Not very elegant, but it does the job.
Regarding the issue about the german character, there is no easy
workarround for it besides using CompareInfo.IndexOf( ..
CompareOptions.Ordinal) to find the specific character and then replace it
by hand.

The best would be to neither treat "ss" as "ß" nor "ß" as "ss". As I
mentioned in another post in that thread, these characters are not the same.
That's why I consider the current behaviour a bug.

Christian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top