string replace problem...

L

Lee Jackson

Given the following code snippet, which is intended to replace every
instance of double or more spacing with a single space :-

while (sbNormalisedString.ToString().IndexOf(" ") > -1)
sbNormalisedString = sbNormalisedString.Replace(" "," ");

Can anyone tell me why it sometimes goes into an infinate loop?

Ive tried it on both string and stringbuilder instances with the same
issues on both. Also have seen the issue trying to replace instances
of "\n " with "\n".

Im late, its tired, and Ive not idea why...any help appreciated.

Lee
 
J

Jon Skeet [C# MVP]

Lee Jackson said:
Given the following code snippet, which is intended to replace every
instance of double or more spacing with a single space :-

while (sbNormalisedString.ToString().IndexOf(" ") > -1)
sbNormalisedString = sbNormalisedString.Replace(" "," ");

Can anyone tell me why it sometimes goes into an infinate loop?

Ive tried it on both string and stringbuilder instances with the same
issues on both. Also have seen the issue trying to replace instances
of "\n " with "\n".

Im late, its tired, and Ive not idea why...any help appreciated.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
 
L

Lee Jackson

Solved the problem when I was trying to put together a complete
program for Jon (cheers Jon).

The string that I was attempting to process was actually a section of
HTML from a (Russian) webpage Id downloaded. When reading the stream
Id used Encoding.UTF8 rather than Encoding.Default (it WAS late), so
it was an encoding issue.

That said, the fact that I got a match on IndexOf but the Replace
function failed seems to fall into the "unexpected behaviour" class of
things...bug or (as is more likely) am I failing to understand
something here?

Regards
 
J

Jon Skeet [C# MVP]

Lee Jackson said:
Solved the problem when I was trying to put together a complete
program for Jon (cheers Jon).

Hey, it's an easy way of replying - I have the "short but complete
program" reply on a shortcut :) Glad to hear it worked though - it's
usually a good starting point.
The string that I was attempting to process was actually a section of
HTML from a (Russian) webpage Id downloaded. When reading the stream
Id used Encoding.UTF8 rather than Encoding.Default (it WAS late), so
it was an encoding issue.

That said, the fact that I got a match on IndexOf but the Replace
function failed seems to fall into the "unexpected behaviour" class of
things...bug or (as is more likely) am I failing to understand
something here?

It *could* be related to an interesting problem someone posted on
another group. Here's a short but complete program which demonstrates
the same problem it looks like you were having:

using System;

class Test
{
static void Main()
{
string x = "x \u200C x";

while (x.IndexOf(" ")!=-1)
{
string y = x.Replace(" ", " ");
Console.WriteLine ("{0} -> {1}", x, y);
x = y;
}
}
}

Here there is a double space with a "zero width non-joiner" character
between the two spaces. IndexOf matches, but Replace doesn't...
 
L

Lee Jackson

Hey, it's an easy way of replying - I have the "short but complete
program" reply on a shortcut :) Glad to hear it worked though - it's
usually a good starting point.


It *could* be related to an interesting problem someone posted on
another group. Here's a short but complete program which demonstrates
the same problem it looks like you were having:

using System;

class Test
{
static void Main()
{
string x = "x \u200C x";

while (x.IndexOf(" ")!=-1)
{
string y = x.Replace(" ", " ");
Console.WriteLine ("{0} -> {1}", x, y);
x = y;
}
}
}

Here there is a double space with a "zero width non-joiner" character
between the two spaces. IndexOf matches, but Replace doesn't...

Yup, that looks like *exactly* the same problem (cheers again Jon).
Obviously one to keep an eye out for.

Any idea if it has been reported as a bug? (and which group was it in?
I googled for it but only got this as a match)

Regards
 
J

Jon Skeet [C# MVP]

Lee Jackson said:
Yup, that looks like *exactly* the same problem (cheers again Jon).
Obviously one to keep an eye out for.

Any idea if it has been reported as a bug? (and which group was it in?
I googled for it but only got this as a match)

I don't believe it's actually a bug - IndexOf is culturally sensitive,
and arguably should match here. If you want an "ordinal" IndexOf, you
need to use CompareInfo.IndexOf and specify CompareOptions.Ordinal. It
should probably be simpler to do that though :)
 
L

Lee Jackson

I don't believe it's actually a bug - IndexOf is culturally sensitive,
and arguably should match here. If you want an "ordinal" IndexOf, you
need to use CompareInfo.IndexOf and specify CompareOptions.Ordinal. It
should probably be simpler to do that though :)

Interesting advice Jon, appreciated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top