String.Replace Anomoly

Levidikus · Sep 18, 2007

Normally, I never have any problems with String.Replace(). However, I
found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line
file. I read in the file into a string called strLine. Then when I
do a a simple replace ... here is what I have tried:

strLine = strLine.Replace("ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"\xAA", "#"); // Doesn't replace ...

if (strLine.Contains(@"\xAA")) MessageBox.Show("found one"); // No
message box ...

if (strLine.Contains("ª")) MessageBox.Show("found one"); // No message
box ...

if (strLine.Contains(@"ª")) MessageBox.Show("found one"); // No
message box ...

Any ideas either what I'm doing wrong, or a better way to try to
replace is persistent character that just won't go away?

=?ISO-8859-1?Q?G=F6ran_Andersson?= · Sep 18, 2007

Levidikus said:
Normally, I never have any problems with String.Replace(). However, I
found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line
file. I read in the file into a string called strLine. Then when I
do a a simple replace ... here is what I have tried:

strLine = strLine.Replace("ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"\xAA", "#"); // Doesn't replace ...

if (strLine.Contains(@"\xAA")) MessageBox.Show("found one"); // No
message box ...

if (strLine.Contains("ª")) MessageBox.Show("found one"); // No message
box ...

if (strLine.Contains(@"ª")) MessageBox.Show("found one"); // No
message box ...

Any ideas either what I'm doing wrong, or a better way to try to
replace is persistent character that just won't go away?

I think that you have tried every possible combination except the one
that works... Try this:

strLine = strLine.Replace("\xAA", "#");

Roman Wagner · Sep 18, 2007

Look like there is a problem with your strLine.

Following works as expected

string test = "\xAA ª";
MessageBox.Show(
String.Concat(test,Environment.NewLine,
test.Replace("ª", "#"),
Environment.NewLine,
test.Replace("\xAA", "#")));

UL-Tomten · Sep 18, 2007

found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line

The following works for me:

string feminineIndicatorChar = Char.ConvertFromUtf32(0xaa);
string b = feminineIndicatorChar.Replace(feminineIndicatorChar,
"#"); // b == "#"

Are you sure your strLine really contains the feminine ordinal
indicator? Can you check in a debugger?

Jon Skeet [C# MVP] · Sep 18, 2007

Normally, I never have any problems with String.Replace(). However, I
found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line
file. I read in the file into a string called strLine. Then when I
do a a simple replace ... here is what I have tried:

strLine = strLine.Replace(@"\xAA", "#"); // Doesn't replace ...

Your mistake is using a verbatim string literal. That is looking for a
substring of backslash, x, A, A. You want it to look for the string
represented by Unicode U+00AA, i.e. '\xAA'. In other words, you *want*
the character escaping which verbatim string literals remove. Just get
rid of the @ and it will be fine.

I would warn against using \x though - because the number of
characters used varies. For instance:

\xAAOkay - does what you want
\xAABad - doesn't do what you want (it'll be U+AABA and then 'd')

Use \u00aa instead - then there's no ambiguity.

Jon

UL-Tomten · Sep 18, 2007

Look like there is a problem with your strLine.

Actually, no; there is a problem with the OP's understanding of the @
character when used on strings.

Doug Semler · Sep 18, 2007

Normally, I never have any problems with String.Replace(). However, I
found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line
file. I read in the file into a string called strLine. Then when I
do a a simple replace ... here is what I have tried:

strLine = strLine.Replace("ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"\xAA", "#"); // Doesn't replace ...

if (strLine.Contains(@"\xAA")) MessageBox.Show("found one"); // No
message box ...

if (strLine.Contains("ª")) MessageBox.Show("found one"); // No message
box ...

if (strLine.Contains(@"ª")) MessageBox.Show("found one"); // No
message box ...

Any ideas either what I'm doing wrong, or a better way to try to
replace is persistent character that just won't go away?

*HOW* are you reading your text file? You need to match the encoding
with the encoding of the file. In this case, you'll probably need to
read the file with UTF7 encoding unless there are the encoding
specifiers at the beginning of the file.

I tried a file (and used File.ReadAllText()) with only 0xAA characters
and it would not run the replace unless I read the file UTF7...

Jon Skeet [C# MVP] · Sep 18, 2007

*HOW* are you reading your text file? You need to match the encoding
with the encoding of the file. In this case, you'll probably need to
read the file with UTF7 encoding unless there are the encoding
specifiers at the beginning of the file.

I tried a file (and used File.ReadAllText()) with only 0xAA characters
and it would not run the replace unless I read the file UTF7...

UTF-7 is *very* rarely used - basically it's used in mail and that's
virtually it, as far as I'm aware. How did you save your file?

That isn't the problem in this case, however.

Jon

Doug Semler · Sep 18, 2007

UTF-7 is *very* rarely used - basically it's used in mail and that's
virtually it, as far as I'm aware. How did you save your file?

That isn't the problem in this case, however.

Jon

THen why couldn't I get the string replace to work if I didn't specify
the encoding as UTF7?

Levidikus · Sep 18, 2007

Thank you very much for all the valuable information!

I am reading the file in using a standard StreamReader, without any
special flags.

Doug Semler · Sep 18, 2007

UTF-7 is *very* rarely used - basically it's used in mail and that's
virtually it, as far as I'm aware. How did you save your file?

That isn't the problem in this case, however.

Jon

Sorry...Clicked send accidentally:

Start Notepad. type 1234567890\n\r(10 bytes of 0xAA)
Save file (ANSI)
Verified that there was no encoding indicator on the file (file is 22
bytes)
Windows Vista (if it matters).

The following lines were used, with the indicated Encododing
string foo = File.ReadAllText(@"C:\users\doug\documents
\test.txt", encoding);
foo = foo.Replace("\u00AA", "#");
Console.WriteLine(encoding);
Console.WriteLine(foo);

ASCII
1234567890
??????????
UTF7
1234567890
##########
UTF8
1234567890
??????????
UTF32
?????
Unicode
???????????

Doug Semler · Sep 18, 2007

Sorry...Clicked send accidentally:

Start Notepad. type 1234567890\n\r(10 bytes of 0xAA)
Save file (ANSI)
Verified that there was no encoding indicator on the file (file is 22
bytes)
Windows Vista (if it matters).

The following lines were used, with the indicated Encododing
string foo = File.ReadAllText(@"C:\users\doug\documents
\test.txt", encoding);
foo = foo.Replace("\u00AA", "#");
Console.WriteLine(encoding);
Console.WriteLine(foo);

ASCII
1234567890
??????????
UTF7
1234567890
##########
UTF8
1234567890
??????????
UTF32
?????
Unicode
???????????

P.S. Not specifying an encoding gives me the ASCII result.
Specifying Encoding.Default (which resolves to SBCSCodePageEncoding)
gives me the correct behavior.

Jon Skeet [C# MVP] · Sep 18, 2007

P.S. Not specifying an encoding gives me the ASCII result.
Specifying Encoding.Default (which resolves to SBCSCodePageEncoding)
gives me the correct behavior.

And that's because Encoding.Default uses the same as what "ANSI" means
in Notepad. UTF-7 just *happened* to work - and I suspect it shouldn't
really have done.

When you don't specify an encoding, almost everything in .NET assumes
UTF-8.

Jon

Doug Semler · Sep 18, 2007

And that's because Encoding.Default uses the same as what "ANSI" means
in Notepad. UTF-7 just *happened* to work - and I suspect it shouldn't
really have done.

When you don't specify an encoding, almost everything in .NET assumes
UTF-8.

Right. But my entire point is that the OP needs to specify the
correct encoding when opening the file. If he doesn't do that, NONE
of the (correct) solutions pointed out earlier will work. In this
case Encoding.Default (if you say UTF7 is wrong) needs to be passed to
the StreamReader constructor.

UL-Tomten · Sep 18, 2007

P.S. Not specifying an encoding gives me the ASCII result.
Specifying Encoding.Default (which resolves to SBCSCodePageEncoding)
gives me the correct behavior.

The default single-byte character set code page encoding (==
SBCSCodePageEncoding) is there to provide an encoding-less encoding,
as far as I can tell. It is to encodings what InvariantCulture is to
cultures: you can use it if you don't care about the encoding and
nobody but you will read what you've written using it. (In other
words; if and only if you wrote the file using Encoding.Default on the
same OS installation, it's safe to use Encoding.Default to read it
back.)

Jon Skeet [C# MVP] · Sep 19, 2007

UL-Tomten said:
The default single-byte character set code page encoding (==
SBCSCodePageEncoding) is there to provide an encoding-less encoding,
as far as I can tell. It is to encodings what InvariantCulture is to
cultures: you can use it if you don't care about the encoding and
nobody but you will read what you've written using it. (In other
words; if and only if you wrote the file using Encoding.Default on the
same OS installation, it's safe to use Encoding.Default to read it
back.)

The bit in brakcets is right - but it's *not* the same as saying it's
an "encoding-less encoding".

An encoding is basically a mapping between byte sequences and character
sequences. 8859-1 is as close to an "encoding-less encoding" as you'll
get, as it maps bytes 0-255 to Unicode 0-255; Encoding.Default doesn't
necessarily do that (and indeed doesn't in most environments).

For instance, on my box byte 128 converts to U+20AC (the Euro symbol).

Use of Encoding.Default should be regarded as "legacy" really - few
things should just use the default encoding for the OS.

UL-Tomten · Sep 19, 2007

The bit in brakcets is right - but it's *not* the same as saying it's
an "encoding-less encoding".

Well, since an encoding by definition specifies encoding rules, I
thought that much was obvious... =]

Maybe there should have been an Encoding.InvariantEncoding instead of
an Encoding.Default, to communicate that the resulting bits are
unknown at compile-time, and perhaps avoid the temptation of using it
for text others might read back.

UL-Tomten · Sep 19, 2007

8859-1 is as close to an "encoding-less encoding" as you'll
get, as it maps bytes 0-255 to Unicode 0-255;

I've always thought of that more as a curse than a blessing.

Jon Skeet [C# MVP] · Sep 19, 2007

The bit in brakcets is right - but it's *not* the same as saying it's
an "encoding-less encoding".

Click to expand...

Well, since an encoding by definition specifies encoding rules, I
thought that much was obvious... =]

But there's such a thing as a "trivial" encoding, which pretty much
sums up ISO-8859-1.

Maybe there should have been an Encoding.InvariantEncoding instead of
an Encoding.Default, to communicate that the resulting bits are
unknown at compile-time, and perhaps avoid the temptation of using it
for text others might read back.

InvariantEncoding sounds like it would do the same on all boxes,
regardless of environment though. Encoding.Default *isn't* invariant -
it varies by environment. I'd have preferred
Encoding.OperatingSystemDefault or something similar. Certainly it
gets confusing that Encoding.Default isn't the encoding which is used
by default by most .NET classes

Jon

Levidikus · Oct 13, 2007

[snip]

Thank you again for all of the outstanding responses.

The file that I am working with is originated on a solaris 8 unix
system. How would I go about identifying the correct "encoding"?
Also, with the File.ReadAllText(), would I even need a streamreader
for that?

Once again, thanks for all the feedback!

James

update db, no errors , no changes	6	Mar 12, 2011
Changing Do Loop to For loop	2	Mar 26, 2009
String.Replace method not usable when knowing index to replace	10	Oct 17, 2007
Opening notepad file then sending to back	3	Aug 15, 2007
Line Input not finding Chr10 or 13	8	Mar 16, 2007
Text Import - remove carriage returns	1	Dec 13, 2005
find & replace	1	Sep 15, 2005
Rid the Immediate Window Call	8	Jan 20, 2010

String.Replace Anomoly

Levidikus

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Roman Wagner

UL-Tomten

Jon Skeet [C# MVP]

UL-Tomten

Doug Semler

Jon Skeet [C# MVP]

Doug Semler

Levidikus

Doug Semler

Doug Semler

Jon Skeet [C# MVP]

Doug Semler

UL-Tomten

Jon Skeet [C# MVP]

UL-Tomten

UL-Tomten

Jon Skeet [C# MVP]

Levidikus

Ask a Question

Similar Threads