Help: Reading/writing German Umlauts in/from Textfile?

  • Thread starter Thread starter Boris Nienke
  • Start date Start date
B

Boris Nienke

Hi,

i'm shocked!
Why?

I have a very simple textfile (created with the WindowsXP Editor).
Just entered "Test äöüß" (without the quotes) and saved it as "1.txt"
- You see the german umlauts in the Text?!?

Now, in the C# code i do this:

string sLine;
StreamReader sr = new StreamReader("1.txt");
sLine = sr.ReadToEnd();
sr.Close();

StreamWriter sw2 = new StreamWriter("2.txt");
sw2.Write(sLine);
sw2.Close();


Result? The Umlauts are NOT in "2.txt"!!!

I've debugged it and it's the StreamReader that is not reading the
German Umlauts! so after the

sLine = sr.ReadToEnd();

"sLine" just contains "Test "

This happens with umlauts in the middle of the Text too - so it has
nothing to do with leading or trailing umlauts.

Any idea what the problem is and how to let the reader read the umlauts
(and the writer to write them if this may be a problem too)???

urgent :-/

Boris
 
Add on:

When i do this:

StreamReader sr = new StreamReader(sFileRead,System.Text.Encoding.UTF7);
sLine = sr.ReadToEnd();


then sLine contains the umlauts.

But: i cannot find a way to create a new textfile in the same format
with umlauts!


When i do:

StreamWriter sw2 = new StreamWriter("debug.txt", false,
System.Text.Encoding.ASCII);
sw2.Write(sLine);
sw2.Close();

the file has the correct size - but contains "????" instead of umlauts.

when i use UTF7 for writing, there are some crazy letters and numbers in
the text instead of the umlauts. Textfile is larger then.

when i use UTF8 for writing, i can see the umlauts when opened in the
windows-editor - but the file is larger (umlauts are 2-byte-chars then)

etc...


DAMN :-/ ... how to read such single-byte ASCII File with umlauts and
store it in the exact same format?

Never had any of this problems with Basic, Pascal/Delphi, COBOL, ....
....

please help

Boris
 
You need to use correct encoding. Most european languages are covered by
Windows-1252:

private void Form1_Load(object sender, System.EventArgs e)

{

string sText = "grün groß";

textBox1.Text = sText;

}

private void btnSave_Click(object sender, System.EventArgs e)

{

StreamWriter wrt = new StreamWriter(@"\text.txt", false,
Encoding.GetEncoding(1252));

wrt.Write(textBox1.Text);

wrt.Close();

}

private void btnLoad_Click(object sender, System.EventArgs e)

{

try

{

StreamReader rdr = new StreamReader(@"\text.txt",
Encoding.GetEncoding(1252));

textBox1.Text = rdr.ReadToEnd();

rdr.Close();

}

catch(Exception ex)

{

MessageBox.Show("Read failed:" + ex.ToString() );

}

}
 
You need to use correct encoding. Most european languages are covered by
Windows-1252:

yes i know
StreamWriter wrt = new StreamWriter(@"\text.txt", false,
Encoding.GetEncoding(1252));

AHA!

BTW: after a lot of trial and error i've solved it with
"Encoding.Default"... But i think ".Default" is the same (on a german or
west-european system) as ".GetEncoding(1252)" - right?

I will try it! Because it's much better to set the correct codepage
where the expected data is in instead of using the default... i think...


Thanks a lot! I was puzzled by all those Encoding-Documentation and have
tried using GetBytes, GetChars etc... but this looks much simpler.

Boris
 
Boris Nienke said:
i'm shocked!
Why?

Because you haven't read the documentation or don't understand
encodings :)
I have a very simple textfile (created with the WindowsXP Editor).
Just entered "Test äöüß" (without the quotes) and saved it as "1.txt"
- You see the german umlauts in the Text?!?

Sure - what encoding did you save it as?
Now, in the C# code i do this:

string sLine;
StreamReader sr = new StreamReader("1.txt");
sLine = sr.ReadToEnd();
sr.Close();

StreamWriter sw2 = new StreamWriter("2.txt");
sw2.Write(sLine);
sw2.Close();


Result? The Umlauts are NOT in "2.txt"!!!

I'm not entirely surprised. I suspect you saved it as the default code
page (1252, probably) but StreamReader uses UTF-8 by default.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.
 
Because you haven't read the documentation or don't understand
encodings :)

no (read it), yes (didn't understand everything of it) :-)
Sure - what encoding did you save it as?

don't know? What if i get textfiles from other people?
I'm not entirely surprised. I suspect you saved it as the default code
page (1252, probably) but StreamReader uses UTF-8 by default.

And that's where my problem was!
I can set the Encoder when opening a StreamReader/-Writer. And there are
ASCII, UTF7, UTF8, UNICODE, ... Encoder... None of them worked like i
would.

And there is a "DEFAULT" Encoder... and because the documentation (and you
too) said, that by default UTF8 is used i thought it's the same.

But it's not :)

I should have read anything about the "DEFAULT" Encoder ...

Thanks a lot

Boris
 
Boris Nienke said:
don't know? What if i get textfiles from other people?

If you don't know the encoding, you're basically stuffed. You can make
a guess, but there's no guarantee that it'll be right, and any UTF-8
file is also a valid CP1252 file, for instance.
And that's where my problem was!
I can set the Encoder when opening a StreamReader/-Writer. And there are
ASCII, UTF7, UTF8, UNICODE, ... Encoder... None of them worked like i
would.

And there is a "DEFAULT" Encoder... and because the documentation (and you
too) said, that by default UTF8 is used i thought it's the same.

But it's not :)

No - that's a very unfortunate bit of naming, I'm afraid. It should be
called something like FileSystemDefault.
 
Back
Top