Question about ReadLine UTF8 line truncation

E

EmeraldShield

(Dot Net 2 C# application - using Encoding.UTF8 with a StreamReader)
I have a very strange problem that I cannot explain with a UTF8 Readline()
although this could exist in other types of encoding, I have not tried them.

Our application wrote this sequence to a UTF8 file. Now I am loading it
back and the text is not coming back in the same as it went out.

DATA:
from: processfrom checkemail failed: 501 syntax error in parameters: invalid
char in email: "sometext\content-transfer-encoding:"@server.com
command: mail from:"sometext\content-transfer-encoding:"@server.com


Each of those lines will be slipt at the \c of the
content-transfer-encoding. They are not high order characters, and I don't
ever remember \c being a control character for anything.
So instead of getting back in two lines that I wrote out, I get in 4 lines.

Any ideas as to why this is, and how would I correct it? I am guessing that
I will need to escape the sequence prior to writing it out, but I don't know
why and that bugs me.

Thanks for any help.
 
J

Jon Skeet [C# MVP]

EmeraldShield said:
(Dot Net 2 C# application - using Encoding.UTF8 with a StreamReader)
I have a very strange problem that I cannot explain with a UTF8 Readline()
although this could exist in other types of encoding, I have not tried them.

<snip>

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
 
S

Steven Cheng[MSFT]

Hello Jon,

Based on your description, when you use StreamReader with UTF8 encoding to
read some text data writen out previously, you get the wrong
string(different from original output), correct?

Would you show to simple test code snippet on this so that we can get a
more clear view on your code logic? Also, as for the following test
fragment you mentioned:

==============
from: processfrom checkemail failed: 501 syntax error in parameters:
invalid char in email:
"sometext\content-transfer-encoding:"@server.comcommand: mail
from:"sometext\content-transfer-encoding:"@server.com
==============

Did you directly embeded in C# code like

string txt = " ..... the text here.....";

or is it load from some other source(such as a Textbox or from another
file)? Also, when you output the data to the txt file(through
StreamReader+UTF8 encoding), have you checked the txt output file to see
whether the output is correctly expected?

Based on my experience, such problem is likely occur when you directly
embeded the string in code since there are some particular chars that need
escaping in C# string. For example, you need to escape \ as \\ So if
you directly embed string in C# code, you need to escape the whole string
as below:

=============
string txt = "from: processfrom checkemail failed: 501 syntax error in
parameters: invalid char in email:
\"sometext\\content-transfer-encoding:\"@server.com command: mail
from:\"sometext\\content-transfer-encoding:\"@server.com";

StreamWriter sw = new StreamWriter("direct_output.txt", false,
Encoding.UTF8);
sw.Write(txt);

sw.Close();

===============

In addition, I suggest you put those string in a TextBox and writeout it
from that TextBox into the StreanWriter to see whether the output is as
expected. here is my test code which works well for the text fragment you
provided.

=============================
private void btnSave_Click(object sender, EventArgs e)
{
StreamWriter sw = new StreamWriter("output.txt", false,
Encoding.UTF8);
sw.Write(textBox1.Text);

sw.Close();
}

private void btnLoad_Click(object sender, EventArgs e)
{
StreamReader sr = new StreamReader("output.txt", Encoding.UTF8);

string txt = sr.ReadToEnd();

sr.Close();


MessageBox.Show(txt);
}
=======================

Please feel free to post here if there is anything unclear or if you met
any furtehr problems.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================



This posting is provided "AS IS" with no warranties, and confers no rights.
 
E

EmeraldShield

Based on your description, when you use StreamReader with UTF8 encoding to
read some text data writen out previously, you get the wrong
string(different from original output), correct?

Would you show to simple test code snippet on this so that we can get a
more clear view on your code logic? Also, as for the following test
fragment you mentioned:

I will do that tonight or tomorrow.
Did you directly embeded in C# code like

string txt = " ..... the text here.....";

or is it load from some other source(such as a Textbox or from another
file)? Also, when you output the data to the txt file(through
StreamReader+UTF8 encoding), have you checked the txt output file to see
whether the output is correctly expected?

No, this is data sent to my application from a remote system over a socket.
I have looked at the output file and it is correct. I have loaded it in
Wordpad, Notepad, and VS2005 and it looks correct in all of them.
In addition, I suggest you put those string in a TextBox and writeout it
from that TextBox into the StreanWriter to see whether the output is as
expected. here is my test code which works well for the text fragment you
provided.

I will try that and see what it looks like. I just wanted to make sure that
the \C was not some sort of escape character that I didn't know about. It
is being sent to my app that way and written straight to disk without any
escaping.

Jason
 
S

Steven Cheng[MSFT]

Thanks for your reply Jason,

Sure, we'll wait for your further update.

Also, I'm sure that "\C" won't be particular escaped by UTF8 encoding or
other encoding readers. Generally, only when we put string directly in code
or some script should we take care of some escaping issue.

Anyway, please feel free to post here if you meet any further problem on
this.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top