How to UTF-8 encode a string?

J

jens Jensen

hello,
i'm doing utf-8 encoding the following way.


string message;

UTF8Encoding utf8 = new UTF8Encoding();

Byte[] encodedBytes = utf8.GetBytes(message);

message = encodedBytes.ToString();





can someone correct me?





many thanks

JJ
 
J

Jon Skeet [C# MVP]

jens said:
hello,
i'm doing utf-8 encoding the following way.
string message;

UTF8Encoding utf8 = new UTF8Encoding();

Note that you can use Encoding.UTF8Encoding to avoid creating a new
instance each time.
Byte[] encodedBytes = utf8.GetBytes(message);
message = encodedBytes.ToString();

can someone correct me?

Well, what are you expecting message to be? All strings in .NET are
UTF-16 encoded. There's no way round that (and it's not really a
problem).

Could you give us more context? Normally encoding is required when you
want to convert from a text representation to a binary represenation of
that text - e.g. to write some text to a stream. What are you trying to
do here?

Jon
 
J

jens Jensen

Could you give us more context? Normally encoding is required when you
want to convert from a text representation to a binary represenation of
that text - e.g. to write some text to a stream. What are you trying to
do here?

Jon

I Jon,
I need to post data to web service vis HTTP POST.

The webservice requires utf-8 encoded data. My "POST" is currently failing
due to this.

Many thanks
JJ
 
J

Jon Skeet [C# MVP]

jens said:
I Jon,
I need to post data to web service vis HTTP POST.

The webservice requires utf-8 encoded data. My "POST" is currently failing
due to this.

Right. So, you need to get the bytes as you did, and then write those
to the request stream.

Jon
 
J

jens Jensen

Right. So, you need to get the bytes as you did, and then write those
to the request stream.

Jon

How could i actually extracted the data in a text file? The remote and is a
java platform and the want to be sur i'm posted utf-8 encoded data.

I need to send them a file showing i'm correctly utf-8 encoding the data.
 
V

Vadym Stetsyak

Hello, jens!

jJ> I need to post data to web service vis HTTP POST.

jJ> The webservice requires utf-8 encoded data. My "POST" is currently
jJ> failing due to this.

Do you specify appropriate content-type, when doing you POST?

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
 
V

Vadym Stetsyak

Hello, jens!

jJ> How could i actually extracted the data in a text file? The remote and
jJ> is a java platform and the want to be sur i'm posted utf-8 encoded
jJ> data.

How do you perform POST, can you throw the code?

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
 
J

jens Jensen

jens Jensen said:
How could i actually extracted the data in a text file? The remote and is
a java platform and the want to be sur i'm posted utf-8 encoded data.

I need to send them a file showing i'm correctly utf-8 encoding the data.


How could i actually extract the data in a text file? The remote end is a
java platform and they want to be sur i'm posting utf-8 encoded data.

I need to send them a file showing i'm correctly utf-8 encoding the data.
 
J

jens Jensen

Do you specify appropriate content-type, when doing you POST?

below the actual code:



//i found my client cert.
req.ClientCertificates.Add(Certificate);

Log.Write("Our certificate: " + Certificate.ToString());

req.ContentType = "text/xml";

// req.KeepAlive = false;


req.Method = "POST";





UTF8Encoding encoding = new UTF8Encoding();

byte[] postBytes = encoding.GetBytes(message);

req.ContentLength = postBytes.Length;



System.IO.Stream reqStream = req.GetRequestStream();

reqStream.Write(postBytes, 0, postBytes.Length);

reqStream.Close();

Log.Write("sending content: "+message);

System.Net.WebResponse resp = (HttpWebResponse)req.GetResponse();


System.IO.StreamReader sr = new
System.IO.StreamReader(resp.GetResponseStream());





Many Thanks

JJ
 
M

Mike Schilling

I am presuming that the problem is that the bytes are correct but the remote
service doesn't recognize them as being UTF-8. Try adding

req.ContentEncoding = encoding;

jens Jensen said:
Do you specify appropriate content-type, when doing you POST?

below the actual code:



//i found my client cert.
req.ClientCertificates.Add(Certificate);

Log.Write("Our certificate: " + Certificate.ToString());

req.ContentType = "text/xml";

// req.KeepAlive = false;


req.Method = "POST";





UTF8Encoding encoding = new UTF8Encoding();

byte[] postBytes = encoding.GetBytes(message);

req.ContentLength = postBytes.Length;



System.IO.Stream reqStream = req.GetRequestStream();

reqStream.Write(postBytes, 0, postBytes.Length);

reqStream.Close();

Log.Write("sending content: "+message);

System.Net.WebResponse resp = (HttpWebResponse)req.GetResponse();


System.IO.StreamReader sr = new
System.IO.StreamReader(resp.GetResponseStream());





Many Thanks

JJ
 
J

jens Jensen

Mike Schilling said:
I am presuming that the problem is that the bytes are correct but the
remote service doesn't recognize them as being UTF-8. Try adding

req.ContentEncoding = encoding;

My request object "req" does not seem to have seem to have "ContentEncoding"
property.
I'm i missing something?
 
J

jens Jensen

try
req.ContentType = "text/xml; charset=utf-8";


Hi Vadym,

I applied this, no chenge in the reponse from the server.

How can i actually dump the utf-8 encoded content to a file so i can see it
actually format following utf-8?

Many thanks

JJ
 
M

Michael Voss

Jon Skeet wrote:
[...snip...]
Note that you can use Encoding.UTF8Encoding to avoid creating a new
instance each time.
[...snip...]

Would that code really create a new instance? I'd expect it to be a
Singleton or something like that, since it does not store any state
information.
 
V

Vadym Stetsyak

Hello, jens!

jJ> How can i actually dump the utf-8 encoded content to a file so i can
jJ> see it actually format following utf-8?

UTF8Encoding encoding = new UTF8Encoding();
byte[] postBytes = encoding.GetBytes(message);
req.ContentLength = postBytes.Length;
System.IO.Stream reqStream = req.GetRequestStream();
reqStream.Write(postBytes, 0, postBytes.Length);
reqStream.Close();

System.IO.File.WriteAllBytes(pathHere, postBytes);


--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
 
V

Vadym Stetsyak

Hello, jens!

jJ> I applied this, no chenge in the reponse from the server.

What is server response? what error code? any content returned?

Another question if remote end is a webservice, why can't you make web reference to it and then communicate via stub?
--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
 
J

Jon Skeet [C# MVP]

Michael said:
Jon Skeet wrote:
[...snip...]
Note that you can use Encoding.UTF8Encoding to avoid creating a new
instance each time.
[...snip...]

Would that code really create a new instance? I'd expect it to be a
Singleton or something like that, since it does not store any state
information.

Using Encoding.UTF8Encoding would reuse a single instance (I believe)
which is why I was suggesting using that instead of new UTF8Encoding().

Jon
 
N

Nick Hounsome

Jon Skeet said:
Michael said:
Jon Skeet wrote:
[...snip...]
Note that you can use Encoding.UTF8Encoding to avoid creating a new
instance each time.
[...snip...]

Would that code really create a new instance? I'd expect it to be a
Singleton or something like that, since it does not store any state
information.

Using Encoding.UTF8Encoding would reuse a single instance (I believe)
which is why I was suggesting using that instead of new UTF8Encoding().

Firstly there is no way to make "new X()" return an existing object because
unlike C++ you cannot override new so you can't do secret singletons.

Secondly the encoding does store 'state' information in the form of the
settings for BOM and exception throwing so in 1.1 there are 4 possible UTF8
encoding objects.

Worse, in 2.0 there is a settable EncoderFallback property which makes all
instances potentially unshareable. The work around for the static properties
is that they are all made readonly (IsReadOnly) and throw
InvalidOperationException if you try to set them. [It doesn't actually say
that in the documentation but it's the only sensible thing to do]

Also the static property is actually called Encoding.UTF8
 
J

jens Jensen

Another question if remote end is a webservice, why can't you make web
reference to it and then communicate via stub?


Hi Vadmyn,

this is a B2B scenario and i may not publish the complete System.Net.Trace
as it contain very confidential info.

But here is the last bit . I doubt you can get any usefull hint from it.

Mnay thanks anyway



System.Net Information: 0 : [8036]
ConnectStream#50223079::ConnectStream(Buffered 24 bytes.)
System.Net Information: 0 : [8036] Associating HttpWebRequest#4878312 with
ConnectStream#50223079
System.Net Information: 0 : [8036] Associating HttpWebRequest#4878312 with
HttpWebResponse#45011781
System.Net Verbose: 0 : [8036] ConnectStream#50223079::Read()
System.Net.Sockets Verbose: 0 : [8036] Socket#13575069::Dispose()
System.Net Verbose: 0 : [8036] Data from ConnectStream#50223079::Read
System.Net Verbose: 0 : [8036] 00000000 : 3C 68 34 3E 41 63 63 65-73 73 20
44 65 6E 69 65 : <h4>Access Denie
System.Net Verbose: 0 : [8036] 00000010 : 64 3C 2F 68 34 3E 0D 0A-
: d</h4>..
System.Net Verbose: 0 : [8036] Exiting ConnectStream#50223079::Read() ->
24#24
System.Net Verbose: 0 : [8036] ConnectStream#50223079::Read()
System.Net Verbose: 0 : [8036] Exiting ConnectStream#50223079::Read() ->
0#0
System.Net Error: 0 : [8036] Exception in the
HttpWebRequest#4878312::EndGetResponse - The remote server returned an
error: (401) Unauthorized.
 
J

Jon Skeet [C# MVP]

Nick said:
Firstly there is no way to make "new X()" return an existing object because
unlike C++ you cannot override new so you can't do secret singletons.

Unless of course you're the CLR :)

(As discussed a short while ago, the System.String constructor returns
String.Empty in certain circumstances.)
Secondly the encoding does store 'state' information in the form of the
settings for BOM and exception throwing so in 1.1 there are 4 possible UTF8
encoding objects.

True - although it's not mutable state. It's not the kind of state
which prevents something being reused, let's say. (I want a word for
this kind of state - I've been using it a lot recently.)
Worse, in 2.0 there is a settable EncoderFallback property which makes all
instances potentially unshareable.

Aargh. I take back the above :)
The work around for the static properties
is that they are all made readonly (IsReadOnly) and throw
InvalidOperationException if you try to set them. [It doesn't actually say
that in the documentation but it's the only sensible thing to do]

Not sure what you mean - the Encoding.XXX properties? Or the properties
on the objects returned by the Encoding.XXX properties?
Also the static property is actually called Encoding.UTF8

Oops, yes.

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top