StringBuilder to byte array

  • Thread starter Thread starter Peter
  • Start date Start date
Well, you'll need to pick an encoding... the simplest way is then
(using UTF8 here):
byte[] buffer = System.Text.Encoding.UTF8.GetBytes(sb.ToString());

How complex you need to make it depends on the scenario.

Marc
 
Well, you should just create a string from the StringBuilder (calling
ToString) and then you can serialize that using the BinaryFormatter, or call
the GetBytes method on the Unicode Encoding instance exposed by the static
Unicode property on the Encoding class to return the bytes.

The latter option is more than likely easier. The former was mentioned
just to show that there are multiple ways to do it.
 
Marc,

I don't know that I would pick UTF8, but rather, I'd use the Unicode
encoding. It seems (and I could be wrong here) that the OP just wants the
bytes that make up the string, and the Unicode Encoding instance is the most
resiliant way of doing that (assuming readability isn't a factor, and at a
cost of twice the storage space).
 
It seems (and I could be wrong here) that the OP just wants the
bytes that make up the string
Yes, but that statement itself is ambiguous. It would be rare to
want to inspect the actual machine memory of a .NET string when
the char-buffer is readily available, so I'll assume that this is for
serialization purposes.
and the Unicode Encoding instance is the most resiliant way
of doing that
Well, the most resiliant way is to agree in advance which encoding
is being used ;-p
Maybe it is just the data I work with, but I still see more UFT8
(heck,
mainly ascii) than I do unicode, so UTF8 makes a good compromise
between working with legacy files and supporting full unicode. The
space saving is nice but (generally) a side benefit.

Marc
 
Nicholas Paldino said:
I don't know that I would pick UTF8, but rather, I'd use the Unicode
encoding. It seems (and I could be wrong here) that the OP just wants the
bytes that make up the string, and the Unicode Encoding instance is the most
resiliant way of doing that (assuming readability isn't a factor, and at a
cost of twice the storage space).

I don't think we really have enough information to say what the OP
really wants, to be honest. If they can pick an encoding, then UTF-8 is
usually a very good choice; if not, that's a different matter.
 
Marc,

To elaborate, when I say "the bytes that make up the string", what I
mean is a serialized, ^lossless^ transformation of the string into bytes.
This can't be done with UTF-8. Granted, for the data that you are using, it
is what you most commonly see, but if you simply want to make sure that all
data that can be stored in a .NET string can be accurately represented in a
byte array, you use the Unicode encoding.
 
Marc Gravell said:
Maybe it is just the data I work with, but I still see more UFT8
(heck, mainly ascii) than I do unicode, so UTF8 makes a good compromise
between working with legacy files and supporting full unicode. The
space saving is nice but (generally) a side benefit.

And also UTF-8 does *support* full unicode - there's no real compromise
here unless you're talking about situations where UTF-8 is bigger, or
you need to quickly access a specific character index (which is where
any variable-width encoding falls down).
 
Nicholas Paldino said:
To elaborate, when I say "the bytes that make up the string", what I
mean is a serialized, ^lossless^ transformation of the string into bytes.
This can't be done with UTF-8.

In what way? Any valid string can be represented in UTF-8.

The only situation in which you may run into problems is if you've got
a surrogate pair issue (e.g. a high surrogate with no corresponding low
surrogate, or vice versa), but I'm not sure that other encodings would
(or should) handle that situation losslessly either. It's basically a
corrupt string at that point.

Could you give an example string where encoding to UTF-8 and then
decoding risks losing data?
 
there's no real compromise here unless...
I was indeed (although I didn't make it clear) thinking of the gool
ol' days of being able to seek a stream by the character offset (give
or take a fixed multiple).

Marc
 
Marc Gravell said:
I was indeed (although I didn't make it clear) thinking of the gool
ol' days of being able to seek a stream by the character offset (give
or take a fixed multiple).
Then you've got to use UTF32 to make it work in all cases ;-)

Christof
 
I think I'll just choose to accept that those days are gone, and stick
with UTF-8 (when there is a choice).

Marc
 
Nicholas Paldino said:
Well, you should just create a string from the StringBuilder (calling
ToString) and then you can serialize that using the BinaryFormatter, or
call the GetBytes method on the Unicode Encoding instance exposed by the
static Unicode property on the Encoding class to return the bytes.

The latter option is more than likely easier. The former was mentioned
just to show that there are multiple ways to do it.


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Peter said:
What is the easiest way to convert StringBuilder to byte array?


Thanks


Thanks everyone for the input, it was very educational.

FYI:
what I am trying to do is retrieve a PDF file through a socket, I read a
socket and store the data into a StringBuilder and after it's done I want to
save the StringBuilder data which is a PDF file on to a hard drive.
 
FYI:
what I am trying to do is retrieve a PDF file through a socket, I read a
socket and store the data into a StringBuilder and after it's done I want to
save the StringBuilder data which is a PDF file on to a hard drive.

I'm not clear on why you're using StringBuilder, or strings at all.

If you're just trying to copy a PDF file, you should be able to just
transfer the bytes of the file and save them to disk verbatim. Running
the PDF through some text decoding and reencoding can only create more
hassles, IMHO.

Pete
 
Thanks everyone for the input, it was very educational.

FYI:
what I am trying to do is retrieve a PDF file through a socket, I read a
socket and store the data into a StringBuilder and after it's done I want to
save the StringBuilder data which is a PDF file on to a hard drive.

As far as I'm aware, PDFs are binary data - they shouldn't be treated
as text.

Don't store the data in a StringBuilder, either stream it straight to
disk or store it in a MemoryStream.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top