StringBuilder to byte array

M

Marc Gravell

Well, you'll need to pick an encoding... the simplest way is then
(using UTF8 here):
byte[] buffer = System.Text.Encoding.UTF8.GetBytes(sb.ToString());

How complex you need to make it depends on the scenario.

Marc
 
N

Nicholas Paldino [.NET/C# MVP]

Well, you should just create a string from the StringBuilder (calling
ToString) and then you can serialize that using the BinaryFormatter, or call
the GetBytes method on the Unicode Encoding instance exposed by the static
Unicode property on the Encoding class to return the bytes.

The latter option is more than likely easier. The former was mentioned
just to show that there are multiple ways to do it.
 
N

Nicholas Paldino [.NET/C# MVP]

Marc,

I don't know that I would pick UTF8, but rather, I'd use the Unicode
encoding. It seems (and I could be wrong here) that the OP just wants the
bytes that make up the string, and the Unicode Encoding instance is the most
resiliant way of doing that (assuming readability isn't a factor, and at a
cost of twice the storage space).
 
M

Marc Gravell

It seems (and I could be wrong here) that the OP just wants the
bytes that make up the string
Yes, but that statement itself is ambiguous. It would be rare to
want to inspect the actual machine memory of a .NET string when
the char-buffer is readily available, so I'll assume that this is for
serialization purposes.
and the Unicode Encoding instance is the most resiliant way
of doing that
Well, the most resiliant way is to agree in advance which encoding
is being used ;-p
Maybe it is just the data I work with, but I still see more UFT8
(heck,
mainly ascii) than I do unicode, so UTF8 makes a good compromise
between working with legacy files and supporting full unicode. The
space saving is nice but (generally) a side benefit.

Marc
 
J

Jon Skeet [C# MVP]

Nicholas Paldino said:
I don't know that I would pick UTF8, but rather, I'd use the Unicode
encoding. It seems (and I could be wrong here) that the OP just wants the
bytes that make up the string, and the Unicode Encoding instance is the most
resiliant way of doing that (assuming readability isn't a factor, and at a
cost of twice the storage space).

I don't think we really have enough information to say what the OP
really wants, to be honest. If they can pick an encoding, then UTF-8 is
usually a very good choice; if not, that's a different matter.
 
N

Nicholas Paldino [.NET/C# MVP]

Marc,

To elaborate, when I say "the bytes that make up the string", what I
mean is a serialized, ^lossless^ transformation of the string into bytes.
This can't be done with UTF-8. Granted, for the data that you are using, it
is what you most commonly see, but if you simply want to make sure that all
data that can be stored in a .NET string can be accurately represented in a
byte array, you use the Unicode encoding.
 
J

Jon Skeet [C# MVP]

Marc Gravell said:
Maybe it is just the data I work with, but I still see more UFT8
(heck, mainly ascii) than I do unicode, so UTF8 makes a good compromise
between working with legacy files and supporting full unicode. The
space saving is nice but (generally) a side benefit.

And also UTF-8 does *support* full unicode - there's no real compromise
here unless you're talking about situations where UTF-8 is bigger, or
you need to quickly access a specific character index (which is where
any variable-width encoding falls down).
 
J

Jon Skeet [C# MVP]

Nicholas Paldino said:
To elaborate, when I say "the bytes that make up the string", what I
mean is a serialized, ^lossless^ transformation of the string into bytes.
This can't be done with UTF-8.

In what way? Any valid string can be represented in UTF-8.

The only situation in which you may run into problems is if you've got
a surrogate pair issue (e.g. a high surrogate with no corresponding low
surrogate, or vice versa), but I'm not sure that other encodings would
(or should) handle that situation losslessly either. It's basically a
corrupt string at that point.

Could you give an example string where encoding to UTF-8 and then
decoding risks losing data?
 
M

Marc Gravell

there's no real compromise here unless...
I was indeed (although I didn't make it clear) thinking of the gool
ol' days of being able to seek a stream by the character offset (give
or take a fixed multiple).

Marc
 
C

Christof Nordiek

Marc Gravell said:
I was indeed (although I didn't make it clear) thinking of the gool
ol' days of being able to seek a stream by the character offset (give
or take a fixed multiple).
Then you've got to use UTF32 to make it work in all cases ;-)

Christof
 
M

Marc Gravell

I think I'll just choose to accept that those days are gone, and stick
with UTF-8 (when there is a choice).

Marc
 
P

Peter

Nicholas Paldino said:
Well, you should just create a string from the StringBuilder (calling
ToString) and then you can serialize that using the BinaryFormatter, or
call the GetBytes method on the Unicode Encoding instance exposed by the
static Unicode property on the Encoding class to return the bytes.

The latter option is more than likely easier. The former was mentioned
just to show that there are multiple ways to do it.


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Peter said:
What is the easiest way to convert StringBuilder to byte array?


Thanks


Thanks everyone for the input, it was very educational.

FYI:
what I am trying to do is retrieve a PDF file through a socket, I read a
socket and store the data into a StringBuilder and after it's done I want to
save the StringBuilder data which is a PDF file on to a hard drive.
 
P

Peter Duniho

FYI:
what I am trying to do is retrieve a PDF file through a socket, I read a
socket and store the data into a StringBuilder and after it's done I want to
save the StringBuilder data which is a PDF file on to a hard drive.

I'm not clear on why you're using StringBuilder, or strings at all.

If you're just trying to copy a PDF file, you should be able to just
transfer the bytes of the file and save them to disk verbatim. Running
the PDF through some text decoding and reencoding can only create more
hassles, IMHO.

Pete
 
J

Jon Skeet [C# MVP]

Thanks everyone for the input, it was very educational.

FYI:
what I am trying to do is retrieve a PDF file through a socket, I read a
socket and store the data into a StringBuilder and after it's done I want to
save the StringBuilder data which is a PDF file on to a hard drive.

As far as I'm aware, PDFs are binary data - they shouldn't be treated
as text.

Don't store the data in a StringBuilder, either stream it straight to
disk or store it in a MemoryStream.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top