StringBuilder to byte array

Peter · Nov 20, 2007

What is the easiest way to convert StringBuilder to byte array?

Thanks

Marc Gravell · Nov 20, 2007

Well, you'll need to pick an encoding... the simplest way is then
(using UTF8 here):
byte[] buffer = System.Text.Encoding.UTF8.GetBytes(sb.ToString());

How complex you need to make it depends on the scenario.

Marc

Nicholas Paldino [.NET/C# MVP] · Nov 20, 2007

Well, you should just create a string from the StringBuilder (calling
ToString) and then you can serialize that using the BinaryFormatter, or call
the GetBytes method on the Unicode Encoding instance exposed by the static
Unicode property on the Encoding class to return the bytes.

The latter option is more than likely easier. The former was mentioned
just to show that there are multiple ways to do it.

Nicholas Paldino [.NET/C# MVP] · Nov 20, 2007

Marc,

I don't know that I would pick UTF8, but rather, I'd use the Unicode
encoding. It seems (and I could be wrong here) that the OP just wants the
bytes that make up the string, and the Unicode Encoding instance is the most
resiliant way of doing that (assuming readability isn't a factor, and at a
cost of twice the storage space).

Marc Gravell · Nov 20, 2007

It seems (and I could be wrong here) that the OP just wants the

bytes that make up the string

Yes, but that statement itself is ambiguous. It would be rare to
want to inspect the actual machine memory of a .NET string when
the char-buffer is readily available, so I'll assume that this is for
serialization purposes.

and the Unicode Encoding instance is the most resiliant way
of doing that

Well, the most resiliant way is to agree in advance which encoding
is being used ;-p
Maybe it is just the data I work with, but I still see more UFT8
(heck,
mainly ascii) than I do unicode, so UTF8 makes a good compromise
between working with legacy files and supporting full unicode. The
space saving is nice but (generally) a side benefit.

Marc

Jon Skeet [C# MVP] · Nov 20, 2007

Nicholas Paldino said:
I don't know that I would pick UTF8, but rather, I'd use the Unicode
encoding. It seems (and I could be wrong here) that the OP just wants the
bytes that make up the string, and the Unicode Encoding instance is the most
resiliant way of doing that (assuming readability isn't a factor, and at a
cost of twice the storage space).

I don't think we really have enough information to say what the OP
really wants, to be honest. If they can pick an encoding, then UTF-8 is
usually a very good choice; if not, that's a different matter.

Nicholas Paldino [.NET/C# MVP] · Nov 20, 2007

Marc,

To elaborate, when I say "the bytes that make up the string", what I
mean is a serialized, ^lossless^ transformation of the string into bytes.
This can't be done with UTF-8. Granted, for the data that you are using, it
is what you most commonly see, but if you simply want to make sure that all
data that can be stored in a .NET string can be accurately represented in a
byte array, you use the Unicode encoding.

Jon Skeet [C# MVP] · Nov 20, 2007

Marc Gravell said:
Maybe it is just the data I work with, but I still see more UFT8
(heck, mainly ascii) than I do unicode, so UTF8 makes a good compromise
between working with legacy files and supporting full unicode. The
space saving is nice but (generally) a side benefit.

And also UTF-8 does *support* full unicode - there's no real compromise
here unless you're talking about situations where UTF-8 is bigger, or
you need to quickly access a specific character index (which is where
any variable-width encoding falls down).

Jon Skeet [C# MVP] · Nov 20, 2007

Nicholas Paldino said:
To elaborate, when I say "the bytes that make up the string", what I
mean is a serialized, ^lossless^ transformation of the string into bytes.
This can't be done with UTF-8.

In what way? Any valid string can be represented in UTF-8.

The only situation in which you may run into problems is if you've got
a surrogate pair issue (e.g. a high surrogate with no corresponding low
surrogate, or vice versa), but I'm not sure that other encodings would
(or should) handle that situation losslessly either. It's basically a
corrupt string at that point.

Could you give an example string where encoding to UTF-8 and then
decoding risks losing data?

Marc Gravell · Nov 20, 2007

there's no real compromise here unless...
I was indeed (although I didn't make it clear) thinking of the gool
ol' days of being able to seek a stream by the character offset (give
or take a fixed multiple).

Marc

Christof Nordiek · Nov 21, 2007

Marc Gravell said:
I was indeed (although I didn't make it clear) thinking of the gool
ol' days of being able to seek a stream by the character offset (give
or take a fixed multiple).

Then you've got to use UTF32 to make it work in all cases ;-)

Christof

Marc Gravell · Nov 21, 2007

I think I'll just choose to accept that those days are gone, and stick
with UTF-8 (when there is a choice).

Marc

Peter · Nov 21, 2007

Nicholas Paldino said:
Well, you should just create a string from the StringBuilder (calling
ToString) and then you can serialize that using the BinaryFormatter, or
call the GetBytes method on the Unicode Encoding instance exposed by the
static Unicode property on the Encoding class to return the bytes.

The latter option is more than likely easier. The former was mentioned
just to show that there are multiple ways to do it.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Peter said:

What is the easiest way to convert StringBuilder to byte array?

Thanks

Click to expand...

Thanks everyone for the input, it was very educational.

FYI:
what I am trying to do is retrieve a PDF file through a socket, I read a
socket and store the data into a StringBuilder and after it's done I want to
save the StringBuilder data which is a PDF file on to a hard drive.

Peter Duniho · Nov 21, 2007

FYI:
what I am trying to do is retrieve a PDF file through a socket, I read a
socket and store the data into a StringBuilder and after it's done I want to
save the StringBuilder data which is a PDF file on to a hard drive.

I'm not clear on why you're using StringBuilder, or strings at all.

If you're just trying to copy a PDF file, you should be able to just
transfer the bytes of the file and save them to disk verbatim. Running
the PDF through some text decoding and reencoding can only create more
hassles, IMHO.

Pete

Jon Skeet [C# MVP] · Nov 21, 2007

Thanks everyone for the input, it was very educational.

FYI:
what I am trying to do is retrieve a PDF file through a socket, I read a
socket and store the data into a StringBuilder and after it's done I want to
save the StringBuilder data which is a PDF file on to a hard drive.

As far as I'm aware, PDFs are binary data - they shouldn't be treated
as text.

Don't store the data in a StringBuilder, either stream it straight to
disk or store it in a MemoryStream.

Peter · Nov 21, 2007

Jon Skeet said:
As far as I'm aware, PDFs are binary data - they shouldn't be treated
as text.

Don't store the data in a StringBuilder, either stream it straight to
disk or store it in a MemoryStream.

Thanks for advice, the only reason I am using the StringBuilder is because I
found an example on MSDN

http://msdn2.microsoft.com/en-us/library/bew39x2a.aspx

Jon Skeet [C# MVP] · Nov 21, 2007

Peter said:
Thanks for advice, the only reason I am using the StringBuilder is because I
found an example on MSDN

http://msdn2.microsoft.com/en-us/library/bew39x2a.aspx

That's far from an ideal example, unfortunately - in particular, it
only ever works with ASCII text. In some cases that's what you want,
but not always.

Stringbuilder error	8	Oct 26, 2007
StringBuilder	1	Nov 6, 2008
How To associate ServicePointManager with Connection?	2	Jan 1, 2012
Please Help String Problem	2	Dec 26, 2006
How to get the same as MS FCIV	3	Jun 24, 2006
How array allocation is implemented in c#?	1	Feb 2, 2014
Encoding Question	5	Feb 25, 2004
Array of StringBuilder	2	Jan 6, 2005

StringBuilder to byte array

Peter

Marc Gravell

Nicholas Paldino [.NET/C# MVP]

Nicholas Paldino [.NET/C# MVP]

Marc Gravell

Jon Skeet [C# MVP]

Nicholas Paldino [.NET/C# MVP]

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]

Marc Gravell

Christof Nordiek

Marc Gravell

Peter

Peter Duniho

Jon Skeet [C# MVP]

Peter

Jon Skeet [C# MVP]

Ask a Question

Similar Threads