buffer to string: incorrect string length

coltrane · Jan 8, 2009

I am trying to convert a buffer to a string using
Encoding.UTF8.GetString(...)
where the buffer contains trailing '\0' s.

When I get the length of the string or print the string the '\0' s are
included as spaces. Why doesn't a string use '\0' to terminate a
string?

example:

byte[] buffer = new buffer[10];
buffer[0] = (byte)'1';
buffer[1] = (byte)'2';
buffer[2] = (byte)'3';
buffer[3] = 0;

string text = Encoding.UTF8.GetString(buffer);

Console.Write('|');
Console.Write(text);
Console.Write('|');

The output is: "|123 |"

actually I am getting the buffer reading from a socket and the buffer
is null terminated and then the rest of the buffer is filled with
junk. When the buffer is converted to a string and printed the text
and the junk are printed.
So I guess the question might be, how do I terminate a string?

thanks for the help

John

Alberto Poblacion · Jan 8, 2009

coltrane said:
I am trying to convert a buffer to a string using
Encoding.UTF8.GetString(...)
where the buffer contains trailing '\0' s.

When I get the length of the string or print the string the '\0' s are
included as spaces. Why doesn't a string use '\0' to terminate a
string?

example:

byte[] buffer = new buffer[10];
buffer[0] = (byte)'1';
buffer[1] = (byte)'2';
buffer[2] = (byte)'3';
buffer[3] = 0;

string text = Encoding.UTF8.GetString(buffer);

Console.Write('|');
Console.Write(text);
Console.Write('|');

The output is: "|123 |"

actually I am getting the buffer reading from a socket and the buffer
is null terminated and then the rest of the buffer is filled with
junk. When the buffer is converted to a string and printed the text
and the junk are printed.
So I guess the question might be, how do I terminate a string?

Strings in C# are not null-terminated like in C. Instead, the length is
internally stored in a separate counter inside the String class.

When using Encoding.GetString(buffer), if you don't want to convert the
complet contents of the buffer, you can use an overload of GetString that
accepts the start index and length:

int i;
for (i=0; i<buffer.Length; i++)
if (buffer==0) break;
int nchars = i-1;

string text = Encoding.ASCII.GetString(buffer, 0, nchars);

Note that I used Encoding.ASCII instead of Encoding.UTF8. You need to
adjust this to use the correct encoding for the charaters you are receiving.
For instance, not every byte in UTF8 represents a single Unicode character
for your String; some Unicode characters are encoded as more than one byte
when using UTF8. So you should only use Encoding.UTF8 if you know that your
buffer does contain UTF8-encoded text.

coltrane · Jan 8, 2009

I am trying to convert a buffer to a string using
Encoding.UTF8.GetString(...)
where the buffer contains trailing '\0' s.

Click to expand...

When I get the length of the string or print the string the '\0' s are
included as spaces. Why doesn't a string use '\0' to terminate a
string?

example:

Click to expand...

byte[] buffer = new buffer[10];
buffer[0] = (byte)'1';
buffer[1] = (byte)'2';
buffer[2] = (byte)'3';
buffer[3] = 0;

Click to expand...

string text = Encoding.UTF8.GetString(buffer);

The output is: "|123 |"

Click to expand...

actually I am getting the buffer reading from a socket and the buffer
is null terminated and then the rest of the buffer is filled with
junk. When the buffer is converted to a string and printed the text
and the junk are printed.
So I guess the question might be, how do I terminate a string?

Click to expand...

Strings in C# are not null-terminated like in C. Instead, the length is
internally stored in a separate counter inside the String class.

When using Encoding.GetString(buffer), if you don't want to convert the
complet contents of the buffer, you can use an overload of GetString that
accepts the start index and length:

int i;
for (i=0; i<buffer.Length; i++)
if (buffer==0) break;
int nchars = i-1;

string text = Encoding.ASCII.GetString(buffer, 0, nchars);

Note that I used Encoding.ASCII instead of Encoding.UTF8. You needto
adjust this to use the correct encoding for the charaters you are receiving.
For instance, not every byte in UTF8 represents a single Unicode character
for your String; some Unicode characters are encoded as more than one byte
when using UTF8. So you should only use Encoding.UTF8 if you know that your
buffer does contain UTF8-encoded text.

wow, this seems a little expensive

thanks for the reply.

john

copy byte array	2	Feb 15, 2008
C TCP client and .NET TCP server?	2	Jan 5, 2011
Split function	3	Nov 9, 2007
reading into the buffer	6	Oct 24, 2008
OutOfMemory Exception on saving image	3	Oct 20, 2009
How to convert a byte array to a singe integer	5	May 7, 2007
I/O buffering	4	Feb 4, 2009
Marshalling an LPSTREAM back to Managed code...	0	Nov 19, 2010

buffer to string: incorrect string length

coltrane

Alberto Poblacion

coltrane

Ask a Question

Similar Threads