buffer to string: incorrect string length

C

coltrane

I am trying to convert a buffer to a string using
Encoding.UTF8.GetString(...)
where the buffer contains trailing '\0' s.

When I get the length of the string or print the string the '\0' s are
included as spaces. Why doesn't a string use '\0' to terminate a
string?

example:

byte[] buffer = new buffer[10];
buffer[0] = (byte)'1';
buffer[1] = (byte)'2';
buffer[2] = (byte)'3';
buffer[3] = 0;

string text = Encoding.UTF8.GetString(buffer);

Console.Write('|');
Console.Write(text);
Console.Write('|');

The output is: "|123 |"

actually I am getting the buffer reading from a socket and the buffer
is null terminated and then the rest of the buffer is filled with
junk. When the buffer is converted to a string and printed the text
and the junk are printed.
So I guess the question might be, how do I terminate a string?

thanks for the help

John
 
A

Alberto Poblacion

coltrane said:
I am trying to convert a buffer to a string using
Encoding.UTF8.GetString(...)
where the buffer contains trailing '\0' s.

When I get the length of the string or print the string the '\0' s are
included as spaces. Why doesn't a string use '\0' to terminate a
string?

example:

byte[] buffer = new buffer[10];
buffer[0] = (byte)'1';
buffer[1] = (byte)'2';
buffer[2] = (byte)'3';
buffer[3] = 0;

string text = Encoding.UTF8.GetString(buffer);

Console.Write('|');
Console.Write(text);
Console.Write('|');

The output is: "|123 |"

actually I am getting the buffer reading from a socket and the buffer
is null terminated and then the rest of the buffer is filled with
junk. When the buffer is converted to a string and printed the text
and the junk are printed.
So I guess the question might be, how do I terminate a string?

Strings in C# are not null-terminated like in C. Instead, the length is
internally stored in a separate counter inside the String class.

When using Encoding.GetString(buffer), if you don't want to convert the
complet contents of the buffer, you can use an overload of GetString that
accepts the start index and length:

int i;
for (i=0; i<buffer.Length; i++)
if (buffer==0) break;
int nchars = i-1;

string text = Encoding.ASCII.GetString(buffer, 0, nchars);

Note that I used Encoding.ASCII instead of Encoding.UTF8. You need to
adjust this to use the correct encoding for the charaters you are receiving.
For instance, not every byte in UTF8 represents a single Unicode character
for your String; some Unicode characters are encoded as more than one byte
when using UTF8. So you should only use Encoding.UTF8 if you know that your
buffer does contain UTF8-encoded text.
 
C

coltrane

I am trying to convert a buffer to a string using
Encoding.UTF8.GetString(...)
where the buffer contains trailing '\0' s.
When I get the length of the string or print the string the '\0' s are
included as spaces. Why doesn't a string use '\0' to terminate a
string?

byte[] buffer = new buffer[10];
buffer[0] = (byte)'1';
buffer[1] = (byte)'2';
buffer[2] = (byte)'3';
buffer[3] = 0;
string text = Encoding.UTF8.GetString(buffer);

The output is: "|123       |"
actually I am getting the buffer reading from a socket and the buffer
is null terminated and then the rest of the buffer is filled with
junk. When the buffer is converted to a string and printed the text
and the junk are printed.
So I guess the question might be, how do I terminate a string?

    Strings in C# are not null-terminated like in C. Instead, the length is
internally stored in a separate counter inside the String class.

    When using Encoding.GetString(buffer), if you don't want to convert the
complet contents of the buffer, you can use an overload of GetString that
accepts the start index and length:

int i;
for (i=0; i<buffer.Length; i++)
  if (buffer==0) break;
int nchars = i-1;

string text = Encoding.ASCII.GetString(buffer, 0, nchars);

   Note that I used Encoding.ASCII instead of Encoding.UTF8. You needto
adjust this to use the correct encoding for the charaters you are receiving.
For instance, not every byte in UTF8 represents a single Unicode character
for your String; some Unicode characters are encoded as more than one byte
when using UTF8. So you should only use Encoding.UTF8 if you know that your
buffer does contain UTF8-encoded text.



wow, this seems a little expensive

thanks for the reply.

john
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top