Encoding Question

C

C# Learner

Imagine the following scenario.

You receive a byte array from a socket. This byte array contains both
text and binary data; it contains text fields delimited by specified
byte sequences.

For example:

"one \xC0\x80 two \xC0\x80 three"

The way I'm currently dealing with this is to convert the byte array to
a string with the following function, then splitting the string by the
delimiter sub-string.

public static string GetString(byte[] data)
{
StringBuilder sb = new StringBuilder();

for (int i = 0; i < data.Length; ++i) {
sb.Append((char)data);
}

return sb.ToString();
}

I know this is a hack, but is there a better way?
 
J

Jon Skeet [C# MVP]

C# Learner said:
Imagine the following scenario.

You receive a byte array from a socket. This byte array contains both
text and binary data; it contains text fields delimited by specified
byte sequences.

For example:

"one \xC0\x80 two \xC0\x80 three"

The way I'm currently dealing with this is to convert the byte array to
a string with the following function, then splitting the string by the
delimiter sub-string.

public static string GetString(byte[] data)
{
StringBuilder sb = new StringBuilder();

for (int i = 0; i < data.Length; ++i) {
sb.Append((char)data);
}

return sb.ToString();
}

I know this is a hack, but is there a better way?


Yes. You definitely, definitely shouldn't be doing that. Instead, you
should be reading blocks into memory, and then scanning for the
delimiters. Then build a string using
Encoding.whatever.GetString (byte[], int, int).

However, if you have control over the protocol, it would be better to
prefix each string with the number of bytes in it - that way you don't
need to do any scanning.
 
C

C# Learner

Jon said:
I know this is a hack, but is there a better way?

Yes. You definitely, definitely shouldn't be doing that. Instead, you
should be reading blocks into memory, and then scanning for the
delimiters. Then build a string using
Encoding.whatever.GetString (byte[], int, int).

Ah, so the only "problem" is finding the delimiters in the byte array then.

I completely forgot that Encoding.Whatever.GetString() can take index
and count parameters. Thanks for pointing that out!
However, if you have control over the protocol, it would be better to
prefix each string with the number of bytes in it - that way you don't
need to do any scanning.

Not in this case.

I guess I'll just write a library method that splits the byte array into
strings then.

Cheers
 
J

Jon Skeet [C# MVP]

C# Learner said:
Yes. You definitely, definitely shouldn't be doing that. Instead, you
should be reading blocks into memory, and then scanning for the
delimiters. Then build a string using
Encoding.whatever.GetString (byte[], int, int).

Ah, so the only "problem" is finding the delimiters in the byte array then.
Yup.

I completely forgot that Encoding.Whatever.GetString() can take index
and count parameters. Thanks for pointing that out!

No problem.
Not in this case.

I guess I'll just write a library method that splits the byte array into
strings then.

Righto. Don't forget that things get tricky if you've got to read the
stream in chunks, but you need to combine multiple chunks to decode
them, etc. It's all doable, just tricky...
 
C

C# Learner

Jon Skeet [C# MVP] wrote:

Righto. Don't forget that things get tricky if you've got to read the
stream in chunks, but you need to combine multiple chunks to decode
them, etc. It's all doable, just tricky...

Gladly, I don't need to do that in this case. :)

Regards
 
R

Ray

C# Learner said:
Imagine the following scenario.

You receive a byte array from a socket. This byte array contains both
text and binary data; it contains text fields delimited by specified
byte sequences.

For example:

"one \xC0\x80 two \xC0\x80 three"

The way I'm currently dealing with this is to convert the byte array to
a string with the following function, then splitting the string by the
delimiter sub-string.

public static string GetString(byte[] data)
{
StringBuilder sb = new StringBuilder();

for (int i = 0; i < data.Length; ++i) {
sb.Append((char)data);
}

return sb.ToString();
}

I know this is a hack, but is there a better way?


I have a similar problem reading in a file that can have various delimiters,
even combinations of them. I do the following using the Split function. The
try catch loop works by conveniently doing nothing if it encounts an error
such as two delimiters together. Perhaps you can adapt this to your problem.

string delimStr = " ,:;\t";
char [] delimiter = delimStr.ToCharArray();
string [] split = null;
string newStr;
StreamReader sr = new StreamReader(filestr);
while ((line = sr.ReadLine()) != null)
{
split=line.Split(delim,columns);
foreach (string s in split)
{
try
{
newStr+=s;
}
catch{} // do nothing
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top