String parsing in VB.net

S

Stargate4004

Hi,

I have am converting a VB6 application to VB.net and have run into a
problem. The application connects to a TCP port on a Linux box and
receives a data buffer. The data buffer contains various fields, some
of which are two and four byte integers.

In VB6 I used the Mid function to pull data out of the buffer so I
could decrypt it (do all that tedius network-order byte-swapping, etc).
Quite simply, if I knew that the four bytes from position 42 in the
buffer contain a long integer I could:

myLong = DecryptLong(Mid(myDataBuffer, 42, 4))

However, in VB.net it seems that Mid will not always give me 4 bytes -
it terminates when it hits a chr(0). Thus I pass less than four bytes
into my DecryptLong function, and it gives the wrong result.

Is there any way to make Mid return everything I asked for, or any
function to pad the return string with chr(0)'s before I pass it to
DecryptLong? (I tried writing my own pad function but it suffered from
the same problem - VB.net just doesn't like nulls!)

Or is there a more effective way of extracting a substring than Mid,
one which perhaps gives you the substring you ask for?

Thanks.
 
L

Larry Lard

Stargate4004 said:
Hi,

I have am converting a VB6 application to VB.net and have run into a
problem. The application connects to a TCP port on a Linux box and
receives a data buffer. The data buffer contains various fields, some
of which are two and four byte integers.

In VB6 I used the Mid function to pull data out of the buffer so I
could decrypt it (do all that tedius network-order byte-swapping, etc).
Quite simply, if I knew that the four bytes from position 42 in the
buffer contain a long integer I could:

myLong = DecryptLong(Mid(myDataBuffer, 42, 4))

However, in VB.net it seems that Mid will not always give me 4 bytes -
it terminates when it hits a chr(0). Thus I pass less than four bytes
into my DecryptLong function, and it gives the wrong result.

Is there any way to make Mid return everything I asked for, or any
function to pad the return string with chr(0)'s before I pass it to
DecryptLong? (I tried writing my own pad function but it suffered from
the same problem - VB.net just doesn't like nulls!)

Or is there a more effective way of extracting a substring than Mid,
one which perhaps gives you the substring you ask for?

My recommendation would be that myDataBuffer should be a Byte array,
and that DecryptLong should have this signature:

Function DecryptLong(buffer() as byte, start as integer) as integer

ie, you just pass the buffer and the index of the first byte of the
number (note that VB6 'Long' = 32-bit integer = VB.NET 'Integer')

HOWEVER

It turns out that the Framework will do this work for you :) The
BitConverter class contains a bunch of procedures for doing all sorts
of conversions to and from raw bytes. In particular,


BitConverter.ToInt32 Method

Returns a 32-bit signed integer converted from four bytes at a
specified position in a byte array.

Public Shared Function ToInt32( _
ByVal value() As Byte, _
ByVal startIndex As Integer _
) As Integer

Parameters
value
An array of bytes.
startIndex
The starting position within value.

Return Value
A 32-bit signed integer formed by four bytes beginning at startIndex.


There's a lot of stuff in the Framework; unfortunately the only real
way to know if there is something that does what you want is to know
everything that's there :/
 
G

Guest

To trap the error, change this:
myLong = DecryptLong(Mid(myDataBuffer, 42, 4))
to something like this:
Dim s As String
...
s = Mid(myDataBuffer, 42, 4)
Debug.Assert(Len(s) = 4)
myLong = DecryptLong(s)
When the assert is true, I believe that len(myDataBuffer) will be < 45.
The mid function will always return 4 bytes even with embedded null
characters provided that there are at least 4 characters available, ie you
are not at the end of the string, as the following example shows:
Dim s, t As String
Dim l As Integer
s = "abc" & Chr(0) & "def"
t = Mid$(s, 3, 4)
l = Len(t) ' ==> 4
s = "abc" & Chr(0) & "d"
t = Mid$(s, 3, 4)
l = Len(t) ' ==> 3
Good luck.
 
S

Stargate4004

OK, it looks like I could be missing a trick here. I did this to get my
buffer into a byte array:

Dim asciiEncoder As New System.Text.ASCIIEncoding
Dim bDataBuffer As Byte() = asciiEncoder.GetBytes(myDataBuffer)

Then I did this to convert bytes 42 - 45 to a long integer:

myLong = BitConverter.ToInt32(bDataBuffer, 41)

(using 41 rather than 42 as it's an offset into the byte array rather
than a position of the charater in the myDataBuffer string).

But instead of getting 684229409 (which I know was sent to me), I got
675808033. This is because the four characters in myDataBuffer which
make up the integer field are 33, 131, 200 and 40, but when the string
is converted to SIGNED byte array they become 33, 3, 72 and 40.

Either asciiEncoder.GetBytes() or BitConverter.ToInt32 is treating the
bytes as signed. How can I stop this?
 
J

Jay B. Harlow [MVP - Outlook]

Stargate4004
Define myDataBuffer itself as Byte(), do not read the data from TCP as Text
(String), rather read the TCP port directly into a byte array.

ASCII is defined as 7 bit encoding, when you call ASCIIEncoding.GetBytes it
translates all characters over 127 into a value between 0 & 127.

Hence if you read the TCP stream into a byte array to begin with, then extra
the data & only convert the actual Text data into strings. Your values will
come out as you expect.

Hope this helps
Jay

| OK, it looks like I could be missing a trick here. I did this to get my
| buffer into a byte array:
|
| Dim asciiEncoder As New System.Text.ASCIIEncoding
| Dim bDataBuffer As Byte() = asciiEncoder.GetBytes(myDataBuffer)
|
| Then I did this to convert bytes 42 - 45 to a long integer:
|
| myLong = BitConverter.ToInt32(bDataBuffer, 41)
|
| (using 41 rather than 42 as it's an offset into the byte array rather
| than a position of the charater in the myDataBuffer string).
|
| But instead of getting 684229409 (which I know was sent to me), I got
| 675808033. This is because the four characters in myDataBuffer which
| make up the integer field are 33, 131, 200 and 40, but when the string
| is converted to SIGNED byte array they become 33, 3, 72 and 40.
|
| Either asciiEncoder.GetBytes() or BitConverter.ToInt32 is treating the
| bytes as signed. How can I stop this?
|
 
L

Larry Lard

Stargate4004 said:
OK, it looks like I could be missing a trick here. I did this to get my
buffer into a byte array:

Dim asciiEncoder As New System.Text.ASCIIEncoding
Dim bDataBuffer As Byte() = asciiEncoder.GetBytes(myDataBuffer)

The root of the problem is that you are treating 'pure binary' data as
characters. If it's not too much work :) I recommend you try and
refactor your code so that myDataBuffer is an array of Byte, rather
than a string. Earlier you said:
The application connects to a TCP port on a Linux box and
receives a data buffer.
Is this your code? I appreciate it might not be, but if it is you
should try and make it so that you receive a Byte() not a String or a
Char().

If that isn't possible:
But instead of getting 684229409 (which I know was sent to me), I got
675808033. This is because the four characters in myDataBuffer which
make up the integer field are 33, 131, 200 and 40, but when the string
is converted to SIGNED byte array they become 33, 3, 72 and 40.

Actually, Byte is unsigned.
Either asciiEncoder.GetBytes() or BitConverter.ToInt32 is treating the
bytes as signed. How can I stop this?

It's ASCIIEncoding that is doing this. Although we commonly refer to a
familiar mapping between the numbers 0-255 and a certain list of
characters as 'ASCII', the actual fact is that ASCII is a *7* bit
encoding - it is only defined for 0-127. Anything beyond that depends
on a whole host of things. Because this encoding only looks at the
bottom 7 bits, it maps 131 (binary 1000 0011) to 3 (binary 000 0011) -
ie values over 127 get 128 subtracted from them.

The fix?

You might be able to get away with just changing to using
System.Text.Encoding.Default, which uses your system's default *ANSI
code page* encoding (ANSI code pages DO cover the full range 0-255).
But this relies on whoever converts the bytes to a string in the first
place also using that same code page. Which they probably will. But
maybe not. So really this is why you want to get bytes back from the
TCP communication, not a string :)
 
S

Stargate4004

Many thanks for all your help everybody.

I changed the code to use a Byte array all the way through, and
hey-presto it all works fine now.

Thanks again.

Peter.
 
G

Guest

When the assert is true, I believe that len(myDataBuffer) will be < 45.

Oops - I mean when the assert fails.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top