searching byte arrays and RTFreaders

A

Andy

Hi

Does anyone know of a .net technique that can quickly find a particular
byte value within a byte array and return its position, much as instr
does for strings in VB.NET?

I'm writing my own custom RTFreader by having my class inherit the
..NET System.IO.BinaryReader. My goal is to have a stream reader
available in .NET that returns the text portion of an RTF file stream -
less any formatting. Being a binary reader, my custom reader loads
n-bytes into a byte array. It then loops through this array and only
returns qualifying bytes to the calling routine.

Some RTF files contain embedded graphics and objects, and these can be
made up of hundreds of kilobytes of data. When my custom reader
encounters one of these blocks, it spends alot of time looping through
the object's bytes just to bypass them. This can cause a significant
delay for my reader to return from a call.

To speed things up, I thought I could calculate the offset to the end
of a block. Each object and graphic in an RTF file is described by a
\pict or \object group. These also have tags that describe the
object's dimensions.

I thought I could take the dimensions, multiply them together and
divide that by 255 (because the object data is byte-64 encoded) and
multiply by two (because each encoded byte is represented by two hex
digits - A5, EF, FF etc) to give me the length of the RTF file's
encoded block.

This would have worked, except that the width and height dimensions in
the RTF file are in twips and not pixels.

So, I'm back to looping through the byte array to find the delimiter
that marks the end of a block.

Is there a facility in .NET that can perform this search at machine
language speed and return the location of the found byte? I heard that
..NET's regex class might be able to do this, but doesn't that also only
work with strings?

Andy
 
A

Andy

I found a method that is a lot faster than manually looping through an
array and checking each element for a value (it searched 500Kb in less
than a second)

The Array class in .net has a method called "indexof" that basically is
a version of instr. It can search an array for a value that is stored
in the format of the array's typedef. If the value you are looking for
can't be found, indexof returns a -1. Otherwise, it returns the index
of the element that the value is in.

ie

dim valueToFind as byte = 5
dim YourArray() as byte = {1,2,3,4,5,6,7,8,9}
dim startingLocation as integer = 1


location = Array.IndexOf(YourArray, valueToFind, startingLocation)
 
A

Andy

B.T.W.

Array.IndexOf does a search looking for an object in the array that
matches the object that you've told it to find.

Because objects are used in the comparison, no type conversions take
place. Instead, Array.IndexOf calls the Equals method on the objects
to determine whether a match has occured.

This means that the object to find has to have the same typedef as the
objects in the array.

For example, if you have a byte array that has some elements that
contain the value 32, setting the object to find to a numeric literal
such as 32 or to a variable that is a System.Int16 that holds the value
32 won't work. This is because even though the values may match
between the elements with 32 in the byte array and the search value of
32, their typedefs (which do NOT match) fail the Equals test.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top