Hemang Shah said:
Yes you are right, that is what I'm trying to achieve.. A sequence of
*Characters* which I thought comprised a string.
I think what Jon was trying to say is that *bytes* and *characters* are
two different things: In .net, characters are usually unicode characters,
i.e. have a size of 2 bytes. You can convert these to a variety of binary
representations (including plain ASCII) which have a different layout.
Now, in your binary file, do you want to look for occurances of a string
in *unicode* representation or ascii (or other) representation?
...
I would really appreciate if you could write me a sample, that would be
going over & beyond!
Here's a little sample I've come up with:
It reads binary blocks of data from a file, then tests every possible
position. After that, it copies the trailing n bytes of the buffer to the
beginning and starts reading after byte n, so it can find matches on
"chunk boundaries". (I think it works)
Note that this is not the fastest searching algorithm; (google for
"boyer-moore" for more info). But I'd guess in your case the HD is the
bottleneck anyway.
using System;
using System.IO;
class BinarySearch
{
static void Main()
{
string stringToLookFor = "7777";
string filePath = @"C:\SomePath\pi.txt";
// convert the string to a binary (ASCII) representation
byte[] bufferToLookFor =
System.Text.Encoding.ASCII.GetBytes(stringToLookFor);
int matchCounter = 1; // count matches for nicer output
// open the file in binary mode
using (Stream stream = new FileStream(filePath, FileMode.Open,
FileAccess.Read))
{
byte[] readBuffer = new byte[16384]; // our input buffer
int bytesRead = 0; // number of bytes read
int offset = 0; // offset inside read-buffer
long filePos = 0; // position inside the file
before read operation
while ((bytesRead = stream.Read(readBuffer, offset,
readBuffer.Length-offset)) > 0)
{
for (int i=0; i<bytesRead+offset-bufferToLookFor.Length; i++)
{
bool match = true;
for (int j=0; j<bufferToLookFor.Length; j++)
if (bufferToLookFor[j] != readBuffer[i+j])
{
match = false;
break;
}
if (match)
{
Console.WriteLine("{0,5}. \"{1}\" found at {3:x}",
matchCounter++, stringToLookFor, filePath, filePos+i-offset);
//return;
}
}
// store file position before next read
filePos = stream.Position;
// store the last few characters to ensure matches on "chunk
boundaries"
offset = bufferToLookFor.Length;
for (int i=0; i<offset; i++)
readBuffer
= readBuffer[readBuffer.Length-offset+i];
}
}
Console.WriteLine("No match found");
}
}
Niki