replacing text data in a binary file

A

Adam J. Schaff

I am writing a quick program to edit a binary file that contains file paths
(amongst other things). If I look at the files in notepad, they look like:

<gibberish>file//g:\pathtofile1<gibberish>file//g:\pathtofile2<gibberish>
etc.

I want to remove the "g:\" from the file paths. I wrote a console app that
successfully reads the file and writes a duplicate of it, but fails for some
reason to do the "replacing" of the "g:\". The code follows with a note
showing the line that is not working. When I look at the "s" variable in
break mode, I see that VB does not show the entire file contents, even
though when I write "s" to the second file stream, the entire original file
is duplicated. I suppose this is because the file content isn't intended to
be interpreted as a string (its binary after all). It is probably hitting
some unfriendly bytes that it can't interpret for string operations like
Replace. If that's the case, maybe I need to interact with it a character at
a time, although I'm not sure how I would do a replace that way. Any ideas
or code would be greatly appreciated. As you can tell, I don't do much
binary file I/O.

Sub Main()
'read the binary file into a string var
Dim fs As New FileStream("C:\Source\Sample and
Demo\Foobar\EditFpl\Only Fools.fpl", FileMode.Open, FileAccess.Read)
Dim sr As New StreamReader(fs, System.Text.Encoding.UTF8)
Dim s As String = sr.ReadToEnd
sr.Close()
fs.Close()

'remove the hard-coded drive letter specification
s.Replace("file://G:\", "file://") 'THIS LINE DOES NOT WORK

'write a new binary file with my changes
Dim fs2 As New FileStream("C:\Source\Sample and
Demo\Foobar\EditFpl\Only Fools2.fpl", FileMode.CreateNew, FileAccess.Write)
Dim sw As New StreamWriter(fs2, System.Text.Encoding.UTF8)
sw.Write(s)

sw.Close()
fs2.Close()
End Sub
 
C

Cor Ligthert

Hi Adam,

A simple one because you are not the first one who did that as you do .

s = s.Replace("file://G:\", "file://") 'THIS LINE DOES NOT WORK

or

dim b as String = s.Replace.......

Before you ask why

:)

I hope this helps?

Cor
 
H

Herfried K. Wagner [MVP]

* "Adam J. Schaff said:
'remove the hard-coded drive letter specification
s.Replace("file://G:\", "file://") 'THIS LINE DOES NOT WORK

'Replace' is a "function", so it returns the result:

\\\
s = s.Replace(...)
///
 
J

Jay B. Harlow [MVP - Outlook]

Adam,
If you are editing a binary file I would recommend opening the file with a
BinaryReader or even just FileStream directly.
Dim sr As New StreamReader(fs, System.Text.Encoding.UTF8)

If you use a StreamReader, you are actually converting the file into Text
(8bit bytes to 16 bit chars) then converting the file back into Binary.

My two concerns with using a StreamReader are:
1. depending on what the binary file actually contains (an exe for example)
removing or adding characters may actually corrupt the file, as the file
contains length & offset fields...
2. Loosing certain bytes, ones that are not translated nicely from arbitrary
bytes back & forth to Unicode. For example using Encoding.ASCII will trash
your file as ASCII is 7 bit, you will loose all the high order bits. I
suspect with UTF8 you will be OK, however it just does not feel right...

Here is a simple program that will copy a file one byte at a time, the trick
is going to be modify it so it looks for "g:\" one byte at a time and skips
those bytes, only if all three are found... Especially if you want to ensure
they are prefaced with "file//".

Dim input As New FileStream("Only Fools.fpl", FileMode.Open)
Dim output As New FileStream("Only Fools2.fpl", FileMode.Create)
Dim value As Integer
value = input.ReadByte()
Do Until value = -1
output.WriteByte(CByte(value))
value = input.ReadByte()
Loop
input.Close()
output.Close()

Hope this helps
Jay
 
A

Adam J. Schaff

Doh! I was so absorbed with the dangers of dealing with binary data as text,
that I forgot to check for obvoius blunders. I've used Replace before and
should have known better! Thanks for not adding "what a flaming idiot" to
your replies. ;-)
 
A

Adam J. Schaff

Jay,

Thanks for the code. I looked at using the binary reader, but couldn't
figure out how to tell when it reaches EOF. Had no idea about the -1 trick.
That said, I'm still not sure what the code would look like to test one byte
at a time for a sequence of 6 characters, particularly if those characters
are stored as 2 bytes each. Hmm, I suppose there is a ReadChar method, but
if I use that instead, I couldn't use the -1 trick. Also, how would I write
it back to the output file if I use ReadChar instead of ReadByte? Is there a
WriteChar, and will it cause problems if some of the data I'm reading with
ReadChar isn't really a char? Ack! I'm just too new to all this. I don't
even understand the use of cbyte in your code (isn't it a byte already?).
Well, at least I have food for thought. Thanks again for the advice. I can
see I have a lot to learn.

Even if I do stay with the stream reader, I'll take your worries to heart
and will add code to backup the file before I play with it. That's just good
common sense.
 
A

Adam J. Schaff

p.s. I just notice that value is an integer. That explains the cbyte in your
code, but now I'm confused why ReadByte returns an integer?? Is it something
to do with testing for EOF? (-1 isn't a typical byte value it occurs to me
;)
 
J

Jay B. Harlow [MVP - Outlook]

Adam,
&hFF (-1) is a valid Byte value, returning an integer allows -1 for EOF &
&HFF for the byte.

Hope this helps
Jay
 
J

Jay B. Harlow [MVP - Outlook]

Adam,
For a binary file, it really depends on the encoding, which I would try to
avoid! If you were using UTF8 and it worked then there is 1 char per byte.
If you need UTF16, then you have (16bit) Unicode itself & there are 2 bytes
per char (in the file), there is also a UTF32 ;-). In other words do not
confuse how a string is represented in memory and how text is encoded in a
file. Binary files could actually have multiple encodings of text. Consider
a file that stores the code page & the length of a string, followed by the
code page encoded value of the string... Speaking of which if you binary
file stores the length of the string, removing the "g:\" may cause problems
;-)

If you start using ReadChar & WriteChar you are back to translating to Text.

Assuming your file is not EBCDIC & you are not using any extended character
(an umlated a for example "ä"). A single byte in your file will contain a
single character. EBCDIC you still have 1 byte per single character however
its bit representation are different, extended characters (ANSI code pages)
are single character per byte, but bytes 128 to 255 have different bit
representations...

You can then use Asc or AscW to convert a char into a integer/byte.
if value = AscW("G"c)
' found a G, look for a :
End if
For information on Unicode and other character sets see:

http://www.yoda.arachsys.com/csharp/unicode.html

A couple other articles that may help:
http://www.yoda.arachsys.com/csharp/debuggingunicode.html

Hope this helps
Jay
 
A

Adam J. Schaff

Thanks for the sample code and links! I realize that removing characters
might be a problem. If that happens, I might try substituting the new drive
letter (e.g. "H:\") for the old and just running this on the file every time
the drive letter changes. Not my first choice, but I bet that would work
because it won't change the file length. Thanks again for all your help.
 
H

Herfried K. Wagner [MVP]

* "Cor Ligthert said:
From the mountains in Austria are comming more and more echo's.

.... and I am living in Austria, so I don't hear them.

:)))
 
J

Jay B. Harlow [MVP - Outlook]

Adam,
One other concern I had was if the file has any check sums in it, to
"prevent" people from tampering with it...

Change G:\ to H:\ will change the check sum...

Just a thought
Jay
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top