Binary Differencing

J

Josh Carlisle

I posted a question a few days ago concerning file differencing and I got
some thought provoking answers. My original question dealt with identifying
file differences for storage of multiple versions of documents, some binary
some textual. I know that there are many excellent version control systems
out there (cvs, subversion, etc) but due to restrictions on the project I'm
working on we have to have a custom implementation. In order to gain what
could be some substantial savings for us on both bandwidth and file storage
we have decided to only store and transmit version differences in our system
(backend is sql server). For the most flexibility we want to go with a
binary differencing model. While searching around I found RFC 3284 "The
VCDIFF Generic Differencing and Compression Data Format" that really seemed
to fit my needs from a very high level. A little more work showed that other
version control systems (Vault) actually implemented this RFC for it's
system. Now I've started to digest the RFC in an effort to start some
prototype implementations but as is common the RFC is a fairly high level
document. Is anyone aware on any subject matter concerning implementing
binary differencing such as VCDiff, preferably some .net (or any) code
snippets. I plan to implement this in the .Net framework (C#) hence my post
to this group. Any advice or direction would be greatly welcomed. Thanks!

Josh Carlisle
 
J

Jon Skeet [C# MVP]

Josh Carlisle said:
I posted a question a few days ago concerning file differencing and I got
some thought provoking answers. My original question dealt with identifying
file differences for storage of multiple versions of documents, some binary
some textual. I know that there are many excellent version control systems
out there (cvs, subversion, etc) but due to restrictions on the project I'm
working on we have to have a custom implementation. In order to gain what
could be some substantial savings for us on both bandwidth and file storage
we have decided to only store and transmit version differences in our system
(backend is sql server). For the most flexibility we want to go with a
binary differencing model. While searching around I found RFC 3284 "The
VCDIFF Generic Differencing and Compression Data Format" that really seemed
to fit my needs from a very high level. A little more work showed that other
version control systems (Vault) actually implemented this RFC for it's
system. Now I've started to digest the RFC in an effort to start some
prototype implementations but as is common the RFC is a fairly high level
document. Is anyone aware on any subject matter concerning implementing
binary differencing such as VCDiff, preferably some .net (or any) code
snippets. I plan to implement this in the .Net framework (C#) hence my post
to this group. Any advice or direction would be greatly welcomed. Thanks!

I have a C# *decoder* for VCDiff which is freely available -
http://www.pobox.com/~skeet/csharp/miscutil

Unfortunately I don't have an encoder in C#. It's one of those things
I'd like to do some time, but don't have the time at the moment.

I found RFC 3284 to be one of the best written ones I've seen - the
implementation of a decoder only took about 4 hours.
 
J

Josh Carlisle

Thanks Jon I'll take a look at your decoder. I'm sure it will prove to be
helpfull at the very least for getting me on the right track. I don't have
much experience taking a RFC to code but it does seem to be well written.
Thanks again.

Josh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top