S
Smithers
I would appreciate some recommendations for programmatically determining if
files differ.
I'm writing a utility that backs up files that customers upload to Web
sites. Rather than mindlessly copying any/all files from each Web site to
the backup server (and wasting space), I'm looking to copy only files that
have been modified since the last backup took place. The files include
anything from PDF to GIF/JPG to XML, text, etc. Max size is currently under
5MB, but that could be increased later depending on customer demand.
I understand that I can look to the LastModified date or other file
properties, but I would prefer something more reliable. By "more reliable" I
mean this: I have noticed that the time can differ by a couple of seconds
after copying a file from one server to another. If the logic were to
compare using those date/times, we would expect "false positives" - files
that appear to be newer (different) based on Date/Time, but are in fact no
different. At least this scenario would happen if the logic looked to the
last backup (on the backup server) and compared against the current file on
a Web server.
So I'm thinking that there may be a more reliable way to determine if the
file content is actually different. While it would be a no-brainer to open
each file and compare the contents, that could be a rather costly
operation - given the large number of files to potentially compare, and
their potential large sizes.
So I'm looking for a reliable means through which to determine which files
have, in fact, been changed - and make that determination with fast
performance.
Suggestions? Ideas?
Thanks!
-S
files differ.
I'm writing a utility that backs up files that customers upload to Web
sites. Rather than mindlessly copying any/all files from each Web site to
the backup server (and wasting space), I'm looking to copy only files that
have been modified since the last backup took place. The files include
anything from PDF to GIF/JPG to XML, text, etc. Max size is currently under
5MB, but that could be increased later depending on customer demand.
I understand that I can look to the LastModified date or other file
properties, but I would prefer something more reliable. By "more reliable" I
mean this: I have noticed that the time can differ by a couple of seconds
after copying a file from one server to another. If the logic were to
compare using those date/times, we would expect "false positives" - files
that appear to be newer (different) based on Date/Time, but are in fact no
different. At least this scenario would happen if the logic looked to the
last backup (on the backup server) and compared against the current file on
a Web server.
So I'm thinking that there may be a more reliable way to determine if the
file content is actually different. While it would be a no-brainer to open
each file and compare the contents, that could be a rather costly
operation - given the large number of files to potentially compare, and
their potential large sizes.
So I'm looking for a reliable means through which to determine which files
have, in fact, been changed - and make that determination with fast
performance.
Suggestions? Ideas?
Thanks!
-S