Solution to synchronize 300,000 files within hours?

Y

yuesir

75% Done
Need help to effectively synchronize 3 machines (XP pro, 2000 pro and
98SE) with 3 different FTP Servers (Titan, Raiden and Serv U)
Each machines has about 15 Gigabytes in total of 300,000 15k to 900k
TIFF files spanning 15 levels of 6000 folders. Servers are connected
by broadband, upload transfer rate is 640K on 2 Servers and 4000K on 1
server.

100 to 200 new files are added in each server daily and not much files
are modified (10 to 30 in total maybe)


We are using WS_ftp synchronize utility performing Server A to B sync
(3 hours) then A to C sync (3 hours) and A to B again (3 hours) to make

the files identical. That is a full day job. And most of the time is
used for
scanning and comparing files.
We can't schedule the job because the synchronization halt from time to

time. We need to reboot the FTP server or the client and start the
task again. I doubt it is caused by the large number of files with
small size, we tried to synchronize 4 DVD movies between 3 servers and
the whole process finished in half the time needed and never halt.
Recently the synchronization is getting slower and those non technical
staffs (those who created 6000 folders) are complaining.


I have tried Availl replicate from Availl.com, but it doesn't seem to
operate well for such a large number of files.
tried Syncback from 2brightsparks, replicate locally is great, but
across ftp doesn't get the job done.
tried remote control each server and backup newly added files but there

are so many files with identical name from different folders. I am
unable to put them back to the correct folders.
Have Googled and found suggestion like wget, TftpSync, FTPsync and
robocopy. Do these work? We don't have a domain controller, server
class operating system, BSD or Linux. Redesign the directory
structure, converting small TIFF to BIG PDF or multi-tiff is not
possible.


In summary, I want to look for a file synchronizing or mirroring
Windows software that doesn't need to scan through all the files
before synchronize. The objective is to have a file added in office C
be available to office A in minutes, and three offices need to have the

same set of data at any time. Is there such a solution?


PS. wondering why I provide so much detail? Because I am waiting for
the last synchronize to finish. 88% Done
 
B

badgolferman

(e-mail address removed), 3/1/2006, 9:50:52 AM,
75% Done
Need help to effectively synchronize 3 machines (XP pro, 2000 pro and
98SE) with 3 different FTP Servers (Titan, Raiden and Serv U)
Each machines has about 15 Gigabytes in total of 300,000 15k to 900k
TIFF files spanning 15 levels of 6000 folders. Servers are connected
by broadband, upload transfer rate is 640K on 2 Servers and 4000K on 1
server.

100 to 200 new files are added in each server daily and not much files
are modified (10 to 30 in total maybe)


We are using WS_ftp synchronize utility performing Server A to B sync
(3 hours) then A to C sync (3 hours) and A to B again (3 hours) to
make

the files identical. That is a full day job. And most of the time is
used for
scanning and comparing files.
We can't schedule the job because the synchronization halt from time
to

time. We need to reboot the FTP server or the client and start the
task again. I doubt it is caused by the large number of files with
small size, we tried to synchronize 4 DVD movies between 3 servers and
the whole process finished in half the time needed and never halt.
Recently the synchronization is getting slower and those non technical
staffs (those who created 6000 folders) are complaining.


I have tried Availl replicate from Availl.com, but it doesn't seem to
operate well for such a large number of files.
tried Syncback from 2brightsparks, replicate locally is great, but
across ftp doesn't get the job done.
tried remote control each server and backup newly added files but
there

are so many files with identical name from different folders. I am
unable to put them back to the correct folders.
Have Googled and found suggestion like wget, TftpSync, FTPsync and
robocopy. Do these work? We don't have a domain controller, server
class operating system, BSD or Linux. Redesign the directory
structure, converting small TIFF to BIG PDF or multi-tiff is not
possible.


In summary, I want to look for a file synchronizing or mirroring
Windows software that doesn't need to scan through all the files
before synchronize. The objective is to have a file added in office C
be available to office A in minutes, and three offices need to have
the

same set of data at any time. Is there such a solution?


PS. wondering why I provide so much detail? Because I am waiting for
the last synchronize to finish. 88% Done

I have had a similar problem but not as many files. The main problem I
think is FTPing the individual files. FTP works much better if you can
send one large file to the server. In order to do that your
synchronizing software would have to zip it all up into one file and
send it to the other machine. I don't think this what you want though.

Recently I discovered NetDrive which can make a FTP site appear as a
virtual drive on your machine. This does open up the possibilities for
other synchronizing software however I'm not sure it is the cure all
your looking for. It may be worth investigating. I use NetDrive and
Karen's Replicator for backup and set it up to do its thing on weekends
when no one is around.
http://www.acs.uwosh.edu/novell/netdrive.htm
 
A

Al Klein

Recently I discovered NetDrive which can make a FTP site appear as a
virtual drive on your machine. This does open up the possibilities for
other synchronizing software however I'm not sure it is the cure all
your looking for. It may be worth investigating. I use NetDrive and
Karen's Replicator for backup and set it up to do its thing on weekends
when no one is around.
http://www.acs.uwosh.edu/novell/netdrive.htm

I was thinking of Karen's Replicator (I use it every day), but that's
not exactly what the OP wants.

I think a custom program to grab an upload as soon as it's finished
and send it to the other two servers would do the job exactly as
ordered.
 
S

semi technical guy

Thanks for the reply, I have tried webdrive before but it is not that
effective. I will follow your direction and take a look at Netdrive.
We have tried hardware VPN router to connect and share, different
network's hardisk into one neighbourhood. Accessing individual files
is OK. But when we want to trick those backup, synchronize software
programs that files are in local machine, the software became unstable.
Most of them hang in the middle of scanning and comparing. Some just
report all 300,000 files are different and need to be sync.
 
S

semi technical guy

I have take a brief look at Karen's site, is the Replicator designed
for remote replication use? or is it capable of syncing without
scanning and comparing?
I think a custom program to grab an upload as soon as it's finished
and send it to the other two servers would do the job exactly as
ordered.

That is what I am thinking of, the problem is new files are not put in
one location. They can appear anywhere in the 6000 folders forest, is
there a program that can monitor all these folders for change, upload,
syncing any changed or added files to the other 2 machines in real
time? I tried the hot folder function of WS_FTP. but It is not
possible to create 6000 hot folders for the job.
 
S

semi technical guy

Thank you j...@webace I have taken a look at that site. The
explanation of the use of Windows briefcase and offline files are very
informative. I will tried the offline method because creating a lot
briefcase folders may not be possible. But I think offline files
solution may not be suitable for my situation, we don't have co-workers
need to take the files offsite and making modification to files. I
will give it a try. Thank you.
 
D

Doc

(e-mail address removed) wrote in @u72g2000cwu.googlegroups.com:
In summary, I want to look for a file synchronizing or mirroring
Windows software that doesn't need to scan through all the files
before synchronize. The objective is to have a file added in office C
be available to office A in minutes, and three offices need to have the

same set of data at any time. Is there such a solution?

I don't know the answer, but surely changing the attributes of the files
already mirrored and then only mirroring files with the +a attribute
would help speed it up.
 
S

semi technical guy

This is similar to a backup operation. That question is how to search
for a program that is able to work this way.
 
T

Terry

This is similar to a backup operation. That question is how to search
for a program that is able to work this way.

I'm still a little confused about exactly what you want.

You say these programs "scan and compare". All synchronization
programs that I know of will scan the local and remote directories, to
get a list of files, and their attributes. But most synchronization
programs do not compare the file contents; they compre either the
time/date stamp, or the attributes (such as "A" for archive), or both.
Comparing file contents would be way too slow even on local drives.

I suggest you figure out what is taking all the time in your case. Is
it the "compare" phase, where it scans the directories and compares
time/date stamps, or is the "copy" phase where it copies files between
systems. Most synchronization programs can do just the compare step
without doing the copy, so you can separate which is the
time-consuming step.

If the time is mostly the copy, then look into what is making the copy
so slow. Is it the problem of FTP with many small files as a previous
poster suggested? Or is it just the number of bytes at the upload rate
of 640K that is killing you? Maybe your problem is connection
bandwidth, not software.

If the time is mostly the scan and compare, as I think you implied in
your original post, then look into what the synchrnozation program is
doing for compare. If it is comparing files byte-by-byte, you need a
different program. If it is comparing time/date stamps, then do some
testing of how you are getting the directory listings (filenames and
attributes and time/date stamp). Is the FTP protocol slowing you down
here?

<OT>
I am not familiar with a shareware synchronization program that will
sync to a FTP server, but I haven't made a detailed search. I use
Backer at http://www.cordes-dev.com/english/product.html, which is an
excellent program and will synchronize with a FTP server, but itis
shareware at US$ 39 or EUR 39.
</OT>

You made a comment about some sync programs reporting all files
different and need to be copied. This should not happen, even under
NetDrive, and I suspect that it is happening *the first time* because
the files have different time stamps on the different servers. Once
the sync program get the time-stamps corrected, it should be much more
efficient.

Also, it seems to me that you need to worry about what happens if two
people change the same file on different servers, and then the
synchronization program runs. Simple sync programs will just replicate
the file with the latest time-stamp, so one persons changes will be
lost. Better sync programs (such as Backer) will detect this condition
and report it so you can manually re-edit if necessary.

Terry
 
S

semi technical guy

Terry, thank you for your detail analysis. The slowest part for the
whole process is the scanning part. We need to exchange about 20meg of
data daily between servers, but those syncing program need to scan 15
giga of data before the syncing this 20meg.

I am looking for a program that monitor local copy of the directories
for changed files and new files. Scan, compare and sync with other
servers as soon as a file is changed or added. leave all those
unmodified file alone. But such a program is so difficult to find.
 
T

Terry

Terry, thank you for your detail analysis. The slowest part for the
whole process is the scanning part. We need to exchange about 20meg of
data daily between servers, but those syncing program need to scan 15
giga of data before the syncing this 20meg.

I am looking for a program that monitor local copy of the directories
for changed files and new files. Scan, compare and sync with other
servers as soon as a file is changed or added. leave all those
unmodified file alone. But such a program is so difficult to find.

Again, what do you mean by "scan", and what do you mean by "compare"?

If you just mean get the filenames, time/date stamp, and attributes,
it doesn't matter how large the files are, and it shouldn't take all
that long (if you have a high speed connection). If you are comparing
byte-by-byte, you need a better synchronization program.

In your original post you talked about using WS_FTP, but WS_FTP isn't
freeware anymore, is it? And the older freeware versions don't support
synchronization that I know of.

Please tell us what you have tried (exactly), and what you mean by the
terms scan, and compare (exactly).

Terry
 
B

badgolferman

semi said:
I am looking for a program that monitor local copy of the directories
for changed files and new files. Scan, compare and sync with other
servers as soon as a file is changed or added. leave all those
unmodified file alone. But such a program is so difficult to find.

It seems you want some sort of program that is resident across several
computers and across several thousand miles. If there is such a
program around it will certainly have to be on a VPN and would surely
suck massive amounts of resources.

The only thing that I can think of that may be suitable for you is
Microsoft's Active Directory and then I don't know very much about it
anyway. Another option would be to force the employees to save the
files on a remote server. Good luck.
 
S

semi technical guy

Again, what do you mean by "scan", and what do you mean by "compare"?

What I mean for scan and compare is as follow

Everytime I do the sync, WS_FTP show something like
connect to remote ftp server
comparing remote directory
directory present
comparing files
remote fileA = local fileA
remote fileB = local fileB
remote file missing << local fileC
remote file D >> local file missing
..........the compare continue for all 300,000 files and take more than
2 1/2 hours. After that the actual transfer of files start and take
less than 10 minutes to finish the transfer job.
If you just mean get the filenames, time/date stamp, and attributes,
it doesn't matter how large the files are, and it shouldn't take all
that long (if you have a high speed connection). If you are comparing
byte-by-byte, you need a better synchronization program.

I don't know how the program compare two files. there is no setting of
any kind I can set. I make a rough estimation of average time taken to
compare 2 files, presume the program compare every files (2.5 hours *
60 * 60 / 300000 = .03) It take the machine 0.03 seconds to compare a
pair of files, it is unlikely that it is comparing byte-by-byte. And
yes I do need a better synchronization program.

In your original post you talked about using WS_FTP, but WS_FTP isn't
freeware anymore, is it? And the older freeware versions don't support
synchronization that I know of.

You are right. but I am looking for new solution. Freeware is a
possible solution I hope.

Please tell us what you have tried (exactly)

I have tried software solutions like webdrive, WS_FTP, Availl
replicator, Syncback, Manually compare, upload and download changed
file using FTP Client. They prove to take too much time or unstable
when handling such a large amount of files.
I have tried hardware solution like VPN Router and syncing file
(locally), it is slower than FTP method.
what you mean by the terms scan, and compare (exactly).

Let's define it as the process that is taken place to find out what
files are different, before the actual transfering of files begin.

Thank you Terry, I hope this explain the matter better.
 
V

Vic Dura

What I mean for scan and compare is as follow

Everytime I do the sync, WS_FTP show something like
connect to remote ftp server
comparing remote directory
directory present
comparing files
remote fileA = local fileA
remote fileB = local fileB
remote file missing << local fileC
remote file D >> local file missing

Are you using WS_FTP Pro or WS_FTP LE? What's the version number?

I'm using WS_FTP LE v5.08 and I don't see any commands that would do
the above "compare" action.
.........the compare continue for all 300,000 files and take more than
2 1/2 hours. After that the actual transfer of files start and take
less than 10 minutes to finish the transfer job.


I don't know how the program compare two files. there is no setting of
any kind I can set. I make a rough estimation of average time taken to
compare 2 files, presume the program compare every files (2.5 hours *
60 * 60 / 300000 = .03) It take the machine 0.03 seconds to compare a
pair of files, it is unlikely that it is comparing byte-by-byte. And
yes I do need a better synchronization program.

That looks to me like the time it would take for the local machine to
send a request for file size/date/time to the remote machine and then
get a reply back.
 
T

Terry

I don't know how the program compare two files. there is no setting of
any kind I can set. I make a rough estimation of average time taken to
compare 2 files, presume the program compare every files (2.5 hours *
60 * 60 / 300000 = .03) It take the machine 0.03 seconds to compare a
pair of files, it is unlikely that it is comparing byte-by-byte. And
yes I do need a better synchronization program.

I agree, over a FTP link it can't be doing byte-by-byte compare, it
must be doing file time/date comparisons.
I have tried software solutions like webdrive, WS_FTP, Availl
replicator, Syncback, Manually compare, upload and download changed
file using FTP Client. They prove to take too much time or unstable
when handling such a large amount of files.
I have tried hardware solution like VPN Router and syncing file
(locally), it is slower than FTP method.

OK, you've tried several programs that are capable of syncing over
FTP. You could try more, but I don't think they are likely to be any
better.

I see a few possible solutions:

Option A: as badgolferman suggests, "Another option would be to force
the employees to save the files on a remote server." Put all the
files on *one* system, and have all the employees save to that system.
You would need to set up a VPN again, so that the other two (remote)
sites could access the files via the broadband network.This way there
is never any question of being out of sync. Two of the sites would be
accessing thes files remotely so could see some delays, but I doubt
this would be noticeable unless you have large files or lots of
employees. However, if your network goes down, the remote employees
have no file access.

Option B would be to get someone to write a custom program for you, as
suggested by Al Klein. The program would monitor the 6000 local
directories for changes, and when a change is detected, send that file
to the remote sites (via FTP). You would need to run the program at
each site.

Option B is not a huge job, but any custom programming job is
expensive. And custom programs are also expensive every time you need
to change something. Also this option is "fragile", in the sense that
if for some reason a particular file does not get replicated to a
remote site, you won't know about it, and now you're permanently out
of sync.

Option C would be to resturcture your work flow so that every employee
does not need to access 300,000 files.

I would recommend option C first, but you will probably say that is
impossible, so then I would go with option A. Option A also has
additional benefits, such as allowing centrailized email management,
allowing printing at a remote site, etc.

But I think we're getting pretty far off topic for this group. All of
this depends on what you are doing and why, how robust the solution
must be, how much money you can afford to spend on it, etc.

Terry
 
S

semi technical guy

Thank you Terry, I will seriously consider Option A and C. I can stop
to search for commercial software that will do the job now, I've wasted
too much time on it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top