Smithers said:
Hummm, hadn't thought about CPU vs file I/O across the wire. I think the IO
is going through a different NIC than that going out to the public Internet
(it better be!).
One hopes so. However, your network i/o isn't the only issue (and in
fact, it's not the one I was thinking of...I did assume that you
wouldn't use the same network adapter for the backup as for the web
server itself).
Presumably the web server will need to access the disk that is being
backed up (at least it sounded like that from your description), and if
it's reading from the disk when the web server is trying to serve up
some data to a client, there's a conflict that changing your i/o
priority could help.
Not sure if that would make much difference. Also, this is
going onto Windows Server 2003, where it will need to run for at least a
couple of years (guaranteed we're NOT upgrading to latest Windows server OS
until it's been out for a while).
In that case, the thread i/o priority is obviously a non-starter.
So, in your opinion, it sounds like Sleep() for 2 seconds or so would be a
perfectly reasonable thing to do in my scenario. That, coupled with low
thread priority. Anything wrong with that approach?
Nothing, really. Absent a way to control thread priority, I'm not sure
there's any other practical way and that method is basically fine.
Your biggest issue doing it that way will be to ensure that not only is
the interval long enough (seems like 2 seconds ought to be), but also
that the time you spend executing is short. With a 2 second interval,
if you spend 500 ms processing, then you're still taking up as much as
25% of the system resources as you would otherwise. If you want the
process to be truly "background", I might aim for 50 ms or less.
Heck, for that matter, you might just break your work into single i/o
calls. When you wake up from the sleep timer, submit a new i/o request,
process the previous one, then sleep again. That way, you might even be
able to keep your processing within a single timeslice, depending on
what exactly you're doing. A thread consuming a single timeslice once
every two seconds ought to be practically unnoticeable, and doing it
that way has the added advantage that practically all of your i/o should
happen while you're sleeping anyway. No blocking on the i/o necessary,
since your thread's asleep anyway.
What constitutes an "i/o request" depends on how you've structured the
program of course. You said you're doing some compression, so depending
on how you've broken that work up, it might be something like reading a
block of data, compressing it, then writing it over the network. In
that case, you might find it most useful to order the operations so that
when your thread wakes up, there's a block of data ready for it to
process, so the last thing it would do before calling sleep would be to
read the next block to process.
FWIW, this backup process can take it's sweet time (within reason) - it's
not like we need for it to complete within x minutes, so if sleeping adds a
total of 15 minutes to the process that's fine (and at 2-seconds per sleep,
there's no way we'd ever approach 15 minutes), depending of course on where
I place the Sleep() calls.
At 2 seconds per sleep, it would add 15 minutes if you called Sleep()
450 times. How many times you actually call Sleep() depends of course
on how you determine the intervals in your processing to call it. But
I'd guess that if you keep the processing time per interval very short,
you could easily wind up calling Sleep() 450 times or even more.
The quick-and-dirty way to figure this out, of course, is to see how
long the process would take without any delay, calculate an non-idle
percentage based on the sleep interval and your target processing time
per interval, and then divide the no-delay time by that percentage.
So, if the process takes 5 minutes normally, you are sleeping for 2
seconds, and you limit each interval to 50 ms of processing time, that's
a 2.5% non-idle time, which gives you a total processing time of 40
minutes. It's not an exact calculation, because it ignores the
difference between having to wait for i/o and not, among other things.
But it should be a good ballpark number.
If it's literally true that you don't need the process to complete in
any specific amount of time, that sort of inflation might be fine.
Otherwise, you're going to have to compromise between getting the
process done fast enough and minimizing its effect on web server
performance.
All that said, you don't say what sort of computer this is running on,
but if you've got (for example) an 8-core box with a huge striped RAID
array with 10,000 RPM disks, all of this might just be a moot point.
With enough performance to spare, it's pointless to waste developer time
on trying to minimize the effects of something like this that isn't
going to run but once a day and for a very short period of time.
Pete