Keeping application responsive with BackgroundWorker

J

Joseph Geretz

A subtle point: If your BackgroundWorker reports its progress back to the
main application window, you'll need to maintain a positive ratio of time
spent within the BackgroundWorker event (_DoWork) on the separate thread, vs
time spent in the report back (_ProgressChanged) event which operates on the
same thread as the main window. If this isn't the case, then you might as
well run your work in the main window thread and avoid all the thread
switching.

A practical example: I'm working on a file processor; it's reading a file
with over a million records. I placed the file processing code into a
BackgroundWorker thread in order to keep the application UI responsive. As
I'm developing this application, there's really nothing in the processing
loop aside from the read of the record, since I decided to get the
infrastructure in place first and then proceed to the details of processing
each record. On my first attempt, reporting status back with every record
locked up the application as tight as if the processing loop would have been
in the main window thread. Reporting back every 100th record wasn't any
better. It wasn't until I scaled back the feedback to every 1000th record
that I saw a positive improvement. (Of course, once I insert processing code
into my loop, this will increase the time spent in the worker thread and so
I'll probably end up with feedback on every 100th record by the time I'm
done.)

It's a subtle point and it took me a few minutes to figure out the
principle. I guess this is pretty simple, but I'm writing this up in case
there's anyone out there who can benefit form this advice.

Hope this helps,

- Joseph Geretz -
 
W

Willy Denoyette [MVP]

Joseph Geretz said:
A subtle point: If your BackgroundWorker reports its progress back to the
main application window, you'll need to maintain a positive ratio of time
spent within the BackgroundWorker event (_DoWork) on the separate thread,
vs time spent in the report back (_ProgressChanged) event which operates on
the same thread as the main window. If this isn't the case, then you might
as well run your work in the main window thread and avoid all the thread
switching.

A practical example: I'm working on a file processor; it's reading a file
with over a million records. I placed the file processing code into a
BackgroundWorker thread in order to keep the application UI responsive. As
I'm developing this application, there's really nothing in the processing
loop aside from the read of the record, since I decided to get the
infrastructure in place first and then proceed to the details of
processing each record. On my first attempt, reporting status back with
every record locked up the application as tight as if the processing loop
would have been in the main window thread. Reporting back every 100th
record wasn't any better. It wasn't until I scaled back the feedback to
every 1000th record that I saw a positive improvement. (Of course, once I
insert processing code into my loop, this will increase the time spent in
the worker thread and so I'll probably end up with feedback on every 100th
record by the time I'm done.)

It's a subtle point and it took me a few minutes to figure out the
principle. I guess this is pretty simple, but I'm writing this up in case
there's anyone out there who can benefit form this advice.

Hope this helps,

- Joseph Geretz -


You better use a timer to update the UI at a fixed interval, say a few times
per seconds at most, the way you are doing depends too much on the
processing algorithm, the performance characteristics of the system (OS,
CPU, IO) and the actual load.

Willy.
 
P

Peter Duniho

[...] On my first attempt, reporting status back with every record
locked up the application as tight as if the processing loop would have
been
in the main window thread.

In addition to Willy's suggestion, I'll point out that if you're using
Invoke() instead of BeginInvoke() to update the UI, you are exacerbating
the situation by forcing the UI update to occur synchronously with your
processing. Not only does this synchronize things, it guarantees two
thread context switches per update, automatically forcing a "worst case"
scenario.

It's still not a great idea to flood the main thread's message queue with
UI updates, but if you were using BeginInvoke() my guess is that the UI
updates would scale much better without such a performance issue,
especially if your processing is file i/o. This is true even on a
single-processor computer (file i/o is much slower than video i/o, so on
average one file i/o access should take much longer than one video i/o
access), and with the more recent PCs that have true multi-core/multi-CPU
configurations, system responsiveness probably wouldn't suffer much, if at
all.

Of course, as you note, once you are actually _doing_ something with the
records you're reading, the balance will shift back even more to the UI
updates not taking a lot of the time.

But really, the greatest theoretical frequency with which a user might
actually perceive updates is something like 100 times per second, so
updates more than once every 10 ms are pointless, and in reality updates
more than once every 100 to 500 ms are almost certainly overkill.

Which brings us back to Willy's suggestion. :)

Pete
 
J

Joseph Geretz

Thanks Willy and Pete, for your suggestions.
But really, the greatest theoretical frequency with which a user might
actually perceive updates is something like 100 times per second, so
updates more than once every 10 ms are pointless, and in reality updates
more than once every 100 to 500 ms are almost certainly overkill.

Which brings us back to Willy's suggestion. :)

In adopting this approach, should an actual Timer be created on a separate
thread, or are you recommended that I simply employ a timecheck operation in
each pass through my loop in order to limit my UI updates to once every
500ms or so? If the former, how will the Timer running on a separate thread
have access to the state of processing taking place in the
BackgroundWorker_DoWork event in order to report this back to the UI? I
guess you are recommending the latter approach, a simple timecheck within
the loop processing in the _DoWork event?

Thanks for your advice.

- Joseph Geretz -

Peter Duniho said:
[...] On my first attempt, reporting status back with every record
locked up the application as tight as if the processing loop would have
been
in the main window thread.

In addition to Willy's suggestion, I'll point out that if you're using
Invoke() instead of BeginInvoke() to update the UI, you are exacerbating
the situation by forcing the UI update to occur synchronously with your
processing. Not only does this synchronize things, it guarantees two
thread context switches per update, automatically forcing a "worst case"
scenario.

It's still not a great idea to flood the main thread's message queue with
UI updates, but if you were using BeginInvoke() my guess is that the UI
updates would scale much better without such a performance issue,
especially if your processing is file i/o. This is true even on a
single-processor computer (file i/o is much slower than video i/o, so on
average one file i/o access should take much longer than one video i/o
access), and with the more recent PCs that have true multi-core/multi-CPU
configurations, system responsiveness probably wouldn't suffer much, if at
all.

Of course, as you note, once you are actually _doing_ something with the
records you're reading, the balance will shift back even more to the UI
updates not taking a lot of the time.

But really, the greatest theoretical frequency with which a user might
actually perceive updates is something like 100 times per second, so
updates more than once every 10 ms are pointless, and in reality updates
more than once every 100 to 500 ms are almost certainly overkill.

Which brings us back to Willy's suggestion. :)

Pete
 
P

Peter Duniho

In adopting this approach, should an actual Timer be created on a
separate
thread, or are you recommended that I simply employ a timecheck
operation in
each pass through my loop in order to limit my UI updates to once every
500ms or so?

The very first thing I would try is switching to BeginInvoke() rather than
Invoke(), if you have not already done so. Even in your degenerate case
that you're doing now, I suspect that will yield a much more usable UI,
and once you add code to actually do something with the results of the
i/o, the overhead of updating the UI should be much less.

KISS. This is the simplest approach, and should not be abandoned until
you have determined it to be an actual performance problem (however
theoretically inefficient it may be).

Note the one thing you definitely do _not_ want to do in your status
update is to call Refresh() or Update(). This definitely will kill
performance, especially in a high-bandwidth scenario such as this one.
You only want to use the status notification to update some data that will
eventually be displayed to the user. Then invalidate the part of the UI
where that data is shown. Eventually, Windows will get around to causing
an actual screen refresh to occur, and the data will be shown with the
latest value.

Note that you may not need to invalidate the region explicitly. For
example, if you are using a Label or TextBox control and setting the Text
property, you can update the data and the control will implicitly handle
invalidating itself.

Lots of updates may go by undisplayed using this mechanism, but that's
just fine. It allows Windows to take over the task of scheduling the
relatively slower operation of video i/o, so that hopefully you don't wind
up with too much performance overhead.

Now, if after all that you find that performance is still suffering, you
need to move on to being smarter about how often you update status. IMHO,
the next thing to try is indeed just checking elapsed time periodically
within your processing (once per record, once per ten records, whatever)
and only emitting an update when a specific length of time has passed.

Potential downsides to this: you're polling the time, and polling anything
is usually a bad idea; also, it quantizes the update scheduling according
to processing of each record. While both of these are potentially
negative, I'd guess they are in practice not a problem. A quick test
reveals that on my Core 2 Duo 2.33Ghz computer, I can call DateTime.Now
more than 2 _million_ times in a second. Your data processing would have
to be pretty damn fast for that to be an issue. On the other hand,
quantization is only an issue if data processing is really slow. Again,
if that were really the case then just using BeginInvoke() to update the
UI wouldn't have been an issue in the first place.

So, I doubt that polling the time would be cause for an issue. Heck, you
could probably even get away with doing math with each polling rather than
precalculating a future time at which the next update would occur (though,
I'd still precalculate just because it's easy and IMHO more readable
anyway).

Now, after all that, you find that performance is still an issue, then
yes...perhaps going to a Timer-based approach would be useful. I would
not bother with a new thread for that purpose. Just put a Forms.Timer
event handler in your status update form, use a volatile variable to
contain the status (e.g. a counter that contains the number of records
processed so far), and read that in your Timer event handler to translate
that into the UI status.

However, this does complicate the code design somewhat, and I suspect you
will find that you don't have performance issues with simpler designs that
justify moving to the more complicated design.
If the former, how will the Timer running on a separate thread
have access to the state of processing taking place in the
BackgroundWorker_DoWork event in order to report this back to the UI? I
guess you are recommending the latter approach, a simple timecheck within
the loop processing in the _DoWork event?

In the former scenario, a volatile variable should be sufficient (as
mentioned). You're not trying to access the data that's actually been
processed, as near as I can tell. You just need to know how much of it
has been. You could even do the percentage calculation in the worker
thread, and make the status variable just a number from 0 to 100.

If you want to actually display results from the processing as they occur,
then things get more complicated. But it doesn't sound as though that's
the goal here.

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top