Errors in services. How should you handle them?

Claire · Apr 1, 2005

I'm writing a realtime monitoring application acting as a Windows service.
This service communicates to some instrumentation via a third party dll. TCP
is the transport mechanism over the network.
What I want to know is how should errors be handled in Windows services? I
foresee problems with this sort of application, especially with the tcp
side.
The service should be able to recover from errors as it's going to be sited
in places where there are power outages. I expect the network to go down
etc.
Should users be alerted if connection can't be achieved? When?

Stephany Young · Apr 1, 2005

Unfortunately Claire, only you can decide if and when you should notify
someone.

Once you have decided the ifs and whens, the next question becomes the hows
and whats. i.e. How do I notify the notifiee and what do I notify them of.
The answers to these are dictated by the mechanisms that are available to
your service and what information is available. Basically, it will come down
to personal preference.

Claudio Grazioli · Apr 1, 2005

I'm writing a realtime monitoring application acting as a Windows service.
This service communicates to some instrumentation via a third party dll. TCP
is the transport mechanism over the network.
What I want to know is how should errors be handled in Windows services? I
foresee problems with this sort of application, especially with the tcp
side.
The service should be able to recover from errors as it's going to be sited
in places where there are power outages. I expect the network to go down
etc.
Should users be alerted if connection can't be achieved? When?

I wrote a lot of service applications in the past (in C++). What I can tell
you from experience:
- use logfiles (or log to a database), where you log the
important things happening in your service
- alert someone, if a severe error happens that needs
user/administrator interaction in anyway.
- make it configurable (who is alerted, when, etc.)

In a .NET application, I strongly suggest to use Log4Net
(http://logging.apache.org/log4net) to do that. Using log4net you also get
very good configuration possibilities to configure what kind of errors are
logged to which device and when shall someone (and who) be alerted.

Instead of Log4Net you could also use the Logging Appliaction Block of the
Enterprise Library
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag2/html/entlib.asp).

hth

Alex Passos · Apr 1, 2005

I wrote a similar service for an enterprise application that dealt with the
same kinds of problems you are facing here: power outtages, lost
connections, etc. Here are a few things to keep in mind while you design
your service:

1) Make your service fail safe from the stand point that if network is not
available it can just idle and periodically check for network availability,
your service should not "hang" or "crash" if the network goes down. This
will involve wrapping all of your network calls with error handlers so that
you won't be using any sockets that are disconnected or in an invalid state.

2) Logs are really useful. If possible log all acitivity to a local
database. For example: Service start time, service shutdown time, and if
the network goes up and down log the time the network became unavailable and
when it was detected up again. Log messages are extremely useful when you
have to trace through a problem that may not be necessarily happening with
your code/design but instead with the network that is connected to the
server.

3) Assign priority to log entries such as Notice, Caution, and Critical
depending on the kinds of problems you foresee encountering during operation
of the server. In the product I worked on the logs were available through a
web interface and color coded accordingly, it was very easy to go straight
to critical and see what happened leading up to that critical log.

4) If your service is going to be multithreaded you add another dimension of
complexity. This is not a bad thing but you must make sure that your
service main thread is able to recycle threads that crash or terminate
unexpectedly. You should be able to gracefully deallocate the memory from a
crashed thread and respawn it to continue what it had been doing previously.
This is not something that is trivial to do.

5) Lastly don't leave any connections open in an environment like this.
Crashing because of a power outtage may actually corrupt the files or
databases for which you are writing to when it happens. So when making
entries into a database or file, if it is not a performance hit go with the
open->write->close sequence and try to make each transaction as atomic as it
can be.

Project I worked on ran a multithreaded service in a fairly hostile
environment where it received data from one end, processed the data, and
sent data to another end point where it needed to be able to detect if the
data received was valid (not garbled) and that the receving system was
available (not crashed) and if not it would have to queue the data until it
became available (each transaction carried a $$$ amount so no transactions
could be lost). In addition locations were often hostile at the very least
and prone to lightning strikes and other hazards, go figure systems running
in closets at air force bases.

This is really long, but I hope it will give you a few things to consider as
you design your system.

Alex

Claire · Apr 4, 2005

Thanks to everyone
Yes Alex, just sounds like my system except I don't have direct access to
the ports as I have to communicate via the instrument vendors DLL ;o(
I just hope the developer's code is well behaved.

Claire

Errors in services. How should you handle them?

Claire

Stephany Young

Claudio Grazioli

Alex Passos

Claire