how do I find a hanging thread?

  • Thread starter Thread starter David Bartosik - MS MVP
  • Start date Start date
D

David Bartosik - MS MVP

I have a windows service that creates 5 worker threads, I'm finding that
over time the service is still going but the threads are evidently dead (no
work is done). Since I don't think they can actually die in the while loop
I'm assuming they get 'hung'.
I'm trying to determine how to look for and find a hanging thread so that it
can be killed and replaced.
I'm looking at threadstate but wouldn't a hanging thread still be considered
running so testing for stopped wouldn't work?
I'm considering a timing idea where if the work is taking to long
(pre-determined variable) I kill the thread and replace it. But what would
be the correct way to do this? I'm looking at Timer but it looks like that
is used to time an event at a regular interval versus timing of a process.
ideas?

--
David Bartosik - MS MVP
for Publisher help:
www.davidbartosik.com
enter to win Pub 2003:
www.davidbartosik.com/giveaway.aspx
 
Hi David,

My thoughts on this :).

I would probably have 5 bool members, one for each thread. Everytime I start
tp prcess something in thread I would set the value of the appropriate bool
member to false indicating that the work is not finished. When the thread
finishes the job, set the bool member to true.

Then I would create a new thread that would check the bool values at certain
intervals and check which of these values is still false. That will tell
you that your thread has "hung" so you can abort/stop it and create a new
thread.

Hope this helps

Fitim Skenderi
 
Hi David,

Does Fitim's reply make sense to you? Do you need further help?

Please feel free to feedback, thanks.

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
"Jeffrey Tan[MSFT]" said:
Hi David,

Does Fitim's reply make sense to you? Do you need further help?

Please feel free to feedback, thanks.

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

I ended up using ticks from datetime and setting a time stamp before and
after the work process and then having a management routine cycle thru the
times looking for durations longer than 60 seconds. If so then I abort the
thread and replace it. So far it seems like a decent solution.
It does appear that the hanging happens on my sql process, which I don't
claim to understand. I have a try catch block around the data piece and a
timeout set on both connection and query. I would think that if something
went wrong the thing would fall into the catch and continue but evidently
not. But the timestamp routine at least keeps me running.
On the downside it has appeared to me that the process of aborting a thread
and replacing it is expensive, I've watched the task manager spike to 88% or
even 100% of cpu when that is being performed.

--
David Bartosik - MS MVP
for Publisher help:
www.davidbartosik.com
enter to win Pub 2003:
www.davidbartosik.com/giveaway.aspx
 
David,

David Bartosik - MS MVP said:
I ended up using ticks from datetime and setting a time stamp before and
after the work process and then having a management routine cycle thru the
times looking for durations longer than 60 seconds. If so then I abort the
thread and replace it. So far it seems like a decent solution.
It does appear that the hanging happens on my sql process, which I don't
claim to understand. I have a try catch block around the data piece and a
timeout set on both connection and query. I would think that if something
went wrong the thing would fall into the catch and continue but evidently
not. But the timestamp routine at least keeps me running.
On the downside it has appeared to me that the process of aborting a thread
and replacing it is expensive, I've watched the task manager spike to 88% or
even 100% of cpu when that is being performed.

Could this be due to the fact that first all finally blocks are executed
before the thread is terminated? If your connections are released in a
finally block then that could slow things down.

Perhaps you could log your sql commands and see what the last command was
before the thread hangs.

HTH,
 
David Bartosik - MS said:
It does appear that the hanging happens on my sql process

What do you mean by a thread hanging? I can see an unused thread pool
thread being idle or I can see a thread programmed to wait (e.g., for
some piece of information) but there must be an underlying reason a
thread is waiting. Have you looked to see what these threads are doing?

Mike Blake-Knox
 
Hi David,

Sorry for letting you wait so long time. I am too busy these days.

Yes, I will ask the same as Mike that: what does your "Thread hang" mean?
From your feedback, I think it should mean the thread does not work and
just idle infinitely.

If I did not misunderstand you, I think there must be a dead lock in your
application, this is a very possible thread hang in multi-threading
application.

You should detect if dead lock existed in your application.

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Hi David,

Do you need more help? Still have any concern on this issue?

Please feel free to feedback, I will work with you. Thanks

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
I have an almost exact situation occuring. I have a multithreaded application, multiple instances of the same thread actually. The thread has a database connection (CDatabase). I keep the database connection open until I encounter an error, then I close and delete it and create a new one. So basically the thread first make sure it has a database connection and creates one if it doesn't. It then queries a database table and may or may not update records in the table. I have logging in the application, I have a message written to the log just before I do the Recordset.Open(sql statement), and then a message after the open completes (pass or fail). I have seen cases where 5 of the opens throw a DBException (timeout) and 10 of them complete successfully, or some combination of timeouts and some that keep running. Other weird thing is the timeouts dont seem to be consistent across instances. To qualify that, if 5 opens time out at the same time they all time out in the same time (lets say 20 seconds). Then I can see other instances where maybe 3 opens time out and it takes over a minute

The application is running in many places and has been running for over a year, we just started seeing this happen

I am worndering if a service pack for Windows 2000/MDAC or maybe SQL Server 2000 may have induced this behavior, or maybe there is something I am just plain doing wrong, but I am almost certain we are handling all of the exception cases correctly

Thank
Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top