How do you kill a completly locked up thread?

T

TheSilverHammer

Because C# has no native SSH class, I am using SharpSSH. Sometimes, for
reasons I do not know, a Connect call will totally lock up the thread and
never return. I am sure it has something to do with weirdness going on with
the server I am talking to. Anyhow, this locked up state happens once in a
while (maybe once per day) and I can't figure out how to deal with the locked
up thread.

If I issue a Thread.Abort() the exception never gets thrown in the thread
because it is locked up. This seems to be the only C# method I know of to
kill a thread. Is there some other way to kill off a thread?

A way you can simulate this yourself, is create any thread that connects to
a server where the connection takes some time, like 10 to 20 seconds. When
the thread is doing this connect (it will happen with even a simple TCP/IP
socket connect) issue a Thread.Abort() from another thread (the one that made
the Thread Object) and you will see that the ThreadAbortException will NOT be
thrown until the Connect call returns.

Another way you can do this is after the connect call is finished and you
start to talk to a server, if you are on a recive data call and the server
stops sending data but never closes the connection, it will block forever.
You will once again not be able to get Thread.Abort() to kill the locked up
thread.

Is there anyone, especially a MSVP who can answer this?
 
B

Ben Voigt [C++ MVP]

TheSilverHammer said:
Because C# has no native SSH class, I am using SharpSSH. Sometimes, for
reasons I do not know, a Connect call will totally lock up the thread and
never return. I am sure it has something to do with weirdness going on
with
the server I am talking to. Anyhow, this locked up state happens once in
a
while (maybe once per day) and I can't figure out how to deal with the
locked
up thread.

If I issue a Thread.Abort() the exception never gets thrown in the thread
because it is locked up. This seems to be the only C# method I know of
to
kill a thread. Is there some other way to kill off a thread?

If you need to terminate a thread while it's running native code, especially
inside a kernel call, you have no way of knowing what state it is modifying
and keeping it coherent. You have to assume the whole process is corrupted.

The only safe way to forcibly end a failed thread like you have is to end
the process containing it.

Do you have access to the socket handle for the connection? If you shutdown
(non-gracefully by setting SO_DONTLINGER) the socket from a different thread
then that will probably cause the stuck operation to complete immediately.
 
T

TheSilverHammer

Wow, these are some fast replies. Normally I can go several days without
one.

Anyway, I am using the SharpSSH class which I did not write, however, I do
have the source code and I suppose I could dig through it to find the socket
calls.

However I am not sure where all the deadlocks are happening in it so it
would be very hard to catch all the problems. In some cases I think the
connect succeeds and a lockup may occur in a receive data callback which runs
in the main thread of that instance of my object (opposed to the SSH object).


Ill look at the egg-head café solution, but I am not sure how applicable it
can be to all the instances of a lockup. For example, if an event handler
in your class has been called by another object (IE: SharpSSH asynch callback
for on data received) you can't wrap that in a method call that can time out
can you?
 
T

TheSilverHammer

Here is a code snippit from an asynch callback that I am sure is one of the
causes of my thread being locked up when the SharpSSH shell object dies.

ReadDataCallback = new AsyncCallback(OnReadData);
shell.IO.BeginRead(RecvBuff, 0, RecvBuff.Length,
ReadDataCallback, null);

while (true == shell.ShellOpened)
{
// See if we have data to send
lock (SendBuffer)
{
if (0 != SendBuffer.Length)
{
shell.Write(SendBuffer);
SendBuffer = string.Empty;
}
}

Thread.Sleep(50);
}

I am not sure how to set the ReadDataCallback up so that it can recover from
a hard lockup from the shell object. The method on the egghead cafe page
doesn't seem to fit this very well.
 
T

TheSilverHammer

Ben Voigt said:
If you need to terminate a thread while it's running native code, especially
inside a kernel call, you have no way of knowing what state it is modifying
and keeping it coherent. You have to assume the whole process is corrupted.

The only safe way to forcibly end a failed thread like you have is to end
the process containing it.

Is there an unsafe way to kill it? I know it can be done, such tools like
process explorer can let me select a single thread of my app and kill it.
 
J

Jeroen Mostert

TheSilverHammer said:
Because C# has no native SSH class, I am using SharpSSH. Sometimes, for
reasons I do not know, a Connect call will totally lock up the thread and
never return. I am sure it has something to do with weirdness going on with
the server I am talking to. Anyhow, this locked up state happens once in a
while (maybe once per day) and I can't figure out how to deal with the locked
up thread.

If I issue a Thread.Abort() the exception never gets thrown in the thread
because it is locked up. This seems to be the only C# method I know of to
kill a thread. Is there some other way to kill off a thread?
Yes, the unmanaged TerminateThread(). However, this doesn't work, in that it
will kill off the thread, but leave approximately zero chance for your
application to continue running successfully. You are guaranteed to corrupt
internal state with this, especially since the CLR gets no chance to cleanly
release resources associated with that thread. Seriously, don't do this.
Your application will probably just deadlock later on the locks the
terminated thread was holding, if it doesn't just crash on corrupted state.

Also, there's no obvious way to find the thread that's blocking. For one
thing, you can kill off the thread corresponding to the Thread object, but
this is not guaranteed to be the thread doing the actual blocking I/O, it
might just be waiting on another thread. As a result, you've just leaked a
thread that's still busy blocking, and worse, the actual I/O is still in
progress, so the socket is unusable. You don't want to repeat this exercise,
as it's a good way to run out of resources fast.
A way you can simulate this yourself, is create any thread that connects to
a server where the connection takes some time, like 10 to 20 seconds. When
the thread is doing this connect (it will happen with even a simple TCP/IP
socket connect) issue a Thread.Abort() from another thread (the one that made
the Thread Object) and you will see that the ThreadAbortException will NOT be
thrown until the Connect call returns.
Correct. The thread is blocking on I/O, in unmanaged code. You can't end it,
and this is more or less by design. But you shouldn't be too dismayed,
because Thread.Abort() is a bad idea for the same reasons TerminateThread()
is. If a thread needs to end, it should be designed to have exit points
where the application state is known, and it can check a flag or issue a
wait on a user object at those points. Raising an exception in the middle of
anywhere is a good way of corrupting global state.
Another way you can do this is after the connect call is finished and you
start to talk to a server, if you are on a recive data call and the server
stops sending data but never closes the connection, it will block forever.
You will once again not be able to get Thread.Abort() to kill the locked up
thread.
Same thing.
Is there anyone, especially a MSVP who can answer this?

I'm not an MSVP but I've seen this so many times in our codebase that it's
not funny anymore. The one way to cancel pending I/O on a socket and unwedge
threads blocking on that is to close the socket from another thread and
handle the resulting exceptions. Nothing else will do, at least nothing that
can be called reliable. Of course, this means tearing down the connection,
but that's still a whole lot better than tearing down your process.

The other alternative, which is less straightforward but suits some designs
better, is to make sure that threads never issue I/O which can take
"forever". Almost every I/O call has a timeout parameter, and for those that
don't there's always asynchronous I/O and
ThreadPool.RegisterWaitForSingleObject(). When the call returns with a
timeout, either poll, decide to wait some more or give up and close the
socket, which you can then do from the same thread that owns the socket,
simplifying error handling.

I understand it's not your code, but trust me: you'll want to rewrite it
anyway, unless you can afford restarting your application every so often.
 
T

TheSilverHammer

Jeroen Mostert said:
I'm not an MSVP but I've seen this so many times in our codebase that it's
not funny anymore. The one way to cancel pending I/O on a socket and unwedge
threads blocking on that is to close the socket from another thread and
handle the resulting exceptions. Nothing else will do, at least nothing that
can be called reliable. Of course, this means tearing down the connection,
but that's still a whole lot better than tearing down your process.

The other alternative, which is less straightforward but suits some designs
better, is to make sure that threads never issue I/O which can take
"forever". Almost every I/O call has a timeout parameter, and for those that
don't there's always asynchronous I/O and
ThreadPool.RegisterWaitForSingleObject(). When the call returns with a
timeout, either poll, decide to wait some more or give up and close the
socket, which you can then do from the same thread that owns the socket,
simplifying error handling.

I understand it's not your code, but trust me: you'll want to rewrite it
anyway, unless you can afford restarting your application every so often.

Grrr.. Damm post thing asked me to login again and ate me post... Anyway...

The SharpSSH code base has a bunch of classes and would take a major effort
to re-write. It is clear that it is unfinished from looking at it. I am
not sure my company wants to fund me re-writing this code set.

However, the solution Peter Bromberg gave on his web site looks good except
for what appears to me to be a big hole or leak. I do not understand how C#
handles this, so maybe it is a non issue. The following is the code segment
from his web site, I hope he doesn't mind me posting it:

public ArrayList DoWorkNeedsTimeout(ArrayList alin, int secondsToWait)
{

ArrayList alOut = new ArrayList();

//Create an instance of our delegate, pointing to the helper
method:

DoWorkNeedsTimeoutDelegate deleg = new
DoWorkNeedsTimeoutDelegate(DoWorkWithTimeout);

// Call BeginInvoke on delegate.
// Note on last two parameters of Delegate BeginInvoke Method:
// 1) callback: not used here, we can pass null
// 2) state: not used, pass an instance of object in the required
parameter location
// Invoke the delegate passing the parameters and get the
IAsyncResult object in "ar":

IAsyncResult ar = deleg.BeginInvoke(alin, secondsToWait, null,
new object());

// if the WaitOne method times out before we get a result, it
will be false:
if (!ar.AsyncWaitHandle.WaitOne(5000, false))
{

// handle timeout logging / notification here - Syslog,
Database, Email - whatever you need
alOut.Add("TIMED OUT!");
}

else // we didn't time out:
{
// get the result of the method call here
alOut = deleg.EndInvoke(ar);
}

return alOut;

}

What he is doing is making a delegate to call BeginInvoke with and then
using the IAsyncResult to wait for a time peroid. If the time peroid
expires, then his thread continues on. If it doesn't expire, he calls
EndInvoke(). This looks good except for the issue of dealing with a truely
locked-up thread.

BeginInvoke() uses a thread from the thread-pool right? So what happens if
that thread never returns so you can call End-Invoke? Is it gone from the
thread-pool forever? If you repeat this look 1000s of times and even if 1%
of the time you get a locked up thread, won't you run out of threads?

The only way this can work indefinitly, which it may, is if the Garbage
collector will reclaim the thread once the delegate and other related objects
are out of scope. Is this how it works?
 
J

Jeroen Mostert

TheSilverHammer said:
Grrr.. Damm post thing asked me to login again and ate me post... Anyway...

The SharpSSH code base has a bunch of classes and would take a major effort
to re-write. It is clear that it is unfinished from looking at it. I am
not sure my company wants to fund me re-writing this code set.
SSH is widely implemented, though, and you will probably want a proven
implementation, given the security concerns. Delegating to a good unmanaged
library (if the interface isn't too horrible to P/Invoke to) may be a better
option. You can also consider using an ActiveX control: there's good support
for this in .NET, and standalone components were all the rage in the VB days
for a reason. Alternatively, use a standalone SSH application and pull its
strings from the managed application, which is an ugly but venerable hack.
Last but certainly not least -- write it in another language where you do
have a mature library at your beck and call.

..NET still suffers from the "everything old is new again" syndrome where
everyone is reinventing the wheel in the new languages, which under
circumstances can be a big waste of time and money. Just because you're now
using C# doesn't mean all your libraries have to be. I see my colleagues
falling into the same trap; one of them tried to "leverage" a Java library
by automatically converting it to C# and then ignoring the warnings. The
results were, as you can imagine, not pretty, and guess who got to fix the
crashes? Meanwhile, the Java applications continued to run just fine with
their "old" library and "legacy" code.
However, the solution Peter Bromberg gave on his web site looks good except
for what appears to me to be a big hole or leak. I do not understand how C#
handles this, so maybe it is a non issue. The following is the code segment
from his web site, I hope he doesn't mind me posting it:

public ArrayList DoWorkNeedsTimeout(ArrayList alin, int secondsToWait)
{

ArrayList alOut = new ArrayList();

//Create an instance of our delegate, pointing to the helper
method:

DoWorkNeedsTimeoutDelegate deleg = new
DoWorkNeedsTimeoutDelegate(DoWorkWithTimeout);

// Call BeginInvoke on delegate.
// Note on last two parameters of Delegate BeginInvoke Method:
// 1) callback: not used here, we can pass null
// 2) state: not used, pass an instance of object in the required
parameter location
// Invoke the delegate passing the parameters and get the
IAsyncResult object in "ar":

IAsyncResult ar = deleg.BeginInvoke(alin, secondsToWait, null,
new object());

// if the WaitOne method times out before we get a result, it
will be false:
if (!ar.AsyncWaitHandle.WaitOne(5000, false))
{

// handle timeout logging / notification here - Syslog,
Database, Email - whatever you need
alOut.Add("TIMED OUT!");
}

else // we didn't time out:
{
// get the result of the method call here
alOut = deleg.EndInvoke(ar);
}
This is wrong. Every call to .BeginInvoke() must have a corresponding call
to .EndInvoke(), to free up any resources that .BeginInvoke() set up. This
is irrespective of whether you've happened to hit a timeout waiting on the
async handle. People violate this rule all over the place, though, because
it seems to work, but even when it actually does work (because
..BeginInvoke() happens not to claim any additional resources) it's a bad
habit to get into. Don't believe me, believe the MSDN:
http://msdn2.microsoft.com/en-us/library/2e08f6yc(VS.80).aspx

Of course, our hands are forced here because .EndInvoke() would block until
the underlying method actually completed, but this just demonstrates why
this can't actually work. You're leaving a delegate call up in the air, but
forgetting about it isn't going to make it go away. (In this case, we can
easily fix things by passing a callback to the .BeginInvoke() that will call
..EndInvoke(), but it's all irrelevant anyway if the delegate never completes.)
return alOut;

}

What he is doing is making a delegate to call BeginInvoke with and then
using the IAsyncResult to wait for a time peroid. If the time peroid
expires, then his thread continues on. If it doesn't expire, he calls
EndInvoke(). This looks good except for the issue of dealing with a truely
locked-up thread.
Yes, exactly. Wrapping everything in another asynchronous invocation does
*nothing* for the blocking problem. What you create here is just a wrapper
that can indeed be abandoned at will, but this doesn't cancel the underlying
blocking method, it just tosses aside the delegate invocation.
BeginInvoke() uses a thread from the thread-pool right?
Yes.

So what happens if that thread never returns so you can call End-Invoke?

You can always call .EndInvoke(). It will just block until the delegate
completes.
Is it gone from the thread-pool forever?

Well, it's still a part of the thread pool, it just never becomes available
for other tasks again. So the number of available TP threads will steadily
decrease.
If you repeat this look 1000s of times and even if 1% of the time you get
a locked up thread, won't you run out of threads?
That's exactly what will happen, and it's easy to test. Use the above code
with a delegate that just does "for (;;) Thread.Sleep(10);" and observe.

This approach is only useful if you don't care that you can't abort an
action that goes on longer than your timeout, but you just need to log when
it does. It doesn't give you any magical ability to abort the action. The
action still needs to complete on its own eventually if you don't want to
run out of resources.
The only way this can work indefinitly, which it may, is if the Garbage
collector will reclaim the thread once the delegate and other related objects
are out of scope. Is this how it works?
No. If it worked that way, you could never have background threads unless
they were referenced by other threads. In a sense, a Thread object is always
"referenced" by the underlying thread. They're not collected until the
underlying thread exits, and if the underlying thread never exits, well,
that's too bad.
 
T

TheSilverHammer

So the basic lesson here is that a locked up thread is unrecoverable. The
only thing you can do about it is abandon the thread and move on. If you
have an application which is supposed to run persistently for days or weeks
at a time, it will have to be restarted to reclaim the resources.

In my case, unless I do major repairs on the SharpSSH class, I will have the
occasional unrecoverable threads.

This kind of stinks. I wonder if there was a way that MS could write a
thread that could be terminated safely. If you can do that with a process,
why can't you do it with a thread? Is there a way to create a process as a
thread that can be killed?
 
P

Peter Duniho

[...]
In my case, unless I do major repairs on the SharpSSH class, I will have
the
occasional unrecoverable threads.

Yup. One of the risks of using third-party code is that if the code sucks
(whether because it's poorly designed or just a work in progress), there's
not much you can do about it. At least in this case, it sounds like you
_could_ try to fix the library (I don't know anything about the library,
so I'm just taking that from your comments).
This kind of stinks. I wonder if there was a way that MS could write a
thread that could be terminated safely. If you can do that with a
process,
why can't you do it with a thread?

You can't really do it with a process either.

This isn't something that Microsoft can really solve. The lack of safety
has to do with what the code executing in the thread or process is doing,
and in particular the inability for someone outside the code to know for
sure what that is. It is possible to write code that, if interrupted
unexpectedly, leaves things in an indeterminate state.

If you are the one writing the code executing on a thread, there are some
situations in which you could know that aborting the code is safe. But if
you're the one writing the code, there's no need to do so. You can just
design the code correctly, so that it's abortable in a well-defined way
instead.

If you're not the one writing the code, then you don't know whether the
nature of the code is such that it's safe to abort at some arbitrary point
of execution. Thus, it's not safe to do. But there's not really any
practical way for Microsoft to change that. It's not about how the OS
manages the thread, it's about the fact that code executing in a thread
could be doing _anything_.

Pete
 
J

Jeroen Mostert

TheSilverHammer said:
So the basic lesson here is that a locked up thread is unrecoverable. The
only thing you can do about it is abandon the thread and move on.

Well, I'd phrase it differently: threads must never lock up *because*
there's no acceptable way to deal with them. If you've got a thread that
could block forever, you've got a bug, simple as that. You have to get an
answer if you ask "so what guarantees that this wait here will be satisfied
eventually?" and if the answer is "the kindness of strangers", you lose.
If you have an application which is supposed to run persistently for days
or weeks at a time, it will have to be restarted to reclaim the
resources.

And that's assuming the application will clean up everything when it stops.
The OS will guarantee that most resources are released, but that's not the
same thing as exiting cleanly (an open file will be closed, for example, but
what's *in* the file when it is?)
This kind of stinks. I wonder if there was a way that MS could write a
thread that could be terminated safely. If you can do that with a process,
why can't you do it with a thread? Is there a way to create a process as a
thread that can be killed?

You can't terminate a process safely either! The keyword here is "safely".
The best thing that happens when you kill off a process is that the OS will
reclaim the resources it associated with that process -- forcibly. For
memory, this doesn't matter; for a socket, this means a connection reset;
for a file, it's probably data loss. This is nothing to get enthusiastic
about, even if it's a good step up from crashing the computer.

Terminating a thread means your application state is hosed. There's nothing
the OS can do to make this "safe", since it knows diddly about your
application's internal state. It can't even track OS resources for every
thread to release them, because there's no notion of ownership beyond the
process. Threads share the process state, including any resources, so just
releasing anything a terminated thread allocated would be wrong.
 
T

TheSilverHammer

If they can do it with processes, why can't they do it with threads?

I am sure they can't guarantee that everything will be fine if my code
doesn't anticipate a resources disappearing, but if I do, I should be able to
do it safely.

For example:

I have a MyThread and then I have the thread procedure which opens a bunch
of files, sockets, and all that. If MyThread is killed, the OS can recover
all that stuff. If MyApplication is the one calling the ThreadKill, then
windows should say, "OK, well you made it, so if you want to kill it, you
must know what you are doing."

If in my thread I do something like:

MyList = new List<string>;

And then when I kill the thread, windows says the List was created in the
thread and therefor will be nuked, it is my problem. I could write my app in
such a way that I know where stuff was allocated so that I could expect
MyList to go away. The CLR could go as far as making any references to
MyList null or just throwing an exception of I try and use it (besides
assigning it a new value).

All a thread has to be is a bag of 'stuff' and if it goes bad, toss it all
out, and as long as there are 'rules' which I can expect to follow, I could
deal with it. They only need one simple rule: If it was opened, allocated,
created in a thread, when the thread is killed (not exits) then it would be
Closed, freed, destroyed, etc...


Having said all that, I understand the sentiment about writing good code and
how none of this is necessary. Unfortunately, that is a 'if the world were
perfect...' point of view in an imperfect world.

In this particular case, I need SSH, which for some reason Microsoft doesn't
seem to see fit as being a core protocol for C# (or .NET in general). I
suggested this on the community sites, and got a 'resolved' and 'won't fix'
with no reasons supplied. The only valid reason I can think of is because
SSH support is in the works, however after much googling I can't find any
hint about official MS SSH support. With their big security push, and SSH
being a cornerstone in network security management, this makes absolutely no
sense. Maybe they are waiting until the security crowd starts beating them
with a stick and hail it as yet another reason to use Linux. How long would
it take a few of MS well trained developers to put out a great SSH suite for
..NET? Ignoring the bureaucracy, it should only take a few actual weeks of
development time.

This leaves me with a choice of writing my own implementation or using some
other library. My employer is not going to want me to spend several weeks
to write my own or fix this SharpSSH library. Personally, I wouldn't mind,
but really, I have a lot to do.

Considering we are living in an imperfect world, we should try to be
accommodating. Yes, the right thing is to NOT screw something up, but it
WILL happen. The proper thing isn't to stand around and talk about how it
should have been done right, and if it was all your problems would go away.

Microsoft's job on this kind of issue is to make life as a programmer as
easy as possible. I will grant you that compared to OS X and Linux stuff,
Microsoft is a rock-star, but in a more absolute sense there is a lot they
could do much, much better.

For example, the current issue, Locked up threads. Granted a good program
will never have this problem, but a realistic response outlook would be that
we have to deal with 'bad' things. A better approach would be for MS to
figure out a way to create a thread and provide some kind of emergency
recovery system. You could make it a special kind of thread used to run
unsafe stuff and the architecture will save you from what is in the thread if
worst comes to worst. It would be like a container for uranium. You have
to use it, and you hope nothing goes wrong, but if it does, it is contained.

Another way (not to drag this rant any longer) to look at this is to look
back in the days where there was no memory protection for applications. One
rogue application could bring the entire system down. To take today's
outlook on threads and apply it to that, it would be the same thing as simply
saying, "Clearly the solution to rogue applications is to not run rogue
applications." Ignoring the fact that AwsomeApp.exe is the ONLY app that
does what you need.

No, I do not expect anyone here to be able to do anything about this. I do
not know, and would doubt, that any MS big-wigs (ones with enough power to
actually do something) read this kind of stuff and would care enough to do
anything about it.

Having said all that, the squeaky wheel gets the kick, so griping about
issues like this might instill even more griping until "The powers that be"
at MS can't stand it anymore and decide to do something.

Anyway, to all who have helped me, thanks. I would like suggest to Peter
Bromberg that he put a warning for the solution he purposed, or in fact
remove it. he solution leaves bound up threads and resources, and if an
application repeats that more then 50 times, it will cease working until it
is restarted. It is OK for a program that isn't going to iterate over that
more then a few times, but it is a death trap for anything that does.
 
P

Peter Duniho

If they can do it with processes, why can't they do it with threads?

Can do what with processes? We've already explained that you can't safely
terminate a process any more than you can safely abort a thread.
I am sure they can't guarantee that everything will be fine if my code
doesn't anticipate a resources disappearing, but if I do, I should be
able to
do it safely.

It's not an issue of resources "disappearing". It's an issue of them
being left in an inconsistent state.

There is no way for the _operating system_ to ensure that things are left
in an inconsistent state. Implementors of various data structures can do
things to make sure they are always in a consistent state (e.g. see
"journaled" or "transaction-based"), but that's up to the implementor.
The OS has no way to do this (though it might provide APIs to help an
implementor do it).
For example:

I have a MyThread and then I have the thread procedure which opens a
bunch
of files, sockets, and all that. If MyThread is killed, the OS can
recover
all that stuff.

No, it can't. All data within a process is owned by the _process_, unless
it's been specifically marked as thread data (*). The OS has no way to
know whether killing a thread allows that data to be cleaned up or not.

(*) (I'm not sure .NET supports this or not, but is supported in the
unmanaged Windows API...I'm seeing a Thread.AllocateDataSlot() method, and
I suspect this addresses the same issue in managed code. In any case,
note that it only addresses specific thread-local storage, not the OS
objects that might be referenced by that storage, as those are still
per-process and cannot be released with the thread terminates).

But even if it did have a way to know what data could be cleaned up,
_that's not the problem_. Cleaning things up is the least of the
worries. It's the fact that software _does_ stuff, and if it's
interrupted in the middle of _doing_ that stuff, whatever data the
software is operating on could be in an inconsistent state.

Most of your rant seems to be about this question of cleaning up, but
that's not the main problem. That's not what makes killing threads or
processes unsafe, and coming up with a paradigm in which you can ensure
things are cleaned up would _not_ make killing threads or processes a safe
operation.

As far as your specific problem goes, there's no point in complaining that
SSH isn't supported in .NET (assuming it's not...I know .NET does have a
lot of crypto stuff in it, and it's possible that you could easily write
an SSH implementation just by combining that with the usual network i/o
stuff). .NET can't possibly implement _everything_, even as with each
iteration it does support more and more.

If a specific library isn't doing what you need or want, you can either
find a different library or write it yourself. Programmers all over the
world make these kinds of decisions every day, and it's just not a big
deal. Note that you are not limited to using a managed code library.
With p/invoke you should be able to use pretty much whatever library you
find useful.

I will point out that your assertion that Microsoft could publish an SSH
library "in a few weeks time" is absurd. No reputable software publishing
company does _anything_ "in a few weeks time". It would take _way_ more
than a few weeks just to properly _test_ such a library, never mind
implement it correctly. Granted I have very little specific knowledge of
SSH, but I would guess that it would take at least three staff members
(programmer, tester, and a program manager to manage the specification for
the feature) something like 6-12 months, for a potential cost of up to
three man-years.

Even if it _were_ just a few weeks worth of work, it boggles my mind that
you would on the one hand say that Microsoft should do this work, and on
the other hand write "My employer is not going to want me to spend several
weeks to write my own". Don't you think Microsoft already has their own
things they are trying to get done? Surely if this is an important enough
feature for your need to justify them implementing it, it's important
enough to justify _you_ doing whatever work is needed on your own to get
it into your product.

Maybe it will get into .NET eventually, maybe it won't. But making
fanciful claims about how easy it would be to implement doesn't help your
case any. If it's really that easy, write it yourself.

And please keep in mind that designing and implementing an operating
system is a lot harder than you seem to think it is. I think it's safe to
say that if dealing with hung threads were really as easy as you claim it
is, Windows and every other OS would already do it. But there's not a
single OS I can think of off the top of my head that can allow a thread or
process to be safely terminated without the risk of causing data integrity
problems.

Pete
 
J

Jeroen Mostert

TheSilverHammer said:
If they can do it with processes, why can't they do it with threads?
It's more a case of "THREADS DON'T WORK THAT WAY!" rather than "can't be done".

A thread's supposed to be lightweight; a simple means of achieving
multiprocessing. If you follow the reliability angle through and add
resource tracking and whatnot you end up with a thread that's basically just
as fat as a process. A thread's not supposed to be isolated from anything;
that's not their purpose.

What you're looking for actually has less to do with threads and more with
isolating components (which may or may not be using separate threads) from
each other's failures. But here "failure" has to be defined so generally as
to make any form of isolation lower than process level well nigh useless.
If in my thread I do something like:

MyList = new List<string>;

And then when I kill the thread, windows says the List was created in the
thread and therefor will be nuked, it is my problem. I could write my app in
such a way that I know where stuff was allocated so that I could expect
MyList to go away. The CLR could go as far as making any references to
MyList null or just throwing an exception of I try and use it (besides
assigning it a new value).
But what's the point?

If you are in a position to terminate the thread properly, you're also in a
position to know what resources should be thrown away. So why don't you do
that, instead of demanding that the CLR save your bacon at a considerable
(and in 99% of the cases, unnecessary) overhead?

Now, if you're using someone else's component, you don't know what resources
they're squirreling away, so you could say that's an argument in favor of
CLR tracking. But hang on a moment -- how do you know what threads the
misbehaving component is using, and how do you select the one that's
blocking in a way you don't want it to for termination? If you can dig deep
enough to figure that out, can't you also figure out what resources it's
abusing and dispose of them?

Indefinitely blocking threads are such a huge pain in the ass because
recognizing when a thread is never going to do something meaningful again is
in theory equivalent to the halting problem and in practice not actually
that much easier. It's like asking the OS for an infinite loop detector. It
could try, but it'd run into unsolvable cases pretty soon.
Having said all that, I understand the sentiment about writing good code and
how none of this is necessary. Unfortunately, that is a 'if the world were
perfect...' point of view in an imperfect world.
If the world were perfect, the operating system and the runtime would join
hands to ensure that nothing you ever did could cause state corruption, and
every error condition was recoverable. But since that's a theoretical
impossibility, they have to settle somewhere before that. Threads were never
meant to be an aid in this. They're actually more like aggravating factors.

The process is the one edge where they can reasonably isolate the rest of
the system from most of the impact of failure. And even that fails when
processes are cooperating to get something done. Try killing off "csrss.exe"
sometime. If you succeed, it's rebooting time, baby. Your other processes
will be just as doomed.
In this particular case, I need SSH, which for some reason Microsoft doesn't
seem to see fit as being a core protocol for C# (or .NET in general).

Hey, they have to give third-party developers *some* chance at a living,
don't they? :)
I suggested this on the community sites, and got a 'resolved' and 'won't
fix' with no reasons supplied. The only valid reason I can think of is
because SSH support is in the works, however after much googling I can't
find any hint about official MS SSH support. With their big security
push, and SSH being a cornerstone in network security management, this
makes absolutely no sense.

Windows has no native (read: Microsoft-supplied) SSH services. That's the
most obvious reason I can think of. .NET heavily focuses on making all of
Windows available through the managed API, but it doesn't go out of its way
to support stuff that isn't ubiquitous on Windows already. And SSH isn't
ubiquitous on Windows -- RDP over VPN is much more common. I say this
without offering judgement on how things are or should be.
Maybe they are waiting until the security crowd starts beating them with
a stick and hail it as yet another reason to use Linux. How long would it
take a few of MS well trained developers to put out a great SSH suite for
.NET? Ignoring the bureaucracy, it should only take a few actual weeks
of development time.
It's not a case of "MS has so much resources, they could do this". Because
every developer and his janitor has a feature they clamor for this way ("why
isn't this just in the base classes so I don't have to think about it
anymore?") It's a big win for the developers, but it has to be a win for
Microsoft too. If there's not enough business incentive for Microsoft to
develop, distribute and support it then they won't do it. Simple as that.

It's weird how in the Unix world everyone cheers when a third-party
developer brings out Yet Another implementation of a well-known protocol,
but how in the Windows world the developers are looking over at Microsoft
expectantly to build everything they need and give it to them. It's true
that Microsoft plays a big role in encouraging this attitude, but still.
This leaves me with a choice of writing my own implementation or using some
other library. My employer is not going to want me to spend several weeks
to write my own or fix this SharpSSH library. Personally, I wouldn't mind,
but really, I have a lot to do.
I just googled ".NET SSH". You don't want to know how many hits I got (and
some of them relevant, even!) What made SharpSSH the monopolist? What about
my suggestion of using an ActiveX control? Is it just a case of not wanting
or being able to spend any money? You get what you pay for...

If you're waiting for MS to turn into a charity and do the things your
company doesn't have the time or money for, then don't forget to pick up a
lottery ticket every day, because you're sure to win in the meantime. Say hi
to your competitors for me.
Considering we are living in an imperfect world, we should try to be
accommodating. Yes, the right thing is to NOT screw something up, but it
WILL happen. The proper thing isn't to stand around and talk about how it
should have been done right, and if it was all your problems would go away.
You're absolutely right. The proper thing is not to stand around and talk
about it but to *do* things right. There has to be a point, somewhere, where
you have to stop talking about general stopgaps and have to get down to
where the actual problem is, because stopgaps only go so far. The OS can't
fix problems with hung threads for you. It already allows you to kill them
off Completely Dead through TerminateThread() if you really think you know
what you're doing. (You probably don't, which is why it's so dangerous.)
That is not fixing the problems, though. And releasing all resources we
somehow deem "belonging" to that thread still isn't fixing the problems.

Tacking on a tracking system for releasing resources is just not a
cost-effective tradeoff. For most applications, the problem will *not* be in
releasing the resources, it's in the fact that whatever they're doing is
going completely wrong. Some applications might just be able to continue
without any problem if the particular action the thread was working on fails
spectacularly, but most will not. They're more likely to grind to a halt. If
you're killing off a thread, you'll probably be killing off your process soon.
Microsoft's job on this kind of issue is to make life as a programmer as
easy as possible. I will grant you that compared to OS X and Linux stuff,
Microsoft is a rock-star, but in a more absolute sense there is a lot they
could do much, much better.
I really have to disagree, at least on this particular issue. You're asking
for the impossible. They can give you the Big Red Emergency Button, and it's
already present in the form of .Abort, and if that doesn't work
TerminateThread(). But you want that button to magically keep your
application in serviceable condition as it's killing off an integral part of
it, and that can't be done.
For example, the current issue, Locked up threads. Granted a good program
will never have this problem, but a realistic response outlook would be that
we have to deal with 'bad' things. A better approach would be for MS to
figure out a way to create a thread and provide some kind of emergency
recovery system.

TerminateThread() *will* get rid of the thread. But the only one who can
"recover" is you. And if the component that failed you is a black box to
you, you're just as sunk as the OS would be.
You could make it a special kind of thread used to run unsafe stuff and
the architecture will save you from what is in the thread if worst comes
to worst. It would be like a container for uranium. You have to use
it, and you hope nothing goes wrong, but if it does, it is contained.
Uranium is easy. That's just radiation. Threads can do *anything*. And most
of the time they're *cooperating* with other threads to get things done.
Good luck automagically containing things.
Another way (not to drag this rant any longer) to look at this is to look
back in the days where there was no memory protection for applications.
One rogue application could bring the entire system down. To take today's
outlook on threads and apply it to that, it would be the same thing as
simply saying, "Clearly the solution to rogue applications is to not run
rogue applications." Ignoring the fact that AwsomeApp.exe is the ONLY app
that does what you need.
See above for the whole "the buck stops somewhere" point. If you want this
protection (and it's indeed a good thing the OS has this), then by all
means, isolate the failing component in a process. The OS can guarantee that
it will at least keep your main process safe from wrongdoings as far as
internal state goes (the failing app might still have corrupted your drive
or something annoying like that, but you stand a good chance).

But that's the thing: that's what *processes* are for. Processes only
started working that way when the OS said they did: before that, processes
could exchange memory directly, as ugly and error-prone as that was. Then
the OS said: "No, stop that -- processes are isolated, and if you want to
cooperate, do it explicitly". But threads are not for isolation and they
never were, they're for integration! They're "lightweight processes", where
"lightweight" means "fast because I do the least amount of work possible to
manage them, they're all yours".

Your argument simply doesn't hold water for threads: it's impossible for
thread X to be "the only thread that does what you need". The thread is just
a way to achieve parallel execution! It's not some sort of isolation box for
computations that aren't under your control. What you want is to isolate
*components*, not threads. Unfortunately, most components can't meaningfully
be isolated, since they have to be able to do anything.
 
B

Ben Voigt [C++ MVP]

Peter Duniho said:
Can do what with processes? We've already explained that you can't safely
terminate a process any more than you can safely abort a thread.

Sure you can. Ok, maybe not an arbitrary process, but it's fairly easy
(depending on what resources are required by your requirements) to design a
process that can be terminated at any point in time. It's even easier to
manage exiting your own process, even with hung threads. Theoretically you
can also create a thread that can be safely terminated, but... not with
..NET. .NET holds internal state and accesses it willy-nilly from any
threads in a way that's threadsafe but not abort safe. However, .NET
doesn't implement any external state on its own, only what you ask it to, so
you can manage your external resources in such a way that it's ok for the
process to be interrupted (for example, instead of writing data files that
could be left inconsistent, store your data in an ACID database using
transactions).
It's not an issue of resources "disappearing". It's an issue of them
being left in an inconsistent state.

There is no way for the _operating system_ to ensure that things are left
in an inconsistent state. Implementors of various data structures can do
things to make sure they are always in a consistent state (e.g. see
"journaled" or "transaction-based"), but that's up to the implementor.
The OS has no way to do this (though it might provide APIs to help an
implementor do it).

Yup, and the problem is that the .NET implementation uses hidden
process-local resources without doing any of this, so no matter what code
you tag on top, calling TerminateThread is gonna crash the process.
No, it can't. All data within a process is owned by the _process_, unless
it's been specifically marked as thread data (*). The OS has no way to
know whether killing a thread allows that data to be cleaned up or not.

(*) (I'm not sure .NET supports this or not, but is supported in the
unmanaged Windows API...I'm seeing a Thread.AllocateDataSlot() method, and
I suspect this addresses the same issue in managed code. In any case,
note that it only addresses specific thread-local storage, not the OS
objects that might be referenced by that storage, as those are still
per-process and cannot be released with the thread terminates).

But even if it did have a way to know what data could be cleaned up,
_that's not the problem_. Cleaning things up is the least of the
worries. It's the fact that software _does_ stuff, and if it's
interrupted in the middle of _doing_ that stuff, whatever data the
software is operating on could be in an inconsistent state.

Most of your rant seems to be about this question of cleaning up, but
that's not the main problem. That's not what makes killing threads or
processes unsafe, and coming up with a paradigm in which you can ensure
things are cleaned up would _not_ make killing threads or processes a safe
operation.

One reasonable approach, as long as this SharpSSH library doesn't use any
external resources except sockets, would be to put that component in a
separate process, communicate back and forth with Remoting, and provide at
least one call that causes said process to free any shared resources and
then call ExitProcess (.NET Application.Exit?) to free the hung thread(s).
 
W

Willy Denoyette [MVP]

Ben Voigt said:
Sure you can. Ok, maybe not an arbitrary process, but it's fairly easy
(depending on what resources are required by your requirements) to design
a process that can be terminated at any point in time. It's even easier
to manage exiting your own process, even with hung threads. Theoretically
you can also create a thread that can be safely terminated, but... not
with .NET.

Terminating a thread using TerminateThread is safe as long as you know
exactly what the thread is doing at the moment the OS kills the thread, this
is exactly what's impossible to know when calling into arbitrary code.
Whenever your thread runs arbitrary code (third party or not) you can't
safely terminate the thread, because you don't have an idea what the thread
is doing, this has nothing to do with .NET, this is about Windows.
Run a simple native code program and terminate a thread (using
TerminateThread Win32 API) while he's allocating memory from the heap, all
successive heap alloc's or heap releases from other threads will now block
forever.
Or terminate a thread while he's executing in a critical section, this CS
will never get released (well, actually when the process terminates),
another thread that tries to enter the CS will deadlock....

Willy.
 
T

TheSilverHammer

Maybe you are all right about making a safe thread that can be killed the way
processes can be to recover resources isn't possible. If you are right, I
have no idea why beyond it is 'logistically' impossible, not actually
impossible.

BTW you can't use ThreadKill() to kill a C# thread (be it from the thread
pool or Thread class) because there is no way to match the Thread ID with the
OS Thread ID. The documentation also says that a thread created with the
Thread Class might be used for multiple things behind the scenes.

So I have been putting as much Duct Tape on SharpSSH as I can and hoping to
catch the lockups, which is very hard since I can't reproduce them easily.
As far as googling SSH for .NET, I am sure you did find quite a few
solutions. Expensive, commercial solutions. Maybe large companies do not
have an issue paying for such things, but the smaller ones I work at are very
cheap. Do you know how long it took me to get them to upgrade just TWO
machines from VS 6.0 to VS 2005? It was like a 2 year long campaign of
pestering. Eventually, with Vista on the horizon, I had invent an
unresolvable problem that forced the issue. So yeah, there are other C# SSH
solutions. Really the point wasn't so much about that, but locked threads.

The simple answer with regard to recovering a locked thread is: You can't.
Not, "You can't safely". No, you simply can't. End of Story. Game Over.
Thank you for playing.

Clearly the big issue is / was figuring out why a thread was locking. Even
that was very difficult because the lockup would only occur sometime at night
when no one was around, and in the morning when my App was seized up, even
Dev Studio could not 'break' the App so I could see what was going on with
the threads. If I did try and 'break' it, Dev Studio would lock up until I
used Task Manager to kill my app, and then Dev studio would say it could not
interrupt the Application.

Whomever suggest I use another thread to close the Shell object, I would
like to thank. That works although it causes a lot of exceptions and
crashes. At least I have a working point and the thread is no longer seized
up.
 
B

Ben Voigt [C++ MVP]

Willy Denoyette said:
Terminating a thread using TerminateThread is safe as long as you know
exactly what the thread is doing at the moment the OS kills the thread,
this is exactly what's impossible to know when calling into arbitrary
code.
Whenever your thread runs arbitrary code (third party or not) you can't
safely terminate the thread, because you don't have an idea what the
thread is doing, this has nothing to do with .NET, this is about Windows.
Run a simple native code program and terminate a thread (using
TerminateThread Win32 API) while he's allocating memory from the heap, all
successive heap alloc's or heap releases from other threads will now block
forever.

Only if it's using a shared heap...
Or terminate a thread while he's executing in a critical section, this CS
will never get released (well, actually when the process terminates),
another thread that tries to enter the CS will deadlock....

But you can use a kernel mutex instead, then it'll be marked as abandoned
and you can recover.

My point was that .NET in particular does a bunch of stuff that is not abort
safe. This is far from saying that .NET is the only library that isn't
abort safe, but there is nothing inherently unsafe about Win32 itself.
 
B

Ben Voigt [C++ MVP]

TheSilverHammer said:
Maybe you are all right about making a safe thread that can be killed the
way
processes can be to recover resources isn't possible. If you are right,
I
have no idea why beyond it is 'logistically' impossible, not actually
impossible.

BTW you can't use ThreadKill() to kill a C# thread (be it from the thread
pool or Thread class) because there is no way to match the Thread ID with
the
OS Thread ID. The documentation also says that a thread created with the
Thread Class might be used for multiple things behind the scenes.

So I have been putting as much Duct Tape on SharpSSH as I can and hoping
to
catch the lockups, which is very hard since I can't reproduce them easily.
As far as googling SSH for .NET, I am sure you did find quite a few
solutions. Expensive, commercial solutions. Maybe large companies do
not
have an issue paying for such things, but the smaller ones I work at are
very
cheap. Do you know how long it took me to get them to upgrade just TWO
machines from VS 6.0 to VS 2005? It was like a 2 year long campaign of
pestering. Eventually, with Vista on the horizon, I had invent an
unresolvable problem that forced the issue. So yeah, there are other C#
SSH
solutions. Really the point wasn't so much about that, but locked
threads.

The simple answer with regard to recovering a locked thread is: You
can't.
Not, "You can't safely". No, you simply can't. End of Story. Game
Over.
Thank you for playing.

Ah, well, you asked a slightly different question.

How do you kill a locked thread? You can't safely.
How do you recover a locked thread in .NET? You can't, period.
Clearly the big issue is / was figuring out why a thread was locking.
Even
that was very difficult because the lockup would only occur sometime at
night
when no one was around, and in the morning when my App was seized up, even
Dev Studio could not 'break' the App so I could see what was going on with
the threads. If I did try and 'break' it, Dev Studio would lock up until
I
used Task Manager to kill my app, and then Dev studio would say it could
not
interrupt the Application.

Whomever suggest I use another thread to close the Shell object, I would
like to thank. That works although it causes a lot of exceptions and
crashes. At least I have a working point and the thread is no longer
seized
up.

You're welcome. Win32 APIs are designed not to force you into a totally
unrecoverable state.

I suspect if you had used "native-only debugging" you might have had less
problems attaching with the debugger.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top