Help to evaluate the system crash resistance method

D

darkside

Hi,

I just thought a approach to hold system crash caused by those tiny SW
problems, please kindly help to evaluate:

1. Hacking the IDT, replace those exception vector like memory
violation,divided by zero with our handling logic.

2. At our handling logic, if the IRQL is above dispatch, transfer the
control to original OS handler, which will show blue screen at last; but if
not, use KeDelayExecutionThreadexction() to hold the problem thread for a
while, then kernel will re-schedule to other threads.In this case, system
still alive instead of going to blue screen.

I've been dedicated to fixing kernel bugs for several years, feel very pity
to see many times the system dead just because of a tiny driver problem.
Would think to develop a kernel piece that can help this...It NOT targets
for helping all the system crash cases - I'm aware of many crash cases are
so severe that it is no use even if you can hold it for a while, it targets
for those SW problems like DBZ, memory violate etc...each of them has a
seperate item at IDT which can be selectively replaced.

My questions here are:
1. Is there any formal way for us to get the IDT address and selectively
replace some of IDT items?
2. How long can the KeDelayExecutionThreadexction() hold the problem thread
in practice?
3. Will the overall mechnism work when driver code raises a kernel crash?

Thank you!

TR
 
D

Don Burn

As some one who has worked in the fault tolerant part of the computer
industry, the problem is a lot harder than you imagine. I know of a number
of companies working on potential solutions (I am a founder of one them),
but you are not going to see discussions here of their technology, you won't
get that without an NDA.

What I can say is you either have to wrap a layer of protection around the
whole driver (such as moving it into its own address space) or provide a way
to capture enough system state to go back before the crash and do something
to avert it. Neither of these is a small task such as tweaking an IDT
member, and neither can easily be explained in a newsgroup.
 
D

darkside

I don't know what's your mean of "such as moving it into its own address
space"(moving which into whose address space?), can you explain why?

Regarding to the system state, I think the processor should reserve most of
them if not all, it is a standard kernel exception handling mechnism, the
processor and OS know which things they should keep in mind...
 
D

Don Burn

See my comments on microsoft.public.development.device.drivers posting the
same question independantly under multiple groups is not a nice idea.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top