Active Directory CRASHED!! But why??

D

Dvord Direwood

We had an AD crash which we couldn't recover from last month. We
couldn't recover because the consultant who installed the server did not
provide us with the password we would need to get in and repair it from
an alternate boot option.

As you can imagine, it's quite frustrating knowing that this happened,
yet we have no logs, or anything else in which to find out what really
happened.

In retrospect we could have spent more time trying to recover the logs,
but this was a production SBS2000 server, and we couldn't spend the
time, so we reinstalled and restored from tape.

Now that the dust has settled, I'm getting asked "why" a lot and I don't
have many answers.

Here are some of the events which I think might be related either
directly or indirectly, and I'm hoping that some of the more
knowledgable out there might have some better ideas as to what
happened...

5 weeks before the crash, we installed 2gb more memory in our server to
bring it up to 4gb to help with diminished performance from this SBS2000
server running all SBS services (ISA, ISS, F&P, Exch2k, SAV, etc.)

4 weeks before the crash, Exchange started having problems with
fragmented virtual memory. I began restarting the IS service when
warned about it. I was getting warned about once a week.

About 2 weeks before the crash, the netsky virus got past our defenses
and played some havoc with a few of the directories which are shared
out. That was the only virus found, and it was promptly scoured off.

The day of the crash, about 5 hours before, I created a group policy to
start running Windows Automatic Updates for all connecting clients.

Just 10 minutes before the crash, Exchange 2000 services stopped saying
there was a problem with Active Directory and the server would need to
be rebooted (should have left it running, it never came back after
that).

2 days after everything was restored, during a scheduled reboot, the
array controller (we're running raid 5) said there was a problem with
one of the drives. I powered the server off, then on, and no errors
were reported.

2 weeks after that, one of the drives died.

We still have virtual memory fragmentation issues which bring up error
messages in the app. log about once every 30 hours.

I finally got some support from management to spread the load off this
one server, but the question remains: why did AD crash? Was it because
of impending hardware failure of the disk? Was it the afteraffects of
some virus? Could some other virus have done this? Did the Exchange
virtual memory fragmentation lead to it? Or do I just have bad luck and
need to find another line of work?? LOL

Thanks for any help you can send me!
 
T

Tim Kowal

I had a similar kind of situation and I can really sympathyze, right down to
considering finding a new line of work!

My setup had two ADs, and my problem probably came down to something like a
bad replication, or maybe unhealthy DNS, even though I can't think of a
single thing I could have done wrong. Bottom line is I had been getting DNS
errors for a long time, but I am just immune to them DNS generates so many
damn errors even in healthy operation, so how are you supposed to feel
genuinely concerned about them?

In the end, I decided that I didn't really do anything wrong, except that in
some cases, when things start to go south, you just really need to know the
filthy innards of the AD system, either how to repair them, or how to
restore them from some kind of backup. But if you didn't do anything wrong,
that is, your system just became tempermental, how can you be reasonably
assured that these measures will fix it?

No, it seems that when AD goes south, you have to reinstall. I've got one
network I'm supporting, and AD is doing weird things, and i'm just praying
it will hold out until i can get in there with a new server and replicate
and take it down. But will the errors come over in the replication? Who
knows. AD is probably an ingenius contraption for large enterprises. But it
is just too overly-complicated for small networks with only one or two
servers.

Tim
 
D

Dvord Direwood

Tim,

Thanks for the reply. I'm glad I'm not losing my mind here. When I
worked with NDS, it was SOOOO much smoother. I could even restore
branches from the directory tree from tape. AD has left a bad taste in
my mouth.

It's worth noting too, that NDS has really suffered in the past few
years too. With NW4.11, NDS was fantastic. We had a global tree
replicated at well over 100 sites across the world. It was completely
flawless.

Now, I have one DC, an SBS2000, and AD gives me fits with 40 users.
There's really something wrong with that.

Ah well. Thanks again!!!
 
C

Cary Shultz [A.D. MVP]

Dvord,

Not sure why you are experiencing such problems. I take care of some eight
or nine SBS2000 environments where there are no problems whatsoever plus an
additional five or six WIN2000 environments where there are no problems
whatsoever...

Well, no problems with Active Directory.

Your problems seem to be more hardware related....and when a virus hits
there are often problems after-the-fact.

Not having the password to get into DSRM should not have prevented you
necessarily from getting in....

A well planned and implemented Active Directory does not act
temperamentally - typically! ;-)

I might say that you inherited a less-than-optimal AD environment.

Cary
 
D

Dvord Direwood

Cary,

Thanks! I'm inclined to agree with my inheritance. This particular
SBS2000 environment really gave me my doubts at first. ISA, IIS,
Exchange, Antivirus, filesharing and print services are all on this one
box. There's some 45 users with everything from P5/100 running 98 to
2.6ghz XP machines. The filesharing load is high, lots of
graphic-intensive files, tiffs and the like. Complicating things is the
existance of this antiquated NT4 box on a P2/333 128MB which does a
little filesharing, and provides print services for one Mac in this
environment as well as for some of the 98 machines.

I do have to say that I really don't have a slew of problems with AD, I
think it runs fine for the most part. In fact, except for our crash, I
never had a lick of problems, and I think that's what has given me the
greatest concern. No warning. Just POW, we're done.

Regarding not having the password to get into DSRM: Well if you know a
way to do it, you might wanna let MS know because I spent just 15 mins
on the phone with them and they said it couldn't be done. Period. So,
the good news is I got my $245 back, but the bad news was rebuilding the
server. I'd hate to think there was a solution, however I would love to
hear what options you are considering.

I am happy to say that I got the green light to build two more
server-class machines, and I'm going to split up the load so we just
have file/print on one box, exchange, web and ftp on another, and ISA on
the last one. I'm hoping that will provide us the accessibility we need
as well as provide for the general health of the environment.

Thanks!!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top