Nightmare on Active Directory Street

P

Phillip Remaker

I just blew a whole day on this problem, incurred a domain downtime of
12 hours and hope someone can learn from my pain. I didn't see anyone
who had this problem exactly.

First off, I know I am a dumbass. No need to tell me. When I started
I had one DC (still have only one, can't replicate... another issue)
and I didn't have a system state backup.

So I repeatedly got this message in my event log:


Event Type: Error
Event Source: NtFrs
Event Category: None
Event ID: 13552
Date: 12/5/2003
Time: 9:38:32 AM
User: N/A
Computer: ACS_PRINT
Description:
The File Replication Service is unable to add this computer to the
following replica set:
"DOMAIN SYSTEM VOLUME (SYSVOL SHARE)"

This could be caused by a number of problems such as:
-- an invalid root path,
-- a missing directory,
-- a missing disk volume,
-- a file system on the volume that does not support NTFS 5.0

The information below may help to resolve the problem:
Computer DNS name is "acs_print.mhcs.a-cs.org"
Replica set member name is "ACS_PRINT"
Replica set root path is "c:\winnt\sysvol\domain"
Replica staging directory path is "c:\winnt\sysvol\staging\domain"
Replica working directory path is "c:\winnt\ntfrs\jet"
Windows error status code is
FRS error status code is FrsErrorJournalInitFailed

Other event log messages may also help determine the problem. Correct
the problem and the service will attempt to restart replication
automatically at a later time.

---------

OK, clear as mud. But I zoomed in on FrsErrorJournalInitFailed.

http://support.microsoft.com/?id=819268 was my answer, and I set the
128M registry key, the colossally long "Ntfs Journal size in MB" (who
writes this crap?) but it did not solve the problem. I only had 400M
on the volume, so that seemed like a reasonable solution. No matter,
lets press on.

Somehow, I thought (INCORRECTLY) that
http://www.jsiinc.com/SUBH/tip3600/rh3605.htm would apply, which
suggested

---------

1. Open a CMD prompt on the domain controller and stop the NetLogon
and Ntfrs services:

net stop NetLogon
net stop Ntfrs

2. Type:

del %systemroot%\ntfrs\jet\Ntfrs.jdb
del %systemroot%\ntfrs\jet\Sys\Edb.chk
del %systemroot%\ntfrs\jet\log\edb.log
del %systemroot%\ntfrs\jet\log\res1.log
del %systemroot%\ntfrs\jet\log\res2.log

3. Type:

net start NetLogon
net start Ntfrs

---------

And then a magic rebuild would occur. So I did that.

Danger Will Robinson!

Lessons:

- DON'T Delete a JET database. Back it up.
- DON'T delete what you don't understand.

Thus began my nightmare.

Log:

Event Type: Warning
Event Source: NtFrs
Event Category: None
Event ID: 13566
Date: 12/7/2003
Time: 10:16:45 PM
User: N/A
Computer: ACS_PRINT
Description:
File Replication Service is scanning the data in the system volume.
Computer ACS_PRINT cannot become a domain controller until this
process is complete. The system volume will then be shared as SYSVOL.

To check for the SYSVOL share, at the command prompt, type:
net share

When File Replication Service completes the scanning process, the
SYSVOL share will appear.

The initialization of the system volume can take some time. The time
is dependent on the amount of data in the system volume.


-------

can't become a DOMAIN CONTROLLER? Ohhhh shit. Well, as soon as the
scanning process completes, I'm golden. No big risk, right?

-------

And then the ominous

Event Type: Information
Event Source: BROWSER
Event Category: None
Event ID: 8035
Date: 12/7/2003
Time: 10:14:10 PM
User: N/A
Computer: ACS_PRINT
Description:
The browser has forced an election on network
\Device\NetBT_Tcpip_{3CF51EFF-6EC0-4054-AB0B-A555FB9FFBF7} because the
Domain Controller (or Server) has changed its role


------

Recap: How many domain controllers do I have? One. How many left?
Do the math. Backup? None. Can anyone log in anymore? No.
Exchange: Gone. Files? Gone. Home directories? Gone. Well,
there.... but in SIDs that nobody can retreive. Visions of a week
long domain rebuild loomed before me. Loss of data. Loss of time.
Legions of infuriated users.

So I'll save you my 8+ hours of frantic calls and thrashing about, and
the general hell-that-was-my-life, including this darling of an error
message:

Event Type: Error
Event Source: NtFrs
Event Category: None
Event ID: 13559
Date: 12/8/2003
Time: 2:35:32 PM
User: N/A
Computer: ACS_PRINT
Description:
The File Replication Service has detected that the replica root path
has changed from "c:\winnt\sysvol\domain" to "c:\winnt\sysvol\domain".
If this is an intentional move then a file with the name
NTFRS_CMD_FILE_MOVE_ROOT needs to be created under the new root path.
This was detected for the following replica set:
"DOMAIN SYSTEM VOLUME (SYSVOL SHARE)"

Changing the replica root path is a two step process which is
triggered by the creation of the NTFRS_CMD_FILE_MOVE_ROOT file.

[1] At the first poll which will occur in 5 minutes this computer
will be deleted from the replica set.
[2] At the poll following the deletion this computer will be re-added
to the replica set with the new root path. This re-addition will
trigger a full tree sync for the replica set. At the end of the sync
all the files will be at the new location. The files may or may not be
deleted from the old location depending on whether they are needed or
not.


----------


Glurph. What have I done? Kill me now.


So I'm just about to commit ritual self disembowelment when I say, OK,
start at the beginning.

And I look back at the first thing I did.

Ntfs Journal size in MB

Look carefully.

I had typed

Nfts Journal size in MB

Ooo. Lets see if that works if I TYPE IT RIGHT. (Yes, I *KNOW* I
should have cut and paste. I was going through a VNC window at the
time, so I couldn't. I could have SWORN I double checked my typing)

--------------

Next log message:

Event Type: Warning
Event Source: NtFrs
Event Category: None
Event ID: 13560
Date: 12/8/2003
Time: 10:27:46 AM
User: N/A
Computer: ACS_PRINT
Description:
The File Replication Service is deleting this computer from the
replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" as an attempt to
recover from the error state,
Error status = FrsErrorMismatchedReplicaRootObjectId
At the next poll, which will occur in 5 minutes, this computer will
be re-added to the replica set. The re-addition will trigger a full
tree sync for the replica set.


--------

OK, well a new error at least. Promising? But still no domain logon.

Finally, I get

Event Type: Information
Event Source: NtFrs
Event Category: None
Event ID: 13553
Date: 12/8/2003
Time: 10:33:02 AM
User: N/A
Computer: ACS_PRINT
Description:
The File Replication Service successfully added this computer to the
following replica set:
"DOMAIN SYSTEM VOLUME (SYSVOL SHARE)"

Information related to this event is shown below:
Computer DNS name is "acs_print.mhcs.a-cs.org"
Replica set member name is "ACS_PRINT"
Replica set root path is "c:\winnt\sysvol\domain"
Replica staging directory path is "c:\winnt\sysvol\staging\domain"
Replica working directory path is "c:\winnt\ntfrs\jet"

---------

And domain starts working again. I can breathe.

What a nightmare. Yes, I know I generally brought it on myself. And
that I had the answer at square one and screwed up. Yes, yes. I
know. But if someone can learn from this I won't feel unbelievably
stupid.

Last note: I hand copied a bunch of SYSVOL files buring this orderal
and now cannot delete them. I suppose I need to read up on FRS to
figure out what is going on. I suspect I made a copy, but all copies
are magically replicated and none of the replicated copies can be
deleted if any are in use. I suspect I need to go into safe mode, or
shut of ntfrs and netlogon to delete the sysvol file copies that I
made? As you might expect, I am pretty gun shy about deleing anything
at this point!!! No more dorking with this system until the holidays
are over, except maybe to add a larger disk and a backup tape.

Thanks for hearing out my rant!

Phil Remaker
Psuedo-Sysadmin for W2K
MUSI - Microsoft Uncertified Systems Idiot
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top