reviving ad after first dc crashed

G

Guest

last month the first w2k dc (dc1) in our network crashed and wasn't
discovered for a few weeks. we have two dc's so the second one (dc2) picked
up the slack. i finally got around yesterday to installing w2k onto a spare
machine and promoting it to a dc (dc3) and moving all fsmo roles to it. so
now i again have two dc's (dc2 & dc3), with dc1 unplugged and in storage.

should i now do the metadata cleanup? i've been noticing a lot of errors in
the event viewer of dc2 and dc3, mostly referring to frs.

on new dc3
ntds replication, event id 1586
the checkpoint with the pdc was unsuccessful. the checkpointing process will
be retried again in four hours. a full sychronization of the security
database to downlevel domain contollers may take place if this machine is
promoted to be the pdc before the next successful checkpoint. the error
returned was: the naming context is in the process of being removed or is not
replicated from the specified server.

event id 13508
the file replication service is having trouble enabling replication from dc1
to dc3 for c:\winnt\sysvol\domain using dns name dc1.domain.com. frs will
keep retrying.

why is it still trying to replicated to dc1 when it's no on the domain
anymore and no have no fsmo roles? i've been checking out the metadata
cleanup and some mention that you should not do that until replication has
been completed or else ad will be screwed up.

how can i be sure that replication is completed when it's trying to
replicate to a non existent dc?

when i run dcdiag i get
test:replications
a recent replication attempt failed: from dc1 to dc3
the replication generated an error (1722)
the rpc server is unavailable.
....
[dc1] dsbind() failed with error 1722

test:services
trkwks service is stopped on dc3
.....dc3 failed test services

test:frssysvol
there are errors after the sysvol has been shared
the sysvol can prevent the sd from starting
.....dc3 passed test frssysvol

the time service also doesnt run. when i try w32tm /s at the command prompt
it returns, rpc to local server returned 0x0. i get a lot of rpc error, but
when i check, rpc is running on both dc's.

i guess i just need some confirmation about metadata cleanup and whatever
else steps i should be taking when you remove a dc from a domain and whether
replication reestablished automatically between dc2 and dc3.

sorry for the long post but wanted to be sure you had all the facts. any
help would be much appretiated. thanks for all responses.
 
M

Mike Shepperd

You need to do the following:

1) Run "netdom query fsmo" on your two remaining dc's to verify that they
both see the fsmo roles in the same place (on a live DC).

2) Metadata Cleanup.
http://support.microsoft.com/default.aspx?scid=kb;en-us;216498

3) Scrub all of your DNS of any entries referring to the dead DC.

4) Make sure that you've made at least one of the other DC's a Global
Catalog (in the properties of it's NTDS Settings object in AD Sites and
Services)

5) Make sure that the two remaining DC's are pointed to the same place
(typically one of the DC's) for DNS as well as all the client machines so
that they're all able to resolve the internal addresses.

6) Change the Time Service so that it runs on the new PDC Emulator:
http://www.microsoft.com/technet/pr...ons/ce8890cf-ef46-4931-8e4a-2fc5b4ddb047.mspx

That should cover the bases, let us know if that cleans it all up.

--
Mike Shepperd
MCSE NT4, 2000, 2003
NewFuture Consulting
Seattle, Washington


John said:
last month the first w2k dc (dc1) in our network crashed and wasn't
discovered for a few weeks. we have two dc's so the second one (dc2)
picked
up the slack. i finally got around yesterday to installing w2k onto a
spare
machine and promoting it to a dc (dc3) and moving all fsmo roles to it. so
now i again have two dc's (dc2 & dc3), with dc1 unplugged and in storage.

should i now do the metadata cleanup? i've been noticing a lot of errors
in
the event viewer of dc2 and dc3, mostly referring to frs.

on new dc3
ntds replication, event id 1586
the checkpoint with the pdc was unsuccessful. the checkpointing process
will
be retried again in four hours. a full sychronization of the security
database to downlevel domain contollers may take place if this machine is
promoted to be the pdc before the next successful checkpoint. the error
returned was: the naming context is in the process of being removed or is
not
replicated from the specified server.

event id 13508
the file replication service is having trouble enabling replication from
dc1
to dc3 for c:\winnt\sysvol\domain using dns name dc1.domain.com. frs will
keep retrying.

why is it still trying to replicated to dc1 when it's no on the domain
anymore and no have no fsmo roles? i've been checking out the metadata
cleanup and some mention that you should not do that until replication has
been completed or else ad will be screwed up.

how can i be sure that replication is completed when it's trying to
replicate to a non existent dc?

when i run dcdiag i get
test:replications
a recent replication attempt failed: from dc1 to dc3
the replication generated an error (1722)
the rpc server is unavailable.
...
[dc1] dsbind() failed with error 1722

test:services
trkwks service is stopped on dc3
....dc3 failed test services

test:frssysvol
there are errors after the sysvol has been shared
the sysvol can prevent the sd from starting
....dc3 passed test frssysvol

the time service also doesnt run. when i try w32tm /s at the command
prompt
it returns, rpc to local server returned 0x0. i get a lot of rpc error,
but
when i check, rpc is running on both dc's.

i guess i just need some confirmation about metadata cleanup and whatever
else steps i should be taking when you remove a dc from a domain and
whether
replication reestablished automatically between dc2 and dc3.

sorry for the long post but wanted to be sure you had all the facts. any
help would be much appretiated. thanks for all responses.
 
G

Guest

I think it all worked. I assume I won't know until I see another error in the
Event Viewer. From the logs, it looks like replication happens late in the
evening, 10-11pm. Unless there is a better way to check if replication has
already happened.

Ran the netdom on the other DC and it shows the correct (new) server holding
FSMO roles.
Did the Metadata Cleanup and scrubbed DNS, no errors so I assume it worked
and I got all the entries.
Both DC's are setup as GC so that should be fine.
Changed DNS setting on both DC's to reflect each other. Updated DHCP so
clients will pickup new DNS next time around.

The Time Service change didn't work. The command listed in the KB doesn't
work in W2K.
w32tm /stripchart /computer:target /samples:1 /dataonly

SHould I just config it and not bother checking?
w32tm /config /manualpeerlist:peers /syncfromflags:manual /reliable:yes
/update

Thanks Mike for the help.

John


Mike Shepperd said:
You need to do the following:

1) Run "netdom query fsmo" on your two remaining dc's to verify that they
both see the fsmo roles in the same place (on a live DC).

2) Metadata Cleanup.
http://support.microsoft.com/default.aspx?scid=kb;en-us;216498

3) Scrub all of your DNS of any entries referring to the dead DC.

4) Make sure that you've made at least one of the other DC's a Global
Catalog (in the properties of it's NTDS Settings object in AD Sites and
Services)

5) Make sure that the two remaining DC's are pointed to the same place
(typically one of the DC's) for DNS as well as all the client machines so
that they're all able to resolve the internal addresses.

6) Change the Time Service so that it runs on the new PDC Emulator:
http://www.microsoft.com/technet/pr...ons/ce8890cf-ef46-4931-8e4a-2fc5b4ddb047.mspx

That should cover the bases, let us know if that cleans it all up.

--
Mike Shepperd
MCSE NT4, 2000, 2003
NewFuture Consulting
Seattle, Washington


John said:
last month the first w2k dc (dc1) in our network crashed and wasn't
discovered for a few weeks. we have two dc's so the second one (dc2)
picked
up the slack. i finally got around yesterday to installing w2k onto a
spare
machine and promoting it to a dc (dc3) and moving all fsmo roles to it. so
now i again have two dc's (dc2 & dc3), with dc1 unplugged and in storage.

should i now do the metadata cleanup? i've been noticing a lot of errors
in
the event viewer of dc2 and dc3, mostly referring to frs.

on new dc3
ntds replication, event id 1586
the checkpoint with the pdc was unsuccessful. the checkpointing process
will
be retried again in four hours. a full sychronization of the security
database to downlevel domain contollers may take place if this machine is
promoted to be the pdc before the next successful checkpoint. the error
returned was: the naming context is in the process of being removed or is
not
replicated from the specified server.

event id 13508
the file replication service is having trouble enabling replication from
dc1
to dc3 for c:\winnt\sysvol\domain using dns name dc1.domain.com. frs will
keep retrying.

why is it still trying to replicated to dc1 when it's no on the domain
anymore and no have no fsmo roles? i've been checking out the metadata
cleanup and some mention that you should not do that until replication has
been completed or else ad will be screwed up.

how can i be sure that replication is completed when it's trying to
replicate to a non existent dc?

when i run dcdiag i get
test:replications
a recent replication attempt failed: from dc1 to dc3
the replication generated an error (1722)
the rpc server is unavailable.
...
[dc1] dsbind() failed with error 1722

test:services
trkwks service is stopped on dc3
....dc3 failed test services

test:frssysvol
there are errors after the sysvol has been shared
the sysvol can prevent the sd from starting
....dc3 passed test frssysvol

the time service also doesnt run. when i try w32tm /s at the command
prompt
it returns, rpc to local server returned 0x0. i get a lot of rpc error,
but
when i check, rpc is running on both dc's.

i guess i just need some confirmation about metadata cleanup and whatever
else steps i should be taking when you remove a dc from a domain and
whether
replication reestablished automatically between dc2 and dc3.

sorry for the long post but wanted to be sure you had all the facts. any
help would be much appretiated. thanks for all responses.
 
M

Mike Shepperd

Time service is probably my least favorite thing... I think the command you
listed will work for the manual configuration and is probably your best bet.

As for the replication... Support Tools should include repadmin (though it
may be inbox for a DC). You can use repadmin /showreps to see what the
current status of AD replication is, as well as a lot of other options if
you use the /? switch to see what's available. You can also initiate
replication with repadmin, but the syntax is awkward, so in a small
environment it ends up being quicker/easier to just go into AD Sites and
Services and right click on each connection object under the NTDS Settings
object of each DC and choose "Replicate Now".

Once the time service is setup, it sounds like the issue will be fully
resolved.

--
--
Mike Shepperd
MCSE NT4, 2000, 2003
NewFuture Consulting
Seattle, Washington


John said:
I think it all worked. I assume I won't know until I see another error in
the
Event Viewer. From the logs, it looks like replication happens late in the
evening, 10-11pm. Unless there is a better way to check if replication has
already happened.

Ran the netdom on the other DC and it shows the correct (new) server
holding
FSMO roles.
Did the Metadata Cleanup and scrubbed DNS, no errors so I assume it worked
and I got all the entries.
Both DC's are setup as GC so that should be fine.
Changed DNS setting on both DC's to reflect each other. Updated DHCP so
clients will pickup new DNS next time around.

The Time Service change didn't work. The command listed in the KB doesn't
work in W2K.
w32tm /stripchart /computer:target /samples:1 /dataonly

SHould I just config it and not bother checking?
w32tm /config /manualpeerlist:peers /syncfromflags:manual /reliable:yes
/update

Thanks Mike for the help.

John


Mike Shepperd said:
You need to do the following:

1) Run "netdom query fsmo" on your two remaining dc's to verify that they
both see the fsmo roles in the same place (on a live DC).

2) Metadata Cleanup.
http://support.microsoft.com/default.aspx?scid=kb;en-us;216498

3) Scrub all of your DNS of any entries referring to the dead DC.

4) Make sure that you've made at least one of the other DC's a Global
Catalog (in the properties of it's NTDS Settings object in AD Sites and
Services)

5) Make sure that the two remaining DC's are pointed to the same place
(typically one of the DC's) for DNS as well as all the client machines so
that they're all able to resolve the internal addresses.

6) Change the Time Service so that it runs on the new PDC Emulator:
http://www.microsoft.com/technet/pr...ons/ce8890cf-ef46-4931-8e4a-2fc5b4ddb047.mspx

That should cover the bases, let us know if that cleans it all up.

--
Mike Shepperd
MCSE NT4, 2000, 2003
NewFuture Consulting
Seattle, Washington


John said:
last month the first w2k dc (dc1) in our network crashed and wasn't
discovered for a few weeks. we have two dc's so the second one (dc2)
picked
up the slack. i finally got around yesterday to installing w2k onto a
spare
machine and promoting it to a dc (dc3) and moving all fsmo roles to it.
so
now i again have two dc's (dc2 & dc3), with dc1 unplugged and in
storage.

should i now do the metadata cleanup? i've been noticing a lot of
errors
in
the event viewer of dc2 and dc3, mostly referring to frs.

on new dc3
ntds replication, event id 1586
the checkpoint with the pdc was unsuccessful. the checkpointing process
will
be retried again in four hours. a full sychronization of the security
database to downlevel domain contollers may take place if this machine
is
promoted to be the pdc before the next successful checkpoint. the error
returned was: the naming context is in the process of being removed or
is
not
replicated from the specified server.

event id 13508
the file replication service is having trouble enabling replication
from
dc1
to dc3 for c:\winnt\sysvol\domain using dns name dc1.domain.com. frs
will
keep retrying.

why is it still trying to replicated to dc1 when it's no on the domain
anymore and no have no fsmo roles? i've been checking out the metadata
cleanup and some mention that you should not do that until replication
has
been completed or else ad will be screwed up.

how can i be sure that replication is completed when it's trying to
replicate to a non existent dc?

when i run dcdiag i get
test:replications
a recent replication attempt failed: from dc1 to dc3
the replication generated an error (1722)
the rpc server is unavailable.
...
[dc1] dsbind() failed with error 1722

test:services
trkwks service is stopped on dc3
....dc3 failed test services

test:frssysvol
there are errors after the sysvol has been shared
the sysvol can prevent the sd from starting
....dc3 passed test frssysvol

the time service also doesnt run. when i try w32tm /s at the command
prompt
it returns, rpc to local server returned 0x0. i get a lot of rpc error,
but
when i check, rpc is running on both dc's.

i guess i just need some confirmation about metadata cleanup and
whatever
else steps i should be taking when you remove a dc from a domain and
whether
replication reestablished automatically between dc2 and dc3.

sorry for the long post but wanted to be sure you had all the facts.
any
help would be much appretiated. thanks for all responses.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top