Replication, AD problems. Rebuild the domain? Advice Sought

G

Guest

Hi,

One UK Secondary School, 9 servers of which 3 are Domain Controllers. One DC
runs Exchange 2003, One runs SQL 2000 & SMS 2003. All servers are Windows
Server Enterprise 2003 running at 2003 domain and 2003 forest level. All DC
are GC. Single domain, two buildings linked by fibre, dc's in each building.
Domain created November 2003.

We have 650 clients all running Windows XP PRO with a mix of SP1 & SP2. Our
network is behind ISA 2000 and we run Symantec AV 8.1.

At the start of the new school year in September we had about two weeks of
network outages, sporadic in nature they lasted for a couple of hours at a
time, these outages caused replication problems. Once we had about a week of
stable network operation when the replication errors became more frequent.
Our network monitoring at that time consisted of pinging servers, all the
pings were fine but the replication errors continued.

I ran a Netdiag and DCdiag which failed on FSMO, our PDC was down. I seized
the role and a couple of days later DCdiag passed.FSMO. Replication issues
continued so I installed a trial copy of Solar Winds, and after a couple of
days one of our core switches was showing 60% packet loss. Said switch was
replaced and replication errors diminished only to be replaced by GP 1038
and 1050 errors (unable to find gpt.ini).

Under guidance from MS PSS we uninstalled DNS on all but one DC, deleted the
zone and restarted the remaining DC. Our zone was re created and we
re-installed dns on another server. We reduced our DC's from 5 to 2 and left
the AD to replicate and synchronize over this last weeked.

Monday Morning and we arrived at work to discover a fibre module had failed
sometime Saturday. We replaced the module, network operational again and GP
seemed to be applying to our clients once again. On checking event logs
there was a occasional USERENV error.

Next we discovered our Exchange server 2003 had failed, the Information
Store wouldn't mount aznd event logs showed Topology errors. Checking in
system manager the exchange server didn't know about a GC server (we had
one!)[dns had appropriate server records], so I made the exchange server a
GC server and IS started and stores were mounted. Now we have a MAPI
external RPC client error in the logs, MS PSS says don't worry but whilst we
can receive mail we are unable to send mail. Exchange reports unable to bind
to DNS.

However DNS is working correctly from the Exchange server.

I am fed up. We have half term next week and I am considering our options
which appear to be:

1. Continue to fire fight errors and hope that it all comes under control.
Cons can't guarantee that this will solve our problems

2. Take down the domain, rebuild the servers and domain, rejoin the clients
(all 650!). Cons, time taken ADV, we will have a working domain again that
we know is fine. No more fire fighting for a while.

or

Could we bring up a new domain, move the clients and users to the new
domain, migrate the group policies and be back online or would our GC be
suspect?

How about a new forest with a new domain (new GC), could we migrate (or
move) our clients & users to the new domain in a new forest? Or would we
have to rejoin? I can script user creation so the biggest problem would be
the computer accounts.

Sorry about the long post. Hope someone has comments & advice on this.

I am at my wits end.

Andy.
 
O

Oli Restorick [MVP]

Sounds like you've had some problems. Without having all the details, it
sounds as if your DNS and general infrastructure was poorly designed and
building on top of poor foundations gave you problems later.

In determining whether a complete rebuild is practical, having a grasp of
how much data and e-mail you'd have to migrate would really help you. The
complexity of the Exchange setup has to be a major factor in determining
what to do.

If you do decide to rebuild during half term, getting some outside help may
be worthwhile to ensure that you don't repeat the same mistakes.

The migration to a new forest/domain scenario sounds possible. Make sure
you create a new forest, though, otherwise you'll end up having to have two
DCs sitting there doing nothing but holding the forest root domain. It's a
waste of hardware and adds unnecessary complexity to the environment -- and
nobody needs any extra complexity. You can use a trust relationship to help
here, but remember that DNS settings are probably the most important thing
to get right. If you get DNS wrong, things just will not work reliably.

As far as your 1038 and 1050 errors (are you sure you don't mean 1030 and
1058?) errors and gpt.ini, is there a possibility that somebody has been
fiddling with file permissions on sysvol?

Does this article help at all?

http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q250842

Regards

Oli



Hi,

One UK Secondary School, 9 servers of which 3 are Domain Controllers. One
DC
runs Exchange 2003, One runs SQL 2000 & SMS 2003. All servers are Windows
Server Enterprise 2003 running at 2003 domain and 2003 forest level. All
DC
are GC. Single domain, two buildings linked by fibre, dc's in each
building.
Domain created November 2003.

We have 650 clients all running Windows XP PRO with a mix of SP1 & SP2.
Our
network is behind ISA 2000 and we run Symantec AV 8.1.

At the start of the new school year in September we had about two weeks of
network outages, sporadic in nature they lasted for a couple of hours at a
time, these outages caused replication problems. Once we had about a week
of
stable network operation when the replication errors became more frequent.
Our network monitoring at that time consisted of pinging servers, all the
pings were fine but the replication errors continued.

I ran a Netdiag and DCdiag which failed on FSMO, our PDC was down. I
seized
the role and a couple of days later DCdiag passed.FSMO. Replication issues
continued so I installed a trial copy of Solar Winds, and after a couple
of
days one of our core switches was showing 60% packet loss. Said switch was
replaced and replication errors diminished only to be replaced by GP 1038
and 1050 errors (unable to find gpt.ini).

Under guidance from MS PSS we uninstalled DNS on all but one DC, deleted
the
zone and restarted the remaining DC. Our zone was re created and we
re-installed dns on another server. We reduced our DC's from 5 to 2 and
left
the AD to replicate and synchronize over this last weeked.

Monday Morning and we arrived at work to discover a fibre module had
failed
sometime Saturday. We replaced the module, network operational again and
GP
seemed to be applying to our clients once again. On checking event logs
there was a occasional USERENV error.

Next we discovered our Exchange server 2003 had failed, the Information
Store wouldn't mount aznd event logs showed Topology errors. Checking in
system manager the exchange server didn't know about a GC server (we had
one!)[dns had appropriate server records], so I made the exchange server a
GC server and IS started and stores were mounted. Now we have a MAPI
external RPC client error in the logs, MS PSS says don't worry but whilst
we
can receive mail we are unable to send mail. Exchange reports unable to
bind
to DNS.

However DNS is working correctly from the Exchange server.

I am fed up. We have half term next week and I am considering our options
which appear to be:

1. Continue to fire fight errors and hope that it all comes under control.
Cons can't guarantee that this will solve our problems

2. Take down the domain, rebuild the servers and domain, rejoin the
clients
(all 650!). Cons, time taken ADV, we will have a working domain again that
we know is fine. No more fire fighting for a while.

or

Could we bring up a new domain, move the clients and users to the new
domain, migrate the group policies and be back online or would our GC be
suspect?

How about a new forest with a new domain (new GC), could we migrate (or
move) our clients & users to the new domain in a new forest? Or would we
have to rejoin? I can script user creation so the biggest problem would be
the computer accounts.

Sorry about the long post. Hope someone has comments & advice on this.

I am at my wits end.

Andy.
 
O

Oli Restorick [MVP]

While I think of it, have you considered adding MOM 2005 Workgroup Edition
to your environment? I'm a big fan of this product. It can give you early
warnings of things that are not running as smoothly as they should and has a
lot of built-in knowledge about what all the various event IDs really mean.

It's relatively cheap and will monitor up to 10 servers. For the Workgroup
Edition, you just pay for the product. There are no extra licenses like the
fully-fledged version has.

Regards

Oli




Oli Restorick said:
Sounds like you've had some problems. Without having all the details, it
sounds as if your DNS and general infrastructure was poorly designed and
building on top of poor foundations gave you problems later.

In determining whether a complete rebuild is practical, having a grasp of
how much data and e-mail you'd have to migrate would really help you. The
complexity of the Exchange setup has to be a major factor in determining
what to do.

If you do decide to rebuild during half term, getting some outside help
may be worthwhile to ensure that you don't repeat the same mistakes.

The migration to a new forest/domain scenario sounds possible. Make sure
you create a new forest, though, otherwise you'll end up having to have
two DCs sitting there doing nothing but holding the forest root domain.
It's a waste of hardware and adds unnecessary complexity to the
environment -- and nobody needs any extra complexity. You can use a trust
relationship to help here, but remember that DNS settings are probably the
most important thing to get right. If you get DNS wrong, things just will
not work reliably.

As far as your 1038 and 1050 errors (are you sure you don't mean 1030 and
1058?) errors and gpt.ini, is there a possibility that somebody has been
fiddling with file permissions on sysvol?

Does this article help at all?

http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q250842

Regards

Oli



Hi,

One UK Secondary School, 9 servers of which 3 are Domain Controllers. One
DC
runs Exchange 2003, One runs SQL 2000 & SMS 2003. All servers are Windows
Server Enterprise 2003 running at 2003 domain and 2003 forest level. All
DC
are GC. Single domain, two buildings linked by fibre, dc's in each
building.
Domain created November 2003.

We have 650 clients all running Windows XP PRO with a mix of SP1 & SP2.
Our
network is behind ISA 2000 and we run Symantec AV 8.1.

At the start of the new school year in September we had about two weeks
of
network outages, sporadic in nature they lasted for a couple of hours at
a
time, these outages caused replication problems. Once we had about a week
of
stable network operation when the replication errors became more
frequent.
Our network monitoring at that time consisted of pinging servers, all the
pings were fine but the replication errors continued.

I ran a Netdiag and DCdiag which failed on FSMO, our PDC was down. I
seized
the role and a couple of days later DCdiag passed.FSMO. Replication
issues
continued so I installed a trial copy of Solar Winds, and after a couple
of
days one of our core switches was showing 60% packet loss. Said switch
was
replaced and replication errors diminished only to be replaced by GP 1038
and 1050 errors (unable to find gpt.ini).

Under guidance from MS PSS we uninstalled DNS on all but one DC, deleted
the
zone and restarted the remaining DC. Our zone was re created and we
re-installed dns on another server. We reduced our DC's from 5 to 2 and
left
the AD to replicate and synchronize over this last weeked.

Monday Morning and we arrived at work to discover a fibre module had
failed
sometime Saturday. We replaced the module, network operational again and
GP
seemed to be applying to our clients once again. On checking event logs
there was a occasional USERENV error.

Next we discovered our Exchange server 2003 had failed, the Information
Store wouldn't mount aznd event logs showed Topology errors. Checking in
system manager the exchange server didn't know about a GC server (we had
one!)[dns had appropriate server records], so I made the exchange server
a
GC server and IS started and stores were mounted. Now we have a MAPI
external RPC client error in the logs, MS PSS says don't worry but whilst
we
can receive mail we are unable to send mail. Exchange reports unable to
bind
to DNS.

However DNS is working correctly from the Exchange server.

I am fed up. We have half term next week and I am considering our options
which appear to be:

1. Continue to fire fight errors and hope that it all comes under
control.
Cons can't guarantee that this will solve our problems

2. Take down the domain, rebuild the servers and domain, rejoin the
clients
(all 650!). Cons, time taken ADV, we will have a working domain again
that
we know is fine. No more fire fighting for a while.

or

Could we bring up a new domain, move the clients and users to the new
domain, migrate the group policies and be back online or would our GC be
suspect?

How about a new forest with a new domain (new GC), could we migrate (or
move) our clients & users to the new domain in a new forest? Or would we
have to rejoin? I can script user creation so the biggest problem would
be
the computer accounts.

Sorry about the long post. Hope someone has comments & advice on this.

I am at my wits end.

Andy.
 
G

Guest

Oli, feedback in line


Oli Restorick said:
Sounds like you've had some problems. Without having all the details, it
sounds as if your DNS and general infrastructure was poorly designed and
building on top of poor foundations gave you problems later.

But we have a simple setup, one domain in one forest. Fibre links between
both sites. I can honestly say that I don't think poor foundations are to
blame, it worked for 9 months without a blip until we had network problems
at the start of September. It is possible that network problems existed
during the summer holidays, plenty of time for AD to get out of sync.
In determining whether a complete rebuild is practical, having a grasp of
how much data and e-mail you'd have to migrate would really help you. The
complexity of the Exchange setup has to be a major factor in determining
what to do.

Exchange should be straightforward, we only use pop3 clients at present (120
staff users and about 400 students), we have a single Exchange 2003 server
that utilises SMTP and pop3 but that's it.


If you do decide to rebuild during half term, getting some outside help may
be worthwhile to ensure that you don't repeat the same mistakes.

True, but time is against us.

The migration to a new forest/domain scenario sounds possible. Make sure
you create a new forest, though, otherwise you'll end up having to have two
DCs sitting there doing nothing but holding the forest root domain. It's a
waste of hardware and adds unnecessary complexity to the environment -- and
nobody needs any extra complexity. You can use a trust relationship to help
here, but remember that DNS settings are probably the most important thing
to get right. If you get DNS wrong, things just will not work reliably.

As far as your 1038 and 1050 errors (are you sure you don't mean 1030 and
1058?) errors and gpt.ini, is there a possibility that somebody has been
fiddling with file permissions on sysvol?

Yes, you are correct 1030 and 1058 errors. DNS was restructured recently
with the help of MS PSS, but from what I can see it made things worse. I
enabled USERENV logging and the problems were related to an inability to
find a DC. However all the DNS SRV records were in place and our users could
login, access their files and print.


Yes, I have seen that article, very useful.

I am lacking experience of other networks, what do others do when there AD
seems buggered? Just how resiliant is AD to replication issues?

Could we use ADMT to migrate over to a new forest? I am wondering what other
ghotca's there may be!

Re: MOM 2005, it's been on my list for about a month now but after seeing
your post recommending it I have been in to see management to ask for a
copy. Apparently the educational price is £72.10 for a licence. Now I am
waiting to see if this is per server or for up to ten servers. I have also
enquired about the cost of the full product.

Not being familiar with MOM is there anything missing from the Workgroup
edition that you would class as essential? We use ISA, Exchange, SMS 2003,
AD, IIS, SQL 2000.


Regards

Oli

Thanks for taking the time to post some advice Oli, much appreciated.


Hi,

One UK Secondary School, 9 servers of which 3 are Domain Controllers. One
DC
runs Exchange 2003, One runs SQL 2000 & SMS 2003. All servers are Windows
Server Enterprise 2003 running at 2003 domain and 2003 forest level. All
DC
are GC. Single domain, two buildings linked by fibre, dc's in each
building.
Domain created November 2003.

We have 650 clients all running Windows XP PRO with a mix of SP1 & SP2.
Our
network is behind ISA 2000 and we run Symantec AV 8.1.

At the start of the new school year in September we had about two weeks of
network outages, sporadic in nature they lasted for a couple of hours at a
time, these outages caused replication problems. Once we had about a week
of
stable network operation when the replication errors became more frequent.
Our network monitoring at that time consisted of pinging servers, all the
pings were fine but the replication errors continued.

I ran a Netdiag and DCdiag which failed on FSMO, our PDC was down. I
seized
the role and a couple of days later DCdiag passed.FSMO. Replication issues
continued so I installed a trial copy of Solar Winds, and after a couple
of
days one of our core switches was showing 60% packet loss. Said switch was
replaced and replication errors diminished only to be replaced by GP 1038
and 1050 errors (unable to find gpt.ini).

Under guidance from MS PSS we uninstalled DNS on all but one DC, deleted
the
zone and restarted the remaining DC. Our zone was re created and we
re-installed dns on another server. We reduced our DC's from 5 to 2 and
left
the AD to replicate and synchronize over this last weeked.

Monday Morning and we arrived at work to discover a fibre module had
failed
sometime Saturday. We replaced the module, network operational again and
GP
seemed to be applying to our clients once again. On checking event logs
there was a occasional USERENV error.

Next we discovered our Exchange server 2003 had failed, the Information
Store wouldn't mount aznd event logs showed Topology errors. Checking in
system manager the exchange server didn't know about a GC server (we had
one!)[dns had appropriate server records], so I made the exchange server a
GC server and IS started and stores were mounted. Now we have a MAPI
external RPC client error in the logs, MS PSS says don't worry but whilst
we
can receive mail we are unable to send mail. Exchange reports unable to
bind
to DNS.

However DNS is working correctly from the Exchange server.

I am fed up. We have half term next week and I am considering our options
which appear to be:

1. Continue to fire fight errors and hope that it all comes under control.
Cons can't guarantee that this will solve our problems

2. Take down the domain, rebuild the servers and domain, rejoin the
clients
(all 650!). Cons, time taken ADV, we will have a working domain again that
we know is fine. No more fire fighting for a while.

or

Could we bring up a new domain, move the clients and users to the new
domain, migrate the group policies and be back online or would our GC be
suspect?

How about a new forest with a new domain (new GC), could we migrate (or
move) our clients & users to the new domain in a new forest? Or would we
have to rejoin? I can script user creation so the biggest problem would be
the computer accounts.

Sorry about the long post. Hope someone has comments & advice on this.

I am at my wits end.

Andy.
 
R

rickiez

Here it goes.
You could attempt to create a new domain as a child, but I like you
would always wonder unless I built a completely seperate new domain.
You could always create the new domain and use the Active Directory
Migration Tool to migrate the computer accounts over. Just a thought.
Hi,

One UK Secondary School, 9 servers of which 3 are Domain Controllers. One DC
runs Exchange 2003, One runs SQL 2000 & SMS 2003. All servers are Windows
Server Enterprise 2003 running at 2003 domain and 2003 forest level. All DC
are GC. Single domain, two buildings linked by fibre, dc's in each building.
Domain created November 2003.

We have 650 clients all running Windows XP PRO with a mix of SP1 & SP2. Our
network is behind ISA 2000 and we run Symantec AV 8.1.

At the start of the new school year in September we had about two weeks of
network outages, sporadic in nature they lasted for a couple of hours at a
time, these outages caused replication problems. Once we had about a week of
stable network operation when the replication errors became more frequent.
Our network monitoring at that time consisted of pinging servers, all the
pings were fine but the replication errors continued.

I ran a Netdiag and DCdiag which failed on FSMO, our PDC was down. I seized
the role and a couple of days later DCdiag passed.FSMO. Replication issues
continued so I installed a trial copy of Solar Winds, and after a couple of
days one of our core switches was showing 60% packet loss. Said switch was
replaced and replication errors diminished only to be replaced by GP 1038
and 1050 errors (unable to find gpt.ini).

Under guidance from MS PSS we uninstalled DNS on all but one DC, deleted the
zone and restarted the remaining DC. Our zone was re created and we
re-installed dns on another server. We reduced our DC's from 5 to 2 and left
the AD to replicate and synchronize over this last weeked.

Monday Morning and we arrived at work to discover a fibre module had failed
sometime Saturday. We replaced the module, network operational again and GP
seemed to be applying to our clients once again. On checking event logs
there was a occasional USERENV error.

Next we discovered our Exchange server 2003 had failed, the Information
Store wouldn't mount aznd event logs showed Topology errors. Checking in
system manager the exchange server didn't know about a GC server (we had
one!)[dns had appropriate server records], so I made the exchange server a
GC server and IS started and stores were mounted. Now we have a MAPI
external RPC client error in the logs, MS PSS says don't worry but whilst we
can receive mail we are unable to send mail. Exchange reports unable to bind
to DNS.

However DNS is working correctly from the Exchange server.

I am fed up. We have half term next week and I am considering our options
which appear to be:

1. Continue to fire fight errors and hope that it all comes under control.
Cons can't guarantee that this will solve our problems

2. Take down the domain, rebuild the servers and domain, rejoin the clients
(all 650!). Cons, time taken ADV, we will have a working domain again that
we know is fine. No more fire fighting for a while.

or

Could we bring up a new domain, move the clients and users to the new
domain, migrate the group policies and be back online or would our GC be
suspect?

How about a new forest with a new domain (new GC), could we migrate (or
move) our clients & users to the new domain in a new forest? Or would we
have to rejoin? I can script user creation so the biggest problem would be
the computer accounts.

Sorry about the long post. Hope someone has comments & advice on this.

I am at my wits end.

Andy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top