Cluster Resource 'Cluster IP Address' failed, Error 1069



Every Sunday night, during our backups, the Cluster detects a failure
of the Public network (error ID 1069). This causes the shared disks
to fail to the passive node, which it turn, causes the backups to
fail. It usually occurs about an hour into the backups. MS KBA
242600 describes in detail "Network Failure Detection and Recovery in
a Two-Node Cluser". I can track exactly what happens in the
Cluster.log. I don't know why the NIC on the Active Node looses
connection with the network. In the cluster.log, I can see a record
of the Node no longer being able to ping the default gateway.
However, the passive Node can, which is why the resources fail over to
it. After about an hour, the resources fail BACK over AGAIN! Because
the same thing happens to the second node. Now the backups quit
trying, and everything is fine until the next Sunday. Note that a
Full backup runs for 16 hours on the Friday prior, having no problems
at all. Its the Sunday Incremental backup that's killing us.

We are using W2K Advanced Server, SP4, Two nodes, with a SAN connected
via Fibre HBA. The public conntection is using 3com Gigabit
3C985b-SX. We have Veritas Volume manager, and the Veritas Netbackup
Client running.

The System Event log typically has the following errors or warnings:


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question