RPC Error 1722 and 1726 - Looking for possible causes

H

Hector Santos

We have a high end client/server RPC systems and on rare occasions, we get
customer reports of RPC errors.

It is been our support history that when customers report RPC related
errors, it usually points to some either hardware and/or networking issue.
Most RPC experts I've talk to directly agree with this.

Although it a rare report, when it does come in, it is always a frustrating
report because it is not something we are able to solve right away and it
because difficult to convince the customer the application reporting RPC
errors is basically revealing much deeper issues going on the machine, NIC,
hardwire or network connection.
As you can imagine, most people, including myself do like to hear they need
to fix or look at hardware to solve a "software issue."

Nonetheless, in the end, nearly 100% of the time, after the customer
finally has no choice but take our advice to investigate the system, the
problem is found with one of the following findings:

- Windows needing service packs,
- Software based system performance issues caused by heavy I/O,
- a bad NIC,
- a bad cluster on the hard drive,
- network interruptions, router
- LAN topology, i.e, daisy vs. hub
- bad motherboards,
- or a revamp of the hardware/machine, etc.

The above is pretty much the recommended order we provide to customers to
analyze RPC related errors before any consideration of a system
upgrade/revamp especially for those who do have older setups.


In any case, its happening again with a very large customer of ours who
previously experienced the rare RPC 1722/1726 errors, a few times a month
or so and after giving them the above list, they finally opted to upgrade
the system from NT to an high end DELL server running Windows 2000/Advanced
Server.

Now, it is happening 3-5 times a day and it has now become mission critical
to get this resolved ASAP.

A MSDN KB lookup for "RPC 1722 1726 Windows 2000" resulted in this KB:

"The Cluster Service Detects RPC Errors 1726 and 1722"
http://support.microsoft.com/default.aspx?kbid=326330

Unless I am reading the KB wrong, it seems to indicate the problem is fixed
in Window 2000 Service Pack 4.

The customer is already using SP4.

However, it also talks about an available Hotfix by contacting Microsoft

How do you read the above KB?

What else can you think of can help cause the 1722 and 1726 RPC errors?

Anyway, I have exhausted all possible reasons so I am trying to see what can
I do within our client/server software to help address the issue once and
for all, like possibly use (or write) an independent RPC Testing Tool. Is
there such a thing available now?

In addition, is there any RPC technique that I may use to maybe work around
this?

For example:

Our RPC client (a modem/internet hosting server) issues a 2 second heartbeat
function call to the RPC server. It obtains statistical information with
this server RPC call to display on the host monitor window.

It is during this heartbeat where the function call returns an RPC error and
the host displays the RPC error saying "Critical RPC Communication:
Continue or Stop?"

A major reduction of customer RPC error support calls was achieved by
changing the heartbeat to count three (3) consecutive RPC errors before
issuing the popup message.

We did this, because early on, we discover the majority of the RPC errors
was a result of some network interruption, a "blip" in the network
communications.

For example, for customers who had legacy DAISY cabled LAN, if someone
temporarily pulled the cabled off one PC machine on the LAN, our HOST
running anywhere on the LAN would instantly detect an RPC error.

Of course, for these customers, we recommended upgrading the LAN by using a
better network (a HUB/router) to solve this problem.

But we also saw in some other instances where a "network blip" can occur for
other reasons; such as heavy disk I/O, network congestion, etc.

In short, if the "communications" was slowed down, the RPC error was
detected.

So by adding the error count, it drastically helped reduce the support calls
related to RPC errors. When we do get the calls, it is normally something
related to what I described above.

Anyway, what other available RPC (or NON-RPC) technique I can add to help
see if there is a "real" RPC issue or get around this error? Maybe see
re-bind works?

I appreciate any input on this matter, including if you just agree that in
your experience these RPC errors are normally based on some
hardware/performance/network communications issue with the customer setup.

Thanks
 
H

Hector Santos

I wrote a quick utility that uses RpcMgmtInqStats().

Can I use the Packet In/Out values to measure a "leak" or lost of packets
being related to network communications issues?

-- Hector
 
G

Greg Kapoustin [MSFT]

Please use microsoft.public.win32.programmer.networks for RPC-related
questions. You will be most likely to receive feedback in that group.

RpcMgmtInqStats will not be useful in diagnosis network communication
issues. RPC_C_STATS_PKTS_IN/RPC_C_STATS_PKTS_OUT values are just the
numbers of network fragments received/sent by the server process, and do not
give any indications of failures that may have occurred at the transport
level.

The best way to diagnose RPC network failures is to examine the network
sniff, or to retrieve the RPC extended error information.

Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top