DCOM TCP/IP Problem

M

MaxB

In our product we have a Client and a Server that uses DCOM to communicate
with each other. The communication is Connection-oriented TCP/IP.
The Client software runs on Windows 2000 Professional service pack 4.
The Server software runs on Windows 2000 Server service pack 4

We have the product installed at a costumer were about once a day the
connection between the Client at Site1 losses the connection with the Server
and the following error is reported in the Event log.(0x800706BE)The remote
procedure call failed.
The DCOM calls from Site1 takes much longer time to that from the other
sites were the client software are installed.

The latency between the all the sites and server are 3 – 5 ms.
The MTU is set to the same value at all sites.

We have also tested with an Client running Windows XP professional and we
still get that the DCOM calls takes long time from Site1.

See the details below:
The error-situation is a Client at Site1. Occasionally the Client will
report ‘archive unavailable’ and freeze permanently or for some uncertain
time.

All Clients and the Server are connected to identical local net-structure
and by identical links to the data centre were server is. The traffic is
running through 2 Cisco FWSM:context’s. At this point a sniffer-PC is
monitoring inside of Server FWSM:context. At the Site1 Client another
sniffer-PC is monitoring local VLAN where Client is attached.

The sniffer-PC’s reveal
- DCOM/RPC-communication between Client and Server
- Traffic over port’s 135 and 5002-5005
- Traffic over port 5003 initiated by Client is running polling over same
session every 2s
- Traffic over port 135 initiated by Client is running polling new sessions
every 0-120s, might be parallel
- Traffic over port 5002 initiated by Server is running at intervals over
parallel sessions
- Traffic over port 5004 initiated by Client is running at intervals

When error occurs and Client reports ‘archive unavailable’, traffic over all
ports and established session is running normaly, that is
- No tcp timeout
- No tcp retransmissions
- No stop in DCOM/RPC request/respons pattern

One sw-function ‘get patient-list’ behaves differently from Client at Site1
(and another Sites on same FWSM:context) from Client at other sites behind
other FWSM:context where no sw-error is reported. This is NOT sw-function
causing root-problem :
- At Site1 the ‘get function’ causes some 1000 (MSS- and small-sized)
packets to flow to Client taking approx 5s
- At other Sites ‘get function’ causes some 50 (MSS- and small-sized)
packets to flow to Client taking approx 0,5s
To sidestep FWSM completely, Server VPN was taken out at Site1 and the
problem-Client given Server IP-addr without changing this behaviour.

Please does anyone know if there are in the DCOM/RPC or TCP/IP in Windows
2000 that may cause the problem with why there are 1000 packages in the
response to Site1 and only 50 Packages in the respose to the other Sites or
can anything else be the cause?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top