Strange problem with mixed Windows network

N

Nick Fotis

Good morning,

I became recently a systems administrator in our company. Some days ago (10
days, to be more exact) we noticed a very annoying and strange problem. Let
me offer some background first:

Servers: mostly Windows NT4 SP5 or so, which operate for some years now,
with SQL servers, Exchange, IIS, etc. Some of these IBM boxes are also
clustered.

The company premises are in 3 floors, and there are various versions of
Windows PC co-existing (from Windows98 to XP Professional). There are three
switches (Cisco Catalyst 2900 in the lower two floors, Cisco Catalyst 3500
in the 3rd floor), while two NT4 boxes serve as PDC and BDC and another one
serves as WINS server.

Ten days ago, some PCs in the 3rd floor (which is served from the Catalyst
3500) started being extremely slow in every thing these did in the network
(browsing, etc.), becoming nearly useless. Doing 'ping -t' showed packet
loss of 10-15% (that's unheard in a switched 100 Mbps network with 10 or so
PCs in a floor). The problem became worse later, with 1-2 more PCs slowing
to a crawl in network operations.

None of the other PCs had any problems, though (besides a master browser
election that was being forced from the PCs that had been slowed down).
Rebooting every troublesome PCs (and a server reboot for making sure nothing
was wedged), the trouble persisted. Mind you, we speak about a network that
was operating for months without any hint of problems, so we were baffled by
this. Doing a full scan with Norton Antivirus didn't uncover any active
virus/worm, and some infected files were quarantined.

An idea that was suggested by another colleague was to slow the Ethernet
connections of the troublesome PCs (most but not all of these have Intel
PRO/100 cards) to 10 Mbps - imagine our surprise when suddently these
started working again!

Any ideas of the cause of the problem??

I'm not very inclined to believe the 'problem cable' is the source of all
this - suddently one and a half rooms (with some exclusions in each room)
starting having problems sounds a bit strange to me (it's all done with Cat5
cabling).
I guess I'll have to ask for a cable test in order to be sure?

Hardware incompatibility? Most of the affected PCs have Intel Ethernet
cards, which are most common (and there are 1-2 Realtek cards in the mix).

Software incompatibility?? As far as I know, nothing has changed in the
software front (but some user may have put something that wreaks havoc in
the network - but

Thanks in advances for any ideas you can give on this.

Scratching head,
N.F.
 
B

Bob Hatcher

Are the switches connected using fixed settings (100 Mbps Full Duplex) not
autosense. They should. Have you checked tree spanning? Have you captured a
"switch# sho tech" on the switch in question and analyzed the text dump of
the switch? Does the 3500 switch drive the 2 2900 or are they daisy chained.
Bob
 
J

Jetro

Do you have latest Intel driver installed? Driver Statistics tab should show
both Input and Output errors. Obviously, if NIC cannot keep Link speed and
Mode (duplex) it retransmits almost every segment/packet, thus you get a
network congestion.
 
N

Nick Fotis

Good morning,

Jetro said:
Do you have latest Intel driver installed? Driver Statistics tab should show
both Input and Output errors. Obviously, if NIC cannot keep Link speed and
Mode (duplex) it retransmits almost every segment/packet, thus you get a
network congestion.

I used both the integrated Intel NIC (from an Asus P4P800-VM motherboard)
and a PCI interface NIC.
The drivers are version 7.0.26.0 , if that helps. Tried re-installing the
driver, no joy.

I'm sure it's not the Cisco port, since I plugged my PC into the boss'
connection (which still works at 100/FDX), and we noticed the same
behaviour. Nearly half the PCs that are plugged in the Catalyst 3524 exhibit
this behaviour, and the thing that really baffles me is these were working
until 12 days ago.

I moved the PC in another floor and plugged it into one of the Catalyst
2924XL, workd splendidly at 100 Mbps at the first try.

Still wondering,
N.Fotis
 
N

Nick Fotis

Good Morning,

Bob Hatcher said:
Are the switches connected using fixed settings (100 Mbps Full Duplex) not
autosense. They should. Have you checked tree spanning? Have you captured a
"switch# sho tech" on the switch in question and analyzed the text dump of
the switch? Does the 3500 switch drive the 2 2900 or are they daisy chained.
Bob

We tried both on autosense and fixed to 100 Mbps, it still insists in
cooperating only at 10 Mbps.
Else, it seems we get lots of collisions, according to 'show interface' in
the Cisco 3524:

FastEthernet0/14 is up, line protocol is up
Hardware is Fast Ethernet, address is 0008.2149.0d4e (bia 0008.2149.0d4e)
MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec, reliability 255/255, txload
1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not set
Auto-duplex (Half), Auto Speed (10), 100BaseTX/FX
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output 00:00:01, output hang never
Last clearing of "show interface" counters never
Queueing strategy: fifo
Output queue 0/40, 0 drops; input queue 0/75, 0 drops
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 9000 bits/sec, 1 packets/sec
89509 packets input, 9453386 bytes
Received 665 broadcasts, 0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast
0 input packets with dribble condition detected
208802 packets output, 176043373 bytes, 0 underruns
1547 output errors, 36949 collisions, 1 interface resets
0 babbles, 1547 late collision, 67 deferred
0 lost carrier, 0 no carrier
0 output buffer failures, 0 output buffers swapped out

I have put in the drivers 'Advanced Options' 10 Mbps/Full duplex.

I have got 1000+ lines of output from 'show technical-support', and
obviously I cannot post all this in the newsgroup. Any hints of where to
start looking for something suspicious?? If you wish, I can send it to you
directly.

The Cisco 2924 are the root of the network, stacked and interconnected. The
3524 hangs in a 100TX port with a Cat5 cable. Regarding the 'spanning tree'
(I'm not expert in this) the 3524 shows:
# sh spanning-tree

Spanning tree 1 is executing the IEEE compatible Spanning Tree protocol
Bridge Identifier has priority 32768, address 0008.2149.0d40
Configured hello time 2, max age 20, forward delay 15
We are the root of the spanning tree
Topology change flag not set, detected flag not set, changes 93
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Timers: hello 1, topology change 0, notification 0
....
....
Interface Fa0/2 (port 14) in Spanning tree 1 is FORWARDING
Port path cost 100, Port priority 128
Designated root has priority 32768, address 0008.2149.0d40
Designated bridge has priority 32768, address 0008.2149.0d40
Designated port is 14, path cost 0
Timers: message age 0, forward delay 0, hold 0
BPDU: sent 37746, received 0

One 2924 shows in the spanning tree:


Spanning tree 1 is executing the IEEE compatible Spanning Tree protocol
Bridge Identifier has priority 32768, address 00d0.9767.e480
Configured hello time 2, max age 20, forward delay 15
Current root has priority 32768, address 0008.2149.0d40 [ this is the
Vlan1 of the 3500 ]
Root port is 16, cost of root path is 19
Topology change flag not set, detected flag not set, changes 90
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Timers: hello 0, topology change 0, notification 0

etc.etc.

Still scratching head.

Anyway, I wish to you all Merry Christmas!

Regards,
N.Fotis
 
N

Nick Fotis

Good morning,

we have found the source (but not the deepest cause, yet) of the problem.

One PC got an irrelevant IP address (e.g. our internal network was
192.168.0.xx, while this one got -via DHCP!- an address like 169.192.45.29).
After removing this stray IP address, the rest of the network came to its
senses and everything works now at 100 Mbps...

Strange things happen sometimes....
Now, we're trying to guess how it got this irrelevant address.

Regards, and I wish you a Happy New Year!
N.Fotis
 
N

Nick Fotis

(OK, following up myself is a sign of brain malfunction, so what?)

Nick Fotis said:
we have found the source (but not the deepest cause, yet) of the problem.

Yeah, we wish...
Strange things happen sometimes....

Now, why I'm not too surprised? :-(

Well, the problem cropped up again in the night, and I do not know why!
Returned the troubled PCs at 10 Mbps for now (again), because experimenting
with 100 Mbps/autosense full duplex/half dupled didn't seem to work (lots of
packets lost, everything is slow, etc.)

Too much fun... and I wonder if this is a hardware problem than a software
one...

N.Fotis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top