LAN connect failure after Windows XP re-installed

G

Guest

The test configuration consists of two nearly identical machines (IBM Thinkpad A30 and A31 with the Intel PRO/100 VE Network interface) running Windows XP Pro and connected by an Ethernet crossover cable. XP on both machines is up to current maintenance levels.

The LAN interface first came up so easily and worked so well (for workgroup file/printer sharing and remote desktop between the two machines) that almost all networking parameters remained at their defaults and I scarcely gave them any thought. IP addresses for the adapters were assigned by DHCP/APIPA (169.254.x.y).

Then, after some months of this, I had a hard disk failure on one machine and was forced to re-install XP on a replacement HDD. This I did using IBM's product recovery CDs, bringing XP up to level again with Microsoft Windows Update. Nothing changed beyond that except for disconnecting cables briefly to swap the HDD and then immediately re-connecting them. The result is a fully functional system except that the LAN TCP/IP interface will not function (no pings across the link).

As a fall-back, I established a PPP (serial crossover cable) link between the two machines. This works fine with either machine "dialing" to the other's "incoming." Dial-up internet access also works without problem on both machines. I assume the physical/MAC layer is working properly since disabling the LAN adapter (from "Network Connections") on either end shows the link going down on the other end. Re-enabling it shows the link coming back up on the other end. There are no diagnostics or any other obvious signs of LAN trouble.

Even with a PPP link established between the two machines, however, I still cannot ping their respective remote LAN addresses.

I have spent considerable time both in searching Microsoft and IBM support and also in trying various things suggested by these sources including manually supplying 192.168.0.x IP addresses for the adapters on both machines and also the netsh int ip reset suggested in several Microsoft network posts. Beyond that I tried things like adding explicit routes to the adapters on the opposing machines and also adding ARP entries for them and their IP addresses. All to no avail - the LAN link itself seems to working fine, the IPCONFIG, etc. all look as they should, I can ping 127.0.0.1, local IP address and local host by name but I cannot ping across the LAN link.

A symptom which looks interesting is that pings from A, the unchanged and presumably still good system, cause link traffic to be seen at B, the re-installed and presumably bad system, but pings from B cause nothing to be seen at A.

I suspect a problem in addressing of messages by B. One thing I do not understand is that ARP -a on either machine produces "No Arp Entries Found." How can IP traffic be routed on the LAN without MAC address resolution?

I will greatly appreciate any suggestions, comments or, even better, magical solutions to this problem.
 
D

David Dickinson

When you ping from "A" (the unchanged machine) to "B" (the re-installed
machine) do you get the echos on "A" (you didn't say)?

What is the status of the NICs in Network Connections?

Does anything show up in the Event Log?

Have you tried a new cable? What caused the hard drive to fail? Could it
have taken the NIC with it? (I've seen many situations where a NIC
/appears/ to be functioning correctly but is, in fact, damaged.)

See:

How to troubleshoot TCP/IP connectivity with Windows XP
http://support.microsoft.com/default.aspx?scid=314067

FYI: The 169.254.x.x address block is used by machines on networks (or not
on networks) where a DHCP server is not available. See

How to Use Automatic TCP/IP Addressing Without a DHCP Server
http://support.microsoft.com/default.aspx?scid=220874

Valid IP Addressing for a Private Network
http://support.microsoft.com/default.aspx?scid=142863

However, if you are using Internet Connections Sharing (ICS), the machine
(the "gateway") that provides ICS is a DHCP server and assigns its NIC the
address 192.168.0.1\255.255.255.0. The other machine will be assigned an
address from 192.168.0.2 - 192.168.0.254 whenever it connects to the
gateway.

See

Description of Internet Connection Sharing in Windows XP
http://support.microsoft.com/default.aspx?scid=310563

A serial connection between the machines will not allow a connection (ping)
to the NIC.

--
David Dickinson
eveningstar at mvps dot org
(Please reply only to the newsgroup)


Ezra said:
The test configuration consists of two nearly identical machines (IBM
Thinkpad A30 and A31 with the Intel PRO/100 VE Network interface) running
Windows XP Pro and connected by an Ethernet crossover cable. XP on both
machines is up to current maintenance levels.
The LAN interface first came up so easily and worked so well (for
workgroup file/printer sharing and remote desktop between the two machines)
that almost all networking parameters remained at their defaults and I
scarcely gave them any thought. IP addresses for the adapters were assigned
by DHCP/APIPA (169.254.x.y).
Then, after some months of this, I had a hard disk failure on one machine
and was forced to re-install XP on a replacement HDD. This I did using
IBM's product recovery CDs, bringing XP up to level again with Microsoft
Windows Update. Nothing changed beyond that except for disconnecting cables
briefly to swap the HDD and then immediately re-connecting them. The result
is a fully functional system except that the LAN TCP/IP interface will not
function (no pings across the link).
As a fall-back, I established a PPP (serial crossover cable) link between
the two machines. This works fine with either machine "dialing" to the
other's "incoming." Dial-up internet access also works without problem on
both machines. I assume the physical/MAC layer is working properly since
disabling the LAN adapter (from "Network Connections") on either end shows
the link going down on the other end. Re-enabling it shows the link coming
back up on the other end. There are no diagnostics or any other obvious
signs of LAN trouble.
Even with a PPP link established between the two machines, however, I
still cannot ping their respective remote LAN addresses.
I have spent considerable time both in searching Microsoft and IBM support
and also in trying various things suggested by these sources including
manually supplying 192.168.0.x IP addresses for the adapters on both
machines and also the netsh int ip reset suggested in several Microsoft
network posts. Beyond that I tried things like adding explicit routes to
the adapters on the opposing machines and also adding ARP entries for them
and their IP addresses. All to no avail - the LAN link itself seems to
working fine, the IPCONFIG, etc. all look as they should, I can ping
127.0.0.1, local IP address and local host by name but I cannot ping across
the LAN link.
A symptom which looks interesting is that pings from A, the unchanged and
presumably still good system, cause link traffic to be seen at B, the
re-installed and presumably bad system, but pings from B cause nothing to be
seen at A.
I suspect a problem in addressing of messages by B. One thing I do not
understand is that ARP -a on either machine produces "No Arp Entries Found."
How can IP traffic be routed on the LAN without MAC address resolution?
I will greatly appreciate any suggestions, comments or, even better,
magical solutions to this problem.
 
G

Guest

Thank you for the response, Dave...

Your questions and my answers:
When you ping from "A" (the unchanged machine) to "B" (the re-installed
machine) do you get the echos on "A" (you didn't say)?

The following is from Windows Task Manager monitoring of the Local Area Connection link:

Ping from A to B shows 4 groups of 32 bytes plus 40 (168 total) sent and 112 received at A. These are in 4 non-unicasts (sent and received). At B we see 168 sent and 184 received (???) in 4 non-unicasts sent and 4 unicasts received.

Ping from B to A shows 168 bytes sent and 112 received at B in 4 non-unicasts each way. At A we see no traffic whatsoever.

Non-unicasts sent and received periodically increment on each side but the above figures were isolated to the ping activity itself.
What is the status of the NICs in Network Connections?

The NICs on both sides show Connected at 100 Mbps with their MAC addresses, APA addresses with Subnet mask of 255.255.0.0 and no default gateway, DNS server or WINS server. On each side we see Packets Sent (almost twice as many on B) but none received (these packet counts do not relate in any obvious way to the WTM stats).

Again, "B" is for "Bad" (I think), the re-installed machine.
Does anything show up in the Event Log?

On the A side we see only "Unable to contact DHCP server" warnings beyond the usual startup info.

On the B side, in addition to usual startup info we see an error message which I have often seen in this process. It is a sign that the two machines are communicating on some level but I do not understand what it is trying to tell me:

Error 8003 from MRxSmb: The master browser has received a server announcement from the computer "A" that believes that it is the master browser for the domain on transport NetBT_Tcpip_{D626062D-509C-4DF6-ACC1. The master browser is stopping or an election is being forced.
Have you tried a new cable?

No, but overall appearance of things makes a cable problem seem unlikely. Unfortunately, getting another one will be somewhat of an exercise.
What caused the hard drive to fail? Could it
have taken the NIC with it? (I've seen many situations where a NIC
/appears/ to be functioning correctly but is, in fact, damaged.)

The HDD apparently died of internal mechanical failure/old age. I had been getting early warnings of that (bad sounds and occasional boot failures) for a few days before it finally went down for good. All else, including this LAN link, continued to work pefectly until then.

Thank you for the additional references. I have gone through the basic troubleshooting procedures more than once but will review all of these. I also intend to review the "normal" startup entries in the Event Log more carefully. They should be more or less identical on these two machines.

I will appreciate your comment on the traffic stats and also on that error entry.
 
G

Guest

Two post scripts to my earlier reply on this:

1. The "grouping" of bytes in the PING attempts I mentioned was based on my inattention to the data. While the overall byte count for the 4 attempts from each originating system was 168 it appears to have been in 4 bursts of 42 bytes each.

2. On the possibility of a bad cable, we are seeing at least partial one way communication between the NICs. If that were due to a bad cable it would seem that reversing the cable might reverse the asymmetry. I tried this and it didn't. There are, by the way, other signs that B is hearing A but not attempting (successfully) to respond.
 
D

David Dickinson

Sorry for the delay in responding. I've been swamped.

You said that on the "A" side the event log contains an "unable to contact
DHCP server". The way that you have described your network, it does not
have (or need) a DHCP server. Both machines should have static IP
addresses. Assign 192.168.0.1 to one machine and 192.168.0.2 (mask
255.255.255.0) to the other and lets see what happens.

I agree that a bad cable is unlikely, but with so many different species of
gremlins in the wild, it doesn't hurt to check. Your test would seem to
verify that it is not the cause.

Regarding traffic, it seems normal.
 
G

Guest

Thank you for following up on this, David...

On the DHCP/APIPA question, I had left both machines with that option because that’s the way it was when the machines were communicating successfully (and, as I said before, so effortlessly that I had scarcely thought about it and had left almost everything in the network/link setup at default).

I have, however, tried using static addresses 192.168.x.a/b several times. Behavior remains the same.

One question on your comment that the traffic seems normal – How is it that in pinging from A to B we should see 4 unicasts received at B while in pinging from B to A we see nothing received at A, neither unicasts nor non-unicasts?

In general, while neither side shows any other symptom of problem, it seems that traffic from the NIC on A is reaching B (as sensed and reported by the software monitoring of the Windows Task Manager) but no traffic from B (zero!) is reaching A. This seems to leave two possibilities. Either traffic is not being put on the link by B or it is but A is not clocking it in.

There is an option in the XP / configure / advanced list for the PRO/100 VE adapter, by the way, which is “Log Link State Event.†By default this is set to “No†but I turned it on on both sides in the process of trouble shooting this. With one exception, the resulting “E100B†entries appearing in the system log are only normal “link up†and “link down†entries arising from XP startup/shutdown and also link enable/disable on either side.

The exception is one period, several days ago and during which I do not remember what I was doing, when a burst of almost 2000 “Adapter Intel(R) PRO/100 VE Network Connection: Hardware Failure Detected†entries appeared in the log for machine A. This was with no other sign of problem and only during a single period of an hour or two over the more than two weeks this has been going on. Although I do not trust this as a true sign of hardware failure, the machine in question is under warranty. I will see if this is good enough to get a free replacement part out of IBM. If not, I may just go ahead and buy one.

I will follow up on the results of that but, in the meantime, would appreciate your comment on the unicasts received and not received I mentioned above.

Regards, Ez
 
G

Guest

David, you're not going to believe this (I scarcely can). This is one for the books, I think. Briefly -

The A machine was a few months old, still under warranty. The B machine had a hard disk failure and its XP software was rebuilt from scratch on a new disk. LAN networking between the two was working fine right up through early warnings of the impending disk failure. The A machine was not touched, neither hardware nor software, other than in temporary removal of the LAN cable on the B side to replace its disk. After the re-build, all seemed fine except that LAN networking would not work.

What failed? Well, what changed? The software (and disk) on the B machine. I ran that up one side and down the other, upgrading drivers, downgrading them, etc., checking all diagnostic info XP supplies. Then I double-checked everything on the B side, even though nothing had changed there. All signs pointed to operational hardware and correctly configured software.

Careful observation of traffic flow with Windows Task Manager, however, appeared to show Ethernet traffic flowing from A to B but not from B to A. Still thinking "B" was bad, I more or less abandoned this effort to do other things. Then, two weeks into this and more or less by chance, I noticed the System Event Log of "A" showing a large number of "Hardware Failure Detected" entries for the NIC over one and only one period of two hours or so.

Still thinking the real problem must be on the "B" side, I called IBM since "A" was under warranty. IBM immediately shipped a replacement NIC. I replaced it. The LAN came up immediately with full functionality!

Moral? If all seems superficially OK (normal link up/link down with no diagnostics) but Ethernet traffic flow is not as expected (as monitored on the Networking tab of Windows Task Manager), suspect a bad NIC!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top