H
Henry Markov
This problem seems unique to Compact PCI (cPCI) and DHCP servers that support
port based IP addressing although some of the issues are probably generic to any
RBS system. The long post that follows is intended to serve two purposes; it
documents some issues that users of this kind of system should be aware of and
it asks a question about how XPe can be convinced to start the DHCP client
protocol with a discover message rather than a request for whatever IP address
the target happened to get at FBA.
Our system will consist of multiple cPCI racks each having 18 server blades and
two gigabit ethernet switch blades. The switch blades support DHCP and port
based addressing whereby the IP address can be correlated to the rack slot
number, i.e. one might choose 192.168.M.N to be assigned to the server in slot N
of chassis M. This allows easy identification of a physical location for errors
identified by IP address.
For an RBS based architecture we have found the following incompatibilities with
the FBA process:
1. If we execute FBA while DHCP is active, the target is assigned its port based
IP address which is apparently "remembered" in the registry. After deployment,
when many servers remote boot at approximately the same time, each starts the
DHCP protocol with a request message (rather than discover) and all ask for the
identical IP address -- the one that was assigned at FBA. This produces a race
that can result in some servers getting no IP address as follows. Server 1
sends a request for the remembered address which happens to be the address it
should get based on port based addressing. DHCP gives it this address. Server
2 looses the boot up race to 1 and it comes up just after 1. It also asks for
the remembered address but the DHCP server refuses and it offers a different
address. Server 2 asks one more time for the remembered address but the DHCP
server can not oblige. The DHCP server tells server 2 that it has no more
addresses because server 2 is refusing the only address that it is entitled to
under port based addressing. Server 2 stops asking for an address.
2. To defeat the above problem we executed FBA with the DHCP server inactive.
We find that the clients do not remember an address doled out by the DHCP server
but they do remember a 169.255.xxx.xxx address. I believe Windows machines use
this address range as some kind of default under certain conditions when they
can not get an IP address. We observe that all machines start the DHCP protocol
requesting this address. The DHCP server refuses a few times before the clients
give up with their request and send a discover. The needless exchange of
messages requesting a 169.255.xxx.xxx address results in boot times much longer
than they could be.
How can I convince my XPe image that it should start the DHCP protocol with a
discover message rather than a request for an address that it can never get?
Henry
port based IP addressing although some of the issues are probably generic to any
RBS system. The long post that follows is intended to serve two purposes; it
documents some issues that users of this kind of system should be aware of and
it asks a question about how XPe can be convinced to start the DHCP client
protocol with a discover message rather than a request for whatever IP address
the target happened to get at FBA.
Our system will consist of multiple cPCI racks each having 18 server blades and
two gigabit ethernet switch blades. The switch blades support DHCP and port
based addressing whereby the IP address can be correlated to the rack slot
number, i.e. one might choose 192.168.M.N to be assigned to the server in slot N
of chassis M. This allows easy identification of a physical location for errors
identified by IP address.
For an RBS based architecture we have found the following incompatibilities with
the FBA process:
1. If we execute FBA while DHCP is active, the target is assigned its port based
IP address which is apparently "remembered" in the registry. After deployment,
when many servers remote boot at approximately the same time, each starts the
DHCP protocol with a request message (rather than discover) and all ask for the
identical IP address -- the one that was assigned at FBA. This produces a race
that can result in some servers getting no IP address as follows. Server 1
sends a request for the remembered address which happens to be the address it
should get based on port based addressing. DHCP gives it this address. Server
2 looses the boot up race to 1 and it comes up just after 1. It also asks for
the remembered address but the DHCP server refuses and it offers a different
address. Server 2 asks one more time for the remembered address but the DHCP
server can not oblige. The DHCP server tells server 2 that it has no more
addresses because server 2 is refusing the only address that it is entitled to
under port based addressing. Server 2 stops asking for an address.
2. To defeat the above problem we executed FBA with the DHCP server inactive.
We find that the clients do not remember an address doled out by the DHCP server
but they do remember a 169.255.xxx.xxx address. I believe Windows machines use
this address range as some kind of default under certain conditions when they
can not get an IP address. We observe that all machines start the DHCP protocol
requesting this address. The DHCP server refuses a few times before the clients
give up with their request and send a discover. The needless exchange of
messages requesting a 169.255.xxx.xxx address results in boot times much longer
than they could be.
How can I convince my XPe image that it should start the DHCP protocol with a
discover message rather than a request for an address that it can never get?
Henry