Observer in a distributed environment

Water Cooler v2 · Sep 9, 2009

Machine A needs to send a message to machine B. Machine A has a static
IP but machine B does not.

One option I could think of to solve this problem is that machine B
opens a TCP connection to machine A and then machine A sends the data/
message to machine B. However, this solution has the following
limitations:

a) It is not scalable if there are many such machines as machine B to
whom/which the data has be sent. It might be a kill on the resources
of machine A.

b) It is machine A that needs to send the data when it wants. Machine
B does not know *when* there will be data for it. In the current
design, machine B will have to keep polling machine A repeatedly with
a TCP connection asking if it has any data for it or not. This can get
expensive if there are many machine B's.

Is there a less expensive way to solve this problem? The Observer
design pattern comes to mind. Machine B could subscribe to a
notification from machine A to inform it when data becomes available.
However, how does one implement the pattern in a distributed
environment when machine B does not have a static IP?

Observer aside, is there a way other than using raw sockets for
machine A to send that data to machine B, that would be less expensive?

Peter Duniho · Sep 9, 2009

Machine A needs to send a message to machine B. Machine A has a static
IP but machine B does not.

One option I could think of to solve this problem is that machine B
opens a TCP connection to machine A and then machine A sends the data/
message to machine B. However, this solution has the following
limitations:

a) It is not scalable if there are many such machines as machine B to
whom/which the data has be sent. It might be a kill on the resources
of machine A.

How so? If for every machine B, machine A will need to communicate with
that machine B, then machine A already needs to have the capacity to track
that many machine Bs. If TCP connections aren't a scalable approach, then
probably you have a more fundamental scaling problem, unrelated to the
network i/o specifically.

b) It is machine A that needs to send the data when it wants. Machine
B does not know *when* there will be data for it. In the current
design, machine B will have to keep polling machine A repeatedly with
a TCP connection asking if it has any data for it or not. This can get
expensive if there are many machine B's.

So don't drop the TCP connection. Have machine B connect, and stay
connected; machine A can then use that existing connection any time it
needs to notify machine B. Polling is bad. So don't poll.

My guess is that the biggest issue you may run into would be dealing with
NAT routers, some of which may "time out" the NAT table entry for a given
connection. This should be a much smaller problem with TCP than UDP
however. If you have trouble with TCP connections being preserved after
being inactive very long times (several hours), that can be addressed by
implementing "keep alive" logic (either using the built-in TCP keep-alive,
or simply having your machine A send a short "keep-alive" message every so
often).

Is there a less expensive way to solve this problem? The Observer
design pattern comes to mind. Machine B could subscribe to a
notification from machine A to inform it when data becomes available.
However, how does one implement the pattern in a distributed
environment when machine B does not have a static IP?

By making the "subscription" be simply a TCP connection.

Observer aside, is there a way other than using raw sockets for
machine A to send that data to machine B, that would be less expensive?

Be careful with that phrase "raw sockets". It has a very specific meaning
in the network programming arena, and isn't applicable if you're simply
talking about the System.Net.Sockets namespace.

The first question that needs to be asked here is, when you write "many
machine B's", just how "many" are we talking here? Unless we're talking
about a half-million, a million, etc. then it's really unlikely that the
number of "machine B's" is really going to be an issue.

Pete

Tom Spink · Sep 9, 2009

Water said:
Machine A needs to send a message to machine B. Machine A has a static
IP but machine B does not.

One option I could think of to solve this problem is that machine B
opens a TCP connection to machine A and then machine A sends the data/
message to machine B. However, this solution has the following
limitations:

a) It is not scalable if there are many such machines as machine B to
whom/which the data has be sent. It might be a kill on the resources
of machine A.

b) It is machine A that needs to send the data when it wants. Machine
B does not know *when* there will be data for it. In the current
design, machine B will have to keep polling machine A repeatedly with
a TCP connection asking if it has any data for it or not. This can get
expensive if there are many machine B's.

Is there a less expensive way to solve this problem? The Observer
design pattern comes to mind. Machine B could subscribe to a
notification from machine A to inform it when data becomes available.
However, how does one implement the pattern in a distributed
environment when machine B does not have a static IP?

Observer aside, is there a way other than using raw sockets for
machine A to send that data to machine B, that would be less expensive?

Hi,

A possible solution is to use broadcast or multicast sockets. If your
machines are on the same network, then all you need to do is broadcast a
message to the network, and any interested machines will pick it up.

In this case, you don't need any TCP connections - Machine A just
broadcasts a message (using, say, UDP) to the network. If Machine B is
correctly configured (i.e. bound to the right port), it'll receive the
message, and can therefore act upon it.

Peter Duniho · Sep 10, 2009

A possible solution is to use broadcast or multicast sockets. If your
machines are on the same network, then all you need to do is broadcast a
message to the network, and any interested machines will pick it up.

In this case, you don't need any TCP connections - Machine A just
broadcasts a message (using, say, UDP) to the network. If Machine B is
correctly configured (i.e. bound to the right port), it'll receive the
message, and can therefore act upon it.

Not quite a valid assumption. Broadcasts are UDP, and thus unreliable.
You can't say without qualification that "any interested machines will
pick it up" or that "it'll receive the message" (you also can't say that
it'll receive the message only once, or that it will be received in the
same order relative to other messages that were sent, but that may or may
not be important to the OP).

At the very least, machine A would need to keep a list of all machine B's
expected to receive the message, and each machine B would have to
acknowledge the message. At that point, you might as well maintain a TCP
connection. It's not like an open-but-idle TCP connection actually costs
anything on the network itself, and the overhead at each endpoint will be
similar to reimplementing the basic functionality some other way.

Pete

Tom Spink · Sep 10, 2009

Peter said:
Not quite a valid assumption. Broadcasts are UDP, and thus unreliable.
You can't say without qualification that "any interested machines will
pick it up" or that "it'll receive the message" (you also can't say that
it'll receive the message only once, or that it will be received in the
same order relative to other messages that were sent, but that may or may
not be important to the OP).

At the very least, machine A would need to keep a list of all machine B's
expected to receive the message, and each machine B would have to
acknowledge the message. At that point, you might as well maintain a TCP
connection. It's not like an open-but-idle TCP connection actually costs
anything on the network itself, and the overhead at each endpoint will be
similar to reimplementing the basic functionality some other way.

Pete

Maintaining state is, of course, a different kettle of fish - and
wasn't within the scope of my response. In hindsight, I should have
said that the messages would be unreliable (I made the foolish
assumption UDP would be decoded correctly ;-)), and that there's no
guarantee for delivery or ordering or blah blah blah - but I didn't
care.

And, you're absolutely right about the additional overhead not being
worth the effort - if you have to maintain state for a UDP connection,
why not use TCP? An in this case, reliability over speed seems to
definitely be the requirement.

Observer in a distributed environment

Water Cooler v2

Peter Duniho

Tom Spink

Peter Duniho

Tom Spink