Encapsulation philosophy question

Peter Duniho · Jun 23, 2007

I have run into a design question that I don't have strong feelings about
one way or the other, so I'm curious what the OOP gurus think about it.

A recent post led me to suggest using a state graph, which got me to
wanting to fiddle with a state graph implementation in .NET. So far, so
good. I even have some very simple code that solves his problem (or
perhaps only part of it...he wasn't really specific as to what his
requirements were).

Part of the stated problem though allowed for the possibility of wanting
to maintain multiple concurrent states for the graph.

As I originally implemented it, only the state graph is visible to the
client code. It uses the usual linked node mechanism internally, but the
client never sees this. The client just gives it a new state transition
value (char type in this case), and gets back the data associated with the
new state. To maintain multiple concurrent states for the graph would
require choosing from one of at least four possibilities:

!) Multiple instances of the state graph itself. Within this
possibility, there are actually two variations:
-- Complete clone of the state graph
-- Shallow clone of the state graph
Because I don't really like either of those options, I didn't
consider this option at all for very long: a complete clone seems
wasteful, and inefficient if the client has a need to change the state
graph later, while the shallow clone creates a not-very-nice situation in
which changing one instance would change all the other instances (even
though that would likely be the behavior the client would normally desire,
I don't feel that forcing that on the client is a good design).

2) Moving the state transition methods into the state class itself,
and having the client use that to operate within the graph. The main
reason I don't like that is that to do that would require that each and
every state instance would have to maintain some information about the
parent graph, or alternatively would have to maintain a "default state"
value to return back to the root state for transitions not explicitly
specified.

I have to admit, now that I've written that, my objection to #2 seems a
little petty, but I am having trouble getting past the "a state graph is
already a memory-hungry solution, why make things worse by adding a copy
of the same reference to every single node of the state graph when I could
just access that value from outside the state node directly?" issue.

3) Exposing at least to some limited degree the actual node reference
from within the state graph.

4) Creating a new, public class that's part of the state graph that's
specifically used for traversing the graph, and in which is contained (but
hidden from the client) the node reference.

I'm finding myself ambivalent choosing between #3 and #4. I like #3
because it's reasonably efficient. I don't have to expose the actual node
type to the client; I can just have a public abstract base class, or even
just use "object" if I want. But there's something that just seems a
little "off" to me with respect to handing over a reference to the client
and saying "here, hold this...you'll need it every time you want to access
or change the state of the graph, and I'll give you a new one any time the
state changes".

I like #4 because it's closer to what I feel the client is doing. They
get a state graph, and then to move about in the state graph, they get a
traversal object that they can use to do that. Sort of like an
enumerator. But it seems a bit wasteful to me, because for each
concurrent maintained state of the graph, you get two references instead
of just one (basically, you have a reference to a reference). It's not as
bad as my objection to option #2, since there will always be many fewer
concurrently maintained states of the graph than there are nodes in the
graph. But it still seems wasteful.

Now, taking as granted that I am obviously overthinking this question,
here's what I am actually looking for an answer to:

From a _design_ perspective, would you prefer #3 or #4 as the mechanism?
Note that I'm not actually asking about the memory performance issues
here, since I readily acknowledge that for all my fixation on that, it's
likely not to be all that relevant. What I really want to know is from
the point of view of encapsulating and keeping the client on a strictly
"need-to-know" basis, is it worth adding a new class a la #4, or is it
reasonable to ask the client to hang on to my traversal state for me?

What say you, oh those who live and breath OOP?

Pete

Tom Spink · Jun 23, 2007

Peter said:
I have run into a design question that I don't have strong feelings about
one way or the other, so I'm curious what the OOP gurus think about it.

A recent post led me to suggest using a state graph, which got me to
wanting to fiddle with a state graph implementation in .NET. So far, so
good. I even have some very simple code that solves his problem (or
perhaps only part of it...he wasn't really specific as to what his
requirements were).

Part of the stated problem though allowed for the possibility of wanting
to maintain multiple concurrent states for the graph.

As I originally implemented it, only the state graph is visible to the
client code. It uses the usual linked node mechanism internally, but the
client never sees this. The client just gives it a new state transition
value (char type in this case), and gets back the data associated with the
new state. To maintain multiple concurrent states for the graph would
require choosing from one of at least four possibilities:

!) Multiple instances of the state graph itself. Within this
possibility, there are actually two variations:
-- Complete clone of the state graph
-- Shallow clone of the state graph
Because I don't really like either of those options, I didn't
consider this option at all for very long: a complete clone seems
wasteful, and inefficient if the client has a need to change the state
graph later, while the shallow clone creates a not-very-nice situation in
which changing one instance would change all the other instances (even
though that would likely be the behavior the client would normally desire,
I don't feel that forcing that on the client is a good design).

2) Moving the state transition methods into the state class itself,
and having the client use that to operate within the graph. The main
reason I don't like that is that to do that would require that each and
every state instance would have to maintain some information about the
parent graph, or alternatively would have to maintain a "default state"
value to return back to the root state for transitions not explicitly
specified.

I have to admit, now that I've written that, my objection to #2 seems a
little petty, but I am having trouble getting past the "a state graph is
already a memory-hungry solution, why make things worse by adding a copy
of the same reference to every single node of the state graph when I could
just access that value from outside the state node directly?" issue.

3) Exposing at least to some limited degree the actual node reference
from within the state graph.

4) Creating a new, public class that's part of the state graph that's
specifically used for traversing the graph, and in which is contained (but
hidden from the client) the node reference.

I'm finding myself ambivalent choosing between #3 and #4. I like #3
because it's reasonably efficient. I don't have to expose the actual node
type to the client; I can just have a public abstract base class, or even
just use "object" if I want. But there's something that just seems a
little "off" to me with respect to handing over a reference to the client
and saying "here, hold this...you'll need it every time you want to access
or change the state of the graph, and I'll give you a new one any time the
state changes".

I like #4 because it's closer to what I feel the client is doing. They
get a state graph, and then to move about in the state graph, they get a
traversal object that they can use to do that. Sort of like an
enumerator. But it seems a bit wasteful to me, because for each
concurrent maintained state of the graph, you get two references instead
of just one (basically, you have a reference to a reference). It's not as
bad as my objection to option #2, since there will always be many fewer
concurrently maintained states of the graph than there are nodes in the
graph. But it still seems wasteful.

Now, taking as granted that I am obviously overthinking this question,
here's what I am actually looking for an answer to:

From a _design_ perspective, would you prefer #3 or #4 as the mechanism?
Note that I'm not actually asking about the memory performance issues
here, since I readily acknowledge that for all my fixation on that, it's
likely not to be all that relevant. What I really want to know is from
the point of view of encapsulating and keeping the client on a strictly
"need-to-know" basis, is it worth adding a new class a la #4, or is it
reasonable to ask the client to hang on to my traversal state for me?

What say you, oh those who live and breath OOP?

Pete

Hi Pete,

Quite interesting. I've got to say I quite like number 3, because it's
basically a context or a token you're giving to the client, which is a
design pattern you see around (think threading/security contexts). Giving
the client a context to operate in also allows some extensibility, if you
need to expand your state graph implementation.

You've mentioned a couple of drawbacks to number 1, but if you want to get
REALLY REALLY complicated you could make it very efficient, by implementing
a copy-on-write mechanism. So, you maintain a shallow copy of the state
graph, and if/when it gets changed, only the affected parts are copied.
Quite a non-trivial thing to implement, but very efficient if you do manage
to do it - there are a number of file-systems that do this (for example the
upcoming Btrfs file-system from Oracle, ext3cow, ZFS and even NTFS to some
extent). But I digress.

Encapsulation philosophy question

Peter Duniho

Tom Spink