Thread safety help on Hashtable

  • Thread starter mattd0878-usenet
  • Start date
M

mattd0878-usenet

I'm sort of unclear on what is thread safe and what isn't when using
Hashtable. The way I understand the documentation is that multiple
threads can simultaneously be reading an instance of Hashtable while
at the same time a single thread is writing to it. Is this correct?
The second part of my question is what exactly is considered a "write"
if I've added a reference object as the value? Is it just calls to
Hashtable methods like .Add and .Remove as well as setting the key to
a new value? Can I safely modify or reassign the reference object in
the Hashtable value? So for instance is this code thread safe?

class Server
{
public IPEndPoint serverEndPoint;
}

Hashtable ht = new Hashtable();
Server server = new Server();

server.serverEndPoint = new IPEndPoint(123456, 8000);

ht.Add(1, server);

server.serverEndPoint.Port = 8001;


What about?


class Server
{
public IPEndPoint serverEndPoint;
}

Hashtable ht = new Hashtable();
Server server = new Server();

server.serverEndPoint = new IPEndPoint(123456, 8000);

ht.Add(1, server);

server = new Server();
server.serverEndPoint = new IPEndPoint(123456, 8001);


Thanks for any help
 
P

Pavel Minaev

I'm sort of unclear on what is thread safe and what isn't when using
Hashtable. The way I understand the documentation is that multiple
threads can simultaneously be reading an instance of Hashtable while
at the same time a single thread is writing to it. Is this correct?
Yes.

The second part of my question is what exactly is considered a "write"
if I've added a reference object as the value? Is it just calls to
Hashtable methods like .Add and .Remove as well as setting the key to
a new value? Can I safely modify or reassign the reference object in
the Hashtable value? So for instance is this code thread safe?

What's not fine is anything that modifies the hashtable itself. Note
that the hashtable contains references to objects, not objects
themselves - so modifying fields of those objects does not modify the
hashtable (though it does mess things up if the modified fields are
used in GetHashCode() for the object - but this is separate, and
regardless of any threading issues).
class Server
{
    public IPEndPoint serverEndPoint;

}

Hashtable ht = new Hashtable();
Server server = new Server();

server.serverEndPoint = new IPEndPoint(123456, 8000);

ht.Add(1, server);

server.serverEndPoint.Port = 8001;

Maybe ok, maybe not - depends on the implementation of
IPEndPoint.GetHashCode(). Regardless, no threading issues here.
What about?

class Server
{
    public IPEndPoint serverEndPoint;

}

Hashtable ht = new Hashtable();
Server server = new Server();

server.serverEndPoint = new IPEndPoint(123456, 8000);

ht.Add(1, server);

server = new Server();
server.serverEndPoint = new IPEndPoint(123456, 8001);

This is definitely fine. You add the value of variable "server" -
which was a reference to some object - to the hashtable. The variable
itself is not linked to hashtable in any way, so when you later change
it (to reference a new object), this has no effect on the hashtable.

By the way, are you only using Hashtable, and not
Dictionary<TKey,TValue>, because you want to get locking for free?
 
M

MattD

Hi Pavel,
Yes, I have initially chosen a Hashtable because of the automatic
locking on the reads which as I understand a Dictionary<TKey,TValue>
doesn't have although I'm not opposed to doing all of the locking
manually if there are benefits to doing so. The first example I gave
is actually closer to what I want to do, but with custom objects.
Ultimately what I'm trying to do is put custom objects into a data
structure that is accessible by multiple threads and populate
properties in each object over time as certain events occur. So if I
were to create a class to use with a Hashtable and implement a version
of GetHashCode() for the class, does that sound like it would work?
Let me know if you have any other suggestions. Thanks again.
 
P

Peter Morris

Personally I wouldn't use Hashtable. Not only is it not type safe, but the
locking on it is too fine grained. For example the locking might be
suitable for adding/removing/changing etc, but what about something like
this


if (!table.ContainsKey(32))
//01: It didn't exist, but another thread may have added it by now
table[32] = new Data();
//02: We've just replaced it, who's to say 2 or more threads haven't
also added it?
return table[32];
//03: Now we are assuming it still exists, but another thread may have
removed it


What you need is something like

public class Meh
{
readonly object SyncRoot = new object;
readonly Dictionary<int, Data> Table = new Dictionary<int, Data>();

public Data DoSomething(int id)
{
Data d;
lock (SyncRoot)
{
if (Table.TryGetValue(id, out d))
return d;
d = new Data();
Table[id] = d;
}
return d;
}
}

Here YOU have defined the granularity of the locking. Well, I have, but you
know what I mean :)
 
P

Peter Morris

though it does mess things up if the modified fields are
used in GetHashCode() for the object

I'd like it if one could optionally implement an IChangeableHashCode
interface on an object which just had a single event HashCodeChanged, which
provided the old hash code + the new one. Hashed containers could check for
this interface and set the event.

The reason is that when you use something like NHibernate your identity
really should be the DB identity, but new objects don't yet have a DB
identity. At the point you save it would be useful if the hash code could
change.

Just thinking in writing :)
 
M

MattD

Thanks for the response. I apologize for being too vague. I was trying
to use a simple example that I could then extrapolate the info to use
in my project. I'll try to be more specific here. I'm creating a web
service using WCF hosted in a windows service. All of the clients
consuming the service will make a call and pass in some data to a
method called ProcessMessage. It is the job of ProcessMessage to
create an instance of the RequestAndResponseMessages class, add the
instance into a data structure when the method begins, and then kick
off further processing by two other methods called
ProcessRequestMessage and ProcessResponseMessage which are each run on
their own thread (one is sending data over a socket and the other is
receiving data on the socket. A requirement is that everything must be
sent over a single socket so that's why multiple sockets aren't being
used.). All ProcessRequestMessage does is read instances of the
RequestAndResponseMessages and send data over the network.
ProcessResponseMessage reads the data from the network and populates
properties in RequestAndResponseMessages that were previously null as
well as a boolean property to that indicates that the response message
is ready to be sent back to the client. As you mentioned the question
also relates to if I need to make RequestAndResponseMessages thread
safe as well even though there isn't a chance that a property could be
overridden which is a separate of the issue from the thread safety of
a Hashtable. Below is the basics of the program. Let me know if I need
to provide more info.

class NetworkManager
{
// multiple client threads calling this method simultaneously
public Message ProcessMessage(Message message)
{
messageGuid = Guid.NewGuid();
messageID = this.GetMessageID();
message.AuditNumber = messageID;
transmissionDateAndTime =
message.TransmissionDateAndTime;

requestAndResponseMessages = new RequestAndResponseMessages();
requestAndResponseMessages.MessageGUID = messageGuid;
requestAndResponseMessages.RequestMessage = message;
requestAndResponseMessages.EncodedRequestMessage =
message.EncodeRequest();


lock (this.requestAndResponseMessagesHashTableLock)
{
this.requestAndResponseMessagesHashTable.Add(messageID,
requestAndResponseMessages);
}

this.sendQueue.Enqueue
(requestAndResponseMessages.RequestMessage.AuditNumber);

while (true)
{
if (requestAndResponseMessages.Completed == true)
{
break;
}
else
{
Thread.Sleep(10);
}
}

lock (this.requestAndResponseMessagesHashTableLock)
{
this.requestAndResponseMessagesHashTable.Remove
(messageID);
}

}

// running on own thread
private void ProcessRequestMessage()
{
object hashtableKey;
RequestAndResponseMessages requestAndResponseMessages = null;

while (true)
{
hashtableKey = this.sendQueue.Dequeue(); // using the
synchronized version which is taken care of in the constructor not
shown here

requestAndResponseMessages = (RequestAndResponseMessages)
this.requestAndResponseMessagesHashTable[hashtableKey];

this.SendToSocket
(requestAndResponseMessages.EncodedRequestMessage);

// used for logging there is no modification in this
method
ThreadPool.QueueUserWorkItem(new WaitCallback
(CreateNetworkLogEntryUsingThreadPool), new object[]
{requestAndResponseMessages, MessageCategory.Request});

}
}

// running on own thread
private void ProcessResponseMessage()
{
byte[] receivedData = null;
Message responseMessage;
RequestAndResponseMessages requestAndResponseMessages = null;

while (true)
{
receivedData = this.ReceiveFromSocket();

responseMessage = new Message();
responseMessage.DecodeResponse
(receivedData);

// response includes the id to match it with the request
and thus the key in the hashtable
requestAndResponseMessages =(RequestAndResponseMessages)
this.requestAndResponseMessagesHashTable
[responseMessage.AuditNumber];

requestAndResponseMessages.ResponseMessage =
responseMessage;
requestAndResponseMessages.EncodedResponseMessage =
receivedData;
requestAndResponseMessages.Completed = true;

// used for logging there is no modification in this method
ThreadPool.QueueUserWorkItem(new WaitCallback
(CreateNetworkLogEntryUsingThreadPool), new object[]
{ requestAndResponseMessages, MessageCategory.Response });
}
}

private object requestAndResponseMessagesHashTableLock;
private Hashtable requestAndResponseMessagesHashTable;
private Queue sendQueue; // using the synchronized version which is
taken care of in the constructor not shown here
}



class RequestAndResponseMessages
{
public Message RequestMessage;
public Message ResponseMessage;
public Guid MessageGuid;
public byte[] EncodedRequestMessage;
public byte[] EncodedResponseMessage;
public bool Completed;
}
 
M

MattD

The "client" being the remote computer, I assume?  When they call  
ProcessMessage, does each client get its own thread for the call?  The  
comment in your code suggests it does, but I want to verify, since I'm not  
that familiar with the web service architecture.

This is my first venture into WCF, but yes this is how I understand it
to work. Yes, clients are remote computers. I'm hosting a web service
that is accessible over HTTP in a process that is run as a windows
service. (setting that up is all part of the WCF framework) I'm
assuming the WCF has some sort of thread pool that services all of the
incoming requests from clients. Ultimately all of these threads call
into ProcessMessage.

I don't understand your parenthetical comment.  Are you using UDP?  Or is  
that one socket connected to some remote endpoint that has nothing to do  
with each client?  Or do you actually have "a single socket" for each  
client, and dedicated threads executing ProcessRequestMessage() and  
ProcessResponseMessage() for each client?


I'm using TCP. The one socket that is used connects to a back end
server that only my application has direct access to. Unfortunately it
is a requirement that only one socket can be open so all client
requests are serialized thorough this single socket. It would be a lot
easier to be able to just create a new socket for each client but that
is not permitted. ProcessRequestMessage and ProcessResponseMessage are
each running just a single instance on their own thread and are
responsible for sending and receiving every client message. So, no,
there is not an instance of each running for each client.

And where does WCF come into all this?  I would have thought if you're  
using WCF, you'd use it for everything.

My application acts as the middle layer between clients and the back
end server. WCF is used just to make it easy to expose the application
I'm writing to clients over the network via web services.

This boolean property is the source of at least two issues I see in your  
code, neither really related to your original question:

     -- The boolean needs to be "volatile".  Otherwise, one thread could  
set it and the other thread could wind up never actually seeing the value 
set

Yea, I've just recently read up on volatile so I will make sure I use
it if I choose to implement it this way.
     -- You are polling, which is almost always the wrong approach. The  
big hint: you're calling Thread.Sleep(), but you don't have any specific  
thing that determines how long you're sleeping for (i.e. you just  
arbitrarily choose 10 ms).  Any time you find yourself writing code that  
calls Thread.Sleep() with an arbitrary value (as opposed to "I know FOR  
SURE that in N milliseconds, something interesting will happen), it's a  
pretty good bet you're going about things the wrong way.

In this particular case, I think replacing the boolean value with an  
AutoResetEvent or ManualResetEvent, or just an object with which you use  
the Monitor class (see Monitor.Wait() and Monitor.Pulse()), would be much 
better.  Then the thread simply blocks indefinitely, until you  
specifically wake it up elsewhere.

Yes, I definitely understand that sleeping for an arbitrary amount of
time is not a good way to do this. I wasn't sure what mechanism to use
so I'll look into using AutoResetEvent.

See above.  At the very least, you need to make that boolean "volatile"..  
Even better, replace it with a different, superior synchronization  
mechanism.

You will also probably want to make other fields "volatile", since they  
are set in one thread and read in another.  The alternative would be to 
"lock()" around accesses, but it looks like in this case that's overkill, 
since you know the threads aren't actually accessing the data at the same 
time.  You just need to keep the data up-to-date, not protect it from  
multiple threads writing at the same time.

Ok that makes sense.

The biggest problem with the code you provided is that it's not a  
concise-but-complete code sample.  Of particular concern is that you use  
variables for which you have not shown the declaration (e.g.  
"messageGuid", "messageID", "transmissionDateAndTime", etc.) and so it's  
not possible to comment on whether they are used in a thread-safe way or  
not (even with the declarations, without more details about how the  
NetworkManager class and its ProcessMessage() method is used, one still  
couldn't comment successfully on that, but both are important pieces of  
information).

Other issues I see (most of which, again, don't really relate to your  
original question but are things I think are important enough to mention):

     -- You use a different variable as the key when adding things to the  
hash table than the one you use to enqueue the ID, and thus later map back  
to the hash table.  In this case, it happens the two variables have the 
same value, but the code is fragile.  It would be much better to make sure  
you just use the same variable, rather than relying on the two variables  
continuing to have the same value.

     -- You have two thread methods that have infinite loops without any  
way to break out of them.  This suggests you are either aborting the  
threads explicitly, or making them background threads and just letting the  
framework abort them when the process exits.  Either way, it's a very bad  
idea.  You should provide a way to signal to the threads to exit, so that  
you can ensure that they always do so cleanly and in a consistent state.

     -- You call Queue.Dequeue() without any check to make sure there's  
actually anything in the queue, nor with any sort of synchronization that 
would ensure that the thread remains blocked until there's something in  
the queue.  How your code can even work in this state, I'm not sure.  
Perhaps it's related to some of the code you left out.  Or maybe you just  
have a really serious bug.


Hah, yea, sorry I included just a fraction of the code and actually do
have everything you mentioned above in order to try and keep it
concise and focus on my original question, but I appreciate all of the
comments cause I'm going to make some improvements in other areas.
Now, all that said, as near as I can tell from what code you did post,  
your use of the Hashtable class is fine.  You've synchronized the code  
that writes to the Hashtable and there's no need to synchronize the code  
that reads from it, so accesses to the Hashtable should remain consistent..

Given the need to synchronize the writes, I remain unconvinced that  
there's really any point in using Hashtable.  Seems to me you might as  
well use Dictionary<TKey, TValue> and synchronize the reads and writes  
both.  But then, I would have explicitly synchronized access to an  
instance of Queue<T> instead of using the Queue wrapper you get from the  
Queue.Synchronized property (at least, I assume that's part of the code  
you left out).  So, obviously there's a bit of a difference of opinion  
with respect to style there.  :)

Pete

So after all of that I am actually leaning this way now. Explicitly
implementing all of the locking and the type safety of
Dictionary<TKey, TValue> does seem a lot more robust. I saw a previous
post of yours where you mentioned using Dictionary<TKey, TValue> with
ReaderWriterLockSlim so I'm looking into that.

Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top