HashSet<T> threading problems?

D

DC

Hi,

I was happy to find the new HashSet<T> class in Framework 3.5, I have
to often used dictionaries with empty values to work with sets.

Recently I ran into a strange performance problem in an application
that adds elements to a HashSet field in many threads. I took far too
long until I isolated that the access to the HashSet, which I forgot
to synchronize, was the reason.

Multiple threads executed HashSet.Add() on a HashSet<int> field and
this did not seem to matter - until some condition (maybe an internal
race) made the CPUs spin up to 100% and never come back. The problem
disappeared when I finally isolated to HashSet.Add() call and first
removed it for testing and then synchronized the HashSet.Add() calls.

While I understand that I made a mistake by not synching the
multithreaded access to a class member, I still think the Frameworks
should have thrown an exception (like it does when one tries to
enumerate and manipulate a collection without syncing). I was not able
to provoke the problem in a test app, though. So my question is: did
someone experience similar problems? Or is this framework behaviour to
be expected?

Regards
DC
 
G

Göran Andersson

DC said:
Hi,

I was happy to find the new HashSet<T> class in Framework 3.5, I have
to often used dictionaries with empty values to work with sets.

Recently I ran into a strange performance problem in an application
that adds elements to a HashSet field in many threads. I took far too
long until I isolated that the access to the HashSet, which I forgot
to synchronize, was the reason.

Multiple threads executed HashSet.Add() on a HashSet<int> field and
this did not seem to matter - until some condition (maybe an internal
race) made the CPUs spin up to 100% and never come back. The problem
disappeared when I finally isolated to HashSet.Add() call and first
removed it for testing and then synchronized the HashSet.Add() calls.

While I understand that I made a mistake by not synching the
multithreaded access to a class member, I still think the Frameworks
should have thrown an exception (like it does when one tries to
enumerate and manipulate a collection without syncing). I was not able
to provoke the problem in a test app, though. So my question is: did
someone experience similar problems? Or is this framework behaviour to
be expected?

Regards
DC

It's definitely to be expected. The code is not thread safe, so just
about anything can happen if you modify the collection from several threads.

As the code is no thread safe, there isn't really any point in trying to
make it partly sensetive to threading issues. It would reduce the
performance but could still not provide any real protection.
 
P

Peter Morris

As the code is no thread safe, there isn't really any point in trying to
make it partly sensetive to threading issues. It would reduce the
performance but could still not provide any real protection.

Exactly. The problem is that by making it thread safe you would impact on
the performance of apps that don't need it to be. In addition to this

Blah[x] = y;

Should that lock? What about this

if (!Blah.ContainsKey(x))
Blah.Add(x, y);

In this case the use of the class defines two method calls on a Dictionary<>
as being a single atomic operation which would need a single lock.
 
D

DC

As the code is no thread safe, there isn't really any point in trying to
make it partly sensetive to threading issues. It would reduce the
performance but could still not provide any real protection.

Exactly.  The problem is that by making it thread safe you would impacton
the performance of apps that don't need it to be.  In addition to this

    Blah[x] = y;

Should that lock?  What about this

    if (!Blah.ContainsKey(x))
        Blah.Add(x, y);

In this case the use of the class defines two method calls on a Dictionary<>
as being a single atomic operation which would need a single lock.

Thank you, Göran and Peter. I get the point. I must synchronize more
carefully.

Regards
DC
 
J

Josh Einstein

As others have mentioned, this impacts performance when it isn't needed.
It's exactly why they stopped the practice of using virtual methods on
collections as well as having a Synchronized() method that creates a
collection that is *supposedly* thread safe (when most usage of a collection
cannot be synchronized by the collection internals itself...)

Josh Einstein
 
D

DC

As others have mentioned, this impacts performance when it isn't needed.
It's exactly why they stopped the practice of using virtual methods on
collections as well as having a Synchronized() method that creates a
collection that is *supposedly* thread safe (when most usage of a collection
cannot be synchronized by the collection internals itself...)

Josh Einstein











- Zitierten Text anzeigen -

OK, let me defend another position just for the fun of it. What could
possibly happen to a HashSet.Add() method, if you use it without
syncing? Assume 100 threads are adding elements, some already exist in
the set, some not. What should the developer expect to happen (worst
case)? Should any of these newly added elements be left out or
duplicated?

Actually the only threading problem that I see with this method is the
return value: if many threads add the same element at almost the same
time, then the class may return false or true to any of these threads
(since the return value indicates if the element was already in the
set).

However, a developer should not have to expect that the method will
totally malperform if not synchronized.

So there is probably a bug in the HashSet<T>.Add() method.

I guess you will disprove my argument in no time.

Regards
DC
 
G

Göran Andersson

DC said:
OK, let me defend another position just for the fun of it. What could
possibly happen to a HashSet.Add() method, if you use it without
syncing? Assume 100 threads are adding elements, some already exist in
the set, some not. What should the developer expect to happen (worst
case)? Should any of these newly added elements be left out or
duplicated?

Actually the only threading problem that I see with this method is the
return value: if many threads add the same element at almost the same
time, then the class may return false or true to any of these threads
(since the return value indicates if the element was already in the
set).

However, a developer should not have to expect that the method will
totally malperform if not synchronized.

So there is probably a bug in the HashSet<T>.Add() method.

I guess you will disprove my argument in no time.

Regards
DC

Just about anything could happen if several threads change the hash set
at the same time. Elements might get added to the wrong bucket so that
they are never found again, duplicates might be added, elements might
disappear, the method might go into an eternal loop...

If the HashSet is implemented similar to a Dictionary, it contains two
arrays; one that contains element, and one that controls which element
belongs in which bucket. This is efficient, but not robust against
corrupted data. If the arrays get out of sync, it might make the hash
set completely unusable.
 
P

Peter Morris

It doesn't really matter what might happen, all that matters is that if it
isn't thread safe then the affects could be "unpredictable".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top