Classes and thread safety

K

klem s

1) Assuming instance methods ( M1 and M2 ) defined by same class both
have locks around the code manipulating data and also assuming the two
methods don’t operate on same data, then wouldn’t it be best if M1 and
M2 used different identifiers as their lock tokens, since that way
threads T1 and T2 calling M1 and M2 respectively won’t get blocked.

2) I assume private reference used as a token should be declared as
instance member when only methods within the same instance will share
same data, but if methods within different instances will share data,
then reference should be declared static? Thus wouldn’t a static token
reference make multiple instances of same class thread safe with
respect to each other?

3) We usually use internal members for lock tokens, so I would assume
that we shouldn’t try to synchronize different classes accessing same
data, but instead we should dedicate a single class D ( its methods
would implement locks ) for accessing particular data and any class
trying to operate on that piece of data should access data only
through D?


4) I realize that we should use private members as lock tokens, since
if they were public, then external code may also use them as lock
tokens, which may lead to deadlocks.

But is there a situation where it would make sense for two methods
( defined in different classes ) to use same reference as a lock
token? Perhaps when both methods operate on same data and thus the
two methods having same reference for a lock would help us synchronize
access to that data?!


5)
a) As far as I can tell, synchronization between different methods of
the same class C should be used in the following circumstances:
• when methods are manipulating same members of that class
• when methods are operating on the same external data ( such as
files etc )

b) As far as methods operating on same data members are concerned,
should we:
• should we always implements locks inside those methods
• or only if we suspect C will be used in a multithreaded application
• or should we let the client code using C to provide a proper
synchronization?

My guess is that we should let the client code using C to provide for
thread safety, but in what situations should we instead make C class
thread safe?

Thank you
 
A

Arne Vajhøj

1) Assuming instance methods ( M1 and M2 ) defined by same class both
have locks around the code manipulating data and also assuming the two
methods don’t operate on same data, then wouldn’t it be best if M1 and
M2 used different identifiers as their lock tokens, since that way
threads T1 and T2 calling M1 and M2 respectively won’t get blocked.
Yes.

2) I assume private reference used as a token should be declared as
instance member when only methods within the same instance will share
same data, but if methods within different instances will share data,
then reference should be declared static? Thus wouldn’t a static token
reference make multiple instances of same class thread safe with
respect to each other?

No. Not static.

By far the easiest/best is to use the actual object you modify for
locking, because that is by definition shared between the code.
3) We usually use internal members for lock tokens, so I would assume
that we shouldn’t try to synchronize different classes accessing same
data, but instead we should dedicate a single class D ( its methods
would implement locks ) for accessing particular data and any class
trying to operate on that piece of data should access data only
through D?

Sounds as a good idea. In maybe 90% of cases.
4) I realize that we should use private members as lock tokens, since
if they were public, then external code may also use them as lock
tokens, which may lead to deadlocks.

That is just a urban legend.
But is there a situation where it would make sense for two methods
( defined in different classes ) to use same reference as a lock
token? Perhaps when both methods operate on same data and thus the
two methods having same reference for a lock would help us synchronize
access to that data?!
Yes.

5)
a) As far as I can tell, synchronization between different methods of
the same class C should be used in the following circumstances:
• when methods are manipulating same members of that class
• when methods are operating on the same external data ( such as
files etc )
Yes.

b) As far as methods operating on same data members are concerned,
should we:
• should we always implements locks inside those methods
• or only if we suspect C will be used in a multithreaded application
• or should we let the client code using C to provide a proper
synchronization?

My guess is that we should let the client code using C to provide for
thread safety, but in what situations should we instead make C class
thread safe?

The current trend in software is to leave it to the called,
because the caller knows the context.

Makes sense to me.

Arne
 
K

klem s

Noting, however, that if you can arrange it so that thread T1 is the
only thread accessing data used by method M1 and likewise for thread T2
and method M2, then no locks at all are necessary. That's the most
efficient way to do things.
But that’s only an option if you’re also the one writing the client
code using this class.
…that thread T1 is the only thread accessing data used by method M1

You mean that during the entire lifetime of some app it should only be
T1 that has access to data used by method M ( regardless of the
number of threads this app will spawn during its lifetime )? But isn’t
that a bit impractical, since in most cases this data may be of
interest to several threads?
It is also useful to note that writing immutable types can help thread
safety a lot, because the data within an instance of the type simply
can't change.
Uhm, for a type to be immutable, the instances of that type can’t
have their state changed once they are created and this can only be
done by making all its data fields(both public and private) readonly
or constant. But this isn’t very practical if you want that type’s
methods to operate on those fields?!
Yes, static.

As the OP has noted, for data accessed only by code within a given
instance, a private object reference is sufficient to make that access
thread-safe.

On the other hand, for data that can be accessed by multiple instances
of a given class, _one_ way to ensure that access is thread safe is to
use a single object reference stored in a static field.

A static field addresses the issue whether the data is itself stored in
a static field, or the data is data found in an instance member.

Are you saying that static field should only be used if data shared by
instances of type T is also a member of T ( either static or instance
member )? But what if shared data is not a member of T?
However, note that using a static field to synchronize access to
instance members is "less granular" than is necessary or even
convenient.
What does the term “less granular” mean in this context?
I would generally only use a static field for the locking
reference when the data itself is static. For instance-member data, an
instance-member locking reference is better (see question #1).
Uhm, I’m a bit confused why you’re mentioning instance member data,
since my question was referring to situation where same data is shared
by instances of same type. Thus, were you:

* were you referring to situation where instance member data is
private and thus accessible only to its instance? In that case I don’t
see any reasons why we would need to use static reference as a token,
since private instance data is not shared between instances?!

* or were you referring to situation where instances of type T want to
operate on data member of a particular instance - also of type T ( in
which case that data member needs to be public, which goes against
good OOP design). In that case it only makes sense to make a locking
reference static ( thus in my opinion making it an instance member
doesn’t make sense ) ?!


Agreed. Note that this philosophy doesn't really have anything to do
with thread safety per se. It's just good, basic encapsulation practice.
Well, I thought that reducing the number of classes able to directly
access some piece of data also reduces the number of possibilities
that something may go wrong ( which implicitly makes code more thread
safe – thus it has to do with thread safety)?

Instead of having two methods in different classes share the same
token, would only two other options to synchronize access to data
shared by the two methods be:

• to not use locks, but instead make sure that during the lifetime of
app only thread T1 can access that data ( as noted in your reply to my
first question )

• or to use single class for accessing that data( as described in my
third question)

In fact, one really should not think of methods when it comes to
synchronization. It's _data_ that needs synchronization. One uses the
same locking reference to protect a given piece of data, _wherever_ it's
used. Otherwise, the data isn't being used in a thread-safe way.
But as you’ve noted making locking reference public introduces the
risk of deadlocks? So when do the benefits of having a public locking
reference out-weight the risks?
In fact, one really should not think of methods when it comes to
synchronization. It's _data_ that needs synchronization.

Aren’t there also situations where we lock a piece of code that isn’t
operating on any shared data? If so, then we can’t claim that only
data needs synchronization?!




thank you
 
K

klem s

klem said:
[...]
It is also useful to note that writing immutable types can help thread
safety a lot, because the data within an instance of the type simply
can't change.
Uhm, for a type to be immutable, the instances of that type can’t
have their state changed once they are created and this can only be
done by making all its data fields(both public and private) readonly
or constant. But this isn’t very practical if you want that type’s
methods to operate on those fields?!

Sure. That System.String class sure is useless, being immutable and all.
I agree that thread safety is an issue when methods operate on shared
data fields. But, assuming:

• several threads try to operate on instance I of type T
• and assuming T’s data fields are all readonly

Now isn’t it often the case that when thread is finished operating on
particular data field, it needs to assign the resulting value back to
that data field. Since data field is readonly, the only option is to
create a new instance of type T, set its data field values accordingly
and set reference variable I to point to this newly create object –
that’s a lot of work

On the other hand, with strings all of this is done automatically!

One instance of an object can access the data in another instance of the
same class of object. Just because the data is shared, that doesn't
preclude it from being stored in or referred to by an instance member.
I concur, but what’s confusing me are the two conflicting statements
you’ve made:

In the following excerpt you’ve basically stated if data is to be
shared by multiple instances of a class T, then synchronization token
should be static, regardless of whether shared data is static or
instance member:

“If the data is shared across multiple instances of a class, then the
synchronization itself is going to need to be shared across multiple
instances (i.e. static).”

But here you’re essentially implying that if data to be shared by
multiple instances of a class T is an instance-member, then
synchronization token should also be instance member:

“ I would generally only use a static field for the locking reference
when the data itself is static. For instance-member data, an instance-
member locking reference is better (see question #1). “

Anyways, here’s how I understand it … assuming instances of type T
( A1_T, A3_T … ) want to operate on data field F ( which is a member
of A2_T ) via instance method M, then if they accesses F via their own
method M ( thus A1_T via A1.M, A3_T via A3.M ), then all instances
must share the same lock token, but if they access F only via A2_T.M,
then each instance could use their own token. Thus, in the former case
token should be static, but in latter case token should be an
instance member?! Right?


The latter,

Again, just to be sure – you’re implying that if instances of type T
( A1_T, A3_T … ) want to operate on data field F ( which is a member
of A2_T ) via instance method M, then if they accesses F via their
own method M ( thus A1_T via A1.M, A3_T via A3.M), then all instances
must share the same lock token, but if they access F via A2_T.M, then
each instance could use their own token. Thus, in the former case
token should be static, but in latter case token should be an
instance member?

BTW - Assuming data shared by all instances of type T is stored in
some static class S, and assuming this class doesn’t provide for
synchronization, then in my opinion instances of type T should use
static field for locking?!


but you are incorrect about your claim about the
accessibility requirements. For a type T, any instance of type T can
access _private_ members of any other instance of type T.

Uhm, I didn’t know that, since the very definition of encapsulation is
to hide the internal implementation from the outside world, so I’ve
automatically assumed that other instances of the same type also
belong to “the outside world”. Thus, if the goal is to hide the
implementation details from the outside world, why not also hide it
from other instances of same type?

Two methods in _different_ classes should generally not be sharing data,
regardless of threading issues.
In other words, you’re saying that the two methods in different
classes should gain access to same data indirectly through dedicated
class D ( see my first post/third question )?

But I’d argue that too is called “sharing data”, regardless of how
indirectly the two methods access that data.

Why would you lock a piece of code that doesn't operate on any shared
data?

10 engines that need to be turned on programatically … 10 methods,
each designed to turn on a specific engine … it takes 1 minute for a
method to completely start up an engine … and it is imperative that
while one engine is in the process of starting up, that no other
engine tries to enter the start up stage ( thus no other engine-method
should be executed ). Thus, even though these 10 methods don’t share
any data, it’s still important that their code for starting up an
engine shares the same lock – HA :)





Additional question: Lock token used by method is specified at design
time. Thus, different threads calling instance A.M ( A is of type T )
will always use the same lock token. This works well if the data on
which M is operating on is shared by all threads.

class T
{
private Token token = new Token();
public void M()
{
lock(token)
{

}
}
}

class Token{}

But if instead M is operating on data passed to it via parameter,
then different threads calling on A.M may be operating on different
data, thus no need for synchronization. But due to the way locks are
implemented, threads will needlessly get synchronized, even though
they may not operate on same data.

If instead locks would be implemented similarly to methods, where
client calling A.M could simply pass in a lock token via parameter,
then flexibility would be much greater. Any ideas why such
implementation may not be a good idea ( besides the fact that such
implementation would in most cases require for lock tokens to be
public ) ?
 
K

klem s

Actually, that's not what you would (or even could) do. By definition,
after you've created an instance of an immutable type, you cannot set
its data field values. It has to be initialized during construction.
That’s what I meant – set data field values via constructor.

Immutable types are most useful when they are used for passing
information back and forth between threads. Obviously, there will still
need to be some shared state somewhere that mutates as the thread does
work; you can't eliminate mutable data types completely.
* So immuateble types shouldn't be used for the data we expect to be
modified by threads?

* but then again, ussing immutable types ( as described above ) even
for data modified by threads can reduce the number of locks in
code?!
Why introduce a third class? The point is that if one class needs data
from another, it should go through a public accessor for that data.
* I guess you're assuming that data shared by two methods ( one
defined in class C1 and other in class C2 ) is member of one of the
two classes ( say C2 )? In that case it does make sense for C1 to
access that data through C2’s public accessor.

* But my question was referring more to a situation where data is not
a member of either class, but instead is contained in a not thread-
safe class ( perhaps supplied by third party ). Wouldn't then best
option be to wrap this data inside thread-safe class safe_C and only
allow access to it through safe_C ( as you’ve already agreed in one of
your previous post )? Thus, all the synchronization necessary would
be provided by safe_C

* In any case, I imagine that it’s quite common scenario that several
classes share data in such way?

* Or were you perhaps thrown off by "data shared by the two methods",
since maybe the term “sharing data” should only be applied if class
containing this data doesn’t provide thread-safe access to it, but
instead the two methods accessing that data must itself provide for
thread safety?

Thus, if data is wrapped inside safe_C ( which provides for thread
safety ), do we still say that the two methods share that data?
 
K

klem s

klem said:
[...]
Immutable types are most useful when they are used for passing
information back and forth between threads. Obviously, there will still
need to be some shared state somewhere that mutates as the thread does
work; you can't eliminate mutable data types completely.
* So immuateble types shouldn't be used for the data we expect to be
modified by threads?

That's either a false assumption or trivially true, depending on what
you actually mean.

By definition, immutable types can't be modified. It's not that you
shouldn't use them for data you expect to modify, it's that you _can't_.
I realize that!
On the other hand, immutable types are fine for _representing_ data that
might change. You simply have to replace one instance with another when
it changes. The data in the immutable type itself doesn't change, but
obviously you would have a variable or something that is storing the
immutable type instance, and that _would_ be mutable and in need of
synchronization.
That’s what I was asking all along. To rephrase what I was saying in
one of previous posts:

" Assuming:
• several threads try to operate on instance I of type T
• and assuming T’s data fields are all readonly

When thread is finished operating on I’s data field, it needs to
assign the resulting values back to
those data field. Since data fields are readonly, the only option is
to create a new instance of type T, set its data field values
accordingly ( via constructor ) and set reference variable I to point
to this newly create object. "

But you kept saying nah, not possible, no, ne, nyet … shall I
continue :)?

* but then again, ussing immutable types ( as described above ) even
for data modified by threads can reduce the number of locks in
code?!

Yes. You only would need synchronization when an instance of an
immutable type flows from one thread to another. Any other time, the
instance of the immutable type is what's being used, and is perfectly
safe to use without synchronization, because you're guaranteed it can't
change.
[...]
* But my question was referring more to a situation where data is not
a member of either class, but instead is contained in a not thread-
safe class ( perhaps supplied by third party ). Wouldn't then best
option be to wrap this data inside thread-safe class safe_C and only
allow access to it through safe_C ( as you’ve already agreed in one of
your previous post )? Thus, all the synchronization necessary would
be provided by safe_C

Yes, that would be fine. The only thing that matters is that the data
is protected by some synchronization object and that all threads that
access it go through the same synchronization object.
* Not sure what you mean by “flows”?!

* Actually, this is a bit confusing … again assuming we have an
instance I of immutable type T … then shouldn’t we make sure that
while one thread is working with I, no other thread has access to it?

For example, assume two threads call method M, which:

a1 - reads I’s data D_Original ( I’ve marked this part of code as a1
for easier reference )
a2 – based on D_Original computes new data D_New
a3 - then creates new instance of type T ( with its fields set to
D_New ) and assigns its reference to I

BTW - a1 and a3 use a lock, while a2 doesn’t

Then:
• thread1 executes a1 and then releases the lock
• while thread1 is still executing a2 ( which doesn’t have a lock
since it doesn’t make any references to I ), thread2 already executes
a1,a2 and a3
• thread1 proceeds with a3 ( which creates new instance and assigns
its reference to I ). But by now thread2 already assigned new instance
to variable I, which means that thread1 computed D_New based on no
longer valid D_Original data, since new instance ( assigned to I in
thread2 ) holds different D_Original values

[...]
Thus, if data is wrapped inside safe_C ( which provides for thread
safety ), do we still say that the two methods share that data?

You can say whatever you want. It is better if you are using
unambiguous ways to say it though.
But what would you say? You did state that the two methods in
different classes shouldn’t be sharing data, so if you still think the
two methods are sharing data, even though this data is wrapped within
safe_C, then…
 
K

klem s

I simply mean that one thread causes another thread to access the same
data.  This "flow" can happen in any number of ways, which is why I
simply use the word "flow" rather than trying to enumerate every way the
data access might move from one thread to another.
* Uhm, how can one thread cause another thread to access the same
data? As far as I know, threads are pretty independent from one
another? Unless if you mean thread1 somehow signals thread2 to access
data thread1 was operating on?

* If that is what you’ve meant, then doesn’t this also fall under the
category of race conditions and thus synchronization is only needed if
we care about the order of execution?

In other words, I’m baffled why you think that “flow” would normally
require synchronization while my example ( method M with a1,a2 and a3
code segments ) would require it only if we care about “race
condition” ?

No.  Once the instance of I has been created, any and all threads can
access it without any trouble at all.
* Except if we care about race conditiona, as described in my example
( method M with a1,a2 and a3 code segments )?

* So, if we don’t care about race condition, then there’s also no need
for synchronization of M’s code segments a1 and a3 ( except perhaps
when “flowing” immutable type instance between threads )?

I don't understand your description is step "a3".  Per your initial
description of the scenario, I is an instance of an object, not a
reference to that instance.  
I meant that I is a reference to an instance
That's an example of the classic concurrency issue known as a "race
condition".  
Didn’t all examples discussed in this thread need synchronization
solely due to “race conditions”? Thus to my understanding, several
threads trying to access the same file also falls under the category
of “race condition”?!
 
K

klem s

I simply mean that one thread causes another thread to access the same
data. This "flow" can happen in any number of ways, which is why I
simply use the word "flow" rather than trying to enumerate every way the
data access might move from one thread to another.
* Uhm, how can one thread cause another thread to access the same
data? As far as I know, threads are pretty independent from one
another? Unless if you mean thread1 somehow signals thread2 to access
data thread1 was operating on?

* If that is what you’ve meant, then doesn’t this also fall under the
category of race conditions and thus synchronization is only needed if
we care about the order of execution?

In other words, I’m baffled why you think that “flow” would normally
require synchronization while my example ( method M with a1,a2 and a3
code segments ) would require it only if we care about “race
condition” ?

No. Once the instance of I has been created, any and all threads can
access it without any trouble at all.
* Except if we care about race conditiona, as described in my example
( method M with a1,a2 and a3 code segments )?

* So, if we don’t care about race condition, then there’s also no need
for synchronization of M’s code segments a1 and a3 ( except perhaps
when “flowing” immutable type instance between threads )?

I don't understand your description is step "a3". Per your initial
description of the scenario, I is an instance of an object, not a
reference to that instance.
I meant that I is a reference to an instance
That's an example of the classic concurrency issue known as a "race
condition".
Didn’t all examples discussed in this thread need synchronization
solely due to “race conditions”? Thus to my understanding, several
threads trying to access the same file also falls under the category
of “race condition”?!
 
K

klem s

I simply mean that one thread causes another thread to access the same
data. This "flow" can happen in any number of ways, which is why I
simply use the word "flow" rather than trying to enumerate every way the
data access might move from one thread to another.
* Uhm, how can one thread cause another thread to access the same
data? As far as I know, threads are pretty independent from one
another? Unless if you mean thread1 somehow signals thread2 to access
data thread1 was operating on?

* If that is what you’ve meant, then doesn’t this also fall under the
category of race conditions and thus synchronization is only needed if
we care about the order of execution?

In other words, I’m baffled why you think that “flow” would normally
require synchronization while my example ( method M with a1,a2 and a3
code segments ) would require it only if we care about “race
condition” ?

No. Once the instance of I has been created, any and all threads can
access it without any trouble at all.
* Except if we care about race condition, as described in my example
( method M with a1,a2 and a3 code segments )? Then threads accessing
it in non-synhronized manner does spell trouble?!

* So, if we don’t care about race condition, then there’s also no need
for synchronization of M’s code segments a1 and a3 ( except perhaps
when “flowing” immutable type instance between threads )?

I don't understand your description is step "a3". Per your initial
description of the scenario, I is an instance of an object, not a
reference to that instance.
I meant that I is a reference to an instance
That's an example of the classic concurrency issue known as a "race
condition".
Didn’t all examples discussed in this thread need synchronization
solely due to “race conditions”? Thus to my understanding, several
threads trying to access the same file also falls under the category
of “race condition”?!
 
K

klem s

Honestly, you really should just go get a good book on concurrency.  A
lot of your questions at this point have been somewhat circular, as you
wrestle with comprehending the various concepts involved.  I recommend
Joe Duffy's book, and his blog as well.  He's an excellent writer and is
an expert in the field:http://www.bluebytesoftware.com/blo...ware.com/books/winconc/winconc_book_resources....

You will probably learn the material faster and more thoroughly that way
than asking a series of questions and posing a variety of vague and
occasionally nonsensical implementation scenarios to this or any other
newsgroup or other forum.

Pete

Ouch. Anyways, I can take a hint ;)

many thanx for helping me out
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top