Threading and DllImport

S

Seth Gecko

Hi

I am developing a complex VB.Net Windows application for an
engineering firm (don't ask me why they prefer VB.Net...). All the
engineering calculations are done in FORTRAN which is compiled to a
non COM type DLL (meaning you can't create an Interop for it). This is
called from within the .Net application using DllImport, just like
using a Windows API call. This works fine.
The problem arises when we create threads (using BeginInvoke) and each
thread is calling the same DllImported function. This results in
various unpredictable errors from within the FORTRAN code, most likely
because the FORTRAN function (and its memory) is shared between
threads. The FORTRAN function itself is thread safe (checked using
Intel Thread-Checker), but my guess is that when using DllImport the
function is loaded into the application process (as opposed to the
thread) and thereby becomes shared.
If I put a SyncLock around the DllImported function call everything
works again, but this means only one thread at a time is allowed to do
the actual calculation which was not intended when we introduced
multiple threads.

Is there any way to solve this problem? My initial suggestion was to
avoid using DllImport and instead try to create some sort of instance
of the FORTRAN function; one unique instance for each thread so that
the FORTRAN code does not share memory between threads. I haven't been
able to achieve this and I am not sure that it is even possible or
that it would actually solve the problem. Any suggestions would be
greatly appreciated...

Thanks
....Casper Lafrenz
 
P

Peter Duniho

Seth said:
Hi

I am developing a complex VB.Net Windows application for an
engineering firm (don't ask me why they prefer VB.Net...). All the
engineering calculations are done in FORTRAN which is compiled to a
non COM type DLL (meaning you can't create an Interop for it). This is
called from within the .Net application using DllImport, just like
using a Windows API call. This works fine.
The problem arises when we create threads (using BeginInvoke) and each
thread is calling the same DllImported function. This results in
various unpredictable errors from within the FORTRAN code, most likely
because the FORTRAN function (and its memory) is shared between
threads. The FORTRAN function itself is thread safe (checked using
Intel Thread-Checker), but my guess is that when using DllImport the
function is loaded into the application process (as opposed to the
thread) and thereby becomes shared.

I don't know anything about "Intel Thread-Checker". However, if the
function were truly thread-safe, I don't think you would be having a
problem.

Having the same _code_ shared between threads is not a problem. Code
doesn't change (or at least, it should not change...if it does, you have
a problem of an entirely different nature). So having multiple threads
executing the same code is fine.

What is a problem is having multiple threads accessing the same data at
the same time, especially if that data changes. Presumably the "Intel
Thread-Checker" is trying to make a determination along these lines, but
if you are having problems with the code because of multiple threads
executing the same code at the same time, then it seems likely that
somewhere there is data being shared amongst those threads even though
"Intel Thread-Checker" doesn't see it.

What the exact problem might be, I can't say. You didn't post any code,
whether the code in the DLL or the code using the DLL. If the code uses
any global variables or static variables (it's been awhile, so I forget
what exactly those are called in FORTRAN, but hopefully you get the
idea), then access to those variables needs to be synchronized.

Better would be to make sure all variables are local to the called
function, or at least local to the thread calling the function
(different languages have different mechanisms for declaring
thread-specific data...maybe your FORTRAN compiler allows this). Then
you never have to worry about contention between the threads for the
same data.

If you do post code with the intention of receiving more specific
advice, please make sure that it is a concise-but-complete example of
code that reliably reproduces the problem. Emphasis on
"concise"...don't just post all your code. Whittle it down to the bare
minimum required to reproduce the problem.

Pete
 
S

Seth Gecko

I don't know anything about "Intel Thread-Checker". However, if the
function were truly thread-safe, I don't think you would be having a
problem.

Having the same _code_ shared between threads is not a problem. Code
doesn't change (or at least, it should not change...if it does, you have
a problem of an entirely different nature). So having multiple threads
executing the same code is fine.

What is a problem is having multiple threads accessing the same data at
the same time, especially if that data changes. Presumably the "Intel
Thread-Checker" is trying to make a determination along these lines, but
if you are having problems with the code because of multiple threads
executing the same code at the same time, then it seems likely that
somewhere there is data being shared amongst those threads even though
"Intel Thread-Checker" doesn't see it.

What the exact problem might be, I can't say. You didn't post any code,
whether the code in the DLL or the code using the DLL. If the code uses
any global variables or static variables (it's been awhile, so I forget
what exactly those are called in FORTRAN, but hopefully you get the
idea), then access to those variables needs to be synchronized.

Better would be to make sure all variables are local to the called
function, or at least local to the thread calling the function
(different languages have different mechanisms for declaring
thread-specific data...maybe your FORTRAN compiler allows this). Then
you never have to worry about contention between the threads for the
same data.

If you do post code with the intention of receiving more specific
advice, please make sure that it is a concise-but-complete example of
code that reliably reproduces the problem. Emphasis on
"concise"...don't just post all your code. Whittle it down to the bare
minimum required to reproduce the problem.

Pete- Hide quoted text -

- Show quoted text -

You are absolutely correct. The FORTRAN code was not thread safe. We
were able to reproduce the problem in a simple example. The use of
FORTRAN common blocks is definitively not thread safe even though the
values written to the common blocks are the same in our test scenario.
The solution is not simple though, as there apparently are performance
loss if the engineers are to remove those blocks (and performance is
the whole reason for using FORTRAN in the first place). Right now we
are playing with the idea of loading the FORTRAN dll several times
(using LoadLibrary), ie. one for each thread. This should make sure
that no memory/data is shared between threads. Any comments on this
idea?

Thanks for your help.
....Casper
 
B

Bob Milton

Seth,
That won't work - LoadLibrary simply increments the load count if the
DLL has already been loaded. So all you get is the need to call FreeLibrary
one more time, you do not get a separate memory space.
Bob
 
P

Peter Duniho

Seth said:
You are absolutely correct. The FORTRAN code was not thread safe. We
were able to reproduce the problem in a simple example. The use of
FORTRAN common blocks is definitively not thread safe even though the
values written to the common blocks are the same in our test scenario.
The solution is not simple though, as there apparently are performance
loss if the engineers are to remove those blocks (and performance is
the whole reason for using FORTRAN in the first place).

Again, it's hard to comment without a specific code example. However...

I am assuming that the common blocks are data used in the calculations,
persistent from one call to the function to another call, but different
among threads that call the function. I say this, because this seems
like the most obvious way that the function would not be thread safe.

If so, then you need to get those blocks of data to be per-thread, so
that each thread isn't stomping on the other thread's data.

There are a variety of ways you might do this. The simplest, if
supported by the FORTRAN compiler, would be to to make the common block
a thread-local data block. Presumably this would just be some sort of
attribute you can apply to the common data block and it would just work.
Unfortunately, I have no specifics as to how the compiler you are
using would do this.

Alternatively, you can require the caller of the function to maintain
the data block, passing a reference to the data in when the caller calls
the function. The function would initialize as necessary and use the
data, but each thread would maintain its own structure containing the
data (perhaps as a local variable, if the caller is just calling this
FORTRAN code over and over in a loop, for example).

Now, all that said...I don't really understand the comment "and
performance is the whole reason for using FORTRAN in the first place".
There's no reason that FORTRAN should be any faster than writing the
same code in C++. For that matter, it's possible that even in C# it
would be as fast or nearly as fast (depending on what compiler
optimizations the C++ compiler is able to do versus what C# allows).

The usual reason I see people state for sticking with FORTRAN is simply
compatibility. That is, they've got a ton of FORTRAN code and don't
want to port it all. There is not usually a performance difference.

If it were me, I would write a managed C++ DLL that includes a class
responsible for these calculations. The class would itself maintain the
necessary data block as instance members, and each thread would create a
single new instance of the class for the purpose of doing the
calculations. The C++ compiler should do a fine job of matching the
FORTRAN performance, and there are other optimization techniques you can
use if you really need to squeeze that last bit of performance out of
the calculations.

If that didn't work, then you could go back and explore the above
options. But doing a managed C++ DLL would allow your calculation code
to be exposed in a nice, simple, .NET-compatible way, and really should
not incur a performance penalty above and beyond whatever performance
penalty you already have now marshaling between your .NET code and the
FORTRAN DLL (in fact, there's a possibility the overhead would be less
calling a managed C++ DLL...I'm not sure about that, but it does seem to
me that the managed-to-managed boundary should be cheaper to cross than
the managed-to-unmanaged boundary :) )).
Right now we
are playing with the idea of loading the FORTRAN dll several times
(using LoadLibrary), ie. one for each thread. This should make sure
that no memory/data is shared between threads. Any comments on this
idea?

As Bob points out, calling LoadLibrary doesn't create new instances of
the DLL for each thread.

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top