High performance arrays in C#

J

JP

Hi Guys,

I am venturing into the realm of scientific computing using C#. And on
that subject I have a question for you!

I have a two dimensional matrix that I need to address very
frequently. The size of the matrix the smallest being 2,000 * 2,000
and the largest being 5,000 * 500,000. Thus the number of cells range
from 4,000,000 to 1,000,000,000 cells.

The content of the cells is limited to the values 0, 1 and 2.

I know that using a rectangular array is pretty fast, but using a
single-dimensional array is still faster. So I want to use this
approach to describe the matrix:

byte[] myArray = new byte[ROW_DIM * COLUMN_DIM];

and reading the value like this:

byte val = myArray[row * COLUMN_DIM + column];

I am not sure about what type of collection I should use though.
Suggestions and explanations are very welcome!

I am also a little in doubt as to how to conserve memory. What
datatype should I use to represent the values in the cell. They can
only ever be 0, 1 or 2. So should I represent them using a byte or
perhaps a nullable boolean?

The last thing I am considering is to reverse the problem somehow.
Could I represent the matrix from the view of the values instead of
the cells. By that I mean maybe three arrays or collections (one for
each possible value) that somehow hold an index of the cells
containing this value. Would this conserve mamory or make it faster to
address the matrix? I can't really see a smart way of doing this, but
if anyone knows of a smart way, then I am all ears!

Thanks,
JP
 
R

Rudy Velthuis

Peter said:
[...]
I know that using a rectangular array is pretty fast, but using a
single-dimensional array is still faster. So I want to use this
approach to describe the matrix:

byte[] myArray = new byte[ROW_DIM * COLUMN_DIM];

and reading the value like this:

byte val = myArray[row * COLUMN_DIM + column];

What makes you think that using a single-dimension array will be
faster?

Perhaps something like:

http://www.codeproject.com/KB/dotnet/arrays.aspx

Ok, that may not be entirely up-to-date anymore.


--
Rudy Velthuis http://rvelthuis.de

"Fashion is something barbarous, for it produces innovation
without reason and imitation without benefit."
-- George Santayana
 
J

JP

The thing that's really going to slow you down is the sheer amount of  
data.  Because of that, you may get a significant speed-up by using  
bit-level access, using an array of unsigned ints, but storing 16 data  
items per array element (2 bits each).  Even though there's more overhead  
processing the data that way, the fact that more of your data can be in  
the CPU cache at a time (16 times as much) will likely make a much bigger 
improvement than the processing will cost you.

Very interesting idea! I am not sure how I would do something like
that. But let me see if I understand you correctly. I am pretty sure I
understand the principle behind this. I am just not sure I understand
the way to approach this best in C#.

If my first 16 cells contain the values 0, 1, 2, 0, 0, 1, 2, 0, 0,1,
2, 0, 0, 1, 2, 0

Then that would become the binary number:
00 01 10 00 00 01 10 00 00 01 10 00 00 01 10 00

Which is the integer
404232216

To access the my cells again then I would have to do what you call bit-
level access. If I want cell 3 then I would have to do a bit level AND
on the integer (404232216) with this binary number.

00 00 11 00 00 00 00 00 00 00 00 00 00 00 00 00 (the integer
201326592)

Wich would return the integer 134217728 or 00 00 10 00 00 00 00 00 00
00 00 00 00 00 00 00.

I can then shift that integer right (16-3)*2 places, and I am left
with the number 2, which is what I was looking for.

How would I go about doing that in a single operation in C#? Or would
you keep an array of the 16 integers you would need for the AND
operation, and then do it in the steps I have outlined here? Am I too
rusty in my binary operation, and have I missed how I should to this
smartest?

Thanks for the very interesting input Pete!!

/jp
 
P

Pavel Minaev

Just a few more random comments...

I have a two dimensional matrix that I need to address very
frequently. The size of the matrix the smallest being 2,000 * 2,000
and the largest being 5,000 * 500,000. Thus the number of cells range
from 4,000,000 to 1,000,000,000 cells.

The content of the cells is limited to the values 0, 1 and 2.

I know that using a rectangular array is pretty fast, but using a
single-dimensional array is still faster. So I want to use this
approach to describe the matrix:

byte[] myArray = new byte[ROW_DIM * COLUMN_DIM];

and reading the value like this:

byte val = myArray[row * COLUMN_DIM + column];

I am not sure about what type of collection I should use though.
Suggestions and explanations are very welcome!

Well, obviously, you can't get faster than arrays. Well, except for
one thing: you can use raw pointers to avoid array bound checks, if
and where that becomes an issue:

fixed (byte* p = myArray)
{
byte val = p[row * COLUMN_DIM + column];
...
}


This means you'll need to pin the array. This is generally frowned
upon, but for arrays as large as yours, they aren't going to be
movable anyway, so you don't strain the GC by doing that. Note that
pinning isn't free by itself, so using it for a single element access
is probably not worth it; it's only useful when you're going to do
several in a row.

On the other hand, keep in mind that JIT will often optimize away
bound checking for the common cases where it's trivially shown to be
unnecessary, such as in:

for (int i = 0; i < myArray.Length; ++i) // i is always within
bounds
{
byte val = p[row * COLUMN_DIM + column];
...
}

so always profile, and consider pointers as a last-resort
optimization.
I am also a little in doubt as to how to conserve memory. What
datatype should I use to represent the values in the cell. They can
only ever be 0, 1 or 2. So should I represent them using a byte or
perhaps a nullable boolean?

Nullable<T> is simply a struct defined along these lines:

struct Nullable<T>
{
T value;
bool hasValue;
...
}

And there's nothing magical about it that somehow makes it more
compact than any other struct thus defined. So, for bool, it'll be
sizeof(bool)*2, which is 2 bytes on .NET.

The correct answer is of course byte - you can't get any smaller than
that. Well, unless you do what Peter suggests you do, and pack values
into bits - in which case it might actually be that Int32 will be more
efficient as a storage unit of those packed 2-bit values.
 
J

JP

OK... so in my testing, I have tried to create both the byte array and
the "compressed" integer array. As expected I get a reduction of
memory usage by 4, which does mean a lot!

But I am still uneasy as how to best read values from such a
"compressed" integer array. Here is my first try, and I am not so sure
that is the best approach:

I have this array of integers to perform the bitwise AND operation
with:

static int[] TwoBitNumberPlaces = new int[]
{
Convert.ToInt32("00000000000000000000000000000011", 2),
Convert.ToInt32("11000000000000000000000000000000", 2),
Convert.ToInt32("00110000000000000000000000000000", 2),
Convert.ToInt32("00001100000000000000000000000000", 2),
Convert.ToInt32("00000011000000000000000000000000", 2),
Convert.ToInt32("00000000110000000000000000000000", 2),
Convert.ToInt32("00000000001100000000000000000000", 2),
Convert.ToInt32("00000000000011000000000000000000", 2),
Convert.ToInt32("00000000000000110000000000000000", 2),
Convert.ToInt32("00000000000000001100000000000000", 2),
Convert.ToInt32("00000000000000000011000000000000", 2),
Convert.ToInt32("00000000000000000000110000000000", 2),
Convert.ToInt32("00000000000000000000001100000000", 2),
Convert.ToInt32("00000000000000000000000011000000", 2),
Convert.ToInt32("00000000000000000000000000110000", 2),
Convert.ToInt32("00000000000000000000000000001100", 2)
};

The actual code to perform the reading of the value is here:

int compressedIndex = cellNo / 16;
int twoBitNumberPlace = cellNo % 16;

int compressedVal = compressedArray[compressedIndex - 1];
int tmp = compVal & ands[col];

int val = tmp >> (16 - twoBitNumberPlace) * 2;

I feel like I am missing the last part of the puzzle... I will read
the numbers in sequence, so I could read all 16 numbers at once
without having to look up the compressed value 16 times.

/jp
 
J

JP

And the correct line is of course:

int tmp = compressedVal & TwoBitNumberPlaces [col];
 
P

Pavel Minaev

OK... so in my testing, I have tried to create both the byte array and
the "compressed" integer array. As expected I get a reduction of
memory usage by 4, which does mean a lot!

But I am still uneasy as how to best read values from such a
"compressed" integer array. Here is my first try, and I am not so sure
that is the best approach:

I have this array of integers to perform the bitwise AND operation
with:

static int[] TwoBitNumberPlaces = new int[]
        {
            Convert.ToInt32("00000000000000000000000000000011", 2),
            Convert.ToInt32("11000000000000000000000000000000", 2),
            Convert.ToInt32("00110000000000000000000000000000", 2),
            Convert.ToInt32("00001100000000000000000000000000", 2),
            Convert.ToInt32("00000011000000000000000000000000", 2),
            Convert.ToInt32("00000000110000000000000000000000", 2),
            Convert.ToInt32("00000000001100000000000000000000", 2),
            Convert.ToInt32("00000000000011000000000000000000", 2),
            Convert.ToInt32("00000000000000110000000000000000", 2),
            Convert.ToInt32("00000000000000001100000000000000", 2),
            Convert.ToInt32("00000000000000000011000000000000", 2),
            Convert.ToInt32("00000000000000000000110000000000", 2),
            Convert.ToInt32("00000000000000000000001100000000", 2),
            Convert.ToInt32("00000000000000000000000011000000", 2),
            Convert.ToInt32("00000000000000000000000000110000", 2),
            Convert.ToInt32("00000000000000000000000000001100", 2)
        };

The actual code to perform the reading of the value is here:

int compressedIndex = cellNo / 16;
int twoBitNumberPlace = cellNo % 16;

int compressedVal = compressedArray[compressedIndex - 1];
int tmp = compVal & ands[col];

int val = tmp >> (16 - twoBitNumberPlace) * 2;

If you use that in a loop, it will most likely be a performance
killer. The reason is simple - it will do an extra array element
access (for your bitmask array), which amounts to a dereference. Which
can get expensive. And the compiler can't really optimize that well,
because there's no way to tell it that the array is really immutable,
and treat it as such.
I feel like I am missing the last part of the puzzle... I will read
the numbers in sequence, so I could read all 16 numbers at once
without having to look up the compressed value 16 times.

Yes, that's precisely it. You need to code your algorithms that
process more than one element at a time so that they loop over the
integers in the array, not over values. Within that loop, you have one
more where you essentially loop over the bits in the bitmask; i.e.:

for (int i = 0; i < compressedArray.Length; ++i)
{
int word = compressedArray;
for (int j = 0, mask = 3 /* 0b11 */; mask != 0; mask << 2, j +=2)
{
int val = (word & mask) >> j;
...
}
}

of course, when you work on a subset of the array, you need to
calculate the starting and ending word indices first, but that should
be trivial.

Also, it may be even better performance-wise if you manually unroll
that inner loop by duplicating code. This would mean that you'll have
to repeat the value extraction several times, but it may help the
optimizer.
 
J

joachimvandenbogaert

Hi Guys,

I am venturing into the realm of scientific computing using C#. And on
that subject I have a question for you!

Here is a collection of general-purpose data structures (C5 Generic
Collection Library):

http://www.itu.dk/research/c5/

I have found it to be very useful,

Regards,
Joachim
 
A

Andrew Faust

JP said:
Hi Guys,

I am venturing into the realm of scientific computing using C#. And on
that subject I have a question for you!

I have a two dimensional matrix that I need to address very
frequently. The size of the matrix the smallest being 2,000 * 2,000
and the largest being 5,000 * 500,000. Thus the number of cells range
from 4,000,000 to 1,000,000,000 cells.

After reading this and the other responders, I have to recommend using a
non-managed language such as c++. It sounds like you need a high level of
control over your memory usage with minimal overhead.

This doesn't mean you need to do the full app in c++. Depending on your
needs you could write the app in c# and just call in to a c++ dll to crunch
the numbers.

Andrew Faust
 
J

Jeff Johnson

I have a two dimensional matrix that I need to address very
frequently. The size of the matrix the smallest being 2,000 * 2,000
and the largest being 5,000 * 500,000. Thus the number of cells range
from 4,000,000 to 1,000,000,000 cells.

Unless they changed the rules when I wasn't looking,
5 000 * 500 000 = 2 500 000 000, not 1 000 000 000....
 
B

Ben Voigt [C++ MVP]

Yes, that's precisely it. You need to code your algorithms that
process more than one element at a time so that they loop over the
integers in the array, not over values. Within that loop, you have one
more where you essentially loop over the bits in the bitmask; i.e.:

for (int i = 0; i < compressedArray.Length; ++i)
{
int word = compressedArray;
for (int j = 0, mask = 3 /* 0b11 */; mask != 0; mask << 2, j += 2)
{
int val = (word & mask) >> j;
...
}
}


Forget adjusting the mask, adjust word:

for (int i = 0; i < compressedArray.Length; ++i)
{
int word = compressedArray;

// either this loop, which skips trailing zeros
while (word != 0) {
int val = word & 3;
...
word >>= 2;
}

// or if you want to process trailing zeros
int subi = 8;
do {
int val = word & 3;
...
word >>= 2;
} while (--subi > 0);
}
 
B

Ben Voigt [C++ MVP]

The thing that's really going to slow you down is the sheer amount of
data. Because of that, you may get a significant speed-up by using
bit-level access, using an array of unsigned ints, but storing 16 data

This seems to have gotten lost in further discussion, but using _unsigned_
integer variables as Peter suggests is important.
 
J

JP

After reading this and the other responders, I have to recommend using a
non-managed language such as c++. It sounds like you need a high level of
control over your memory usage with minimal overhead.

This is something I hear in many places. But then I read articles with
benchmarks showing me that performance is pretty much the same, if you
know how to optimize your code, and not use the wrong collections for
example, and knowing what the cost of an algorithm is.

Added to that is the need to maintain the code. I am not a C or even C+
+ developer. I have written code in C++, and could probably write the
core engine in C++. But I would be much better off doing it in C#.

/JP
 
A

Andrew Faust

JP said:
This is something I hear in many places. But then I read articles with
benchmarks showing me that performance is pretty much the same, if you
know how to optimize your code, and not use the wrong collections for
example, and knowing what the cost of an algorithm is.

Added to that is the need to maintain the code. I am not a C or even C+
+ developer. I have written code in C++, and could probably write the
core engine in C++. But I would be much better off doing it in C#.

Performance is indeed excellent with C#. My recommendation is mostly due to
the complexity of making those optimizations and the potentially huge amount
of data you'll need to deal with. If you need a high level of control over
your memory and you know c++ it's actually much easier and simpler than
trying to work around c#'s managed memory and specifically the garbage
collector.
Added to that is the need to maintain the code.

C/C++ code isn't really any harder to maintain than c#. It all just comes
down to how well written and documented the code is.
But I would be much better off doing it in C#.

I agree that your first choice should be a language you know and like. It's
much better to write the program well in a language you know than poorly in
a language you don't. I just wanted to make sure the option was considered.

Ultimately, my recommendation is to write it in c#. However, keep a good
clean separation between the data crunching code and the rest of the app.
This way you have the option in the future to replace that piece later if
you find you need to drop to a lower level language.

Andrew Faust
 
P

Pavel Minaev

This is something I hear in many places. But then I read articles with
benchmarks showing me that performance is pretty much the same, if you
know how to optimize your code, and not use the wrong collections for
example, and knowing what the cost of an algorithm is.

This is true, but it applies to typical (i.e. high-level) code you
write. When you're there, high-level C++ is really not that different
performance-wise from high-level C#.

In your case, though, you seem to be down to micro-optimization level
already. At this point, picking the right tool is important, and
minute differences in performance can matter a big deal if you have a
long-running number-crunching or bit-twiddling loop. For example, I've
mentioned manual unrolling previously - that's something that I doubt
C# JIT compiler will do on its own, but I'm certain that VC++ will do
the unrolling on maximum optimization settings.

Of course, when it comes to C++, there are existing high-performance
libraries for processing of large arrays of data anyway. So that's
another point.
Added to that is the need to maintain the code. I am not a C or even C+
+ developer. I have written code in C++, and could probably write the
core engine in C++. But I would be much better off doing it in C#.

Actually, that may well be another point for C/C++ here. There's no
guarantee that micro-optimized C# code will be any more readable; in
fact, it may end up being less readable, because C# wasn't designed
for such things.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top