High performance arrays in C#

JP · Apr 6, 2009

Hi Guys,

I am venturing into the realm of scientific computing using C#. And on
that subject I have a question for you!

I have a two dimensional matrix that I need to address very
frequently. The size of the matrix the smallest being 2,000 * 2,000
and the largest being 5,000 * 500,000. Thus the number of cells range
from 4,000,000 to 1,000,000,000 cells.

The content of the cells is limited to the values 0, 1 and 2.

I know that using a rectangular array is pretty fast, but using a
single-dimensional array is still faster. So I want to use this
approach to describe the matrix:

byte[] myArray = new byte[ROW_DIM * COLUMN_DIM];

and reading the value like this:

byte val = myArray[row * COLUMN_DIM + column];

I am not sure about what type of collection I should use though.
Suggestions and explanations are very welcome!

I am also a little in doubt as to how to conserve memory. What
datatype should I use to represent the values in the cell. They can
only ever be 0, 1 or 2. So should I represent them using a byte or
perhaps a nullable boolean?

The last thing I am considering is to reverse the problem somehow.
Could I represent the matrix from the view of the values instead of
the cells. By that I mean maybe three arrays or collections (one for
each possible value) that somehow hold an index of the cells
containing this value. Would this conserve mamory or make it faster to
address the matrix? I can't really see a smart way of doing this, but
if anyone knows of a smart way, then I am all ears!

Thanks,
JP

Rudy Velthuis · Apr 6, 2009

Peter said:
[...]
I know that using a rectangular array is pretty fast, but using a
single-dimensional array is still faster. So I want to use this
approach to describe the matrix:

byte[] myArray = new byte[ROW_DIM * COLUMN_DIM];

and reading the value like this:

byte val = myArray[row * COLUMN_DIM + column];

Click to expand...

What makes you think that using a single-dimension array will be
faster?

Perhaps something like:

http://www.codeproject.com/KB/dotnet/arrays.aspx

Ok, that may not be entirely up-to-date anymore.

--
Rudy Velthuis http://rvelthuis.de

"Fashion is something barbarous, for it produces innovation
without reason and imitation without benefit."
-- George Santayana

JP · Apr 7, 2009

The thing that's really going to slow you down is the sheer amount of
data. Because of that, you may get a significant speed-up by using
bit-level access, using an array of unsigned ints, but storing 16 data
items per array element (2 bits each). Even though there's more overhead
processing the data that way, the fact that more of your data can be in
the CPU cache at a time (16 times as much) will likely make a much bigger
improvement than the processing will cost you.

Very interesting idea! I am not sure how I would do something like
that. But let me see if I understand you correctly. I am pretty sure I
understand the principle behind this. I am just not sure I understand
the way to approach this best in C#.

If my first 16 cells contain the values 0, 1, 2, 0, 0, 1, 2, 0, 0,1,
2, 0, 0, 1, 2, 0

Then that would become the binary number:
00 01 10 00 00 01 10 00 00 01 10 00 00 01 10 00

Which is the integer
404232216

To access the my cells again then I would have to do what you call bit-
level access. If I want cell 3 then I would have to do a bit level AND
on the integer (404232216) with this binary number.

00 00 11 00 00 00 00 00 00 00 00 00 00 00 00 00 (the integer
201326592)

Wich would return the integer 134217728 or 00 00 10 00 00 00 00 00 00
00 00 00 00 00 00 00.

I can then shift that integer right (16-3)*2 places, and I am left
with the number 2, which is what I was looking for.

How would I go about doing that in a single operation in C#? Or would
you keep an array of the 16 integers you would need for the AND
operation, and then do it in the steps I have outlined here? Am I too
rusty in my binary operation, and have I missed how I should to this
smartest?

Thanks for the very interesting input Pete!!

/jp

Pavel Minaev · Apr 7, 2009

Perhaps something like:

http://www.codeproject.com/KB/dotnet/arrays.aspx

Ok, that may not be entirely up-to-date anymore.

It still is, insofar as element access for multidimensional arrays
still goes through method calls - there were no special IL
instructions added to index MD arrays in CLR 2.0.

Pavel Minaev · Apr 7, 2009

Just a few more random comments...

I have a two dimensional matrix that I need to address very
frequently. The size of the matrix the smallest being 2,000 * 2,000
and the largest being 5,000 * 500,000. Thus the number of cells range
from 4,000,000 to 1,000,000,000 cells.

The content of the cells is limited to the values 0, 1 and 2.

I know that using a rectangular array is pretty fast, but using a
single-dimensional array is still faster. So I want to use this
approach to describe the matrix:

byte[] myArray = new byte[ROW_DIM * COLUMN_DIM];

and reading the value like this:

byte val = myArray[row * COLUMN_DIM + column];

I am not sure about what type of collection I should use though.
Suggestions and explanations are very welcome!

Well, obviously, you can't get faster than arrays. Well, except for
one thing: you can use raw pointers to avoid array bound checks, if
and where that becomes an issue:

fixed (byte* p = myArray)
{
byte val = p[row * COLUMN_DIM + column];
...
}

This means you'll need to pin the array. This is generally frowned
upon, but for arrays as large as yours, they aren't going to be
movable anyway, so you don't strain the GC by doing that. Note that
pinning isn't free by itself, so using it for a single element access
is probably not worth it; it's only useful when you're going to do
several in a row.

On the other hand, keep in mind that JIT will often optimize away
bound checking for the common cases where it's trivially shown to be
unnecessary, such as in:

for (int i = 0; i < myArray.Length; ++i) // i is always within
bounds
{
byte val = p[row * COLUMN_DIM + column];
...
}

so always profile, and consider pointers as a last-resort
optimization.

I am also a little in doubt as to how to conserve memory. What
datatype should I use to represent the values in the cell. They can
only ever be 0, 1 or 2. So should I represent them using a byte or
perhaps a nullable boolean?

Nullable<T> is simply a struct defined along these lines:

struct Nullable<T>
{
T value;
bool hasValue;
...
}

And there's nothing magical about it that somehow makes it more
compact than any other struct thus defined. So, for bool, it'll be
sizeof(bool)*2, which is 2 bytes on .NET.

The correct answer is of course byte - you can't get any smaller than
that. Well, unless you do what Peter suggests you do, and pack values
into bits - in which case it might actually be that Int32 will be more
efficient as a storage unit of those packed 2-bit values.

JP · Apr 7, 2009

OK... so in my testing, I have tried to create both the byte array and
the "compressed" integer array. As expected I get a reduction of
memory usage by 4, which does mean a lot!

But I am still uneasy as how to best read values from such a
"compressed" integer array. Here is my first try, and I am not so sure
that is the best approach:

I have this array of integers to perform the bitwise AND operation
with:

static int[] TwoBitNumberPlaces = new int[]
{
Convert.ToInt32("00000000000000000000000000000011", 2),
Convert.ToInt32("11000000000000000000000000000000", 2),
Convert.ToInt32("00110000000000000000000000000000", 2),
Convert.ToInt32("00001100000000000000000000000000", 2),
Convert.ToInt32("00000011000000000000000000000000", 2),
Convert.ToInt32("00000000110000000000000000000000", 2),
Convert.ToInt32("00000000001100000000000000000000", 2),
Convert.ToInt32("00000000000011000000000000000000", 2),
Convert.ToInt32("00000000000000110000000000000000", 2),
Convert.ToInt32("00000000000000001100000000000000", 2),
Convert.ToInt32("00000000000000000011000000000000", 2),
Convert.ToInt32("00000000000000000000110000000000", 2),
Convert.ToInt32("00000000000000000000001100000000", 2),
Convert.ToInt32("00000000000000000000000011000000", 2),
Convert.ToInt32("00000000000000000000000000110000", 2),
Convert.ToInt32("00000000000000000000000000001100", 2)
};

The actual code to perform the reading of the value is here:

int compressedIndex = cellNo / 16;
int twoBitNumberPlace = cellNo % 16;

int compressedVal = compressedArray[compressedIndex - 1];
int tmp = compVal & ands[col];

int val = tmp >> (16 - twoBitNumberPlace) * 2;

I feel like I am missing the last part of the puzzle... I will read
the numbers in sequence, so I could read all 16 numbers at once
without having to look up the compressed value 16 times.

/jp

JP · Apr 7, 2009

And the correct line is of course:

int tmp = compressedVal & TwoBitNumberPlaces [col];

Pavel Minaev · Apr 7, 2009

OK... so in my testing, I have tried to create both the byte array and
the "compressed" integer array. As expected I get a reduction of
memory usage by 4, which does mean a lot!

But I am still uneasy as how to best read values from such a
"compressed" integer array. Here is my first try, and I am not so sure
that is the best approach:

I have this array of integers to perform the bitwise AND operation
with:

static int[] TwoBitNumberPlaces = new int[]
{
Convert.ToInt32("00000000000000000000000000000011", 2),
Convert.ToInt32("11000000000000000000000000000000", 2),
Convert.ToInt32("00110000000000000000000000000000", 2),
Convert.ToInt32("00001100000000000000000000000000", 2),
Convert.ToInt32("00000011000000000000000000000000", 2),
Convert.ToInt32("00000000110000000000000000000000", 2),
Convert.ToInt32("00000000001100000000000000000000", 2),
Convert.ToInt32("00000000000011000000000000000000", 2),
Convert.ToInt32("00000000000000110000000000000000", 2),
Convert.ToInt32("00000000000000001100000000000000", 2),
Convert.ToInt32("00000000000000000011000000000000", 2),
Convert.ToInt32("00000000000000000000110000000000", 2),
Convert.ToInt32("00000000000000000000001100000000", 2),
Convert.ToInt32("00000000000000000000000011000000", 2),
Convert.ToInt32("00000000000000000000000000110000", 2),
Convert.ToInt32("00000000000000000000000000001100", 2)
};

The actual code to perform the reading of the value is here:

int compressedIndex = cellNo / 16;
int twoBitNumberPlace = cellNo % 16;

int compressedVal = compressedArray[compressedIndex - 1];
int tmp = compVal & ands[col];

int val = tmp >> (16 - twoBitNumberPlace) * 2;

If you use that in a loop, it will most likely be a performance
killer. The reason is simple - it will do an extra array element
access (for your bitmask array), which amounts to a dereference. Which
can get expensive. And the compiler can't really optimize that well,
because there's no way to tell it that the array is really immutable,
and treat it as such.

I feel like I am missing the last part of the puzzle... I will read
the numbers in sequence, so I could read all 16 numbers at once
without having to look up the compressed value 16 times.

Yes, that's precisely it. You need to code your algorithms that
process more than one element at a time so that they loop over the
integers in the array, not over values. Within that loop, you have one
more where you essentially loop over the bits in the bitmask; i.e.:

for (int i = 0; i < compressedArray.Length; ++i)
{
int word = compressedArray;
for (int j = 0, mask = 3 /* 0b11 */; mask != 0; mask << 2, j +=2)
{
int val = (word & mask) >> j;
...
}
}

of course, when you work on a subset of the array, you need to
calculate the starting and ending word indices first, but that should
be trivial.

Also, it may be even better performance-wise if you manually unroll
that inner loop by duplicating code. This would mean that you'll have
to repeat the value extraction several times, but it may help the
optimizer.

JP · Apr 7, 2009

Thanks guys! Great input!

/jp

joachimvandenbogaert · Apr 7, 2009

Hi Guys,

I am venturing into the realm of scientific computing using C#. And on
that subject I have a question for you!

Here is a collection of general-purpose data structures (C5 Generic
Collection Library):

http://www.itu.dk/research/c5/

I have found it to be very useful,

Regards,
Joachim

Andrew Faust · Apr 7, 2009

JP said:
Hi Guys,

I am venturing into the realm of scientific computing using C#. And on
that subject I have a question for you!

I have a two dimensional matrix that I need to address very
frequently. The size of the matrix the smallest being 2,000 * 2,000
and the largest being 5,000 * 500,000. Thus the number of cells range
from 4,000,000 to 1,000,000,000 cells.

After reading this and the other responders, I have to recommend using a
non-managed language such as c++. It sounds like you need a high level of
control over your memory usage with minimal overhead.

This doesn't mean you need to do the full app in c++. Depending on your
needs you could write the app in c# and just call in to a c++ dll to crunch
the numbers.

Andrew Faust

Jeff Johnson · Apr 7, 2009

I have a two dimensional matrix that I need to address very
frequently. The size of the matrix the smallest being 2,000 * 2,000
and the largest being 5,000 * 500,000. Thus the number of cells range
from 4,000,000 to 1,000,000,000 cells.

Unless they changed the rules when I wasn't looking,
5 000 * 500 000 = 2 500 000 000, not 1 000 000 000....

Ben Voigt [C++ MVP] · Apr 7, 2009

Yes, that's precisely it. You need to code your algorithms that

process more than one element at a time so that they loop over the
integers in the array, not over values. Within that loop, you have one
more where you essentially loop over the bits in the bitmask; i.e.:

for (int i = 0; i < compressedArray.Length; ++i)
{
int word = compressedArray;
for (int j = 0, mask = 3 /* 0b11 */; mask != 0; mask << 2, j += 2)
{
int val = (word & mask) >> j;
...
}
}

Forget adjusting the mask, adjust word:

for (int i = 0; i < compressedArray.Length; ++i)
{
int word = compressedArray;

// either this loop, which skips trailing zeros
while (word != 0) {
int val = word & 3;
...
word >>= 2;
}

// or if you want to process trailing zeros
int subi = 8;
do {
int val = word & 3;
...
word >>= 2;
} while (--subi > 0);
}

Ben Voigt [C++ MVP] · Apr 7, 2009

The thing that's really going to slow you down is the sheer amount of
data. Because of that, you may get a significant speed-up by using
bit-level access, using an array of unsigned ints, but storing 16 data

This seems to have gotten lost in further discussion, but using _unsigned_
integer variables as Peter suggests is important.

JP · Apr 8, 2009

After reading this and the other responders, I have to recommend using a
non-managed language such as c++. It sounds like you need a high level of
control over your memory usage with minimal overhead.

This is something I hear in many places. But then I read articles with
benchmarks showing me that performance is pretty much the same, if you
know how to optimize your code, and not use the wrong collections for
example, and knowing what the cost of an algorithm is.

Added to that is the need to maintain the code. I am not a C or even C+
+ developer. I have written code in C++, and could probably write the
core engine in C++. But I would be much better off doing it in C#.

/JP

Andrew Faust · Apr 8, 2009

JP said:
This is something I hear in many places. But then I read articles with
benchmarks showing me that performance is pretty much the same, if you
know how to optimize your code, and not use the wrong collections for
example, and knowing what the cost of an algorithm is.

Added to that is the need to maintain the code. I am not a C or even C+
+ developer. I have written code in C++, and could probably write the
core engine in C++. But I would be much better off doing it in C#.

Performance is indeed excellent with C#. My recommendation is mostly due to
the complexity of making those optimizations and the potentially huge amount
of data you'll need to deal with. If you need a high level of control over
your memory and you know c++ it's actually much easier and simpler than
trying to work around c#'s managed memory and specifically the garbage
collector.

Added to that is the need to maintain the code.

C/C++ code isn't really any harder to maintain than c#. It all just comes
down to how well written and documented the code is.

But I would be much better off doing it in C#.

I agree that your first choice should be a language you know and like. It's
much better to write the program well in a language you know than poorly in
a language you don't. I just wanted to make sure the option was considered.

Ultimately, my recommendation is to write it in c#. However, keep a good
clean separation between the data crunching code and the rest of the app.
This way you have the option in the future to replace that piece later if
you find you need to drop to a lower level language.

Andrew Faust

Pavel Minaev · Apr 9, 2009

This is something I hear in many places. But then I read articles with
benchmarks showing me that performance is pretty much the same, if you
know how to optimize your code, and not use the wrong collections for
example, and knowing what the cost of an algorithm is.

This is true, but it applies to typical (i.e. high-level) code you
write. When you're there, high-level C++ is really not that different
performance-wise from high-level C#.

In your case, though, you seem to be down to micro-optimization level
already. At this point, picking the right tool is important, and
minute differences in performance can matter a big deal if you have a
long-running number-crunching or bit-twiddling loop. For example, I've
mentioned manual unrolling previously - that's something that I doubt
C# JIT compiler will do on its own, but I'm certain that VC++ will do
the unrolling on maximum optimization settings.

Of course, when it comes to C++, there are existing high-performance
libraries for processing of large arrays of data anyway. So that's
another point.

Added to that is the need to maintain the code. I am not a C or even C+
+ developer. I have written code in C++, and could probably write the
core engine in C++. But I would be much better off doing it in C#.

Actually, that may well be another point for C/C++ here. There's no
guarantee that micro-optimized C# code will be any more readable; in
fact, it may end up being less readable, because C# wasn't designed
for such things.

How array allocation is implemented in c#?	1	Feb 2, 2014
Performance issues with multi-dimensional arrays	4	Nov 12, 2008
Scatter plot / Data from matrices?	1	Jun 21, 2014
Readonly array elements	6	Aug 20, 2013
How to I assign some bytes in hex to a byte array	6	Apr 21, 2010
Help with multi-dimensional arrays	1	May 5, 2010
Question about array copy performance.	8	Nov 9, 2007
Type conversion C# and Matlab	1	Feb 22, 2008

High performance arrays in C#

JP

Rudy Velthuis

JP

Pavel Minaev

Pavel Minaev

JP

JP

Pavel Minaev

JP

joachimvandenbogaert

Andrew Faust

Jeff Johnson

Ben Voigt [C++ MVP]

Ben Voigt [C++ MVP]

JP

Andrew Faust

Pavel Minaev

Ask a Question

Similar Threads