flatten multi-dimensional array to on-dimensional array

per9000 · Nov 30, 2006

Hi all,

I have a two-dimensional array of data, f.x int's. We can imagine that
the array is "really large". Now I want the data in it and store this
in a one-dimensional array.

The obvious way to do this is a nested for-loop - but we all know
O(n^2) is bad. So I am looking for something like ArrayList.ToArray(),
or Matlabs A

).
C#
// D is an ArrayList
D.ToArray();

Matlab
% A is a matrix
A

)

I tried a cast that could have been perfect for at least one of the
cases.
((int[])A).CopyTo(D, 0);

But I learn that I "Cannot convert type 'int[*,*]' to 'int[]'"

Another problem here is that I sometimes want the data rowbased -
sometimes columnbased.

Right now I do something like the below code, that is horribly
2*O(n^2), I guess I could perhaps live with 1*O(n^2), but any shortcuts
are welcome.

Thanks,

Per

//
// sample code below
//
static void Main(string[] args)
{

int dim1 = 2;
int dim2 = 4;
int k = 0;

// create arrays
int[,] A = new int[dim1 , dim2];
int[] B = new int[dim1 * dim2];
int[] C = new int[dim1 * dim2];

// fill A with silly values
for (int i = 0; i < A.GetLength(0); i++)
for (int j = 0; j < A.GetLength(1); j++)
A[i, j] = k++;

//
// copy from A to B - rowbased
// O(N^2)
//
int idx = 0;
for (int i = 0; i < A.GetLength(0); i++)
for (int j = 0; j < A.GetLength(1); j++)
B[idx++] = A[i, j];

//
// copy from A to C - column-based
// O(N^2)
//
idx = 0;
for (int j = 0; j < A.GetLength(1); j++)
for (int i = 0; i < A.GetLength(0); i++)
C[idx++] = A[i, j];

Console.WriteLine("A:");
for (int i = 0; i < A.GetLength(0); i++)
{
for (int j = 0; j < A.GetLength(1); j++)
Console.Write(" {0}", A[i, j]);
Console.WriteLine();
}

Console.WriteLine("B:");
foreach (double b in B)
Console.Write(" {0}", b);
Console.WriteLine();

Console.WriteLine("C:");
foreach (double c in C)
Console.Write(" {0}", c);
Console.WriteLine();
}

Samuel R. Neff · Nov 30, 2006

Perhaps you can create a custom class that wraps a one dimensional
array of data and use methods to receive data as if it's two
dimensional. Then you never actually convert the entire stored data
from two-dimensional to one-dimensional you just map accesses to the
data.

However, why do you need to do this at all? If the data in inherintly
two-dimensional then leave it that way. Why convert it to
one-dimensional? The answer to that question could help come up with
more ideas on how to accomplish your end goal.

HTH,

Sam

Christof Nordiek · Nov 30, 2006

Hi Per,

it isn't O(n^2) but O(n) if n is the overall size of the array. No method
would perform better than that, since all the content has to be copied.
There could be some advantages in a build in ToOneDimension() but i don't
think so and it would be very small. But the complexity of the problem will
still be O(n).

Bruce Wood · Nov 30, 2006

I agree, although given the requirements I would suggest the opposite:
store the data in a two-dimensional array and then provide ways to
fetch it as though it were a row-based one-dimensional array or a
column-based one-dimensional array. Since you are really using a
two-dimensional array and not a jagged array, the math will be simple.

In .NET 2.0 you can even write two iterators: one that iterates over
the array in row order, the other that iterates over the array in
column order. Those plus simple access routines will allow you to treat
the array in all three ways:

public class MyMatrix<T>
{
private T[,] _matrix;

public MyMatrix(int size) { this._matrix = new T[size, size]; }

public T this[int row, int column] { return this._matrix[row,
column]; }

public T GetByRowOrder(int rowOrder)
{
int rowSize = this._matrix.GetLength(0);
if (rowSize == 0)
{
throw new IndexOutOfRangeException(...);
}
return this._matrix[rowOrder / rowSize, rowOrder % rowSize];
}

public T GetByColumnOrder(int columnOrder)
{
int columnSize = this._matrix.GetLength(1);
if (columnSize == 0)
{
throw new IndexOutOfRangeException(...);
}
return this._matrix[columnOrder % columnSize, columnOrder /
columnSize];
}

...etc...
}

per9000 · Dec 1, 2006

Hi,

yes I could make a class that wraps my matrix, and that is pretty much
the situation I am in. The need to flatten the array comes from a
babel-problem, I need the values in external resources written in
languages that are not in the .NET family.

I did a small example in C/C# that illustrates a horribly ugly
solution, but that works and that answers one of my questions.

I pretty much create an array (in two dims) in C# and send it to C. C#
looks at the array as an int[,]. But somewhere underneath this it is
just a pointer. In the C level I receive a void* (by 'pure' luck it is
a match).

The from the below code is:
C# >> Hello.
C# >> calling printArr.
0 1 2 3 4 5 6 7 8 9
C# >> Good night.

That means to me that C# orders its elements "row based" (assuming the
second index corresponds to rows).

// C# code
// ------------------
using System;
using System.Runtime.InteropServices;
using System.Collections.Generic;
using System.Text;

namespace PER9000
{
class ArrTest
{
[DllImport("ArrSub.dll")]
public static extern void printArr(int[,] A, int I, int J);

static void Main(string[] args)
{
Console.WriteLine("C# >> Hello.");
int I = 2;
int J = 5;

int[,] A = new int[I, J];
int c = 0;

for (int i = 0; i < I; i++)
for (int j = 0; j < J; j++)
A[i, j] = c++;

Console.WriteLine("C# >> calling printArr.");
printArr(A, I, J);
Console.WriteLine("C# >> Good night.");
}
}
}
// ------------------
// C code
void __declspec(dllexport) __stdcall printArr(void* A, int I, int J);
void __declspec(dllexport) __stdcall printArr(void* A, int I, int J)
{
int c = 0;
int * B = (int *) A;
for (c = 0; c < I * J; c++)
printf(" %d", B[c]);
printf("\n");
}
// ------------------

This somehow answers part of my question. It seems that I have to use a
nested for-loop (I agree the complexity is O(number of elements)) to
hand-pick the elements one by one (I am afraid this will be a
bottleneck when it comes to speed).

So my conclusion is that there are no 'better' way to flatten an array
in two dims ordered by column.

/Per

Bruce Wood · Dec 1, 2006

per9000 said:
Hi,

yes I could make a class that wraps my matrix, and that is pretty much
the situation I am in. The need to flatten the array comes from a
babel-problem, I need the values in external resources written in
languages that are not in the .NET family.

I did a small example in C/C# that illustrates a horribly ugly
solution, but that works and that answers one of my questions.

I pretty much create an array (in two dims) in C# and send it to C. C#
looks at the array as an int[,]. But somewhere underneath this it is
just a pointer. In the C level I receive a void* (by 'pure' luck it is
a match).

The from the below code is:
C# >> Hello.
C# >> calling printArr.
0 1 2 3 4 5 6 7 8 9
C# >> Good night.

That means to me that C# orders its elements "row based" (assuming the
second index corresponds to rows).

// C# code
// ------------------
using System;
using System.Runtime.InteropServices;
using System.Collections.Generic;
using System.Text;

namespace PER9000
{
class ArrTest
{
[DllImport("ArrSub.dll")]
public static extern void printArr(int[,] A, int I, int J);

static void Main(string[] args)
{
Console.WriteLine("C# >> Hello.");
int I = 2;
int J = 5;

int[,] A = new int[I, J];
int c = 0;

for (int i = 0; i < I; i++)
for (int j = 0; j < J; j++)
A[i, j] = c++;

Console.WriteLine("C# >> calling printArr.");
printArr(A, I, J);
Console.WriteLine("C# >> Good night.");
}
}
}
// ------------------
// C code
void __declspec(dllexport) __stdcall printArr(void* A, int I, int J);
void __declspec(dllexport) __stdcall printArr(void* A, int I, int J)
{
int c = 0;
int * B = (int *) A;
for (c = 0; c < I * J; c++)
printf(" %d", B[c]);
printf("\n");
}
// ------------------

This somehow answers part of my question. It seems that I have to use a
nested for-loop (I agree the complexity is O(number of elements)) to
hand-pick the elements one by one (I am afraid this will be a
bottleneck when it comes to speed).

So my conclusion is that there are no 'better' way to flatten an array
in two dims ordered by column.

Two points.

First, the only way faster than O(n) is something equivalent to C's
memcpy method that copies chunks of memory at a time. I think that this
is what you're looking for, but so far as I know C# doesn't allow you
to do things like this for security reasons. On modern CPUs the loop
operation should be fast enough anyway, as the whole row will likely
fit into on-board cache and the loop for copying a row (at least) will
not have to fetch from memory. I don't know about the writing of the
array to its new "flattened" structure... that may fit into cache as
well. Anyway, using pure C#, there's no "memcpy" any more, so there's
nothing faster than O(n).

Second, I think that you're thinking about the problem from a
too-low-level point of view. In situations like this, you have two
possible solutions:

1) Maintain the values in their most logical form (i.e. a
two-dimensional array) and whenever you need to read them in another
form (e.g. flattened into a column-then-row order) then do the
necessary calculations to fetch them in that order. This is the
solution I proposed. This solution makes sense if most of the time you
are manipulating / using the values as a 2-D array and only
occasionally do you need them in their row-major or column-major order.

From a performance point of view, this solution works well if you spend

most of your time in the C# code, but only occasionally have to call
other languages and thus translate the array into column-major order.
(As you've seen, C# stores multidimensional arrays in row-major order
and so is compatible with C, for example, although I don't know if that
is guaranteed to be so into the future.)

If, however, you make repeated calls to other languages and you have to
transform the array each time, and you suspect that this will be a
performance problem, the solution is not to find a way to transform the
array faster, but to find a way to not transform the array at all.

2) One solution along those lines is to maintain the values in multiple
orders and use whichever order is most useful at any given moment. This
makes sense if the values are read much more than they are modified,
and the reads are as often in one order as another. The code for a
solution like that might look something like this:

public class MyMatrix<T>
{
private T[] _rowMajor;
private T[] _columnMajor;
private int _rowCount;
private int _columnCount;

public MyMatrix(int rows, int columns)
{
if (rows <= 0) throw new ArgumentException(...);
if (columns <= 0) throw new ArgumentException(...);
this._rowCount = rows;
this._columnCount = columns;
this._rowMajor = new T[rows * columns];
this._columnMajor = new T[rows * columns];
}

public T this[int row, int column]
{
get { return this._rowMajor[row * this._columnCount + column];
}
set
{
this._rowMajor[row * this._columnCount + column] = value;
this._columnMajor[column * this._rowCount + row] = value;
}
}

...etc...
}

As you can see, the cost here is writing to the structure, which is now
more expensive. The return on that investment is that you have row and
column-major order arrays constantly maintained for quick read-back. If
the C# program is constantly reading the values as a two-dimensional
array then you could add that as a third, constantly-maintained form
for the data. In other words, store the data in triplicate:
two-dimensional, row-major, and column-major orders.

This becomes more complicated, however, if the programs you call will
be modifying the arrays. In that case when you return from calling the
other language you have to synch any changes from the array you passed
back to the other copy or copies. So, if you pass a row-major array to
a program that makes changes to it, you then have to copy that back
into the column-major version (and potentially to the two-dimensional
version, if you're maintaining that) so that they remain in synch.
Since you can't know what the program changed, that would cost you at
least one entire array copy on every call, which would be the same cost
as the first solution.

So, if the other-language programs you're calling do nothing but read
the arrays, and you call them frequently, solution #2 is better. If
they update the arrays, or if they are called infrequently and your C#
program does frequent updates to the array then solution #1 is at least
no worse, and probably better. Finally, if the programs you call in
other languages update the arrays they're passed then you're going to
have performance problems no matter what you do.

per9000 · Dec 3, 2006

Hi,

First, the only way faster than O(n) is something equivalent to C's
memcpy method that copies chunks of memory at a time. I think that this
is what you're looking for, but so far as I know C# doesn't allow you
to do things like this for security reasons.

Ok, that is what I needed to hear. Thanks.

2) One solution along those lines is to maintain the values in multiple
orders and use whichever order is most useful at any given moment. This
makes sense if the values are read much more than they are modified...

Excellent idea! In my case the values are created once, then pretty
much only read. This will of course require some more memory, but that
is compensated with speed so I think this might be nice for me -
especially since I might end up with repeated scans of the matrix.

Another bonus with this is that I could use shallow copying if I only
read the values (this is not true in general but might work sometimes).
I send a pointer to the appropriate one-dimensional array instead of
copying the values. O(1) I guess, but on the other hand - I do not
really copy the values.

/Per

Bruce Wood · Dec 4, 2006

per9000 said:
I send a pointer to the appropriate one-dimensional array instead of copying the values.

Exactly. No copying = no performance problem.

BTW, it's common to trade memory for performance when solving computing
problems. It's rare that you can save memory and increase speed at the
same time (unless the original design stank). Usually there are several
good designs, ranging from compact and slow to memory-hungry and fast,
and the problem is to choose the right one based on how the data are
used.

per9000 · Dec 4, 2006

Another related thing:
Do you know how efficient/fast...

// non-shallow copy of myMatrix to yourMatrix
myMatrix.CopyTo(yourMatrix,0);

....is compared to...

for (int i = 0; i < myMatrix.Length; i++)
yourMatrix = myMatrix

?

/Per

Performance issues with multi-dimensional arrays	4	Nov 12, 2008
DllImport two dimensional array	1	Mar 28, 2008
multidimensional array	1	Oct 15, 2008
Converting a IEnumberable sequence into an array - an example here	2	Oct 26, 2008
Multi-dimensional arrays	2	Jun 4, 2009
why do I get compile error	2	Jan 26, 2010
b-tree	11	Jun 17, 2011
Parsing Data Problem ...	5	Nov 3, 2010

flatten multi-dimensional array to on-dimensional array

per9000

Samuel R. Neff

Christof Nordiek

Bruce Wood

per9000

Bruce Wood

per9000

Bruce Wood

per9000

Ask a Question

Similar Threads