algorithm for "common array of longs"

P

Peter

Hi

I have a number of arrays of longs, from which I need to find a single
array which only contains the values which appear in all the original
arrays.

For example, I could have the three arrays:

1, 3, 2, 8, 5
3, 6, 1
6, 1, 4, 3

from which I need to generate the "common" array:
1, 3
(as these are the only values which appear in all the original arrays).


What is a good algorithm for this? The below is what I have written,
which does seem to do what I want. But more than likely there is a much
better, more efficient way to do this...

static void Main(string[] args)
{
long[] x = new long[5] { 1, 3, 2, 8, 5 };
long[] y = new long[3] { 3, 6, 1 };
long[] z = new long[4] { 6, 1, 4, 3 };

List<long[]> originals = new List<long[]> { x, y, x };

long[] commonArray = Common(originals);
}

static long[] Common(List<long[]> originals)
{
long[] temp = originals[0];

for (int i = 1; i < originals.Count; i++)
{
temp = Common(temp, originals);
}

return temp;
}

static long[] Common(long[] one, long[] two)
{
List<long> common = new List<long>();
foreach (long val in one)
{
if (two.Contains(val))
{
common.Add(val);
}
}
return common.ToArray();
}


Thanks,
Peter
 
M

Michel Walsh

If the lists are large and sorted, a double walk over the 'index' in
parallel can be implemented. If the list are short and not sorted, the list
can be walked through, kind of what you did, rather than first sorted and
next using a index walk in parallel, because the time spent to sort them may
be more than what you will win by using the double walk on the 'index'. That
is why SQL has execution plan based on statistics: given different cases,
the 'imperative code' used will be different. There is also a question of
memory: can you use a lot or are you limited to the strict minimum? Anyhow,
here how a walk on index in parallel works (note that it is what SQL can
perform to execute an INNER JOIN between two lists with no dup).

0. R is an empty list,
FingerA points to before the start of the listA
FingerB points to before the start of the listB
1. Finger A points the next item in list A, itemA
2. If fingerA points to nothing, it is done.
3. Find next item in listB equals to itemA
3a. If none is found, goto 1
4. Append itemA to R
5. Move FingerB to the next item in listB, itemB
6. If fingerB points to nothing, it is done.
7. Find next item in listA equals to itemB
7a. If none if found, goto 5
8. Append itemB to R
9 Goto 1.


note that you can change lightly the algorithm if listB has duplicated
values (I assumed both list have not dup), but I don't think this is the
kind of JOIN you want.

Example: listA: 1, 2, 3, 5, 8
list B: 1, 5, 6

Step 1: fa -> 1
Step 3: fb-> 1, R =={ 1 }
Step 5: fb-> 5
Step 7: fa-> 5, R =={ 1, 5 }
Step 1: fa->8
Step 3a.
Step 1: fa-> eof
so, R={ 1, 5 }



Now, for the code... I would use SQL, but feel free to do it in C# though
:)



Vanderghast, Access MVP



Peter said:
Hi

I have a number of arrays of longs, from which I need to find a single
array which only contains the values which appear in all the original
arrays.

For example, I could have the three arrays:

1, 3, 2, 8, 5
3, 6, 1
6, 1, 4, 3

from which I need to generate the "common" array:
1, 3
(as these are the only values which appear in all the original arrays).


What is a good algorithm for this? The below is what I have written,
which does seem to do what I want. But more than likely there is a much
better, more efficient way to do this...

static void Main(string[] args)
{
long[] x = new long[5] { 1, 3, 2, 8, 5 };
long[] y = new long[3] { 3, 6, 1 };
long[] z = new long[4] { 6, 1, 4, 3 };

List<long[]> originals = new List<long[]> { x, y, x };

long[] commonArray = Common(originals);
}

static long[] Common(List<long[]> originals)
{
long[] temp = originals[0];

for (int i = 1; i < originals.Count; i++)
{
temp = Common(temp, originals);
}

return temp;
}

static long[] Common(long[] one, long[] two)
{
List<long> common = new List<long>();
foreach (long val in one)
{
if (two.Contains(val))
{
common.Add(val);
}
}
return common.ToArray();
}


Thanks,
Peter
 
R

Rudy Velthuis

Peter said:
Hi

I have a number of arrays of longs, from which I need to find a single
array which only contains the values which appear in all the original
arrays.

For example, I could have the three arrays:

1, 3, 2, 8, 5
3, 6, 1
6, 1, 4, 3

from which I need to generate the "common" array:

You need the intersection of different sets. Intersect the first with
the second and the result with the third. There are plenty of
algorithms to do that. I have a few (relying on the sets of numbers
being sorted), but they are unfortunately not in C#.

A quick translation (may be faulty) - note that the lists must be
sorted for this to work - it is generally much faster than calling
Contains() in a loop:

public static int SetIntersection<T>(IList<T> source1,
IList<T> source2, IList<T> destination, IComparer<T> comparer)
{
int i1 = 0;
int i2 = 0;
int d = 0;
while (i1 < source1.Count && i2 < source2.Count)
{
int comp = comparer.Compare(source1[i1], source2[i2]);
if (comp < 0)
i1++;
else if (comp > 0)
i2++;
else
{
destination[d] = source1[i1];
i1++;
i2++;
d++;
}
}
return d;
}

Of course destination must be big enough to contain the largest of the
source lists entirely.

if list1, list2 and list3 are the 3 lists, then do

int len = SetIntersection<int>(list1, list2, intermediate,
Comparer<int>.Default);
intermediate.DeleteRange(len, intermediate.Count - len);
len = SetIntersection<int>(intermediate, list3, result,
Comparer<int>.Default);
result.DeleteRange(len, result.Count - len);

And result will contain the elements present in all. You could modify
the routine to pass d back as well, so you know the number of items
written to it.
 
G

Göran Andersson

Peter said:
Hi

I have a number of arrays of longs, from which I need to find a single
array which only contains the values which appear in all the original
arrays.

For example, I could have the three arrays:

1, 3, 2, 8, 5
3, 6, 1
6, 1, 4, 3

from which I need to generate the "common" array:
1, 3
(as these are the only values which appear in all the original arrays).


What is a good algorithm for this? The below is what I have written,
which does seem to do what I want. But more than likely there is a much
better, more efficient way to do this...

static void Main(string[] args)
{
long[] x = new long[5] { 1, 3, 2, 8, 5 };
long[] y = new long[3] { 3, 6, 1 };
long[] z = new long[4] { 6, 1, 4, 3 };

List<long[]> originals = new List<long[]> { x, y, x };

long[] commonArray = Common(originals);
}

static long[] Common(List<long[]> originals)
{
long[] temp = originals[0];

for (int i = 1; i < originals.Count; i++)
{
temp = Common(temp, originals);
}

return temp;
}

static long[] Common(long[] one, long[] two)
{
List<long> common = new List<long>();
foreach (long val in one)
{
if (two.Contains(val))
{
common.Add(val);
}
}
return common.ToArray();
}


Thanks,
Peter


The Contains method for a List gets slower the larger the list is, a
HashSet scales much better:

static long[] Common(long[] one, long[] two) {
HashSet<long> set = new HashSet<long>(two);
List<long> common = new List<long>();
foreach (long val in one) {
if (set.Contains(val)) common.Add(val);
}
return common.ToArray();
}
 
M

Michel Walsh

Seems much more compact that the pseudo code I tried to used...and working
fine. I would use destination.Add( source1[i1] ) instead of
destination[d]=source1[i1], though.



I wonder what LINQ.Intesect uses?

Int[] first = { 1, 3, 2, 8, 5};
Int[] second = { 6, 1, 4, 3};
IEnumerable<int> common = first.Intersect(second);


Vanderghast, Access MVP



Rudy Velthuis said:
Peter said:
Hi

I have a number of arrays of longs, from which I need to find a single
array which only contains the values which appear in all the original
arrays.

For example, I could have the three arrays:

1, 3, 2, 8, 5
3, 6, 1
6, 1, 4, 3

from which I need to generate the "common" array:

You need the intersection of different sets. Intersect the first with
the second and the result with the third. There are plenty of
algorithms to do that. I have a few (relying on the sets of numbers
being sorted), but they are unfortunately not in C#.

A quick translation (may be faulty) - note that the lists must be
sorted for this to work - it is generally much faster than calling
Contains() in a loop:

public static int SetIntersection<T>(IList<T> source1,
IList<T> source2, IList<T> destination, IComparer<T> comparer)
{
int i1 = 0;
int i2 = 0;
int d = 0;
while (i1 < source1.Count && i2 < source2.Count)
{
int comp = comparer.Compare(source1[i1], source2[i2]);
if (comp < 0)
i1++;
else if (comp > 0)
i2++;
else
{
destination[d] = source1[i1];
i1++;
i2++;
d++;
}
}
return d;
}

Of course destination must be big enough to contain the largest of the
source lists entirely.

if list1, list2 and list3 are the 3 lists, then do

int len = SetIntersection<int>(list1, list2, intermediate,
Comparer<int>.Default);
intermediate.DeleteRange(len, intermediate.Count - len);
len = SetIntersection<int>(intermediate, list3, result,
Comparer<int>.Default);
result.DeleteRange(len, result.Count - len);

And result will contain the elements present in all. You could modify
the routine to pass d back as well, so you know the number of items
written to it.


--
Rudy Velthuis http://rvelthuis.de

"We are not retreating - we are advancing in another Direction."
-- General Douglas MacArthur (1880-1964)
 
R

Rudy Velthuis

Michel said:
Seems much more compact that the pseudo code I tried to used...and
working fine. I would use destination.Add( source1[i1] ) instead of
destination[d]=source1[i1], though.

I designed mine to work with arrays too.
I wonder what LINQ.Intesect uses?

Int[] first = { 1, 3, 2, 8, 5};
Int[] second = { 6, 1, 4, 3};
IEnumerable<int> common = first.Intersect(second);

I translated that from a language without LINQ. <g>

I guess LINQ would be a lot easier.
 
H

Hans Kesting

Göran Andersson brought next idea :
Peter said:
Hi

I have a number of arrays of longs, from which I need to find a single
array which only contains the values which appear in all the original
arrays.

For example, I could have the three arrays:

1, 3, 2, 8, 5
3, 6, 1
6, 1, 4, 3

from which I need to generate the "common" array:
1, 3
(as these are the only values which appear in all the original arrays).


What is a good algorithm for this? The below is what I have written,
which does seem to do what I want. But more than likely there is a much
better, more efficient way to do this...

static void Main(string[] args)
{
long[] x = new long[5] { 1, 3, 2, 8, 5 };
long[] y = new long[3] { 3, 6, 1 };
long[] z = new long[4] { 6, 1, 4, 3 };

List<long[]> originals = new List<long[]> { x, y, x };

long[] commonArray = Common(originals);
}

static long[] Common(List<long[]> originals)
{
long[] temp = originals[0];

for (int i = 1; i < originals.Count; i++)
{
temp = Common(temp, originals);
}

return temp;
}

static long[] Common(long[] one, long[] two)
{
List<long> common = new List<long>();
foreach (long val in one)
{
if (two.Contains(val))
{
common.Add(val);
}
}
return common.ToArray();
}


Thanks,
Peter


The Contains method for a List gets slower the larger the list is, a HashSet
scales much better:

static long[] Common(long[] one, long[] two) {
HashSet<long> set = new HashSet<long>(two);
List<long> common = new List<long>();
foreach (long val in one) {
if (set.Contains(val)) common.Add(val);
}
return common.ToArray();
}


If you are using HashSets, why not:

long[] x = new long[5] { 1, 3, 2, 8, 5 };
long[] y = new long[3] { 3, 6, 1 };
long[] z = new long[4] { 6, 1, 4, 3 };

HashSet<long> hs1 = new HashSet<long>(x);
hs1.IntersectWith(y);
hs1.IntersectWith(z);

foreach(var l in hs1)
Console.WriteLine(l);


(returns 1 and 3)

Hans Kesting
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top