Compaing two arrays when element order does not matter

S

ssg31415926

I need to compare two string arrays defined as string[] such that the
two arrays are equal if the contents of the two are the same, where
order doesn't matter and every element must be unique.

E.g. these two arrays would test as equal:

servers[0] = "Admin"
servers[1] = "Finance"
servers[2] = "Payroll"
servers[3] = "Sales"

monitors[0] = "Sales"
monitors[1] = "Payroll"
monitors[2] = "Admin"
monitors[3] = "Finance"

Is there an easy way to do this? The only way I can think of is:
1. check that the number of elements is the same because if not then
they're not equal since every element must be unique.
2. create another array (with one element for each element in the first
array) and run through the second array once for each element in the
first, checking if it exists and marking it in the additional array if
it does.
Frankly, this seems clumsy. Is there a better way?
 
O

Otis Mukinfus

I need to compare two string arrays defined as string[] such that the
two arrays are equal if the contents of the two are the same, where
order doesn't matter and every element must be unique.

E.g. these two arrays would test as equal:

servers[0] = "Admin"
servers[1] = "Finance"
servers[2] = "Payroll"
servers[3] = "Sales"

monitors[0] = "Sales"
monitors[1] = "Payroll"
monitors[2] = "Admin"
monitors[3] = "Finance"

Is there an easy way to do this? The only way I can think of is:
1. check that the number of elements is the same because if not then
they're not equal since every element must be unique.
2. create another array (with one element for each element in the first
array) and run through the second array once for each element in the
first, checking if it exists and marking it in the additional array if
it does.
Frankly, this seems clumsy. Is there a better way?
Do the two arrays both have to be arrays? If one can be a Hashtable,
then you can load it and use it to test the array.

Or if they have to be arrays you can use a binary search algorithm to
search one of them but one of them will have to be sorted to do that.

Otis Mukinfus
http://www.otismukinfus.com
http://www.tomchilders.com
 
N

Nick Hounsome

ssg31415926 said:
I need to compare two string arrays defined as string[] such that the
two arrays are equal if the contents of the two are the same, where
order doesn't matter and every element must be unique.

E.g. these two arrays would test as equal:

servers[0] = "Admin"
servers[1] = "Finance"
servers[2] = "Payroll"
servers[3] = "Sales"

monitors[0] = "Sales"
monitors[1] = "Payroll"
monitors[2] = "Admin"
monitors[3] = "Finance"

Is there an easy way to do this? The only way I can think of is:
1. check that the number of elements is the same because if not then
they're not equal since every element must be unique.
2. create another array (with one element for each element in the first
array) and run through the second array once for each element in the
first, checking if it exists and marking it in the additional array if
it does.
Frankly, this seems clumsy. Is there a better way?

Do the length check then sort the first array then use binary search to look
for elements of the second array in the first.
 
G

Guest

Just copy the array elements to a hashtable or sorted list to sort them.
Once sorted, you can iterate through the sorted list and compare.

Again, as I often suggest, consider using a CollectionBase based custom
collection instead of an array and you can sort your collection without
having to copy your array to anything else.

HTH
 
N

Nick Hounsome

Dale said:
Just copy the array elements to a hashtable or sorted list to sort them.
Once sorted, you can iterate through the sorted list and compare.

Again, as I often suggest, consider using a CollectionBase based custom
collection instead of an array and you can sort your collection without
having to copy your array to anything else.

Why do you think that you need to copy an array to sort it?
 
B

Bruce Wood

Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

Hashing one array into a Hashtable then using the Hashtable to test for
presence is probably the fastest method for large arrays, since its
cost increases roughly as a factor of the total number of items, rather
than as n times m or anything like that.

For smaller arrays there is always the simpleminded approach:

if (servers.Length != monitors.Length)
{
return false;
}
for (int i = 0; i < servers.Length; i++)
{
bool found = false;
for (int j = 0; i < monitors.Length j++)
{
if (servers == monitors[j])
{
found = true;
}
}
if (!found)
{
return false;
}
}
return true;

If your arrays are going to be small (a few dozen elements at most,
each) then this isn't a bad solution: it is very simple and anyone will
be able to read and understand it.

The problem is that its cost increases by the _product_ of the array
sizes. So if there are "n" elements in the servers array and "m"
elements in the monitors array then the cost (generally speaking) will
be n times m divided by 2.

Copying one array into a Hashtable and then looking up the elements of
the other array in the Hasthable increases as some factor of n + m,
which is far better as the arrays grow:

Hashtable serverHash = new Hashtable();
foreach (string s in servers)
{
serverHash = s;
}
foreach (string m in monitors)
{
if (serverHash.Contains(m))
{
serverHash.Remove(m);
}
else
{
return false;
}
}
return serverHash.Count == 0;
 
O

Otis Mukinfus

On 2 Feb 2006 12:09:15 -0800, "Bruce Wood" <[email protected]>
wrote:

All good points...
In Line:
Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

Hashing one array into a Hashtable then using the Hashtable to test for
presence is probably the fastest method for large arrays, since its
cost increases roughly as a factor of the total number of items, rather
than as n times m or anything like that.

For smaller arrays there is always the simpleminded approach:

if (servers.Length != monitors.Length)
{
return false;
}
for (int i = 0; i < servers.Length; i++)
{
bool found = false;
for (int j = 0; i < monitors.Length j++)
{
if (servers == monitors[j])
{
found = true;


// no reason to keep going if you found it
break;
}
}
if (!found)
{
return false;
}
}
return true;

If your arrays are going to be small (a few dozen elements at most,
each) then this isn't a bad solution: it is very simple and anyone will
be able to read and understand it.

The problem is that its cost increases by the _product_ of the array
sizes. So if there are "n" elements in the servers array and "m"
elements in the monitors array then the cost (generally speaking) will
be n times m divided by 2.

Copying one array into a Hashtable and then looking up the elements of
the other array in the Hasthable increases as some factor of n + m,
which is far better as the arrays grow:

Hashtable serverHash = new Hashtable();
foreach (string s in servers)
{
serverHash = s;
}
foreach (string m in monitors)
{
if (serverHash.Contains(m))
{
serverHash.Remove(m);
}
else
{
return false;
}
}
return serverHash.Count == 0;


Otis Mukinfus
http://www.otismukinfus.com
http://www.tomchilders.com
 
B

Bruce Wood

Yes, that was particularly silly of me. I meant to include "&& !found"
in the loop condition, but forgot.
 
N

Nick Hounsome

Bruce Wood said:
Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

He didn't say that they you weren't allowed to sort them (or at least one of
them) which is what I suggested.
 
G

Guest

I don't assume you have to copy an array to sort it. I just assume that the
OP would have done that already if that's what he wanted. I assumed that he
wanted to keep the existing arrays in their original order. How did you
interpret his question in regards to that?

Also, if he copies the arrays to a hashtable he can solve the problem of
uniqueness. Another way to solve the uniqueness problem is to use a custom
collection so that he can implement his own add method which will guarantee
uniqueness.
 
G

Guest

Even better, rather than

found = true;
break;

is to just:

return true;

--
Dale Preston
MCAD C#
MCSE, MCDBA


Otis Mukinfus said:
On 2 Feb 2006 12:09:15 -0800, "Bruce Wood" <[email protected]>
wrote:

All good points...
In Line:
Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

Hashing one array into a Hashtable then using the Hashtable to test for
presence is probably the fastest method for large arrays, since its
cost increases roughly as a factor of the total number of items, rather
than as n times m or anything like that.

For smaller arrays there is always the simpleminded approach:

if (servers.Length != monitors.Length)
{
return false;
}
for (int i = 0; i < servers.Length; i++)
{
bool found = false;
for (int j = 0; i < monitors.Length j++)
{
if (servers == monitors[j])
{
found = true;


// no reason to keep going if you found it
break;
}
}
if (!found)
{
return false;
}
}
return true;

If your arrays are going to be small (a few dozen elements at most,
each) then this isn't a bad solution: it is very simple and anyone will
be able to read and understand it.

The problem is that its cost increases by the _product_ of the array
sizes. So if there are "n" elements in the servers array and "m"
elements in the monitors array then the cost (generally speaking) will
be n times m divided by 2.

Copying one array into a Hashtable and then looking up the elements of
the other array in the Hasthable increases as some factor of n + m,
which is far better as the arrays grow:

Hashtable serverHash = new Hashtable();
foreach (string s in servers)
{
serverHash = s;
}
foreach (string m in monitors)
{
if (serverHash.Contains(m))
{
serverHash.Remove(m);
}
else
{
return false;
}
}
return serverHash.Count == 0;


Otis Mukinfus
http://www.otismukinfus.com
http://www.tomchilders.com
 
B

Bruce Wood

return true;

No, because then you would return true upon finding the first element
of the servers array in the monitors array, without checking the other
servers.
 
B

Bruce Wood

True. Nonetheless, your hash suggestion was much more efficient. Even a
quicksort becomes expensive as the number of elements grows.

When it comes to sorting versus hashing, I always think of it this way:
Sorting costs more because it's putting the elements _in order_, not
just creating a way to find them quickly. So, I always ask myself
whether I need the elements to be _in order_ for any reason. If the
answer is no, then a sort is usually my last resort, as I don't want to
incur the extra cost to establish an order that I don't need.

In this case the OP was testing for presence. So a sort would incur a
bunch of extra work to order the elements when that ordering wouldn't
really contribute to the solution. Hashing (which you also suggested)
is a much better idea.
 
D

David Larson

Simple solution that involves no sorting. Just add all the hash codes
from both arrays and compare the two sums. This O(n) solution has an
extremely slight possibility of a false positive.

// first check length of two arrays, if not equal then done
if( servers.Length != monitors.Length )
return false;

// add all hashcode values of each element in first array
int serversSumOfHashCodes = 0;
for( int index=0; index<servers.Length; index++ )
{
serversSumOfHashCodes += servers[index].GetHashCode();
}

// add all hashcode values of each element in second array
int monitorsSumOfHashCodes = 0;
for( int index=0; index<monitors.Length; index++ )
{
monitorsSumOfHashCodes += monitors[index].GetHashCode();
}

// compare first sumOfHashCodes with second sumOfHashCodes
// if sums are == then the two arrays contain the same of elements
if( serversSumOfHashCodes == monitorsSumOfHashCodes )
return true;

return false;
 
J

Jon Skeet [C# MVP]

David Larson said:
Simple solution that involves no sorting. Just add all the hash codes
from both arrays and compare the two sums. This O(n) solution has an
extremely slight possibility of a false positive.

But in the case where there *is* a false positive, it'll be a real pain
to track down. Hash codes really *shouldn't* be used as definite
equality checks.

Of course, it's more reasonable to do this as a quick way of returning
a negative, and then testing more exhaustively in the positive case.

(I'd use foreach loops instead of the two for loops, by the way - or
preferrably make each one a call to a single common routine that
contained the loop. Of course, this was demonstrating the idea more
than giving the actual code you'd use.)
 
K

Ken Wilson

I don't assume you have to copy an array to sort it. I just assume that the
OP would have done that already if that's what he wanted. I assumed that he
wanted to keep the existing arrays in their original order. How did you
interpret his question in regards to that?

Also, if he copies the arrays to a hashtable he can solve the problem of
uniqueness. Another way to solve the uniqueness problem is to use a custom
collection so that he can implement his own add method which will guarantee
uniqueness.

Is there really anything wrong, given the arrays are the same length,
with just doing a brute force compare. You are going to need to make
a one-to-one match to pass anyway so I figure taking one element at a
time and walking the other array and bailing out of the loop with a
fail the moment you don't find a match. Is there a more elegant way?
Can the more elegant way also be guaranteed to outperform a brute
force attempt in all but the most remote circumstance?

However, I agree with one of the earlier posters however. If I was
going to need this kind of capacity I would be more inclined to use an
Array or some other appropriate class from the Collections namespace
simply for convenience.

Ken Wilson
Seeking viable employment in Victoria, BC
 
J

Jon Skeet [C# MVP]

Ken said:
Is there really anything wrong, given the arrays are the same length,
with just doing a brute force compare. You are going to need to make
a one-to-one match to pass anyway so I figure taking one element at a
time and walking the other array and bailing out of the loop with a
fail the moment you don't find a match. Is there a more elegant way?
Can the more elegant way also be guaranteed to outperform a brute
force attempt in all but the most remote circumstance?

A sort and then compare would generally be O(n log n). Inserting values
into a hashtable and then walking through checking for existence is
likely to be O(n) (assuming reasonably distributed hash codes). Brute
force will be O(n^2). It's only likely to make a difference when you
get to large arrays, but then the difference could be very significant.

(To check the hashtable stuff, I had a look at the docs for Hashtable
which claim that retrieving a value is an O(1) operation. I find that
hard to believe - if every key in the table has the same hashcode, it
has to walk through all of them, making it an O(n) operation. That's
not mentioned anywhere...)

Jon
 
K

Ken Wilson

Lots of suggestions so far, including some that won't work.

Binary searches won't work, since the OP specifically stated that the
arrays would be in random order.

Hashing one array into a Hashtable then using the Hashtable to test for
presence is probably the fastest method for large arrays, since its
cost increases roughly as a factor of the total number of items, rather
than as n times m or anything like that.

For smaller arrays there is always the simpleminded approach:

I think there is a bug in this code. If the very first item passes
won't the found variable be irreversibly changed to true?
if (servers.Length != monitors.Length)
{
return false;
}
for (int i = 0; i < servers.Length; i++)
{
bool found = false;
for (int j = 0; i < monitors.Length j++)
{
if (servers == monitors[j])
{
found = true;
}
}
if (!found)
{
return false;
}
}
return true;


Following your initial code I have derived this. Excuse me for
wrapping it in a method but I just love re-usable code, [;-)

bool IsEquivalent(string[] servers, string[] monitors)
{
bool equivalent = false; // assume the worst

if (servers.Length == monitors.Length) // same length?
{
bool found = true; // assume the best

for (int i = 0; i < servers.Length; i++)
{
for (int j = 0; j < monitors.Length; j++)
{
if (servers == monitors[j]
{
continue;
}
else
{
found = false;
break;
}
}

if (!found)
{
break;
}
}

equivalent = found;
}

return equivalent;
}

If I am wrong in this analysis please feel free to correct me. It
wouldn't be my first time and I suspect it wouldn't be my last anyway.
[;-)

Ken Wilson
Seeking viable employment in Victoria, BC
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top