Simple but confusing algorith question

B

Bill Butler

Greg Young said:
As you mention that this is an interview question ...

My guess is that they were looking to see how you approached the problem (and what questions you
asked like "is the data sorted", "What's the general usage of the method i.e. 3 items vs 300")

All of those items have alot to do with how you would want to implement the routine.

I would *hope* that this was the interviewer's intent.
Asking "what's the most efficient implementation?" is a loaded question.

Is the data sorted?
Is it OK to sort the data, or do we need to copy the Array and sort that?
Is the array so large that copying it causes memory issues?
Is the data set small enough that O(N), O(NlogN) etc not come into play?

What does the actual data normally look like? (let me Splain)
If the data has very few gaps in it a brute force approach might be fastest.
Example: Data = all integers from 1-100 Sum = 101
One pass through the data will suffice to find a valid result.

OK, this is an extreme example, but if the data has few gaps, you will tend to get multiple valid
solutions to the problem. In the example above there are 100 pairs that add up to 101 (assuming
(x,y) is different from (y,x))

Thus a brute force search can succeed, possibly faster that the sort

Anyway....food for thought
Bill
 
G

Greg Young [MVP]

Why sorted data? - Well sorted data would allow it to be done in O(n) to
start with ...

I already retracted the i+1 above.

This question - array 1,2,3,4 sum = 5 is because there are 2 answers.


SP said:
Greg Young said:
If the data were sorted you could do much better. If they are not sorted
you
would have n^2 as your worst case as you have to check every index of the
array for each value.

How would sorted data change things?
even so you could still do slightly better by only issuing a single
add/sub
instead of a operation on every iteration ... see example.

bool MyPairSum(int [] array, int sum)
{
for (int i = 0; i < array.Length; i++)
{
int need = sum - array[j];
for (int j = 0; j < array.Length; j++)
{
if (array[j] == sum)
return true;
}
}
return false;
}

Also your code has failure modes because int j = i + 1 ... what about the
following data?

int j = i + 1 does not cause a failure. It avoids comparing a number
against itself which you do in your example. In your example you would
return True incorrectly with 0,2,3 sum = 4
array 0,1,2 sum = 3 by using i+1 you say it doesn't exist.

Original return True correctly
Another quick question ... what should the behavior be for the following?

array 1,2,3,4 sum = 5

Same here
Cheers,

Greg

Hey all,


I was asked this question in an interview recently:

Suppose you have the method signature

bool MyPairSum(int [] array, int sum)

the array has all unique values (no repeats), your task is to find two
indices in the array whose sum equals the input parameter sum. If there
exists two entries in the array that sum together to equal the input
parameter of sum, then return true, otherwise return false. what's the
most efficient implementation?

My question is can we have a better worse case runtime of O(n²) ??
I'm thinking that in the worse case, you'd have to compare each index
against all the other indices and test each sum until you find one that
matches the input sum. In the worse case, there will be no pair of
indices that equal the input sum parameter. Am I overlooking a very
basic trick? I figured with all the sharp people in here, somebody can
discern whether I'm off base or not :)

Here's the sample implementation I was thinking of:


bool MyPairSum(int [] array, int sum)
{
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if ( array + array[j] == sum)
return true;
}
}
return false;
}


Tks, Karan S.

 
J

james.curran

One change to this (actually the problem was in every one)

bool MyPairSum(int [] array, int sum)
{
for (int i = 0; i < array.Length -1; i++)
{
int need = sum - array[j];
for (int j = i+1; j < array.Length; j++)
{
if (array[j] == need)
return true;
}
}
return false;
}

Without the "array.Length -1" in the "for (i=" line, j would equal
Length+1, and "array[j]" would blow up (except we would actually enter
the second for()), but we would go through the outer loop one more time
than necessary --- and our goal *is* to be the most efficient.
 
B

Bruce Wood

Here is a minor adaptation that may save some iterations depending upon
the sum you're asking for and what data is in the array. Starting with
James Curran's code:

bool MyPairSum(int [] array, int sum)
{
if (array.Length == 0) { return false; }

int minValue = array[0];
int maxValue = array[0];
int need;

// Do the first pass through the array
need = sum - array[0];
for (int j = 1; j < array.Length; j++)
{
if (array[j] == need) { return true; }
if (array[j] < minValue) minValue = array[j];
if (array[j] > maxValue) maxValue = array[k];
}

// Now do the rest
for (int i = 1; i < array.Length -1; i++)
{
need = sum - array[j];
if (minValue <= need && need <= maxValue)
{
for (int j = i+1; j < array.Length; j++)
{
if (array[j] == need)
return true;
}
}
}
return false;
}

This eliminates cases in which the value at array is such that no
value in the array could possibly combine with it to sum to "sum". If
the array contents were sufficiently disparate this could save time.

Of course, if the array contents and the sum are such that almost any
pair of values could potentially sum to "sum" then the additional test
adds two comparisons and an && operation to the cost.
 
M

MSDN

In addition possibly you can add some "History-Efficiency-Metrics" of
previous executions and predict or guestimate an algorithm to use.
at times it could be the wrong algorithm efficiency wise.
This is done in Microprocessors today when doing look ahead calculations.
Basically you are processing op-code ahead of time predicting that the
processing logic might need the look-ahead calculation.

SA
 
J

james.curran

After all, O(n)+O(n.log n) = O(n.log n).

Um... Not really. Or, should I say, "Only in th emost theortic case".


Remember the "O(X)" translates to "time = X*C + K".

Hence you are arguing that "n*Clinearsearch + Klinearsearch +
(n.log.n)*Csort +Ksort) < (n*n) *Csearch + Ksearch)"

The problem is the Csort is *so much* greater than Csearch, n would
have to be turly massive before it becomes the domain factor in the
formula.

e.g., let's assume all the Ks are 0, and Clinearseach == Csearch, then
we get
n*Csearch + (n.log.n)*CSort < (n*n)*Csearch

However, the simple search really isn't n*n, it's (n*(n+1))/2, which
regrouped equas
n*Csearch + (n*CSort) * (log.n) < (n * (n+1)*Csearch) /2

Multply by 2
2*n*Csearch + 2*(n*CSort) * (log.n)< (n * (n-1)*Csearch)

Subtract 2*n*Csearch
2 * n * CSort * log.n < n * (n-3)*Csearch

divide by n:
2 * log.n *CSort < (n-3) * Csearch

Putting that all together,
if CSort == CSearch, sort first is faster when n >=7
if CSort == CSearch*10, sort first is faster when n >=94
if CSort == CSearch*100, sort first is faster when n >=1461
if CSort == CSearch*1000, sort first is faster when n >=19789

Now, considering the Csearch here is simply one comparison, and Csort
involves spliting, joining, comparing & copying array elements, 1:1000
ratio is not out of the question.
 
B

Bruce Wood

As Michael H pointed out above, that should be:

need = sum - array;

not

need = sum - array[j];
 
M

Marcus Andrén

As was pointed out by some of the other posters, the best algorithm
depends on the number of items in the array. Unless it is specifically
mentioned in the question it is usually best that the algorithm should
work with any indata size and therefore, the O() is the easiest thing
to look at.

An O(n*log n) solution has been posted in this thread. It consists of
cloning the array, sorting it and finally traverse at it from both
ends. Here is some sample code for it:

public static bool MyPairSumSorted(int[] array2, int sum)
{
int[] array = (int[])array2.Clone();
Array.Sort(array);
int i1 = 0;
int i2 = array.Length - 1;

while (i1 != i2)
{
int compare = array[i1] + array[i2] - sum;
if (compare < 0)
{
i1++;
}
else if (compare > 0)
{
i2--;
}
else
return true;
}

return false;
}

If the indata array is small, it will however be more efficent to use
your method and avoid sorting. It will perform significantly when the
indata is small.

public static bool MyPairSum(int[] array, int sum)
{
for (int i = 0; i < array.Length; i++)
{
int need = sum - array;
for (int j = i+1; j < array.Length; j++)
{
if (array[j] == need)
return true;
}
}
return false;
}

The point where both method perform equally well seems to be when the
array contains a little more than 100 items.

Finally there is a meta method that is often used when an algorithm
performs well with large data and less well with small data. You
basically use the best of both worlds. It basically looks like this.

const int cutoffPoint = 110;

public static bool MyPairSumBoth(int[] array, int sum)
{
if(array.Length > cutoffPoint)
MyPairSumSorted(array,sum);
else
MyPairSum(array,sum);
}

This will use the method that is best for the specific array. The
disadvantage off this method is that it uses more code since both
solutions has to be provided, but it can work efficently with any data
size.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top