Binary search class: small problem retrieving the last element in the ordered array

B

Bob Rock

Hello,

I have an array of strings and need to find the matching one with the
fastest possible code. I decided to order the array and then write a binary
search algo.
What I came up with is the following. I noticed that if I set:

int upper = strings.GetUpperBound(0);

I never match the last element in the array (i.e. "iii")

while if I set:

int upper = strings.GetUpperBound(0) + 1;

I'm able to match also the last element in the array.
The problem is that I think upper should indeed be equal to
strings.GetUpperBound(0).
Is there something I'm missing??? Is the algo wrong or missing something???


using System;

namespace TestApplication
{
class BinarySearchClass
{
static void Main(string[] args)
{
string[] strings = new string[]{"aaa", "bbb", "ccc", "ddd", "eee", "fff",
"ggg", "hhh", "iii"};

BinarySearchClass search = new BinarySearchClass();

int res = search.FindString(strings, "iii");

Console.Read();
}

public int FindString(string[] strings, string str)
{
int lower = strings.GetLowerBound(0);
int upper = strings.GetUpperBound(0);

return this.BinarySearch(strings, str, lower, upper);
}

public int BinarySearch(string[] strings, string str, int lowerbound, int
upperbound)
{
int pos = ((upperbound - lowerbound) / 2) + lowerbound;

int res = string.Compare(strings[pos], str);

if(res > 0)
{
pos = this.BinarySearch(strings, str, lowerbound, pos);
}
else if(res < 0)
{
pos = this.BinarySearch(strings, str, pos, upperbound);
}
return pos;
}
}
}



Regards,
Bob Rock
 
F

Fred Mellender

You do need set upper = strings.GetUpperBound(0) + 1. This is because when
you
divide by 2, you truncate and thus can never attain the end of the array,
unless you add
1.

I don't see why your algorithm does not loop forever if there is no match in
the array. You
should check to see that the new pos is not equal to either the upper or
lower bound. If it is, the binary search is done, even if the element is
not found. Continuing to
call BinarySearch will just calculate the same pos forever.

You should *seriously* consider using the .Net library's BinarySearch
instead
of writing your own, IMHO.
 
J

john sun

Bob said:
Hello,

I have an array of strings and need to find the matching one with the
fastest possible code. I decided to order the array and then write a binary
search algo.
What I came up with is the following. I noticed that if I set:

int upper = strings.GetUpperBound(0);

I never match the last element in the array (i.e. "iii")

while if I set:

int upper = strings.GetUpperBound(0) + 1;

I'm able to match also the last element in the array.
The problem is that I think upper should indeed be equal to
strings.GetUpperBound(0).
Is there something I'm missing??? Is the algo wrong or missing something???


using System;

namespace TestApplication
{
class BinarySearchClass
{
static void Main(string[] args)
{
string[] strings = new string[]{"aaa", "bbb", "ccc", "ddd", "eee", "fff",
"ggg", "hhh", "iii"};

BinarySearchClass search = new BinarySearchClass();

int res = search.FindString(strings, "iii");

Console.Read();
}

public int FindString(string[] strings, string str)
{
int lower = strings.GetLowerBound(0);
int upper = strings.GetUpperBound(0);

return this.BinarySearch(strings, str, lower, upper);
}

public int BinarySearch(string[] strings, string str, int lowerbound, int
upperbound)
{
int pos = ((upperbound - lowerbound) / 2) + lowerbound;

int res = string.Compare(strings[pos], str);

if(res > 0)
{
pos = this.BinarySearch(strings, str, lowerbound, pos);
}
else if(res < 0)
{
pos = this.BinarySearch(strings, str, pos, upperbound);
}
return pos;
}
}
}



Regards,
Bob Rock
Bob, how about this..

int pos= Array.BinarySearch<string>(strings, "iii");

J.W.
 
B

Bob Rock

Yes, the code as I provided it does loop forever if there is no match.
I coded my own algo because I don't want BinarySearch to order by array
every single time, as it indeed does. The array is already ordered.
As for the stop condition when the searched string is not in the array,
upper == lower does not guarantee the search from stopping.
For example, if the searched string is lexicographically bigger than any
other string in the array, lower never gets to be equal to upper and the
algo goes on in a loop until a until a stack overflow exception is rised.

I wonder how I should write the stop condition to handle any situation.


Bob Rock
 
B

Bob Rock

Yes, the problem is that BinarySearch orders the array everytime while the
array in *already* ordered. I was trying to be more efficient by writing my
own code.

Bob Rock
 
C

Carl Daniel [VC++ MVP]

Bob said:
Yes, the problem is that BinarySearch orders the array everytime
while the array in *already* ordered. I was trying to be more
efficient by writing my own code.

Not so. Array.BinarySearch requires that the array is already sorted - it
does not itself sort the array. You're re-inventing a wheel that's already
built for you in the framework.

-cd
 
J

john sun

Bob said:
Yes, the code as I provided it does loop forever if there is no match.
I coded my own algo because I don't want BinarySearch to order by array
every single time, as it indeed does. The array is already ordered.
As for the stop condition when the searched string is not in the array,
upper == lower does not guarantee the search from stopping.
For example, if the searched string is lexicographically bigger than any
other string in the array, lower never gets to be equal to upper and the
algo goes on in a loop until a until a stack overflow exception is rised.

I wonder how I should write the stop condition to handle any situation.
when (upperbound - lowerbound)==1, you should just check two elements to
see if there is a match..

Your condition :

int pos = ((upperbound - lowerbound) / 2) + lowerbound;

if the upperbound =2, and lowerbound=1, then you get pos=1, and when you
go to the next loop, you are getting the same condition.

Have you did some experiment that you can save time by doing your own
binary search.
 
B

Ben Voigt

public int BinarySearch(string[] strings, string str, int lowerbound, int
upperbound)
{
int pos = ((upperbound - lowerbound) / 2) + lowerbound;

int res = string.Compare(strings[pos], str);

if(res > 0)
{
pos = this.BinarySearch(strings, str, lowerbound, pos);
}
else if(res < 0)
{
pos = this.BinarySearch(strings, str, pos, upperbound);

Use pos + 1 here, since pos is already checked.
 
C

Carl Daniel [VC++ MVP]

Ben Voigt said:
public int BinarySearch(string[] strings, string str, int lowerbound,
int upperbound)
{
int pos = ((upperbound - lowerbound) / 2) + lowerbound;

int res = string.Compare(strings[pos], str);

if(res > 0)
{
pos = this.BinarySearch(strings, str, lowerbound, pos);
}
else if(res < 0)
{
pos = this.BinarySearch(strings, str, pos, upperbound);

Use pos + 1 here, since pos is already checked.
}
return pos;
}
}
}

.... and if you really want to gain performance, re-write this as a
non-recursive version. The C# compiler won't un-do the tail recursion for
you. The JIT might, but I wouldn't count on it - better to just write a
non-recursive version to start with. Even better, just use
Array.BinarySearch (assuming you're using the 2.0 framework, or later).

-cd
 
B

Ben Voigt

Carl Daniel said:
Ben Voigt said:
public int BinarySearch(string[] strings, string str, int lowerbound,
int upperbound)
{
int pos = ((upperbound - lowerbound) / 2) + lowerbound;

int res = string.Compare(strings[pos], str);

if(res > 0)
{
pos = this.BinarySearch(strings, str, lowerbound, pos);
}
else if(res < 0)
{
pos = this.BinarySearch(strings, str, pos, upperbound);

Use pos + 1 here, since pos is already checked.
}
return pos;
}
}
}

... and if you really want to gain performance, re-write this as a
non-recursive version. The C# compiler won't un-do the tail recursion for
you. The JIT might, but I wouldn't count on it - better to just write a
non-recursive version to start with. Even better, just use
Array.BinarySearch (assuming you're using the 2.0 framework, or later).

That wasn't actually meant as a performance tip, but for correctness.
 
C

Carl Daniel [VC++ MVP]

Ben Voigt said:
That wasn't actually meant as a performance tip, but for correctness.

I know - just pointing it out for the OPs benefit. I know you know better
:)

-cd
 
B

Bob Rock

Hello Carl,

I read somewhere that BinarySearch was sorting the array everytime it is
called ... tried finding the source but no luck.
Anyhow in the end I got the code working .... although I do not know if it
is faster or not:

public int BinarySearch(string[] strings, string str)
{
int upper = strings.GetUpperBound(0);
int lower = strings.GetLowerBound(0);

while(upper >= lower)
{
int pos = (upper + lower) / 2;

int res = string.Compare(strings[pos], str);
if(res > 0)
{
upper = pos - 1;
}
else if(res < 0)
{
lower = pos + 1;
}
else
{
return pos;
}
}

throw new ApplicationException("String not found.");
}



Bob Rock
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top