There is quite obviously no way to speed up a search, unless more information is available about the searched data. It is well known that a search can be made much more effective, if the data are ordered. Imagine, for example, a telephone directory in which the names were not alphabetically listed. It would be utterly useless. We shall therefore present an algorithm which makes use of the knowledge that a is ordered, i.e., of the condition
Ak: 1≤k < N : ak-1≤ak
The key idea is to inspect an element picked at random, say am, and to compare it with the search argument x. If it is equal to x, the search terminates; if it is less than x, we infer that all elements with indices less or equal to m can be eliminated from further searches; and if it is greater than x, all with index greater or equal to m can be eliminated. This results in the following algorithm called binary search; it uses two index variables L and R marking the left and at the right end of the section of a in which an element may still be found.
L := 0; R := N-1; found := FALSE ;
WHILE (L≤R) & ~found DO
m := any value between L and R;
IF a[m] = x THEN found := TRUE
ELSIF a[m] < x THEN L := m+1
ELSE R := m-1
END
END
The loop invariant, i.e. the condition satisfied before each step, is
(L≤R) & (Ak : 0≤k < L : ak < x) & (Ak : R < k < N : ak > x)
from which the result is derived as
found OR ((L > R) & (Ak : 0≤k < L : ak < x ) & (Ak : R < k < N : ak > x))
which implies
(am = x) OR (Ak : 0≤k < N : ak≠x)
The choice of m is apparently arbitrary in the sense that correctness does not depend on it. But it does influence the algorithm’s effectiveness. Clearly our goal must be to eliminate in each step as many elements as possible from further searches, no matter what the outcome of the comparison is. The optimal solution is to choose the middle element, because this eliminates half of the array in any case. As a result, the maximum number of steps is log2N, rounded up to the nearest integer. Hence, this algorithm offers a drastic improvement over linear search, where the expected number of comparisons is N/2.
The efficiency can be somewhat improved by interchanging the two if-clauses. Equality should be tested second, because it occurs only once and causes termination. But more relevant is the question, whether – as in the case of linear search — a solution could be found that allows a simpler condition for termination. We indeed find such a faster algorithm, if we abandon the naive wish to terminate the search as soon as a match is established. This seems unwise at first glance, but on closer inspection we realize that the gain in efficiency at every step is greater than the loss incurred in comparing a few extra elements. Remember that the number of steps is at most log N. The faster solution is based on the following invariant:
(Ak : 0≤k < L : ak < x) & (Ak : R≤k < N : ak≥x)
and the search is continued until the two sections span the entire array.
L := 0; R := N;
WHILE L < R DO
m := (L+R) DIV 2;
IF a[m] < x THEN L := m+1 ELSE R := m END
END
The terminating condition is L≥R. Is it guaranteed to be reached? In order to establish this guarantee, we must show that under all circumstances the difference R-L is diminished in each step. L < R holds at the 36 beginning of each step. The arithmetic mean m then satisfies L≤m < R. Hence, the difference is indeed diminished by either assigning m+1 to L (increasing L) or m to R (decreasing R), and the repetition terminates with L = R. However, the invariant and L = R do not yet establish a match. Certainly, if R = N, no match exists. Otherwise we must take into consideration that the element a[R] had never been compared. Hence, an additional test for equality a[R] = x is necessary. In contrast to the first solution, this algorithm — like linear search — finds the matching element with the least index.