3/15/13
The Arrays class is part of the Java 2 Collections Framework and contains numerous static methods for representing and manipulating arrays, allowing them to be manipulated independently of the details of their representation. Among other things, it contains methods for sorting, searching, copying, and the toString(), which allows you to output an array as a formatted list. Today’s article presents the binarySearch() method; it employs the Binary (or half-interval) search algorithm to find the index of an element within a sorted array.
How the Binary Search Algorithm Works
When working with sorted arrays, a binary search can be a lot faster than iterating through the list from start to finish. It works by iteratively or recursively eliminating half the array until either the element is found or the whole list has been exhausted. It’s a lot like 20 questions except that it’s really the same three questions over and over! The java.util.Arrays class has an overloaded version of binarySearch() for all types of Objects and variables, so, whatever type you’re working with, there is a version for you. Here’s an example to help clarify:
Say that we have the following list of ten char data items:
private char[] myArray = {'a', 'b', 'c', 'e', 'f', 'h', 'i', 'l', 'n', 'p', 's', 'v', 'z'};
Because of the missing letters, you can’t easily surmise the position of the letter ‘n’. So you call the Arrays.binarySearch(char[] a, char key) method:
int index = Arrays.binarySearch(myArray, 'n');
The method begins by locating the middle element index, thereby splitting the method elements in half, at which point there are one of three possibilities:
- The element position is lower than the halfway index.
- The element position is greater than the halfway index.
- The element position is exactly at the halfway index.
In the latter case, the element index is simply returned from the function. Usually, it takes a few more iterations of this process whereby the split portion that might contain the desired element is put to the above test.
Using the Returned Insertion Point Value
When the array does not contain the desired value, the element of the key that would be inserted next into the array is returned. That’s called the insertion point. It’s either the index of the first element greater than the key or the length of the array if all elements in the array are less than the specified key. Since integers from 0 to the array length minus one represent valid results for a successful search, the first available insertion point is -1. Therefore, the actual insertion point is represented as (-(insertion point) -1). For instance, if the insertion point of a search is at element 2, the actual insertion point returned will be -3. So, with regards to myArray above, the insertion point for the character ‘m’ would be -9 even though the real insertion point would be 8.
Don’t try to come up with a conversion algorithm just yet. The compliment Operator (~) will convert the returned insertion point to the real one for you. It flips bits around so that ~2 equals -3 and vice versa:
int result = java.util.Arrays.binarySearch(myArray, 'm'); System.out.println(~result); //displays 8
Having the correct insertion point is all well and good, but we still have to do something with it. If you’ve ever tried to insert an element in a standard array, it’s not easy. The ArrayList class is much better suited to such operations. You can’t use generics with primitive types, so char values have to be converted to their Character wrapper equivalent. No conversion of each element is required because autoboxing takes care of that for us. We can then use the ArrayList’s add(int index, Object value) method to add the ‘m’ char at the insertion point:
ArrayList<Character> list = new ArrayList<Character>();
for (char c : myArray) { list.add(c); }
list.add(~result, 'm'); System.out.println(list); //displays [a, b, c, e, f, h, i, l, m, n, p, s, v, z] System.out.println(list.get(0)instanceof Character); //true
Should you wish to go back to using a primitive array, you’ll have to follow a reverse process. First, a new myArray must be created with an additional element for the ‘m’ value. Then a for loop is used to set each array element. Again, autoboxing takes care of the conversion from Character to char primitive:
myArray = new char[list.size()]; for (int c = 0; c < list.size() - 1; c++) { myArray[c] = list.get(c); } System.out.println(list); //displays [a, b, c, e, f, h, i, l, m, n, p, s, v, z]
Unique vs. Non-unique Elements
If the array contains multiple elements with the specified value, there is no guarantee which one will be found. That’s because there’s no way to be sure that multiple instances of the same value will remain intact when the array is split in half. If you have any doubts that all of the array elements are unique, you can create a HashSet out of a list Arrays.asList() method
List<String> lst = Arrays.asList("a", "b", "c", "a", "c", "b"); Set<String> set = new HashSet<String>(lst); System.out.println("Original list: " + lst); //[a, b, c, a, c, b] System.out.println("Unique values: " + set); //[a, b, c] //update lst with unique values lst = new ArrayList<String>(set); System.out.println(lst);
Conclusion
The Arrays.binarySearch() method is a highly efficient way to lookup and index in an ordered list. Best of all, you don’t need to rely on any third-party libraries. Just import, the java.util.Arrays class, and you’re good to go. In an upcoming article, we’ll look at how to use the Arrays class to sort your lists.