Binary Search in String

Binary Search in String: In this tutorial, we will learn how to use binary search to find a word from a dictionary (A sorted list of words). Learn binary search in the string with the help of examples and C++ implementation. By Radib Kar Last updated : August 14, 2023

Prerequisite

Before learning how to perform binary search in the string, please go through these tutorials:

Binary search

Binary search is one of the most popular algorithms which searches a key in a sorted range in logarithmic time complexity.

Binary search in string

Earlier in this article, we saw that we can use binary search to find a key in a sorted range. We discussed based on integer element. In this article, we are going to discuss based on a sorted list of strings.

The idea is exactly similar, the similar the only difference is the data type which is string now to represent the words and the sorted list of words is known as a dictionary.

Example

The sorted list of words be:
["bad", "blog", "coder", "coding", "includehelp", "india"]

And, the key to be searched is: "dog" 
then answer would be "Not Found"

If the key to be searched is "coding" 
then index at where the word found is : 3 (0-based)

We know that binary search works on a method that finds a pivot and then compares the pivot element with the key and based on the comparison result it continues to search either in the left half or in the right half only and ultimately terminates.

How string comparison works?

So the thing is how string comparison works?

Say we have two string a and string b. How to decide which string is greater and which one is lesser. To compare strings we use lexicographical ordering. Between two strings a & b, which one has less ASCII valued character at first is counted to be the smaller one. For example, the smaller string between "d" & "abc" is "abc". Another situation can arrive when one string is prefix string of the other, then, of course, the prefix string would be the smaller one. For example, the smaller one between "abc" & "abcd" would be "abc".

Algorithm to search a word in a sorted list of words using a binary search

Below is the detailed algorithm to search a word in a sorted list of words using a binary search.

If the input list is not sorted we need to sort ourselves, otherwise, the binary search will fail.

Let's work on the above example to describe the binary search:

Wordlist: ["bad", "blog", "coder", "coding", "includehelp", "india"]
key to be searched= "coding"

The basic steps behind the binary search are to first divide the range into two (that's why binary) half based on a pivot. How will we choose the pivot?

We will pick the mid element as our pivot

To find the mid element simple do mid=(left+right)/2 where left is the start index of the current range and right is end index of the current range.

Now we need to check whether the search key is the same as the pivot element or not. If it's the same then, we are done. We found the key. If it's not the same then there can be two cases,

  • key> pivot element: //check how comparison works for string
    In this case, we need to check only the right half of the range. Right half means the elements which are greater than the pivot. This is possible only because the list is sorted. Since the word list is sorted it's guaranteed that search key will not appear in the left half as it's greater than the pivot. So we can discard the left half and shrink our range to [pivot index+1, right] for further search.
  • key< pivot element: //check how comparison works for string
    In this case, we need to check only the left half of the range. Left half means the word elements which are less than the pivot. This is possible only because the word list is sorted. Since the word list is sorted it's guaranteed that search key will not appear in the right half as it's less than the pivot. So we can discard the right half and shrink our range to [left, pivot index-1] for further search.

Algorithm steps

So every time,

  1. We will find the pivot index=(left+ right)/2
  2. We will check whether the pivot element is key or not, if it's the key then terminate as the key is found. Otherwise, shrink the range and update left or right as per choices discussed above
  3. Continue until range collapse down to 0(left>right)

Example explanation with algorithm steps

Below is the dry run with the algorithm:

Iteration 1:
Initially, 
the range is["bad", "blog", "coder", "coding", "includehelp", "india"], 
key="coding"

So left=0, right= 5
Pivot index is (0+5)/2=2, so pivot is "coder"
Now pivot < "coding"(key)
So we need to search the right half only, hence left=pivot index+1=3
Thus the searching range now: ["coding", "includehelp", "india"]
----------------------------------------

Iteration 2:
Now, 
the range is ["coding", "includehelp", "india"], 
key=" coding"

So left=3, right= 5
Pivot index is (3+5)/2=4, so pivot is "includehelp"
Now pivot >"coding" (key) 
So we need to search the left half only, hence right=pivot index-1=3
Thus the searching range now: ["coding", "includehelp", "india"]
-----------------------------------------

Iteration 3:
Now, 
the range is ["coding"], 
key="coding"

So left=3, right= 3
Pivot index is (3+3)/2=3, so pivot is "coding"
Now pivot =="coding" (key) 
Terminate the search and return pivot index

Time complexity

Time complexity is the same as binary search which is logarithmic, O(log2n). This is because every time our search range becomes half.

  • So, T(n)=T(n/2)+1(time for finding pivot)
  • Using the master theorem you can find T(n) to be Log2n. Also, you can think this as a series of n/2+n/4+n/8+n/16+….+1 which is Log2(n)

Better way to find the pivot index

We were finding the pivot index like (left+ right)/2. But one thing to notice that (left+ right) has a chance to lead to integer overflow. Hence a better method is to find the pivot index like below:

Pivot index= left+ (right-left)/2

C++ program to implement binary search in the string

#include <bits/stdc++.h>
using namespace std;

//iterative binary search
int binary_search_iterative(vector<string> arr, string key)
{
    int left = 0, right = arr.size();

    while (left <= right) {

        int mid = left + (right - left) / 2;
        if (arr[mid] == key)
            return mid;
        else if (arr[mid] < key)
            left = mid + 1;
        else
            right = mid - 1;
    }
    return -1;
}

//recursive binary search
int binary_search_recursive(vector<string> arr, string key, int left, int right)
{
    if (left > right)
        return -1;

    int mid = left + (right - left) / 2;
    if (arr[mid] == key)
        return mid;
    else if (arr[mid] < key)
        return binary_search_recursive(arr, key, mid + 1, right);
    return binary_search_recursive(arr, key, left, mid - 1);
}

//to print
void print(vector<string>& a)
{
    for (auto it : a)
        cout << it << " ";
    cout << endl;
}

int main()
{
    cout << "Enter number of words you want to enter for the word list\n";
    int n;
    cin >> n;

    vector<string> arr(n);
    cout << "Enter the words\n";
    for (int i = 0; i < n; i++) {
        cin >> arr[i];
    }

    cout << "Enter searching key\n";
    //key there
    string key;
    cin >> key;

    cout << "Sorting the input list to ensure binary search works\n";
    sort(arr.begin(), arr.end());

    cout << "Printing the sorted word list\n";

    print(arr);

    clock_t tStart1 = clock();
    int index = binary_search_iterative(arr, key);
    if (index == -1)
        cout << key << " not found\n";
    else
        cout << key << " found at index(0 based): " << index << endl;

    clock_t tend1 = clock();
    printf("Time taken in iterative binary search: %.6fs\n", (double)(tend1 - tStart1) / CLOCKS_PER_SEC);

    clock_t tStart2 = clock();
    index = binary_search_recursive(arr, key, 0, n - 1);
    if (index == -1)
        cout << key << " not found\n";
    else
        cout << key << " found at index(0 based): " << index << endl;
    clock_t tend2 = clock();
    printf("Time taken in recursive binary search: %.6fs\n", (double)(tend2 - tStart2) / CLOCKS_PER_SEC);

    return 0;
}

Output:

Enter number of words you want to enter for the word list
6
Enter the words
includehelp
india
bad
coding
coder
blog
Enter searching key
coding
Sorting the input list to ensure binary search works
Printing the sorted word list
bad blog coder coding includehelp india 
coding found at index(0 based): 3
Time taken in iterative binary search: 0.000023s
coding found at index(0 based): 3
Time taken in recursive binary search: 0.000007s

Related Tutorials

Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.