Binary Search in String

Binary Search in String: In this article, we have discussed how we can use binary search to find a word from a dictionary (Sorted list of words)?
Submitted by Radib Kar, on July 23, 2020

Prerequisite:

Binary search is one of the most popular algorithms which searches a key in a sorted range in logarithmic time complexity.

Binary search in string

Earlier in this article, we saw that we can use binary search to find a key in a sorted range. We discussed based on integer element. In this article, we are going to discuss based on a sorted list of strings.

The idea is exactly similar, the similar the only difference is the data type which is string now to represent the words and the sorted list of words is known as a dictionary.

For example,

The sorted list of words be:
["bad", "blog", "coder", "coding", "includehelp", "india"]

And, the key to be searched is: "dog" 
then answer would be "Not Found"

If the key to be searched is "coding" 
then index at where the word found is : 3 (0-based)

We know that binary search works on a method that finds a pivot and then compares the pivot element with the key and based on the comparison result it continues to search either in the left half or in the right half only and ultimately terminates.

So the thing is how string comparison works?

Say we have two string a and string b. How to decide which string is greater and which one is lesser. To compare strings we use lexicographical ordering. Between two strings a & b, which one has less ASCII valued character at first is counted to be the smaller one. For example, the smaller string between "d" & "abc" is "abc". Another situation can arrive when one string is prefix string of the other, then, of course, the prefix string would be the smaller one. For example, the smaller one between "abc" & "abcd" would be "abc".

Below is the detailed algorithm to search a word in a sorted list of words using a binary search.

If the input list is not sorted we need to sort ourselves, otherwise, the binary search will fail.

Let's work on the above example to describe the binary search:

Wordlist: ["bad", "blog", "coder", "coding", "includehelp", "india"]
key to be searched= "coding"

The basic steps behind the binary search are to first divide the range into two (that's why binary) half based on a pivot. How will we choose the pivot?

We will pick the mid element as our pivot

To find the mid element simple do mid=(left+right)/2 where left is the start index of the current range and right is end index of the current range.

Now we need to check whether the search key is the same as the pivot element or not. If it's the same then, we are done. We found the key. If it's not the same then there can be two cases,

  • key> pivot element: //check how comparison works for string
    In this case, we need to check only the right half of the range. Right half means the elements which are greater than the pivot. This is possible only because the list is sorted. Since the word list is sorted it's guaranteed that search key will not appear in the left half as it's greater than the pivot. So we can discard the left half and shrink our range to [pivot index+1, right] for further search.
  • key< pivot element: //check how comparison works for string
    In this case, we need to check only the left half of the range. Left half means the word elements which are less than the pivot. This is possible only because the word list is sorted. Since the word list is sorted it's guaranteed that search key will not appear in the right half as it's less than the pivot. So we can discard the right half and shrink our range to [left, pivot index-1] for further search.

So every time,

  1. We will find the pivot index=(left+ right)/2
  2. We will check whether the pivot element is key or not, if it's the key then terminate as the key is found. Otherwise, shrink the range and update left or right as per choices discussed above
  3. Continue until range collapse down to 0(left>right)

Below is the dry run with the algorithm:

Iteration 1:
Initially, 
the range is["bad", "blog", "coder", "coding", "includehelp", "india"], 
key="coding"

So left=0, right= 5
Pivot index is (0+5)/2=2, so pivot is "coder"
Now pivot < "coding"(key)
So we need to search the right half only, hence left=pivot index+1=3
Thus the searching range now: ["coding", "includehelp", "india"]
----------------------------------------

Iteration 2:
Now, 
the range is ["coding", "includehelp", "india"], 
key=" coding"

So left=3, right= 5
Pivot index is (3+5)/2=4, so pivot is "includehelp"
Now pivot >"coding" (key) 
So we need to search the left half only, hence right=pivot index-1=3
Thus the searching range now: ["coding", "includehelp", "india"]
-----------------------------------------

Iteration 3:
Now, 
the range is ["coding"], 
key="coding"

So left=3, right= 3
Pivot index is (3+3)/2=3, so pivot is "coding"
Now pivot =="coding" (key) 
Terminate the search and return pivot index

Time complexity:

Time complexity is the same as binary search which is logarithmic, O(log2n). This is because every time our search range becomes half.

  • So, T(n)=T(n/2)+1(time for finding pivot)
  • Using the master theorem you can find T(n) to be Log2n. Also, you can think this as a series of n/2+n/4+n/8+n/16+….+1 which is Log2(n)

Better way to find the pivot index:

We were finding the pivot index like (left+ right)/2. But one thing to notice that (left+ right) has a chance to lead to integer overflow. Hence a better method is to find the pivot index like below:

Pivot index= left+ (right-left)/2

C++ implementation:

#include <bits/stdc++.h>
using namespace std;

//iterative binary search
int binary_search_iterative(vector<string> arr, string key)
{
    int left = 0, right = arr.size();

    while (left <= right) {

        int mid = left + (right - left) / 2;
        if (arr[mid] == key)
            return mid;
        else if (arr[mid] < key)
            left = mid + 1;
        else
            right = mid - 1;
    }
    return -1;
}

//recursive binary search
int binary_search_recursive(vector<string> arr, string key, int left, int right)
{
    if (left > right)
        return -1;

    int mid = left + (right - left) / 2;
    if (arr[mid] == key)
        return mid;
    else if (arr[mid] < key)
        return binary_search_recursive(arr, key, mid + 1, right);
    return binary_search_recursive(arr, key, left, mid - 1);
}

//to print
void print(vector<string>& a)
{
    for (auto it : a)
        cout << it << " ";
    cout << endl;
}

int main()
{
    cout << "Enter number of words you want to enter for the word list\n";
    int n;
    cin >> n;

    vector<string> arr(n);
    cout << "Enter the words\n";
    for (int i = 0; i < n; i++) {
        cin >> arr[i];
    }

    cout << "Enter searching key\n";
    //key there
    string key;
    cin >> key;

    cout << "Sorting the input list to ensure binary search works\n";
    sort(arr.begin(), arr.end());

    cout << "Printing the sorted word list\n";

    print(arr);

    clock_t tStart1 = clock();
    int index = binary_search_iterative(arr, key);
    if (index == -1)
        cout << key << " not found\n";
    else
        cout << key << " found at index(0 based): " << index << endl;

    clock_t tend1 = clock();
    printf("Time taken in iterative binary search: %.6fs\n", (double)(tend1 - tStart1) / CLOCKS_PER_SEC);

    clock_t tStart2 = clock();
    index = binary_search_recursive(arr, key, 0, n - 1);
    if (index == -1)
        cout << key << " not found\n";
    else
        cout << key << " found at index(0 based): " << index << endl;
    clock_t tend2 = clock();
    printf("Time taken in recursive binary search: %.6fs\n", (double)(tend2 - tStart2) / CLOCKS_PER_SEC);

    return 0;
}

Output:

Enter number of words you want to enter for the word list
6
Enter the words
includehelp
india
bad
coding
coder
blog
Enter searching key
coding
Sorting the input list to ensure binary search works
Printing the sorted word list
bad blog coder coding includehelp india 
coding found at index(0 based): 3
Time taken in iterative binary search: 0.000023s
coding found at index(0 based): 3
Time taken in recursive binary search: 0.000007s





Comments and Discussions

Ad: Are you a blogger? Join our Blogging forum.





Languages: » C » C++ » C++ STL » Java » Data Structure » C#.Net » Android » Kotlin » SQL
Web Technologies: » PHP » Python » JavaScript » CSS » Ajax » Node.js » Web programming/HTML
Solved programs: » C » C++ » DS » Java » C#
Aptitude que. & ans.: » C » C++ » Java » DBMS
Interview que. & ans.: » C » Embedded C » Java » SEO » HR
CS Subjects: » CS Basics » O.S. » Networks » DBMS » Embedded Systems » Cloud Computing
» Machine learning » CS Organizations » Linux » DOS
More: » Articles » Puzzles » News/Updates


© https://www.includehelp.com some rights reserved.