Hashing Data Structure

Data Structure | Hashing: In this tutorial, we are going to learn about the Hashing data structure, hashing along with hash table ADT, hashing functions, advantages of hashing, and the applications of hashing, creating a hash, etc.
Submitted by Radib Kar, on July 01, 2020

What is Hashing?

Hashing is a technique that is used for storing and extracting information in a faster way. It helps to perform searching in an optimal way. Hashing is used in databases, encryptions, symbol tables, etc.

Why Hashing is needed?

Hashing is needed to execute the search, insert, and deletions in constant time on an average. In our other data structures like an array, linked list the above operations take linear time, O(n). The best case can be a self-balanced tree-like AVL tree, where the time complexity is of order O(logn). But Hashing allows us to perform the operations in constant time, O(1) on an average.

Component of hashing

Hash table
Hash functions
Collisions
Collision resolution techniques

Hash Table ADT

As an ADT, the common operations of a hash table is,

Search a key in the hash table
Insert a key in the hash table
Delete a key from the hash table

To understand what hash table is and how it works, let's take a programming example. Say the problem is to find unique characters in a string (lowercase only),

#include <bits/stdc++.h>
using namespace std;

int main()
{
    string str;

    cout << "input your string:\n";
    cin >> str;

    //create the hash table
    int arr[26] = { 0 };
    for (char c : str) {
        arr[c - 'a']++;
    }

    //finding unique character in O(1)
    cout << "unique characters are: ";
    for (int i = 0; i < 26; i++) {
        if (arr[i] == 1)
            cout << char(i + 'a') << " ";
    }

    return 0;
}

Output:

input your string:
includehelp
unique characters are: c d h i n p u

Now how we can solve this using hash table & hashing?

Step 1 (Create the hash table):

We will use an array to create the hash table. To create the hash table,

Let's declare an array of size 26, arr[26] and initialize with 0 initially (size =26 since there is 26 lowercase numbers only)

The for each character of the string we will map like below,

for each character c in string str:
arr[c-'a']++

This will create a hash table for us to represent the string

Step 2 (Search the unique character):

Now if we find any arr[i] having value 1 then char(i+'a') will be our unique character
This is a standard example of hashing where we found the unique character in O(1) (O(26)=O(1) ) time after constructing the hash table.

So hashing is an excellent solution for key-based searching if we can construct the hash table.

Okay, so the question is why we are calling it hashing though we used an array only? We could have termed is as just another use of an array.

To address the above question let's think what we could have done if the problem was related to numbers instead of lowercase character siring? What would be our array size then? Of course, we can't still take 26 then as the numbers will be ranging +INF to –INF (Okay for your satisfaction LLMAX to LLMIN), but that will not be a feasible thing to do of course as it won't be O(1) anymore. So what we need to do is to hash all the values to some limited range say [0 to k], so that it still stays O(1). This is known as hashing where we are mapping the keys (original values) to some locations based on a predefined algorithm (hashing function).

For example,

Say we have keys,
13, 1112, 111119, 112345111118, 11111111111115
And we have the locations (set of limited range) as [0,10]
So after hashing we may found
13 -> 3
1112 -> 2
111119 -> 9
112345111118 -> 8
11111111111115 -> 5

Can you guess the hashing function here?

This is a kind of direct addressing where we can find the key just by looking at the locations. Like if we look at location 3, we will find 13. But this never happens if we have a large number of keys and that's why we require some hashing functions.

Hash functions

Hash functions convert a key into the index of the hash table (location). A hash function should generate unique locations, but that's difficult to achieve since the number of indexes is much less than the number of keys. We often lead to the collision while using a hash function which is not perfect.

There are several collision removing techniques like below:

Direct chaining
1. Separate chaining
Open addressing
1. Linear probing
2. Quadratic probing
3. Double hashing

To more detail on hashing functions: Hash functions and its characteristics

Coding problems using hashing:

DS - Basics

DS - Array

DS - Linked List

DS - Stack

DS - Queue

DS Hashing

DS Tree

DS - Graph

DS Programs Using C/C++

DS - Miscellaneous Topics