Hashing | Open addressing for collision handling

Open addressing for collision handling: In this article are we are going to learn about the open addressing for collision handling which can be further divided into linear probing, quadratic probing, and double hashing.
Submitted by Radib Kar, on July 01, 2020

Prerequisite: Hashing data structure

Open addressing

In open addressing, all the keys will be stored in the hash table itself, not by using any additional memory or extending the index(linked list). This is also known as closed hashing and this is done mainly based on probing. Probing can be done based on either linear probing or quadratic probing

In open addressing, we keep rehashing until we resolve.

Linear Probing

In linear probing, the rehashing process in linear. Say the location found at any step is n and n is occupied then the next attempt will be to hash at position (n+1). We wrap around from the last table location to first location if necessary.

rehashing(key) = (n+1) % table size

Example:

say the keys are: {123, 124, 333, 4679,983}
hash table size is 10

So,

Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.


Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	Empty
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty

Step 2: Inserting 124 in the hash map. So, location is 4 and we can map since not occupied.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty

Step 3:

Inserting 333 in the hash map. So, location is 3, but it's occupied

Next location is 4, that's occupied too.

Next location is 5 and we can map here since unoccupied. (linear probing)

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	333
6	Empty
7	Empty
8	Empty
9	Empty

Step 4: Inserting 4679 in the hash map. So, location is 9. We can map here since unoccupied.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	333
6	Empty
7	Empty
8	Empty
9	4679

Step 5:

Inserting 983 in the hash map. So, location is 3.

But it's occupied and using linear probing we map it to 6.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	333
6	983
7	Empty
8	Empty
9	4679

Advantage and disadvantages of linear probing:

  • The advantage of linear probing is that it's easy to implement.
  • The disadvantage is that it leads to clustering. Due to linear probing, the collided keys form a cluster in groups and increases the probing length which reduces the overall efficiency.

C++ implementation for linear probing

//linear probing
#include <bits/stdc++.h>
using namespace std;

void add_using_linear_probing(int hash[], int a)
{
    //hash function h(x)=x%10
    int k = a % 10;

    //linear probing
    while (true) {
        if (hash[k] == -1) {
            hash[k] = a;
            break;
        }
        k = (k + 1) % 10; //linear increment of probe
    }
}

int main()
{
    //set of input numbers
    vector<int> arr{ 123, 124, 333, 4679, 983 };

    //initialize the hash table
    //each entry of the hash table is a single entry
    int hash[10]; //size of hashtable is 10
    memset(hash, -1, sizeof(hash)); //initialize with empty initially

    for (int a : arr) {
        //hashing
        add_using_linear_probing(hash, a);
    }

    cout << "---------using linear probing---------\n";

    cout << "Hash table is:\n";

    for (int i = 0; i < 10; i++) {
        if (hash[i] == -1)
            cout << i << "->"
                 << "Empty" << endl;
        else
            cout << i << "->" << hash[i] << endl;
    }

    return 0;
}

Output:

---------using linear probing---------
Hash table is:
0->Empty
1->Empty
2->Empty
3->123
4->124
5->333
6->983
7->Empty
8->Empty
9->4679

Quadratic probing

The problem of clustering can be avoided by using quadratic probing. Here the rehashing is done like below,

rehashing(key) = (n+k2) % table size
where, k is 1,2,3, ...

We wrap around from the last table location to the first location if necessary.

Like say hashing location initially is 3 and 3 is occupied then we will go for 3+12=4, if it’s still occupied we will go for 4+22=8. So on

For the above set of keys that we used for linear probing we will use quadratic probing.

Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	Empty
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty

Step 2: Inserting 124 in the hash map. So, location is 4 and we can map since not occupied.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty

Step 3:

Inserting 333 in the hash map. So, location is 3, but it’s occupied.

Next location is 3+12=4, that’s occupied too.

Next location is 4+22=8 and we can map here since unoccupied. (quadratic probing)

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	333
9	Empty

Step 4:

Inserting 4679 in the hash map. So, location is 9.

We can map here since unoccupied.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	333
9	4679

Step 5:

Inserting 983 in the hash map. So, location is 3.

So, location is 3, but it's occupied

Next location is 3+12=4, that's occupied too.

Next location is 4+22=8 and that's occupied too.

Next location is 8+32==17%10=7

we can map here since unoccupied. (quadratic probing)

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	983
8	333
9	4679

Advantage and disadvantages of quadratic probing:

The main advantage is here we can overcome the problem of clustering which appeared in the case of linear probing. By quadratic chances of clustering is much less.

C++ implementation of quadratic probing

//quadratic probing
#include <bits/stdc++.h>
using namespace std;

void add_using_quadratic_probing(int hash[], int a)
{
    //hash function h(x)=x%10
    int k = a % 10;

    //quadratic probing
    int incr = 1;
    while (true) {
        if (hash[k] == -1) {
            hash[k] = a;
            break;
        }
        //quadratic increment of probe
        k = (k + int(pow(incr, 2))) % 10; 
        incr++;
    }
}

int main()
{
    //set of input numbers
    vector<int> arr{ 123, 124, 333, 4679, 983 };

    //initialize the hash table
    //each entry of the hash table is a single entry
    int hash[10]; //size of hashtable is 10
    memset(hash, -1, sizeof(hash)); //initialize with empty initially

    for (int a : arr) {
        //hashing
        add_using_quadratic_probing(hash, a);
    }

    cout << "---------using quadratic probing---------\n";

    cout << "Hash table is:\n";

    for (int i = 0; i < 10; i++) {
        if (hash[i] == -1)
            cout << i << "->"
                 << "Empty" << endl;
        else
            cout << i << "->" << hash[i] << endl;
    }

    return 0;
}

Output:

---------using quadratic probing---------
Hash table is:
0->Empty
1->Empty
2->Empty
3->123
4->124
5->Empty
6->Empty
7->983
8->333
9->4679

Double hashing

Double hashing is the best open addressing technique to overcome clustering chances. Here we increment the probing length based on another hash function.

Say the primary hash function is h1 and secondary hash function is h2 to increment probing length

Then f(key)=h1(key)+k*h2(key) where h2≠h1

Like, first we find h1(key). If it's occupied, we will go for h1(key)+h2(key) where h2(key) is the probing length. If it's still occupied then will go for h1(key) +2*h2(key), so on

We will wrap up from last of table table to beginning of table if necessary

If we do double hashing with our previous example, And take h1(key)=key%10 and h2(key)=number digits in key

Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	Empty
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty

Step 2: Inserting 124 in the hash map. So, location is 4 and we can map since not occupied.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty

Step 3:

Inserting 333 in the hash map. So, location is 3, but it's occupied.

Next location is 3+digits(3) = 6, that's occupied too.

Next location is 4+22=8 and we can map here since unoccupied. (double hashing)

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	333
7	Empty
8	Empty
9	Empty

Step 4:

Inserting 4679 in the hash map. So, location is 9.

We can map here since unoccupied.

Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	333
7	Empty
8	Empty
9	4679

Step 5:

Inserting 983 in the hash map. So, location is 3, but it's occupied.

Next location is 3+digit(983)=6, that's occupied too.

Next location is 3+2*digit(983)=9 and that's occupied too.

Next location is 3+3*digit(983)==12%10=2

We can map here since unoccupied. (double hashing)

Index	Keys
0	Empty
1	Empty
2	983
3	123
4	124
5	Empty
6	333
7	Empty
8	Empty
9	4679

Advantages and disadvantages of double hashing:

The clustering chance is very less as we are using a separate hashing function, h2(key) to increment the probing length. A new hashing function is an overhead of course.

Double hashing uses few probes than quadratic or linear probing but takes more time than those two.

C++ implementation of Double hashing

//double hashing
#include <bits/stdc++.h>
using namespace std;

int digit(int a)
{
    return to_string(a).length();
}

void add_using_double_hashing(int hash[], int a)
{
    //hash function h1(x)=x%10
    //hash function h2(x)=digit(x) 
    //for incrementing probing
    int k = a % 10;

    //double hashing
    int count = 1;
    while (true) {
        if (hash[k] == -1) {
            hash[k] = a;
            break;
        }
        //double hashing for incrementing prob length
        k = (k + count * digit(a)) % 10; 
        count++;
    }
}

int main()
{
    //set of input numbers
    vector<int> arr{ 123, 124, 333, 4679, 983 };

    //initialize the hash table
    //each entry of the hash table is a single entry
    int hash[10]; //size of hashtable is 10
    //initialize with empty initially
    memset(hash, -1, sizeof(hash)); 

    for (int a : arr) {
        //hashing
        add_using_double_hashing(hash, a);
    }

    cout << "---------using double hashing---------\n";

    cout << "Hash table is:\n";

    for (int i = 0; i < 10; i++) {
        if (hash[i] == -1)
            cout << i << "->"
                 << "Empty" << endl;
        else
            cout << i << "->" << hash[i] << endl;
    }

    return 0;
}

Output:

---------using double hashing---------
Hash table is:
0->Empty
1->Empty
2->983
3->123
4->124
5->Empty
6->333
7->Empty
8->Empty
9->4679



Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.