# Hashing | Open addressing for collision handling

Open addressing for collision handling: In this article are we are going to learn about the open addressing for collision handling which can be further divided into linear probing, quadratic probing, and double hashing.
Submitted by Radib Kar, on July 01, 2020

Prerequisite: Hashing data structure

In open addressing, all the keys will be stored in the hash table itself, not by using any additional memory or extending the index(linked list). This is also known as closed hashing and this is done mainly based on probing. Probing can be done based on either linear probing or quadratic probing

In open addressing, we keep rehashing until we resolve.

## Linear Probing

In linear probing, the rehashing process in linear. Say the location found at any step is n and n is occupied then the next attempt will be to hash at position (n+1). We wrap around from the last table location to first location if necessary.

```rehashing(key) = (n+1) % table size
```

Example:

```say the keys are: {123, 124, 333, 4679,983}
hash table size is 10
```

So,

Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.

Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	Empty
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty
```

Step 2: Inserting 124 in the hash map. So, location is 4 and we can map since not occupied.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty
```

Step 3:

Inserting 333 in the hash map. So, location is 3, but it's occupied

Next location is 4, that's occupied too.

Next location is 5 and we can map here since unoccupied. (linear probing)

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	333
6	Empty
7	Empty
8	Empty
9	Empty
```

Step 4: Inserting 4679 in the hash map. So, location is 9. We can map here since unoccupied.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	333
6	Empty
7	Empty
8	Empty
9	4679
```

Step 5:

Inserting 983 in the hash map. So, location is 3.

But it's occupied and using linear probing we map it to 6.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	333
6	983
7	Empty
8	Empty
9	4679
```

• The advantage of linear probing is that it's easy to implement.
• The disadvantage is that it leads to clustering. Due to linear probing, the collided keys form a cluster in groups and increases the probing length which reduces the overall efficiency.

C++ implementation for linear probing

```//linear probing
#include <bits/stdc++.h>
using namespace std;

{
//hash function h(x)=x%10
int k = a % 10;

//linear probing
while (true) {
if (hash[k] == -1) {
hash[k] = a;
break;
}
k = (k + 1) % 10; //linear increment of probe
}
}

int main()
{
//set of input numbers
vector<int> arr{ 123, 124, 333, 4679, 983 };

//initialize the hash table
//each entry of the hash table is a single entry
int hash; //size of hashtable is 10
memset(hash, -1, sizeof(hash)); //initialize with empty initially

for (int a : arr) {
//hashing
}

cout << "---------using linear probing---------\n";

cout << "Hash table is:\n";

for (int i = 0; i < 10; i++) {
if (hash[i] == -1)
cout << i << "->"
<< "Empty" << endl;
else
cout << i << "->" << hash[i] << endl;
}

return 0;
}
```

Output:

```---------using linear probing---------
Hash table is:
0->Empty
1->Empty
2->Empty
3->123
4->124
5->333
6->983
7->Empty
8->Empty
9->4679
```

The problem of clustering can be avoided by using quadratic probing. Here the rehashing is done like below,

```rehashing(key) = (n+k2) % table size
where, k is 1,2,3, ...
```

We wrap around from the last table location to the first location if necessary.

Like say hashing location initially is 3 and 3 is occupied then we will go for 3+12=4, if it’s still occupied we will go for 4+22=8. So on

For the above set of keys that we used for linear probing we will use quadratic probing.

Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	Empty
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty
```

Step 2: Inserting 124 in the hash map. So, location is 4 and we can map since not occupied.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty
```

Step 3:

Inserting 333 in the hash map. So, location is 3, but it’s occupied.

Next location is 3+12=4, that’s occupied too.

Next location is 4+22=8 and we can map here since unoccupied. (quadratic probing)

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	333
9	Empty
```

Step 4:

Inserting 4679 in the hash map. So, location is 9.

We can map here since unoccupied.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	333
9	4679
```

Step 5:

Inserting 983 in the hash map. So, location is 3.

So, location is 3, but it's occupied

Next location is 3+12=4, that's occupied too.

Next location is 4+22=8 and that's occupied too.

Next location is 8+32==17%10=7

we can map here since unoccupied. (quadratic probing)

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	983
8	333
9	4679
```

The main advantage is here we can overcome the problem of clustering which appeared in the case of linear probing. By quadratic chances of clustering is much less.

```//quadratic probing
#include <bits/stdc++.h>
using namespace std;

{
//hash function h(x)=x%10
int k = a % 10;

int incr = 1;
while (true) {
if (hash[k] == -1) {
hash[k] = a;
break;
}
k = (k + int(pow(incr, 2))) % 10;
incr++;
}
}

int main()
{
//set of input numbers
vector<int> arr{ 123, 124, 333, 4679, 983 };

//initialize the hash table
//each entry of the hash table is a single entry
int hash; //size of hashtable is 10
memset(hash, -1, sizeof(hash)); //initialize with empty initially

for (int a : arr) {
//hashing
}

cout << "Hash table is:\n";

for (int i = 0; i < 10; i++) {
if (hash[i] == -1)
cout << i << "->"
<< "Empty" << endl;
else
cout << i << "->" << hash[i] << endl;
}

return 0;
}
```

Output:

```---------using quadratic probing---------
Hash table is:
0->Empty
1->Empty
2->Empty
3->123
4->124
5->Empty
6->Empty
7->983
8->333
9->4679
```

## Double hashing

Double hashing is the best open addressing technique to overcome clustering chances. Here we increment the probing length based on another hash function.

Say the primary hash function is h1 and secondary hash function is h2 to increment probing length

Then f(key)=h1(key)+k*h2(key) where h2≠h1

Like, first we find h1(key). If it's occupied, we will go for h1(key)+h2(key) where h2(key) is the probing length. If it's still occupied then will go for h1(key) +2*h2(key), so on

We will wrap up from last of table table to beginning of table if necessary

If we do double hashing with our previous example, And take h1(key)=key%10 and h2(key)=number digits in key

Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	Empty
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty
```

Step 2: Inserting 124 in the hash map. So, location is 4 and we can map since not occupied.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	Empty
7	Empty
8	Empty
9	Empty
```

Step 3:

Inserting 333 in the hash map. So, location is 3, but it's occupied.

Next location is 3+digits(3) = 6, that's occupied too.

Next location is 4+22=8 and we can map here since unoccupied. (double hashing)

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	333
7	Empty
8	Empty
9	Empty
```

Step 4:

Inserting 4679 in the hash map. So, location is 9.

We can map here since unoccupied.

```Index	Keys
0	Empty
1	Empty
2	Empty
3	123
4	124
5	Empty
6	333
7	Empty
8	Empty
9	4679
```

Step 5:

Inserting 983 in the hash map. So, location is 3, but it's occupied.

Next location is 3+digit(983)=6, that's occupied too.

Next location is 3+2*digit(983)=9 and that's occupied too.

Next location is 3+3*digit(983)==12%10=2

We can map here since unoccupied. (double hashing)

```Index	Keys
0	Empty
1	Empty
2	983
3	123
4	124
5	Empty
6	333
7	Empty
8	Empty
9	4679
```

The clustering chance is very less as we are using a separate hashing function, h2(key) to increment the probing length. A new hashing function is an overhead of course.

Double hashing uses few probes than quadratic or linear probing but takes more time than those two.

C++ implementation of Double hashing

```//double hashing
#include <bits/stdc++.h>
using namespace std;

int digit(int a)
{
}

{
//hash function h1(x)=x%10
//hash function h2(x)=digit(x)
//for incrementing probing
int k = a % 10;

//double hashing
int count = 1;
while (true) {
if (hash[k] == -1) {
hash[k] = a;
break;
}
//double hashing for incrementing prob length
k = (k + count * digit(a)) % 10;
count++;
}
}

int main()
{
//set of input numbers
vector<int> arr{ 123, 124, 333, 4679, 983 };

//initialize the hash table
//each entry of the hash table is a single entry
int hash; //size of hashtable is 10
//initialize with empty initially
memset(hash, -1, sizeof(hash));

for (int a : arr) {
//hashing
}

cout << "---------using double hashing---------\n";

cout << "Hash table is:\n";

for (int i = 0; i < 10; i++) {
if (hash[i] == -1)
cout << i << "->"
<< "Empty" << endl;
else
cout << i << "->" << hash[i] << endl;
}

return 0;
}
```

Output:

```---------using double hashing---------
Hash table is:
0->Empty
1->Empty
2->983
3->123
4->124
5->Empty
6->333
7->Empty
8->Empty
9->4679
```