Hash functions and its characteristics

In this tutorial, we are going to learn about the hash functions which are used to map the key to the indexes of the hash table and characteristics of a good hash function.
Submitted by Radib Kar, on July 01, 2020

Prerequisite: Hashing data structure

Hash function

The hash function is the component of hashing that maps the keys to some location in the hash table. As part of the hashing technique, we need a hash function to map the available keys to the set of indexes in the hash table. Say we have a set of keys ranging from [-1000, 1000] and have a hash table of size 10, index ranging from [0, 10].

That's why to map the keys we need hash functions. A hash function is termed as the perfect hash function is what is able to map the keys to unique locations. But unfortunately, there is no systematic way to generate hash function as the size of the key list is very large considered to hash table size.

A popular hash function is a folding method where we fold the integers and sum the partitions and then take mod 10 finally.

For example,

The hash function for the folding method is = 
    sum of the fold of size two %10
Say we have integer 60784567
So sum will be 60 + 78 + 45 + 67 = 250
So final location is 250 % 10 = 0

Compute folding method using hash function

The below program computes the above folding method which is an example of the hash function.

#include <bits/stdc++.h>
using namespace std;

//folding method

int main()
{
    string s;
  
    cout << "enter number\n";
    cin >> s;

    //folding and summing
    int sum = 0;
    for (int i = 0; i < s.length(); i += 2) {
        if (i + 1 < s.length())
            sum += stoi(s.substr(i, 2));
        else
            //when only one digit is left for folding
            sum += stoi(s.substr(i, 1));
    }

    cout << s << "->" << sum % 10;

    return 0;
}

Output:

enter number
60784567
60784567->0

Now if some other number also finds location 0, then it's called collision.

Characteristics of good hash function

  1. Minimum collision
  2. High gain factor (distributes keys evenly). Like say we have 10 locations in the hash table, then almost 9 locations should have keys. It should not be the case that 4-5 locations have keys only where the keys collided.
  3. Have a high load factor. Load factor means the number of keys stored/hash table size
  4. Easy to compute

Exercises on hash function

Use the below hash function to compute the hashing and comment on the goodness of the hash function.

1) F(key) = number of digits of key

#include <bits/stdc++.h>
using namespace std;

//hash function 1
int main()
{
    string s;
    cout << "enter number\n";
    cin >> s;

    //f(s)=no of digit in s=length of s
    cout << s << "->" << s.length();

    return 0;
}

Output:

enter number
123452
123452->6

The above hash function is not good at all. Say we have set of keys all having the same digits, then all the keys will collide and the rest of the locations will remain empty.

2) F(key) = (rand()*key) % 10

#include <bits/stdc++.h>
using namespace std;

//hash function 2
int main()
{
    int n;
    cout << "enter number\n";
    cin >> n;

    //f(n)=rand()*n
    cout << n << "->" << (rand() * n) % 10;

    return 0;
}

Output:

enter number
103456
103456->2

The above hash function is good as we are multiplying random integers and bringing randomness, the collision rate will be less.

Comparison of the above hash functions

#include <bits/stdc++.h>
using namespace std;

//comparing goodness of hash function 1 &  2
int main()
{

    //set of input numbers
    vector<int> arr{ 12345, 234245, 1223123, 765845, 345234, 234534, 98675, 34523, 123, 3245 };

    //using hash function 1
    cout << "using hash function 1\n";
    for (int a : arr) {
        cout << a << "->" << to_string(a).length() % 10 << endl;
    }

    //using hash function 2
    cout << "\n\nusing hashh function 2\n";
    for (int a : arr) {
        cout << a << "->" << (rand() * a) % 10 << endl;
    }

    return 0;
}

Output:

using hash function 1
12345->5
234245->6
1223123->7
765845->6
345234->6
234534->6
98675->5
34523->5
123->3
3245->4


using hashh function 2
12345->9
234245->4
1223123->3
765845->-1
345234->-4
234534->4
98675->6
34523->0
123->5
3245->-9



Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.