Sort elements by frequency

In this article, we are going to see how to sort an array based on frequency? This problem is a classic problem in hashing and a variety of interview problems is based on this core concept. In this article also, we are going to see the instance of sorting a map based on its value instead of a key.
Submitted by Radib Kar, on July 05, 2020

Prerequisite:

Problem statement:

Sort an array based on frequencies. The element having maximum frequency will come first. If two elements have the same frequency then the element coming first will appear first in the sorted array two.

Example:

Input array:
[1, 2, 3, 2, 3, 2, 1]

After sorting frequency wise:
[2, 2, 2, 1, 1, 3, 3]

2 has the maximum frequency 
thus, it came first and since 1 and 3 have the same frequency, 
so, 1 came first preserving the order in the input array.

Solution:

The idea is to create a hash table first which will store the frequency of unique elements. That can be created like below:

Hash function, h(x)=x but instead of storing key we will store the count

Initially, hash table is empty.

For each key in input array,
	Hash[key]++
End for

If we create the hash table from the above input array, it will be,

Key	Frequency
1	2
2	3
3	2

So now we need to sort the map based on the value instead of the key, but for the same frequencies, we need to preserve the key order(it’s order not key itself).

To preserve the key order we require another hash table that would store the first position of the key. We will store that like below:

Say this hash table is named first_occurrence

For each key in input array,
	If first_occurence[key]==0: //not assigned
	first_occurence[key]=key index+1
End for

So after processing the input the hash table first_occurrence would be:

Key	First occurrence
1	1(0+1)
2	2(1+1
3	3(2+1)

So now rest of the task is to sort the hash table based on the map and based on first_occurrence hash table.

We will use the priority queue to sort this map based on our defined comparator where we will implement logic to sort by frequency value and first_occurrence hash table.

How to pass this in the comparator is our main challenge or area of our discussion?

You can observe a thing that we have designed a class for this problem and kept the first_occurrence hash table as a data member, not any local member to a function. Thing is not the thing I do in general. So, there must be something, that made me design a class and bring the OOP concept within it. This is because I need to feed the first_occurrence hash table into the comparator and the comparator is a lambda function. Like below,

auto comp=[](pair<int,int> a, pair<int,int> b){
	//function body
}

But this time if you notice the implementation, we have 
auto comp=[this](pair<int,int> a, pair<int,int> b){
	//function body
}

This helps you to pass the data members to the lambda function or it's known as capturing of the lambda function.

That's why we designed the class and made first_occurrence as data members to feed into the lambda function. Now we can perform our logic similarly as we do in comparators.

The lambda comparator would be like below which will give us a sorted map based on our requirement via the priority_queue.

auto comp=[this](pair<int,int> a, pair<int,int> b){          
    if(a.second<b.second)
        return true;
    else if(a.second>b.second)
        return false;
    else{
        if(first_occurence[a.first]<first_occurence[b.first])
            return false;
        else
            return true;
    }
};

So after pushing all pairs(a map element is pair, <key, element>) we will have our priority queue as,

Key	Frequency
2	3
1	2
3	2

Now, pop each element from the priority queue and append the key as many times as the frequency,

So at iteration 1
We will pop <2,3>
So vector will be 2 2 2
So at iteration 2
We will pop <1,2>
So vector will be 2 2 2 1 1
So at iteration 3
We will pop <3,2>
So vector will be 2 2 2 1 1 3 3
Queue Is empty
So our sorted array based on frequency is
2 2 2 1 1 3 3

C++ implementation:

#include <bits/stdc++.h>
using namespace std;

class Includehelp {
public:
    unordered_map<int, int> first_occurence;
    void print(vector<int> arr)
    {
        for (auto i : arr) {
            cout << i << " ";
        }
        cout << endl;
    }

    vector<int> sort_by_frequency(vector<int> arr)
    {
        //create the hashmap
        unordered_map<int, int> mymap;

        for (int i = 0; i < arr.size(); i++) {
            mymap[arr[i]]++;
            if (first_occurence[arr[i]] == 0)
                first_occurence[arr[i]] = i + 1;
        }

        //now to sort based on frequency lets use priority queue

        //comparator of priority queue using lambda function
        auto comp = [this](pair<int, int> a, pair<int, int> b) {

            if (a.second < b.second)
                return true;
            else if (a.second > b.second)
                return false;
            else {
                if (first_occurence[a.first] < first_occurence[b.first])
                    return false;
                else
                    return true;
            }
        };

        priority_queue<pair<int, int>, vector<pair<int, int> >, decltype(comp)> pq(comp);
        for (auto& ij : mymap) {
            pq.push(ij);
        }

        //sorted outcome
        vector<int> result;

        while (!pq.empty()) {
            pair<int, int> p = pq.top();
            pq.pop();
            int no = p.first;
            int freq = p.second;
            for (int i = 0; i < freq; i++)
                result.push_back(no);
        }
        return result;
    }
};

int main()
{
    int n;
 
    cout << "Enter number of elements\n";
    cin >> n;
 
    vector<int> arr(n, 0);
    cout << "Enter the elements\n";
    for (int i = 0; i < n; i++) {
        cin >> arr[i];
    }
 
    Includehelp* obj = new Includehelp();
    arr = obj->sort_by_frequency(arr);
 
    cout << "Printing after sorting by frequency\n";
    obj->print(arr);

    return 0;
}

Output:

Enter number of elements
8
Enter the elements
1 2 3 1 2 3 4 5
Printing after sorting by frequency
1 1 2 2 3 3 4 5 





Comments and Discussions

Ad: Are you a blogger? Join our Blogging forum.





Languages: » C » C++ » C++ STL » Java » Data Structure » C#.Net » Android » Kotlin » SQL
Web Technologies: » PHP » Python » JavaScript » CSS » Ajax » Node.js » Web programming/HTML
Solved programs: » C » C++ » DS » Java » C#
Aptitude que. & ans.: » C » C++ » Java » DBMS
Interview que. & ans.: » C » Embedded C » Java » SEO » HR
CS Subjects: » CS Basics » O.S. » Networks » DBMS » Embedded Systems » Cloud Computing
» Machine learning » CS Organizations » Linux » DOS
More: » Articles » Puzzles » News/Updates


© https://www.includehelp.com some rights reserved.