Clustering: Introduction, Types, and Advantages in Machine Learning

Machine Learning | Clustering: In this tutorial, we will learn about the clustering, its types. By Akashdeep Singh Last updated : April 17, 2023

What is Clustering in Machine Learning?

The clustering is a process of dividing objects into groups which are consisting of similar data points. Now let's take an example of items arranged in a mall. So, their similar items are grouped so, one can not find the items mixed with other items. (i.e. onion can not be present in fruits category).

Where Clustering Is Used?

Clustering is used in Amazon's recommendation system in which it shows all the recommended products based on the past purchased product. Another use is in Netflix, which recommends movies and shows based on the watch history. Clustering is also used in business as image segmentation, grouping webpages, and information retrieval. For example, in a retail business, clustering helps to analyze customer shopping behavior, sales campaigns, and customer attention.

Types of Clustering

There are three types of clustering: Exclusive clustering, Overlapping clustering, and Hierarchical clustering.

1. Exclusive clustering

In exclusive clustering, all the data points exclusively belong to one cluster only. It means there will not be any similarity between the data point of one cluster to the data point of another cluster. K-means clustering is an example of exclusive clustering.

2. Overlapping clustering

In the overlapping clustering, data points belong to multiple clusters. it means there will be some similarity between the data point of one cluster to another cluster. C-means clustering is an example of overlapping clustering.

3. Hierarchical clustering

In hierarchical clustering, there are different clusters present but they are distinct from each other while their data points are having similarity among all the clusters. Hierarchical clustering is further divided into two parts. These are Agglomerative and Divisive.

  • In Agglomerative, initially, each data point is considered as an individual cluster. In the next process, one of the similar clusters merges with another cluster and this process is continued until one separate cluster formed. In this clustering, the process following the bottom-up approach.
  • Divisive is just opposite to the Agglomerative clustering. Here, all the data points are grouped and further separated until each data points become individual. Here, the Divisive process following the top-down approach.

K-means Clustering Algorithm

K-means is a clustering algorithm whose main goal is to group similar elements or data points into a cluster Here in K-means, K represents the no. of the cluster formed. The "cluster center" is the arithmetic mean of all the points belonging to the cluster. Each point is closer to its cluster center than to other cluster centers.

Advantages of K-means Clustering Algorithm

  • Easy to implement
  • Relatively fast and efficient
  • Only has one parameter to tune and you can easily see the direct impact of adjusting the value of parameter K

Comments and Discussions!

Load comments ↻

Copyright © 2024 All rights reserved.