Home » DBMS

Normalization in Database Management System

DBMS | Normalization: In this tutorial, we will learn about the normalization, different types of the normalization with the examples. By IncludeHelp Last updated : May 29, 2023

What is Normalization in DBMS?

Normalization is a process of organizing the data in a database to avoid data redundancy and improve data integrity. Normalization of the database is a process of arranging the data in the database. To remove data repetition (redundancy) and unwanted features such as addition, Update, and deletion anomalies, and a normalization is a systematic approach to decomposing tables. It is a multi-step approach that places information in tabular form, eliminating redundant information from the tables.

What happens if the database table is not Normalize?

If a table is not normalized properly and has data redundancy, it not only requires additional memory space but also makes it very difficult to maintain and Update the database records without data loss. If the database is not normalized, insertion, Updation, and anomalies deletion will be difficult.

Purpose of Normalization

Normalization is primarily used for two reasons.

To remove redundant data.
To ensure data dependencies, i.e. data is logically processed and making sense.

Types of DBMS Normalization

It is possible to classify databases by their normalization level, from level 1 to level 5. This means that the easiest and most basic method of normalizing data is level 1 (First Normal Form or 1NF), up to 5NF, the most complex one.

The most common form of Normalization is as follows -

First Normal Form
Second Normal Form
Third Normal Form
BCNF
Fourth Normal Form

1) First Normal Form (1NF)

There are some criteria to keep the database table in first normal form. These are as follows -

A table will be in 1NF, if it contains an atomic value.
It needs to note that a table attribute is unable to hold several values. It must only contain an attribute with a single value.
The multi-value attribute, the composite attribute, and their combinations are disallowed by the first normal type.

Example: Table STUDENT is not in 1NF because of multi-valued attribute ST_PHONE.

STUDENT table:

ST_ID	ST_NAME	ST_PHONE	ST_STATE
101	Ram	8877665544, 8989265471	MP
102	Rex	8574783832	Maharashtra
103	Kapil	7390372389, 8589830302	Delhi

The decomposition of the STUDENT table into 1NF has been shown below:

ST_ID	ST_NAME	ST_PHONE	ST_STATE
101	Ram	8877665544	MP
101	Ram	8989265471	MP
102	Rex	8574783832	Maharashtra
103	Kapil	7390372389	Delhi
103	Kapil	8589830302	Delhi

2) Second Normal Form (2NF)

There are several conditions for keeping the second normal shape of the database table. These are the following -

In the 2NF row, the table must be 1NF.
In the second normal type, all non-key attributes depending on the primary key are completely functional.

Example: let's we assume that teachers' data and the subjects they teaches can be stored by a school. A teacher can teach more than one subject in a school.

TEACHER table:

TEACHER_ID	SUBJECT	TEACHER_AGE
T1	Hindi	45
T1	Science	45
T2	Maths	35
T3	English	28
T3	SST	28

The non-prime attribute TEACHER AGE in the provided table is based on TEACHER ID, which is the right subset of a candidate key. That is why it violates the 2NF law.

We split it down into two tables to translate the given table into 2NF:

TEACHER_ID	TEACHER_AGE
T1	45
T2	35
T3	28

TEACHER_SUBJECT table:

TEACHER_ID	SUBJECT
T1	Hindi
T1	Science
T2	Math
T3	English
T3	SST

3) Third Normal Form (3NF)

In the 3NF, the table must be 2NF.
There should no transitive dependency for non-prime attributes

If at least one of the following conditions holds X->Y in any non trivial function dependency, a Relationship is in 3NF:

X is a Super key.
Y is a primary attribute.

If A->B and B->C are two FDs then A->C is called transitive dependency.

The normalization of the relationship of 2NF with 3NF requires the elimination of transitive dependencies. If there is a transitive dependence, by putting the attribute(s) in a new relation along with a copy of the determinant, we delete the transitively dependent attribute(s) from the relation.

Consider the examples given below.

Example-1:
In relation TEACHER given in following table,

TEACHER_ID	TEACHER_NAME	TEACHER_ZIP_CODE	TEACHER_CITY
101	Rex	Z1	Noida
102	Rohan	Z2	Boston
103	Puneet	Z3	Chicago
104	Jack	Z4	Norwich
105	Amit	Z5	Bhopal

Super key in the table above:

{TEACHER_ID}, {TEACHER_ID, TEACHER_NAME}, {TEACHER_ID, TEACHER_NAME, TEACHER_ZIP_CODE}....so on

Candidate key: {TEACHER_ID}

Non-prime attributes: In the given table, all attributes except TEACHER_ID are non-prime.

Here, TEACHER_STATE & TEACHER_CITY dependent on TEACHER_ZIP_CODE and TEACHER_ZIP_CODE dependent on TEACHER_ID. The non-prime attributes (TEACHER_STATE, TEACHER_CITY) transitively dependent on super key (TEACHER_ID). It violates the rule of third normal form.

That's why we need to move the TEACHER_CITY and TEACHER_STATE to the new <TEACHER_ZIP> table, with TEACHER_ZIP_CODE as a Primary key

TEACHER table:

TEACHER_ID	TEACHER_NAME	TEACHER_ZIP_CODE
101	Rex	Z1
102	Rohan	Z2
103	Puneet	Z3
104	Jack	Z4
105	Amit	Z5

TEACHER_ZIP table:

TEACHER_ZIP_CODE	TEACHER_CITY
Z1	Noida
Z2	Boston
Z3	Chicago
Z4	Norwich
Z5	Bhopal

4) Boyce Codd normal form (BCNF)

There are several conditions for keeping the Boyce Codd normal form which forms of the database table. These are the following -

The modern variant of 3NF is the BCNF. Stricter than 3NF, it is.
If each functional dependency X ->Y, X is the upper key of the table, a table is in BCNF.
The table should be in 3NFF for BCNF.

Example: Let's presume that there is a business in which learners work in more than one department.

STUDENT table:

ST_ID	ST_COUNTRY	ST_SPECIALISATION	DEPT_TYPE	ST_SPECIALISATION_NO
101	India	Engineering	D101	S1
101	India	IT	D101	S2
102	UK	Architecture	D201	S3
102	UK	Maths	D201	S4

In the above table Functional dependencies are as follows:

ST_ID → ST_COUNTRY
ST_SPECIALISATION → {DEPT_TYPE, ST_SPECIALISATION_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither ST_SPECIALISATION nor ST_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

ST_COUNTRY table:

ST_ID	ST_COUNTRY
101	India
101	India

ST_SPECIALISATION table:

ST_SPECIALISATION	DEPT_TYPE	ST_SPECIALISATION_NO
Engineering	D101	S1
IT	D101	S2
Architecture	D201	S3
Maths	D201	S4

ST_SPECIALISATION_MAPPING table:

ST_ID	ST_SPECIALISATION
D101	S1
D101	S2
D201	S3
D201	S4

Functional dependencies:

ST_ID → ST_COUNTRY
ST_SPECIALISATION → {DEPT_TYPE, ST_SPECIALISATION_NO}

Candidate keys:

For the first table: ST_ID
For the second table: ST_SPECIALISATION
For the third table: {ST_ID, ST_SPECIALISATION}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

5) Fourth normal form (4NF)

There are several conditions for keeping the Fourth normal form which forms of the database table. These are the following -

If it is in Boyce Codd Normal Form and has no multi-value dependence, a table will be in 4NF.
For dependency A -> B, if multiple values of B exist for a single value of A, then the table would be a multi-valued dependency.

Example

STUDENT

STU_ID	COURSE	HOBBY
101	Maths	Dancing
101	Physics	Singing
102	Chemistry	Dancing
103	Science	Cricket
104	English	Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no database between COURSE and HOBBY.

In the STUDENT table, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID	COURSE
101	Maths
101	Physics
102	Chemistry
103	Science
104	English

STUDENT_HOBBY:

STU_ID	HOBBY
101	Dancing
101	Singing
102	Dancing
103	Cricket
104	Hockey

6) Fifth normal form (5NF)

There are several conditions for keeping the Fifth normal form which forms of the database table. These are the following -

If it is in 4NF and does not have any join dependence, a table is in 5NF and joining should be lossless.
5NF is fulfilled when in order to prevent duplication, all tables are broken into as many tables as possible.
5NF is often referred to as the normal form of Project-join (PJ/NF).

Example:

SUBJECT	LECTURER	SEMESTER
Physics	Mr. Kabir	I
Physics	Mrs. Hemlata	II
Chemistry	Mr. Ramkumar	I
Chemistry	Mr. Amar Singh	I
Math	Mr. Lal K.	II

In the table above for semester 1, Ram takes both computer and math classes, but for semester 2, he does not take math classes. In this case, all these fields must be combined to classify valid data.

We add a new Semester as Semester 3 to suppose but do not know about the topic and who will take that topic so that we leave Lecturer and Subject as NULL. But together all three columns serve as the primary key, so we can't leave two more columns blank.

So we can decompose the table above into 3 tables Table 1, Table 2 & Table 3 to transform the table above into 5NF:

Table 1:

SEMESTER	SUBJECT
I	Physics
II	Physics
I	Chemistry
II	Math

Table 2:

SUBJECT	LECTURER
Physics	Mr. Kabir
Physics	Mrs. Hemlata
Chemistry	Mr. Ramkumar
Chemistry	Mr. Amar Singh
Math	Mr. Lal K.

Table 3:

SEMSTER	LECTURER
I	Mr. Kabir
II	Mrs. Hemlata
I	Mr. Ramkumar
I	Mr. Amar Singh
II	Mr. Lal K.

How do we tackle the actual process of database normalization?

This is a very general process, the details of are as follows:

Step 1: For each value, build separate tables, or locate duplicate fields in a table, position them in different tables, and connect a key to each of them.

Step 2: In different tables, create relations between values. For example, between the Colors table and the Sizes table for a clothing product.

Step 3: Introduce relations between the columns of the main key and the columns of the non-key.

Advantages of Normalization of Databases

Followings are the key advantages of Normalization of database,

A logical map:
The collection and mapping of data is more logically organized and thus its value for every department using the tables is doubled.
Data consistency:
The reliability of the data is improved, and the information stored would be more reliable. It avoids the following errors: storing outdated versions, finding duplicated data in different sections of your business, and finding different types of links between product data without a clear hierarchy.
Sound Data Relation:
The creation of a full data structure that demonstrates how information from various tables relates to each other is showing the relational data. The identification of data links and the correction of any inaccessibility or inconsistency of information inside the product database will be facilitated.

Keys and Their Types in DBMS

What is Chomsky Normal Form?