Normalization in Database Management System

DBMS | Normalization: In this tutorial, we will learn about the normalization, different types of the normalization with the examples. By IncludeHelp Last updated : May 29, 2023

What is Normalization in DBMS?

Normalization is a process of organizing the data in a database to avoid data redundancy and improve data integrity. Normalization of the database is a process of arranging the data in the database. To remove data repetition (redundancy) and unwanted features such as addition, Update, and deletion anomalies, and a normalization is a systematic approach to decomposing tables. It is a multi-step approach that places information in tabular form, eliminating redundant information from the tables.

What happens if the database table is not Normalize?

If a table is not normalized properly and has data redundancy, it not only requires additional memory space but also makes it very difficult to maintain and Update the database records without data loss. If the database is not normalized, insertion, Updation, and anomalies deletion will be difficult.

Purpose of Normalization

Normalization is primarily used for two reasons.

  • To remove redundant data.
  • To ensure data dependencies, i.e. data is logically processed and making sense.

Types of DBMS Normalization

It is possible to classify databases by their normalization level, from level 1 to level 5. This means that the easiest and most basic method of normalizing data is level 1 (First Normal Form or 1NF), up to 5NF, the most complex one.

The most common form of Normalization is as follows -

  1. First Normal Form
  2. Second Normal Form
  3. Third Normal Form
  4. BCNF
  5. Fourth Normal Form

1) First Normal Form (1NF)

There are some criteria to keep the database table in first normal form. These are as follows -

  • A table will be in 1NF, if it contains an atomic value.
  • It needs to note that a table attribute is unable to hold several values. It must only contain an attribute with a single value.
  • The multi-value attribute, the composite attribute, and their combinations are disallowed by the first normal type.

Example: Table STUDENT is not in 1NF because of multi-valued attribute ST_PHONE.

STUDENT table:

ST_ID ST_NAME ST_PHONE ST_STATE
101Ram8877665544, 8989265471MP
102Rex8574783832Maharashtra
103Kapil7390372389, 8589830302Delhi

The decomposition of the STUDENT table into 1NF has been shown below:

ST_ID ST_NAME ST_PHONE ST_STATE
101Ram8877665544MP
101Ram8989265471MP
102Rex8574783832Maharashtra
103Kapil7390372389Delhi
103Kapil8589830302Delhi

2) Second Normal Form (2NF)

There are several conditions for keeping the second normal shape of the database table. These are the following -

  • In the 2NF row, the table must be 1NF.
  • In the second normal type, all non-key attributes depending on the primary key are completely functional.

Example: let's we assume that teachers' data and the subjects they teaches can be stored by a school. A teacher can teach more than one subject in a school.

TEACHER table:

TEACHER_IDSUBJECTTEACHER_AGE
T1Hindi45
T1Science45
T2Maths35
T3English28
T3SST28

The non-prime attribute TEACHER AGE in the provided table is based on TEACHER ID, which is the right subset of a candidate key. That is why it violates the 2NF law.

We split it down into two tables to translate the given table into 2NF:

TEACHER_IDTEACHER_AGE
T145
T235
T328

TEACHER_SUBJECT table:

TEACHER_IDSUBJECT
T1Hindi
T1Science
T2Math
T3English
T3SST

3) Third Normal Form (3NF)

  • In the 3NF, the table must be 2NF.
  • There should no transitive dependency for non-prime attributes

If at least one of the following conditions holds X->Y in any non trivial function dependency, a Relationship is in 3NF:

  • X is a Super key.
  • Y is a primary attribute.

If A->B and B->C are two FDs then A->C is called transitive dependency.

The normalization of the relationship of 2NF with 3NF requires the elimination of transitive dependencies. If there is a transitive dependence, by putting the attribute(s) in a new relation along with a copy of the determinant, we delete the transitively dependent attribute(s) from the relation.

Consider the examples given below.

Example-1:
In relation TEACHER given in following table,

TEACHER_IDTEACHER_NAMETEACHER_ZIP_CODETEACHER_CITY
101RexZ1Noida
102RohanZ2Boston
103PuneetZ3Chicago
104JackZ4Norwich
105AmitZ5Bhopal

Super key in the table above:

{TEACHER_ID}, {TEACHER_ID, TEACHER_NAME}, {TEACHER_ID, TEACHER_NAME, TEACHER_ZIP_CODE}....so on

Candidate key: {TEACHER_ID}

Non-prime attributes: In the given table, all attributes except TEACHER_ID are non-prime.

Here, TEACHER_STATE & TEACHER_CITY dependent on TEACHER_ZIP_CODE and TEACHER_ZIP_CODE dependent on TEACHER_ID. The non-prime attributes (TEACHER_STATE, TEACHER_CITY) transitively dependent on super key (TEACHER_ID). It violates the rule of third normal form.

That's why we need to move the TEACHER_CITY and TEACHER_STATE to the new <TEACHER_ZIP> table, with TEACHER_ZIP_CODE as a Primary key

TEACHER table:

TEACHER_IDTEACHER_NAMETEACHER_ZIP_CODE
101RexZ1
102RohanZ2
103PuneetZ3
104JackZ4
105AmitZ5

TEACHER_ZIP table:

TEACHER_ZIP_CODETEACHER_CITY
Z1Noida
Z2Boston
Z3Chicago
Z4Norwich
Z5Bhopal

4) Boyce Codd normal form (BCNF)

There are several conditions for keeping the Boyce Codd normal form which forms of the database table. These are the following -

  • The modern variant of 3NF is the BCNF. Stricter than 3NF, it is.
  • If each functional dependency X ->Y, X is the upper key of the table, a table is in BCNF.
  • The table should be in 3NFF for BCNF.

Example: Let's presume that there is a business in which learners work in more than one department.

STUDENT table:

ST_IDST_COUNTRYST_SPECIALISATIONDEPT_TYPEST_SPECIALISATION_NO
101IndiaEngineeringD101S1
101IndiaITD101S2
102UKArchitectureD201S3
102UKMathsD201S4

In the above table Functional dependencies are as follows:

  1. ST_ID → ST_COUNTRY
  2. ST_SPECIALISATION → {DEPT_TYPE, ST_SPECIALISATION_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither ST_SPECIALISATION nor ST_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

ST_COUNTRY table:

ST_IDST_COUNTRY
101India
101India

ST_SPECIALISATION table:

ST_SPECIALISATION DEPT_TYPE ST_SPECIALISATION_NO
EngineeringD101S1
ITD101S2
ArchitectureD201S3
MathsD201S4

ST_SPECIALISATION_MAPPING table:

ST_IDST_SPECIALISATION
D101S1
D101S2
D201S3
D201S4

Functional dependencies:

  1. ST_ID → ST_COUNTRY
  2. ST_SPECIALISATION → {DEPT_TYPE, ST_SPECIALISATION_NO}

Candidate keys:

For the first table: ST_ID
For the second table: ST_SPECIALISATION
For the third table: {ST_ID, ST_SPECIALISATION}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

5) Fourth normal form (4NF)

There are several conditions for keeping the Fourth normal form which forms of the database table. These are the following -

  • If it is in Boyce Codd Normal Form and has no multi-value dependence, a table will be in 4NF.
  • For dependency A -> B, if multiple values of B exist for a single value of A, then the table would be a multi-valued dependency.

Example

STUDENT

STU_IDCOURSEHOBBY
101MathsDancing
101PhysicsSinging
102ChemistryDancing
103ScienceCricket
104EnglishHockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no database between COURSE and HOBBY.

In the STUDENT table, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_IDCOURSE
101Maths
101Physics
102Chemistry
103Science
104English

STUDENT_HOBBY:

STU_IDHOBBY
101Dancing
101Singing
102Dancing
103Cricket
104Hockey

6) Fifth normal form (5NF)

There are several conditions for keeping the Fifth normal form which forms of the database table. These are the following -

  • If it is in 4NF and does not have any join dependence, a table is in 5NF and joining should be lossless.
  • 5NF is fulfilled when in order to prevent duplication, all tables are broken into as many tables as possible.
  • 5NF is often referred to as the normal form of Project-join (PJ/NF).

Example:

SUBJECTLECTURERSEMESTER
PhysicsMr. KabirI
PhysicsMrs. HemlataII
ChemistryMr. RamkumarI
ChemistryMr. Amar SinghI
MathMr. Lal K.II

In the table above for semester 1, Ram takes both computer and math classes, but for semester 2, he does not take math classes. In this case, all these fields must be combined to classify valid data.

We add a new Semester as Semester 3 to suppose but do not know about the topic and who will take that topic so that we leave Lecturer and Subject as NULL. But together all three columns serve as the primary key, so we can't leave two more columns blank.

So we can decompose the table above into 3 tables Table 1, Table 2 & Table 3 to transform the table above into 5NF:

Table 1:

SEMESTERSUBJECT
IPhysics
IIPhysics
IChemistry
IIMath

Table 2:

SUBJECTLECTURER
PhysicsMr. Kabir
PhysicsMrs. Hemlata
ChemistryMr. Ramkumar
ChemistryMr. Amar Singh
MathMr. Lal K.

Table 3:

SEMSTERLECTURER
IMr. Kabir
IIMrs. Hemlata
IMr. Ramkumar
IMr. Amar Singh
IIMr. Lal K.

How do we tackle the actual process of database normalization?

This is a very general process, the details of are as follows:

Step 1: For each value, build separate tables, or locate duplicate fields in a table, position them in different tables, and connect a key to each of them.

Step 2: In different tables, create relations between values. For example, between the Colors table and the Sizes table for a clothing product.

Step 3: Introduce relations between the columns of the main key and the columns of the non-key.

Advantages of Normalization of Databases

Followings are the key advantages of Normalization of database,

  • A logical map:
    The collection and mapping of data is more logically organized and thus its value for every department using the tables is doubled.
  • Data consistency:
    The reliability of the data is improved, and the information stored would be more reliable. It avoids the following errors: storing outdated versions, finding duplicated data in different sections of your business, and finding different types of links between product data without a clear hierarchy.
  • Sound Data Relation:
    The creation of a full data structure that demonstrates how information from various tables relates to each other is showing the relational data. The identification of data links and the correction of any inaccessibility or inconsistency of information inside the product database will be facilitated.



Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.