# External Merge Sorting Algorithm

In this tutorial, we will learn about the basic concept of external merge sorting and the example of external merge sorting with their algorithm. By Abhishek Kataria Last updated : August 12, 2023

## What is external sorting?

**External sorting is a technique in which the data is stored on the secondary memory, in which part by part data is loaded into the main memory and then sorting can be done over there. Then this sorted data will be stored in the intermediate files**. Finally, these files will be merged to get a sorted data. Thus by using the external sorting technique, a huge amount of data can be sorted easily. In case of external sorting, all the data cannot be accommodated on the single memory, in this case, some amount of memory needs to be kept on a memory such as hard disk, compact disk and so on.

## Requirement of external sorting

The requirement of **external sorting** is there, where the data we have to store in the main memory does not fit into it. Basically, it consists of two phases that are:

**Sorting phase:**This is a phase in which a large amount of data is sorted in an intermediate file.**Merge phase:**In this phase, the sorted files are combined into a single larger file.

**One of the best examples of external sorting is external merge sort.**

## What is external merge sorting?

The **external merge sort** is a technique in which the data is stored in intermediate files and then each intermediate files are sorted independently and then combined or merged to get a sorted data.

## Example of external merge sorting

Let us consider there are 10,000 records which have to be sorted. For this, we need to apply the external merge sort method. Suppose the main memory has a capacity to store 500 records in a block, with having each block size of 100 records.

In this example, we can see 5 blocks will be sorted in intermediate files. This process will be repeated 20 times to get all the records. Then by this, we start merging a pair of intermediate files in the main memory to get a sorted output.

## What is two-way merge sorting?

**Two-way merge sort** is a technique which works in two stages which are as follows here:

**Stage 1**: Firstly break the records into the blocks and then sort the individual record with the help of two input tapes.

**Stage 2**: In this merge the sorted blocks and then create a single sorted file with the help of two output tapes.

By this, it can be said that **two-way merge sort** uses the two input tapes and two output tapes for sorting the data.

## Algorithm for two-way merge sort

**Step 1)** Divide the elements into the blocks of size M. Sort each block and then write on disk.

**Step 2)** Merge two runs

- Read first value on every two runs.
- Then compare it and sort it.
- Write the sorted record on the output tape.

**Step 3)** Repeat the step 2 and get longer and longer runs on alternates tapes. Finally, at last, we will get a single sorted list.

## Analysis

This algorithm requires **log(N/M)** passes with initial run pass. Therefore, at each pass the **N** records are processed and at last we will get a **time complexity as O(N log(N/M)**.

Related Tutorials

- Introduction to Algorithms
- Introduction to Greedy Strategy in Algorithms
- Stability in sorting
- Radix Sort and its Algorithm
- Bucket Sort Algorithm
- Bubble sort Algorithm, Flow Chart and C++ Code
- Insertion sort Algorithm, flowchart and C, C++ Code
- Merge Sort | One of the best sorting algorithms used for large inputs
- Binary Search in C, C++
- Randomized Binary Search
- Meta Binary Search | One-sided Binary Search
- Difference between Linear Search and Binary Search
- Binary Search in String
- Variants of Binary Search
- Sieve of Eratosthenes to find prime numbers
- Optimal Merge Pattern (Algorithm and Example)
- Given an array of n numbers, Check whether there is any duplicate or not
- Finding the missing number
- Find the number occurring an odd number of times
- Find the pair whose sum is closest to zero in minimum time complexity
- Find three elements in an array such that their sum is equal to given element K
- Bitonic Search Algorithm
- Check whether a number is Fibonacci or not
- Segregate even and odd numbers in minimum time complexity
- Find trailing zeros in factorial of a number
- Find Nearest Greatest Neighbours of each element in an array
- Interpolation search algorithm
- Floor and ceil of an element in an array using C++
- Two Elements whose sum is closest to zero
- Find a pair with a given difference
- Count number of occurrences (or frequency) in a sorted array
- Find a Fixed Point (Value equal to index) in a given array
- Find the maximum element in an array which is first increasing and then decreasing
- Dynamic Programming (Components, Applications and Elements)
- Algorithm for fractional knapsack problem
- Algorithm and procedure to solve a longest common subsequence problem
- Dynamic Programming (Components, Applications and Elements)
- Find the Nth Fibonacci number | C++
- Longest Common Subsequence using Dynamic programming (DP)
- Longest Increasing Subsequence using Dynamic programming (DP)
- Find the maximum sub-array sum using KADANE'S ALGORITHM
- Non-intersecting chords using Dynamic Programming (DP)
- Edit Distance using Dynamic Programming (DP)
- Finding Ugly Number using Dynamic Programming (DP)
- Egg dropping problem using Dynamic Programming (DP)
- Wild card matching problem using Dynamic programming (DP)
- Compute sum of digits in all numbers from 1 to N for a given N
- Minimum jumps required using Dynamic programming (DP)
- Graph coloring problem's solution using backtracking algorithm
- Breadth First Search (BFS) and Depth First Search (DFS) Algorithms
- Travelling Salesman Problem
- Kruskal's (P) and Prim's (K) Algorithms
- Multistage graph problem with forward approach and backward approach algorithms
- Floyd Warshall algorithm with its Pseudo Code

- Backtracking (Types and Algorithms)
- 4 Queen's problem and solution using backtracking algorithm
- N Queen's problem and solution using backtracking algorithm
- Find the GCD (Greatest Common Divisor) of two numbers using EUCLID'S ALGORITHM
- Compute the value of A raise to the power B using Fast Exponentiation
- Implement First Come First Served (FCFS) CPU Scheduling Algorithm using C program
- Implementations of FCFS scheduling algorithm using C++
- Implementation of Shortest Job First (SJF) Non-Preemptive CPU scheduling algorithm using C++
- Implementation of Shortest Job First (SJF) Preemptive CPU scheduling algorithm using C++
- Implementation of Priority scheduling (Pre-emptive) algorithm using C++
- Implementation of Priority scheduling (Non Pre-emptive) algorithm using C++
- Implementation of Round Robin CPU Scheduling algorithm using C++
- Analysis of LRU page replacement algorithm and Belady's anomaly
- Branch and Bound
- Find the roots of a complex polynomial equation using Regula Falsi Method in C
- Sieve of Eratosthenes to find prime numbers
- Implementations of FCFS scheduling algorithm using C++
- Implementation of Shortest Job First (SJF) Non-Preemptive CPU scheduling algorithm using C++
- Implementation of Shortest Job First (SJF) Preemptive CPU scheduling algorithm using C++
- Implementation of Priority scheduling (Pre-emptive) algorithm using C++
- Divide and Conquer Paradigm (What it is, Its Applications, Pros and Cons)
- Implementation of Priority scheduling (Non Pre-emptive) algorithm using C++
- Implementation of Round Robin CPU Scheduling algorithm using C++
- Jump Search Implementation using C++
- Optimal Merge Pattern (Algorithm and Example)
- Introduction to Greedy Strategy in Algorithms
- Strassen's Matrix Multiplication in algorithms
- Huffman Coding (Algorithm, Example and Time complexity)
- Backtracking (Types and Algorithms)
- 4 Queen's problem and solution using backtracking algorithm
- N Queen's problem and solution using backtracking algorithm
- Graph coloring problem's solution using backtracking algorithm
- Tournament Tree and their properties
- Deterministic and Non Deterministic Algorithms
- Lower Bound Theory
- Non Recursive Tree Traversal Algorithm
- Line Drawing Algorithm
- Breadth First Search (BFS) and Depth First Search (DFS) Algorithms
- P and NP problems and solutions | Algorithms
- Travelling Salesman Problem
- 2 – 3 Trees Algorithm
- Kruskal's (P) and Prim's (K) Algorithms
- Algorithm for fractional knapsack problem
- Algorithm and procedure to solve a longest common subsequence problem
- Midpoint Circle Algorithm
- Multistage graph problem with forward approach and backward approach algorithms
- Floyd Warshall algorithm with its Pseudo Code
- Reliability design problem
- Removing consecutive duplicates from a string
- Fast Exponentiation using Bitmasking

Comments and Discussions!