Back To Course Outline - COSI335
Back To Chapter Two
Merge Sort
"Divide - and - Conquer" Strategy
    • Divide into 2 halves
    • Conquer: Sort the 2 halves separately.  (This is a recursive step.)
    • Combine:  "Merge" the 2 sorted halves.
Divide it until you are able to merge.
Keep breaking problem to smaller and smaller pieces and the problem is small enough to figure out,
and then sort back to get large piece.  Two large pieces merge to complete the sorting at the end.
When you merge two pieces, you do the comparison.

Example:
Sort the following items:
7, 3, 8, 5, 2, 6, 4, 1.

7, 3, 8, 5, 2, 6, 4, 1. 7, 3, 8, 5 7, 3 7 3, 7 3, 5, 7, 8 1, 2, 3, 4, 5, 6, 7, 8
3
8, 5 8 5, 8
5
2, 6, 4, 1 2, 6 2 2, 6 1, 2, 4, 6
6
4, 1 4 1, 4
1

Algorithm:
 
MergeSort(A, start, last)
{
   mid =(start + last)/2
 if(last > start+1)
 { 
  MergeSort(A, start, mid)
  MergeSort(A, mid, last)
  Merge(A, start, mid, last)
 }
}

Note: 
The indexes of array or vector to be are
[start, last).

Merge(A, start, mid, last)
{
  • while both sublists are not exhausted, compare two elements from sublists, and copy the smaller to vector "temp".

  • copy the tail of the sublist that is not exhausted to "temp". 

  • copy elements from temporary vector to original list.

}

IMPLEMENTATIONS:
Click here to see merge sort on pages 866 to 922, Data Structures, Ford/Topp.

Time Complexity Analysis:
 
Steps
Time
Divide into 2 halves
1 step
Sort the 2 halves separately
2T(n/2) steps
"Merge" the 2 sorted halves
O(n) steps

T(n) = maximum # of steps that the algorithm takes on an input of length n.
What is T(1)?
     Sort 1 data taking 0 step.  So, T(1) = 0.
The following inequality is a "Recurrence Relation" since the function T(n) is on the both sides.
T(n) <= 1 + 2T(n/2) + O(n)
T(n) <= 2T(n/2) + O(n)
T(n) <= 2T(n/2) + cn,   where c = some fixed constant
Assume n is a power of 2,
         T(n) <= 2T(n/2) + cn
                 <= 2(2T(n/4)+c(n/2)) + cn
                 <= 4T(n/4)+cn + cn
                 <= 4(2T(n/8)+c(n/4))+cn + cn
                 <= 8T(n/8)+cn+cn + cn
                 <= 23T(n/23)+3cn
                 <= 24T(n/24)+4cn
                  ........
                 <= 2lgN T(1)+lgN(cn)
                 <= c*n*lgN
                  =O(n lgN)
O(nlgN) grows more slowly than O(n2),
Therefore merge sort asymptotically beats insertion sort in the worst case.

Space Complexity Analysis:
1. How many arrays seat in recursion stack?
    Depth of recursion stack = the height of the tree
   = O(log2 n)   Click here to see recursion stack.
2. Assume of space for an item on recursion stack

3. Space Complexity Analysis:
    Total Space = O(log2 n) + O(n) = O(n)
    since O(n) is much larger than O(log2 n), O(n) dominates O(log2 n).
Exercise:
Change base:   log2 n = (log10 n) / (log10 2)
 
n n log2 n n2
2 =21 2*1 = 2 2*2 = 4
4=22    
8=23    
16=24    
32=25