1.8 Lowerbound of Sorting

In terms of time complexity, we have seen two types of sorting algorithms:

slow sorting in \(O(n^2)\) time: insertion sort, selection sort, bubblesort, …
fast sorting in \(O(n\log n)\) time: quicksort, mergesort, heapsort, treesort, …

You might wonder why all fast sorting algorithms run in \(O(n\log n)\) time.

Is this a coincidence?
Is there magic behind this weird quantity \(n\log n\)?
Is it possible to get even faster than that?

Now let’s answer these questions. The short answer is: that’s not a coincidence, and \(O(n\log n)\) time is indeed the fastest we can get in terms of internal comparison-based sorting (although external sorting using disk space can get even faster, but that’s beyond the scope of this course). This is because you need at least \(\sim n \log n\) comparisons to figure out the exact ordering for \(n\) numbers.

To see why this is the case, consider the number of all possible orderings for \(n\) distinct numbers, which is \(n!\). A priori you don’t know which ordering is the sorted order, and the job of a sorting algorithm is to use comparisons to reduce the number of possible orderings down to 1. Each comparison (such as \(a_1 ?\ a_2\)) would halve the number of (remaining) possibilities (there are exactly \(\frac{n!}{2}\) orders where \(a_1 < a_2\) and another \(\frac{n!}{2}\) where \(a_1 > a_2\)). Let’s say \(a_1 < a_2\), then we compare \(a_1 ?\ a_3\), whose result would further reduce the remaining \(\frac{n!}{2}\) possible orderings by half.

So you can draw a decision tree like this (like recursion trees we have seen so far):

                a_1 ? a_2             n! possibile orderings
            < /           \ >
             /             \
        a_1 ? a_3       a_2 ? a_3     n!/2 possbile orderings

          ...              ...

How many levels (or comparisons) do you need to reach a unique ordering?

\[h = \log (n!)\]

Now let’s simplify this \(h\). It’s obvious that \((\frac{n}{2})^\frac{n}{2} < n! < n^n\), which can be shown geometrically:

   |****--->| n
   |****--> | n-1
   |****->  | ...
   |****>   | n/2
   |--->    | ...
   |-->     | ...
   |->      | 2
   +>-------+ 1

the triangle (-->) represents \(n!\)
the small square on the top-left corner (****) is \((\frac{n}{2})^\frac{n}{2}\)
the big square that contains the triangle is \(n^n\)

So \[(\frac{n}{2})^\frac{n}{2} < n! < n^n\]

Now take logarithm for the above, we get:

\[ \log (\frac{n}{2})^\frac{n}{2} < h = \log(n!) < \log n^n\]

\[\frac{n}{2} \log \frac{n}{2} < h = \log(n!) < n \log n\]

Both sides grow by the order of \(n \log n\), so \(h\) grows by \(n \log n\) asymptotically, i.e., you need order of \(n \log n\) comparisons to figure out a unique ordering by comparison.

In complexity analysis, we say the lowerbound of sorting is \(\Omega (n \log n)\), because Big-\(O\) is for upperbound, Big-\(\Omega\) is for lowerbound, and Big-\(\Theta\) for precise bound: \(f(n) = \Theta(g(n))\) if and only if \(f(n) = O(g(n))\) and \(f(n) = \Omega(g(n))\). See Wikipedia for details.