Topological Sort

The topological ordering is one of the most fundamental and widely used concepts in graph theory and graph algorithms. You probably didn’t notice it, but we have been using topological ordering in many real-life scenarios:

Definition and Theorem

topological order

Now let’s define this concept rigorously. For a directed graph \(G=(V,E)\), an ordering of nodes is a topological ordering if and only if for any edge \((u,v)\in E\), \(u\) must be before \(v\) in that ordering. Or more formally, a topological ordering of \(G\) is an ordering of nodes as \(v_1, v_2, \ldots v_n\) so that for any edge \((v_i, v_j)\in E\) we have \(i<j\). In other words, all edges must point “forward” in a topological order.

For example, a topological ordering on courses provides an order to take courses that respects the prerequisites: when taking course \(v\), all the courses that are required by it (not just \(v\)’s immediate prerequisites, but also its “recursive ancestors”) have already been taken.

One of the most important theorems in graphs is the following:

\(G\) has a topological ordering iff. \(G\) is a DAG.

Proof.

Given a directed graph, how to find a topological order (if it has one)? This process is known as “topological sort” (because like sorting, it returns an ordering), and there are two classical algorithms for this task: BFS style (bottom-up) and DFS style (recursive top-down).

Finding Topological Order: BFS style

Imagine you just became a freshman in a college. You’re given the course prerequisite chart, which is (guaranteed to be) a DAG. How would you figure out a plan of taking courses? Well, which course can you take first? It must be a course without any prerequisites. In other words, a node with 0 in-degree. Then you take it (say course \(v\)). By doing so, you also satisfy prerequisites of any course \(u\) if there is a \((v,u)\) edge. Basically, delete all outgoing edges from \(v\) (actually no need to delete them, instead just decrease the indegrees of those \(u\) nodes). Then find another course with 0 in-degree, and continue until you’ve taken all courses.

This is a simple variant of the BFS algorithm we’ve seen before. The only change is that the queue should only contain 0-in-degree nodes; when you update along the \((v,u)\) edge, you decrease the indegree of \(u\) and if the indegree is 0, add \(u\) to the queue.

If at any point the queue is empty before you take all courses, you know there is a cycle, and there is no topological ordering (there are no courses that I can take immediately, yet there are still courses that I need to take).

Here is the pseudocode:

def order(V, E):
    Q = {v in V | indeg(v) == 0}
    order = []
    while Q is not empty:
        v = Q.pop()
        order.append(v) # add v to the topological ordering
        for (v, u) in E: # should use adjacency list
            indeg(u) -= 1
            if indeg(u) == 0: # can take course u now
                Q.push(u)
    if len(order) == len(V):
        return order
    else:
        return None # cyclic, no topological order

For example, consider the following DAG:

     > B ---> C
    /  ^   /
  /    | /
A ---> D ---> E

Finding Topological Order: DFS style

Another algorithm is top-down. Let’s say you want to install software package \(v\), but it depends on \(u_1\) and \(u_2\). So you need to recursively satisfy \(u_1\) first, and after conquering \(u_1\) (which might involve installing 100+ packages), you conquer \(u_2\) recursively (with memoization, of course). Now you can finally install \(v\) (like post-order traversal). So you add \(v\) to the topological order (as the last node). Any node is added to the topological order when it’s time to install it. Note that as in DFS, you need an outer loop over all nodes because (a) you don’t know a priori which node is the “sink” node in the graph, and (b) there could be more than one sink nodes (e.g., C and E above).

Recursion tree for the above example (it could start at any node):

Note that the DFS returns a different topological order from BFS.

Here is the pseudocode:

def order(V, E):
    def dfs_order(v):
        if v in visited: # already taken
            return
        visited.add(v)
        for u in prereqs[v]: # satisfy prereqs first
            dfs_order(u)
        order.append(v) # now take v
        
    visited = {}
    order = []
    prereqs = dict((v,u) for (u,v) in E)) # incoming edges
    for v in V:
        if v not in visited:
            dfs_order(v)    
    return order

Unique vs. Multiple Topological Orders

In general, (as in the above example) a DAG is likely to have more than one topological orders, just like in college, two students in the same major can take the same set of courses in different orders. Let’s study the following questions: