MST (Prim's and Kruskal's)
Last updated
Last updated
First let us define the problem statement clearly. Given a weighted undirected graph (think about why it would be difficult for a directed graph), a spanning tree is defined as a tree that contains all the nodes of the graph. A Minimum Spanning Tree (MST) is defined as a spanning tree with the lowest total cost (weight).
Formally, a spanning tree is an acyclic subset of the edges that connects all nodes.
Note that there cannot be any cycles since we need it to be a tree. Moreover, if all the edges have non-negative weights and if there were cycles, we could remove one edge of the cycle (still preserving connectedness) and reduce the weight.
An example of an MST is:
Can an MST be used to find shortest paths? That is, between two nodes, does the shortest path between them lie on the MST?
No! An MST minimizes the total weight but not the shortest distance between two nodes. For example, in the graph above, the shortest path between the top-left and bottom-left nodes is 9 (direct path) but the path in the MST is 4 + 1 + 8 = 13 > 9.
So, MST is used for “minimizing the total cost” problems. In general, it is important to realize when to use shortest path algorithms and when to use MST algorithms.
Is an MST for a graph unique?
No! If edges have the same weight, it is possible that there are multiple MSTs with the same minimum weight.
But, if all the edge weights are distinct, then the MST must be unique (because there is no ambiguity in sorting the edges in Kruskal’s algorithm or choosing the minimum outgoing edge from a vertex in Prim’s).
Hence, If all edge weights in a connected graph G are distinct, then G has a unique minimum spanning tree
To simplify the analysis and avoid the annoyingly subtle case of non-distinct edge weights, we consider the case where all the weights are distinct. Hence, we can refer to an MST of a graph as being the MST.
There are no cycles.
When an MST is cut along any edge, the two resulting spanning trees are MSTs for those particular set of nodes.
Cycle property: For every cycle, the maximum weight edge is not in the MST.
Proof (Cut and paste argument): Assume the heaviest edge is in the MST. Now, remove the maximum weight edge. This cuts the MST into 2 MSTs. Since there is a cycle, there is another edge (the one that we left out initially) that joins the 2 MSTs. Use this edge to join. The total weight is lower since you replaced a heavier edge with a lighter one. Moreover, you still get a spanning tree. Hence, it contradicts the fact that the initial ST was an MST.
In the proof above, we implicitly used the idea that for any cycle, an even number of edges are cut across a cut (for any line drawn, the cycle intersects the line at an even number of points)
It is important to note that the inverse of the cycle property is not true in general, i.e., the statement “For every cycle, the minimum weight edge is always in the MST” is false! In fact, a minimum weight edge of one cycle can be the maximum weight edge of another cycle and hence, by the cycle property itself, it cannot be in the MST. Hence, for every cycle, the minimum weight edge may or may not be in the MST.
Cut property: Let us define a cut of a graph as a partition of the vertices into 2 disjoint subsets. An edge is said to cross a cut if it has one vertex in each of the two disjoint subsets. Then, for every partition of the nodes, the minimum weight edge across the cut is in the MST.
Proof is similar to the cut and paste argument made as in the cycle property. Suppose not. Then another edge that crosses the cut must be in the MST (which has a weight higher than our minimum weight edge that crosses the cut). We can “cut and paste” this edge to get an MST of lower weight. So, our assumption was incorrect. Hence, proved.
A direct consequence of the cut property is that for every vertex, the minimum outgoing edge is always part of the MST. This follows necessarily because the cut property holds for any partition. In particular, we can create partitons - each of which divides the graph into 2 sets - one containing exactly one node, and the other containing the rest of the nodes.
The inverse of the cut property is not true in general, i.e., the statement “The maximum outgoing edge is never part of the MST” is false. It is easy to come up with some examples to prove this. (e.g. in the graph above, look at the rightmost node. Both its edges are in the MST. In particular, the maximum outgoing edge is in the MST. Hence, for every node, the maximum outgoing edge may or may not be part of the MST.
Can an MST be used to find the smallest maximum edge path between two nodes? That is, if we define the weight of a path to be the maximum weight of all the edges in the path, does the shortest path between two nodes lie on the MST?
Yes! At every step, we are trying to put the minimum weight edge into the MST. So, naturally, all the edges in the MST have low weight. This leads to the minimsation of the maximum edge path between any two nodes. (Note that an MST minimizes the maximum edge weight present in the path, but NOT the total sum of the weights along the path)
Every MST algorithm relies fundamentally on the following two rules:
Red rule: If C is a cycle with no red arcs, then colour the maximum weight edge in C red.
Blue rule: If D is a cut with no blue arcs across the cut, then colour the minimum weight edge in D blue.
In the end, all the blue edges will form an MST. The above generic algorithm is just a direct cosequence of the cycle property and the cut property. So, we can greedily build the MST by repeatedly applying red rule or blue rule to an arbitrary edge.
The algorithm itself is based on the cut property and quite simple to understand:
Pick an arbitrary node
Add it to a set (which will store all the nodes already connected by the spanning tree) sptSet
Look (read: relax) at all the outgoing edges of (essentially, observe the edges that are across the cut and ) and update the distances of other nodes in the priority queue by setting the distance of each node to be the minimum of the current distance and the weight of this edge connecting and this node). Pick the minimum weight edge connecting a node in sptSet
to one outside the sptSet
. Say, the other node is . Add to the sptSet
and add the edge connecting to a set that stores the edges of the MST.
Then, look at all the outgoing edges of the newly added node and update the distances of other nodes in the priority queue.
Pick the minimum node in the priority queue (the least distance between this node and any node already in the sptSet
).
Repeat steps 2-6 until the priority queue is empty (all nodes are in the sptSet
)
Note: The priority queue stores the nodes outside the sptSet prioritised by their minimum distance to any node inside the sptSet.
A key invariant of Prim’s algorithm is that at every step, the edges being chosen form an MST for those set of nodes connected by the tree. That is, it greedily builds the MST. This is in contrast to Kruskal’s algorithm in which the edges at any arbitrary step need not form a tree (since they may be disconnected). So, if you stop runnning Prim’s algorithm midway, you still get an MST for a subset of the original graph, but this is not true for Kruskal’s as you get mulitple MSTs for multiple subsets of the graph.
Another observation is that every node in the graph is either in the priority queue or the sptSet
. That is, the two form a partition of the set of nodes.
Initially, other than the starting node, the priorities of all other nodes are Integer.MAX_VALUE
since we don’t know the distances.
decreaseKey
operation takes when there are elements in the PQ implemented using AVL tree.
In this case, the size of the PQ is at most since it stores the vertices. We look at each vertex once (total cost ) when we delete it from the PQ (total cost ). We look at each edge once and perform at most one decreaseKey
operation per edge (total cost ).
Adding all the costs together and assuming , the running time of Prim’s is .
Notice how we analysed the algorithm above by assigning each cost of an operation to an edge or a vertex. So, rather than thinking in terms of each node and each edge, we think from a higher level and view the total cost of an operation as being dependent on the number of edges or vertices. This makes it much easier to analyse.
It is worthwhile to compare Prim’s algorithm to Dijkstra’s since they are quite similar (mostly because Dijkstra also came up with this algorithm for making an MST)
Maintain a set of visited nodes (sptSet
)
Maintain a set of visited nodes (not in the priority queue)
Greedily grow the set by adding a node connected via the lightest edge
Greedily grow the set by adding neighbouring node that is closest to the source
Use PQ to order nodes by edge weight
Use PQ to order nodes by distance from source.
Greedily in this case means that every step of the algorithm, we consider the best decision that we can make with the current information. Greedy algorithms can solve optimisation problems. Both, shortest path and MST are optimisation (minimisation, in particular) problems.
Also a greedy algorithm - it adds the lowest weight edges to the MST. Skip the edges that connect 2 nodes already connected by a blue edge since otherwise it would form a cycle (and this edge in fact, would have the maximum weight in the cycle since you are looking at edges in increasing order of weight).
Sort edges by weight from smallest to biggest.
Consider edges in ascending order:
If both endpoints are already in the tree, skip this edge (colour it red) (perform a Find
operation)
Otherwise, join the two nodes with a blue edge (colour it blue) (perform a Union
operation)
We can use a Union-Find data structure to keep track of which nodes are in the same blue tree.
Say, we are using the most optimized Union-Find data structure that takes for each union/find operation. We perform at most such operations resulting in a total cost of . BUT, the most expensive part of Kruskal’s algorithm is actually to sort the edges! This takes and so, the running time of Kruskal’s is dominated by this factor and hence, is .
Note that is (in case of a clique - worst case) and so,
Hence, Prim’s and Kruskal’s have the same asymptotic running time.
What if all the edges have the same weight?
Well, then any spanning tree is a minimum spanning tree. So, you can just use BFS or DFS and find MST in time.
In particular, if all the edge weights are , then the cost of an MST is (Because any tree with nodes has edges)
How can we improve Kruskal’s algorithm if we know that all the edges have weights from where is not too big?
The most expensive part of Kruskal’s algorithm is sorting the edges. In this case, we can use counting sort (with an array of size to sort the edges in time). Hence, the running time of Kruskal’s is because for each edge, performing union takes and performing find takes . Remember that is the Ackermann function (iterated log) but it is not a constant. It is just an extremely slowly growing function.
A cool feature of such a graph would be that we can also optimize Prim’s algorithm for this. We can use an array of size 10 as a priority queue of linked lists where the slot holds a linked list with edge weight to a node in the sptSet. Then, decreaseKey
would move a node to the new linked list. In this case, the running time of Prim’s would be because:
insert
and deleteMin
are performed times. (cost: since deleteMin just looks at the minimum bucket and takes any node - we can store a pointer to remember the minimum bucket if you don’t want to traverse the array eacht time)
decreaseKey
is performed at most times. (decreaseKey takes because we can lookup the node (e.g. in hash table), delete it from the current linked list and move it to the new slot’s linked list. We don’t need to traverse the old linked list to find the node if we simply use a hash table. So, the total cost of decreaseKey
is .
Hence, the total cost is .
A natural question to ask is: If we can use the second variant of MST to solve in , why does this not work for Dijkstra? The answer is subtle: even though the maximum weight of any edge is , the total path length can be upto since we are concerned with the total distance from the source and not just the weight of each edge. We know that the maximum diameter of a graph is (think of a line graph) and if each edge has weight , the total path length is . It is (generally) not feasible to have an array of such a large size, especially since it is a function of the maximum edge weight and not a function of the number of edges.
MST is more difficult to define for a directed graph - does it require that every node be connected to every other node through some set of edges in the MST? Or is one-way connectedness between a pair of nodes sufficient?
We define a rooted spanning tree as a tree in which every node is reachable on a path from the root (and there are no cycles obviously).
This is a much harder problem to solve now since:
the cut property does not hold
the cycle property does not hold
the generic MST algorithm does not work
As a special case, however, consider a directed acyclic graph with one root (we define a root as a node with no incoming edges).
For every node except the root, add the minimum weight incoming edge. Then, obviously there are no cycles since the graph itself is acyclic. Each edge is chosen once for a vertex except the root and so there are edges. Hence, it is a tree. Moreover, it is an MST! This should seem obvious once you realise that you have chosen the minimum weight incoming edge for each node. Every node has to have at least one incoming edge in the MST so this is the minimum spanning tree. Hence, we can find an MST for a directed acyclic graph with one root in time (for every vertex, look at all its incoming edges. Each edge is considered once. Each vertex is considered once)
Define a MaxST to be a spanning tree of maximum weight. How to find a MaxST?
Negate the edges. Find MinST. This is your MaxST.
Notice that having negative edges does not really matter. Our MST algorithms work perfectly fine even for negative edges - this is because only the relative weights of the edges matter. So, if you have a graph with negative edges: you can simply run Prim’s or Kruskals (OR) you can add a positive constant to each edge to make each edge positive.
Or, another neat way to find a MaxST would be to run Kruskal’s in reverse. That is, sort the edges in descending order so that you add the heaviest weight edges first.
What happens if you add a constant to the weight of every edge? Does it change the MST?
No! In fact, MST only depends on the relative weight of different edges. By adding or multiplying a positive constant , you are not changing the order of the edges (when sorted by weight). It might be easier to think of Kruskal’s algorithm to explain this - when you sort the edges, you only care about the relative weights. Adding a constant does not change this sorted order. Hence, Kruskal will inspect the edges in the same order, resulting in the same MST.
This is different in case of shortest paths since there we consider the weight of a path to be the sum of the weights of all the edges. So, adding a constant to each edge weight will result in unequal weights being added to paths of differing lengths. In particular, shorter paths have been effectively prioritised since the amount of weight added to a path is
Other MST algorithms do exist (e.g. Boruvka’s ) and are slightly optimized for modern requirements and parallelizability.
The currently best known algorithm is that of Chazelle (2000) in which he came up with an MST algorithm that runs in where is the classical functional inverse of Ackermann’s function.
Despite its relatively obscure origin, early Western algorithms researchers were aware of Boruvka’s algorithm, but dismissed it as being “too complicated”. As a result, despite its simplicity and eciency, most algorithms and data structures textbooks unfortunately do not even mention Boruvka’s algorithm. This omission is a serious mistake; Borvka’s algorithm has several distinct advantages over other classical MST algorithms.
Boruvka’s algorithm often runs faster than its O(E log V) worst-case running time. The number of components in F can drop by significantly more than a factor of 2 in a single iteration, reducing the number of iterations below the worst-case
A slight reformulation of Borvka’s algorithm (actually closer to Boruvka’s original presentation) actually runs in O(E) time for a broad class of interesting graphs, including graphs that can be drawn in the plane without edge crossings. In contrast, the time analysis for the other two algorithms applies to all graphs.
Boruvka’s algorithm allows for significant parallelism; in each iteration, each component of F (subgraph of MST) can be handled in a separate independent thread. This implicit parallelism allows for even faster performance on multicore or distributed systems. In contrast, the other two classical MST algorithms (i.e., Prim’s and Kruskal’s) are intrinsically serial.
But the converse is not true. That is, if two MSTs are joined by the smallest edge between them, the resulting spanning tree need not be an MST. Think about what would happen if you decided to split the set of nodes poorly to begin with.