SSSP (Bellman-Ford and Dijkstra)
Last updated
Last updated
SSSP is a common acrnoym for Single-Source Shortest Path. This means that we are interested in trying to find the shortest path from one node in the graph (called the source) to all other nodes in the graph. There are a lot of algorithms that can solve this problem efficiently but we will discuss only 2 of them.
Firstly it is important to remember that BFS is able to find the SSSP. Then, why do we need another algorithm? Because BFS only works for unweighted graphs. BFS finds the minimum number of hops, not the minimum distance. Moreover, BFS does not explore every path in the graph (although neither do these SSSP algorithms - in fact any algorithm that tries to explore all possible paths between 2 nodes must be exponential in time).
So for weighted graphs, each edge has a weight associated with it. You can think of the weight as the cost from travelling from one node to another via that edge. In most cases, we are interested in finding the path with the shortest sum of weights. You could define shortest paths in another way (say, the lowest product of all weights, or the number of edges whose weight is greater than 5, the smallest maximum-weight-edge along the entire path, etc.).
We store the edge weights in the adjacency list too. Formally, we define a weight function as an assignment to each edge a real number.
We denote the shortest distance from to as . Our aim is to find for all for a given .
The key idea for most (if not, all!) shortest path algorithms is the triangle inequality: .
This means that the shortest path from S to C cannot exceed the path length of the shortest distance from S to A + that from A to C. Because if it did, we could just go from S to C via A and we would get a shorter distance. In particular, the inequality holds trivially if A lies on the shortest path between S and C
The triangle inequality does not refer to the mathematical one in the sense that the sum of 2 sides of a triangle can be less than the third side when dealing with a graph because the weights of the graph are arbitrary. This is not a violation of the triangle inequality.
As we have to find the minimum distance, this is an optimisation problem. Optimisation problem can either be solved by brute-force (BF - Bellman Ford (also brute force lol)) or greedily (Dijkstra)
BF is essentially a brute force algorithm.
We start by maintaining an estimate for each distance. Our aim is that by the end of the algorithm, all our estimates will precisely equal the actual shortest distances. (From now, we drop the prefix “shortest” and let distance refer to the shortest distance)
Our invariant is: For every node, the estimate is always greater than or equal to the distance
In the beginning, since we don’t know any of the distances, we mark all estimates to be infinity and the estimate of the source to be 0 (the distance from the source to itself is 0). Then, we start to “relax” the edges. Relaxation is a very important step in all shortest-path algorithms.
In Bellman-Ford, we do not care about the order of relaxations.
Relaxing an edge means to update the estimate of the node if we have found a shorter path to that node (using the triangle inequality)
All the above code does is this: if you found a shorter path from the source to v via u, then update the distance so that you now follow this path.
If you want to recover the shortest path too, you can keep track of the parent each time you udpate the distance.
But the key question is: is it enough to relax each edge once to find SSSP?
No! Because you are performing the relaxations in any arbitrary order, it is possible that some relaxations have no effect on updating the distances because you should have performed other relaxations before (I.e., the updation needs to propagate throughout the path) In short, the number of times you need to relax depends on the order of edges.
So, how many times do we need to relax each edge in the worst case?
We can terminate early if we are sure that relaxing all the edges again does not update any more distances. In particular, we can terminate early if an entire iteration of relaxing all deges have no effect on the distances.
Running time: O(VE)
But why does Bellman Ford work? Why is it able to find the shortest paths?
Firstly, notice that the shortest path from any node to any other node cannot have any loops/cycle (we deal with negative cycles soon). This is because, it is possible to eliminate that cycle and reduce the path length.
Does every node at 1 hop from the source have the correct shortest path after 1 iteration? No! Only those nodes which lie along the shortest path from the source to another node and are 1 hop away from the source have their correct estimates.
This is an easy example to show that after 1 iteration, a node 1 hop away from the source need not have its correct estimate.
Does this algorithm also work for graphs with negative weights?
Yes. At no step along the way are we assuming that weights have to be positive.
BUT, shortest paths are not defined in graph with negative-weight cycles. Because, then it is possible to keep cycling through the negative cycle to lower the shortest distance and the algorithm would never converge. (In fact, it is meaningless to talk about shortest paths in such graphs)
If we don’t have any cycles in a graph, how would you try to find the longest path distances from a source node to every other node?
This is arguably one of the most commonly implemented SSSP algorithm because it runs faster than BF (Bellman Ford) and it is pretty easy to code out.
Dijkstra’s algroithm is an example of a greedy algorithm. In a greedy method, a problem should be solved in stages by making a sequence of decisions and considering one input at a time to get an optimal solution. There are prefined procedures that we use for getting an optimal solution (in this case, the SSSP).
Greedy algorithm: Make the best decision at every step and you will get the optimal solution in the end.
Is there always a right order to relax the edges (assuming non-negative weights) such that if we follow this correct order, we only need to relax each edge once?
In fact, we don’t even need to go in BFS order. We just need to ensure that before relaxing all outgoing edges of a node, all the incoming edges of the node have been relaxed. So, the order in which we relax the edges is exactly the same order in which the edges occur in the actual shortest path from the source to a node. (This works even for negative weight edges) - But this only works for DAGs (in a general graph, there are cyclic dependencies; so you may not be able to relax any edge if you strictly follow this. Dijkstra does not use this property. Dijkstra allows you to relax a node even when there may be incoming edges that have not been relaxed because you are sure that this node’s estimate is the least it can be - you have already relaxed all the incoming edges to this node from the nodes with lower estimate when you relaxed those nodes. The other incoming edges arise from nodes with higher estimates and cannot lower this node’s estimate since the weight of this edge must be non-negative.
Now that we know a correct order to relax the edges exists, we can come up with an algorithm (Dijkstra’s) to exploit this property and only relax each edge once.
(Note that in the context of Dijkstra, when we say “relax a node”, we mean “relax all the outgoing edges from the node”)
Dijkstra’s algorithm is very similar to Prim’s algorithm for minimum spanning tree. Like Prim’s MST, we generate a SPT (shortest path tree) with a given source as a root. We maintain two sets, one set contains vertices included in the shortest-path tree, other set includes vertices not yet included in the shortest-path tree. At every step of the algorithm, we find a vertex that is in the other set (set of not yet included) and has a minimum distance from the source. (observe the difference between Dijkstra’s and Prims: Dijkstra stores the distance estimate of a node to the source, Prim stores distance estimate of a node to any other node already in the spanning tree set.
Create a set sptSet (shortest path tree set) that keeps track of vertices included in the shortest-path tree, i.e., whose minimum distance from the source is calculated and finalized. Initially, this set is empty.
Assign a distance value to all vertices in the input graph. Initialize all distance values as INFINITE. Assign distance value as 0 for the source vertex so that it is picked first.
While sptSet doesn’t include all vertices
Invariant: Once a vertex has been added to the sptSet, its distance always remains the same (and never reduces). That is, the sptSet contains the set of vertices whose shortest path distance has already been found. This is the reason why we only need to relax each edge once (no further relaxation can happen once you are sure you have found the shortest path)
In other words, we are building the shortest paths greedily.
HOWEVER, the above invariant is true only if ALL the edges have non-negative weights. An easy example of why it wouldn’t work for a graph with negative weights is given below:
Consider the following case:
This is an important distinction between BF and Dijkstra - BF can handle negative edges but Dijkstra cannot as long as there is no negative cycle (in fact, it can even detect cycles).
At each step, we are sure that our estimate is greater than or equal to the correct distance. We are assuming non-negative weights for Dijkstra’s algorithm.
In other words, Dijkstra ensures that the order in which the edges are relaxed is precisely the order of the edges in the shortest path from a source to a node. It greedily builds up the shortest path tree.
This relies heavily on the fact that extending a path does not make it shorter which is true whenever our edges have non-negative weights
Dijkstra optimises the process of relaxation by relaxing edges in the “correct” order. Instead of arbitrarily picking edges to be relaxed, we maintain a data structure that tells us exactly which edge to relax. Using this “correct ordering”, we only need to relax each edge once!
To get the path information using Dijkstra, we can create a parent array, update the parent array when distance is updated (i.e., when relaxing an edge leads to updation of distance of destination vertex) and use it to show the shortest path from source to different vertices.
Moreover, if we are interested only in the shortest distance from the source to a single target, we can stop the loop when the picked minimum distance vertex is equal to the target (recall our invariant).
To be able to get the minimum key (in this case, we can let the key be our current estimate of the node from the source), we need to support the following operations from any data structure that stores our vertices (our PQ stores the nodes, NOT EDGES! storing edges would be pretty pointless since it does not give us any information about the distances to nodes from the source):
isEmpty()
- Checks if there are any more elements in the data structure
contains(key k)
- Checks if the key k
exists in the data structure
decreaseKey(key k, priority p)
- reduces the priority of key k
to be equal to p
. (so that we can decrease priority when the estimate is reduced)
deleteMin()
- Deletes and returns the key with the minimum priority (in this case, because we are interested in the node whose estimate is the least)
insert(key k, priority p)
- inserts a key k
with priority p
.
Notice that a priority queue is an abstract data type - bunch of operations to be supported - and can be implemented in many ways.
We will try to break down the steps in Dijkstra to come up with the overall time complexity:
All the cost of the algorithm essentially depends on how expensive the relaxation process is.
Cannot be used for graphs with negative weights
“Computer Science is no more about computers than astronomy is about telescope”
“There should be no such thing as boring as mathematics.”
“Elegance is not a dispensable luxury but a factor that decides between success and failure.”
“Simplicity is a prerequisite for reliability”
BFS, DFS, and Dijkstra all basically use the same algorithm but different data structure to store the order of nodes to be explored. All the three follow the general outline as follows:
Maintain a set of explored vertices.
Add vertices to the explored set by following edges that go from a vertex in the explored set to a vertex outside the explored set.
Take edge from vertex that was discovered least recently
Take edge from vertex that was discovered most recently
Take edge from vertex that is closest to source
Use queue
Use stack
Use priority queue
SSSP for unweighted graphs
Does not give shortest paths
SSSP for graphs with non-negative edges
Let us suppose that we have a DAG and we want to find the shortest path. Obviously we can still use BF (and Dijkstra too if the edges are non-negative). But can we do better since we are sure that there are no cycles? (generally, if we have more constraints - a specific kind of problem - it is much easier to solve than a general problem. For example, longest path problems are NP-hard on general graphs but are super easy on DAGs, as you shall soon see)
Again, we aim to relax the edges exactly once. For this, we need to find the the “right order” of relaxations.
Observe that BFS would not work (just as before)
To be sure that a node has a correct estimate, we only process a node after all its incoming edges have been relaxed (this should bring flashbacks of Kahn’s algorithm to mind)
We have already learnt how to find such an ordering - Topological sort!!!
If we arrange the node left to right such that all the edges only point towards the right, then we can simply start from the source and relax all the nodes from left to right (relaxing a node is equivalent to relaxing all its outgoing edges). Observe that if a node appears on the left of a source in the topological sort, there is no path from the source to that node (since all edges point right).
Does this also work if there are negative weights?
Yes! We are not assuming that the weights are positive at any step.
How can we find the longest path in a DAG?
How about finding the longest path in a general graph (which can have cycles)?
Finding a longest path in a general graph (with possibly cycles) is NP-hard - if you could find the longest simple path, then you could decide if there is a path that visits every vertex. Any polynomial time algorithm for longest path thus implies a polynomial time algorithm for Hamiltonian Path. Hence, by reduction from Hamiltonian Path (which we know to be NP-hard), we can conclude that finding the longest simple path is also NP-hard.
Q. Why is longest path NP-hard but shortest path easily solvable? Why does it not work to negate all edges and run a SSSP algorithm to find longest path?
Ans: There is a very subtle issue here.
We have spent quite some time discussing the shortest path problem and how certain algorithms does not work under the presence of negative cycles. It's a common source of confusion (or at least, lack of intuition) and I think the main issue comes from not defining terms carefully enough.
The following discussion deals with directed weighted graphs and uses these definitions:
A walk is a sequence of edges joining a sequence of vertices.
A trail is a walk in which all edges are distinct.
A path is a walk in which all vertices are distinct. (Note that if a walk is path, then the walk is automatically a trail)
A simple path is a path. They are the same thing.
We are concerned with the shortest path problem. The SSSP variant asks for the shortest path from a source node to every node in the graph while the APSP variant asks for the shortest path between any two nodes in the graph. These paths should not have repeated vertices.
So far, we have learnt a handful of algorithms that "solves" SSSP or APSP, some of which are useful against special graphs (e.g. BFS when all edge weights are equal, relax edges in topological order when we are given a DAG). Here, we shall focus on the more general Dijkstra's algorithm, Bellman-Ford algorithm and Floyd-Warshall algorithm. Note that
Dijkstra's algorithm fails when the graph has negative edge weights.
Bellman-Ford algorithm fails when the graph has negative cycles.
Floyd-Warshall algorithm fails when the graph has negative cycles.
The lecture explained in detail why Dijkstra's algorithm fails when there are negative edge weights. In summary, the assumption "all edge weights are non-negative" is used in proving the correctness of the algorithm. Without this assumption, we can construct a graph for which the algorithm fails.
Some might argue that Bellman-Ford algorithm does not "fail" against graph with negative cycles. Paths in such graphs can be infinitely short because we can repeatedly walk around the negative cycles, so there really is no "shortest path" and hence there is no meaningful output.
However, according to the definitions given above, a path must not have repeated vertices. Therefore, we are not allowed to repeatedly walk around the negative cycles in the first place. It turns out that the shortest paths of graphs with negative cycles are still well-defined. For instance, consider the graph below:
If we were to run Bellman-Ford algorithm on this graph, we would not be able to correctly find the shortest path from node s
to node v
. In this case, the desired shortest path exists and has length 0, achieved via the path s->u1->u2->u3->v
. Any attempts to walk through the negative cycle more than once is illegal. In particular, s->u1->u2->u3->u1->u2->u3->v
is not a valid path. Instead, it falls under the category of a walk.
The reason negative cycles were brought up in the lecture is to show that Bellman-Ford algorithm does not always work for any graphs. In particular, it fails to find the shortest paths in graphs with negative cycles. The algorithm does not "remember" whether a node has been visited, and so might incorrectly decrease the estimates of the nodes. It is important to note that in such cases, the shortest paths still exist, but are not able to be found using Bellman-Ford algorithm. The algorithm does nothing more than accurately detecting these negative cycles.
Although this was not mentioned in the lectures, it is interesting to note that Floyd-Warshall algorithm fails for graphs with negative cycles as well. Consider the following graph:
Let P0 = { }, P1 = { a }, P2 = { a, b }, P3 = { a, b, c }
. Note that the shortest path from every node to itself must be 0. However, from the recurrence given in the lecture, S[c, c, P2] = min(S[c, c, P1], S[c, b, P1] + S[b, c, P1]) = min(0, -4 - 2) = -6
. This answer corresponds to the invalid path c->a->b->c
, therefore the output of Floyd-Warshall algorithm in this case will be incorrect. We can similarly use the algorithm to detect negative cycles though.
In summary, notice how we never really had an algorithm that solves the shortest path problem for general graphs in polynomial time. If such algorithm exists, it turns out that we can similarly solve the longest path problem in polynomial time by negating all edge weights and running this algorithm. Indeed, as mentioned during the lecture, the longest (simple) path problem is NP-complete (or NP-hard, depending on how the problem is formulated). This also implies that the shortest path problem for general graphs is NP-complete.
Observe that the length of the longest path from the source to a node is (think of a line, whose diameter is ). Obviously you wouldn't repeat vertices when trying to find the shortest path. Here, we are assuming that there are no negative cycles.
Consider finding the shortest path from the source to a vertex . Let be the shortest path from to . Then, observe that after each iteration of relaxing all the edges, at least one more vertex along the path gets its correct estimate (i.e., estimate = distance). For example, if the path is: , after the 2nd iteration, at least both and will have the right estimates. So, since the longest possible length of a shortest path (read: diameter of the graph) is , we need to relax all the edges at most times to ensure that we have found the correct distances from the source to all other nodes.
But sometimes you don’t need to wait for all iterations of relaxations? When can you terminate early?
Each edge is relaxed at most times (We abuse notation - the hallmark of a true computer scientist - and simply write instead of )
Another key invariant is that after 1 iteration, our 1 hop estimate on the shortest path is always correct. After iterations, our estimates along all the nodes in the shortest path is correct. (Can be proved inductively)
Let be a shortest path tree of graph rooted at source . After iteration , all nodes which are hops alway from on tree , have their correct estimates equal to the shortest distance.
If is the shortest path from to , and if goes through , then is also the shortest path from to (and from to ). That is, a shortest path between two nodes is made up of the shortest path between all its intermediate nodes.
This follows from the triangle inequality. If did not go through , then another path from to that went through would have the shortest distance.
A cool byproduct of Bellman-Ford is that it is able to detect negative cycles using this very property. We proved that for a graph with no negative cycles, the path length (not the actual distance) for any shortest path is at most . Running Bellman-Ford on the iteration should not affect any of our estimates (as we expect them to be correct by now). So, if we observe any updates in the estimations after iterations of relaxations, we discover that our graph has a negative cycle. Many implementations of Bellman-Ford, hence, run the algorithm for iterations, with the last iteration being used to check if any negative cycles are present.
A really neat trick is to negate all the edge weights and then run Bellman-ford to find the shortest path. Negate all the distances to recover the actual distances. It is important that this graph be a tree (no cycles) or else after negation, we end up with (possibly) negative weight cycles. Or just use DAG_SSSP with negated edges in time.
In BF, once the estimate of a node has been set, it can be updated multiple times throughout the iterations until it finally reaches the minimum distance (this is an important distinction between BF and Dijsktra). So, if we stop the BF algorithm after iterations, only the nodes whose shortest path lengths are less than or equal to hops are guaranteed to have their correct estimates. (however note that if by luck, we relax the edges in the perfect order in the first iteration itself, all the nodes will have the correct estimate. So, we cannot make claims like “after the first iterations, only those nodes that are within hops from the source on the shortest path will have the correct estimates”)
In Dijkstra, once we pop a node from the priority queue, we can be sure that its estimate will not change. In fact, if you’re interested in finding the shortest path from a source to a single vertex , you can terminate Dijkstra as soon as you pop from the priority queue.
We saw in BF that we had to relax each edge at most times to be sure that all our estimates were correct. We are now trying to get the shortest paths by only relaxing each edge once! First, we need to ask ourselves:
Yes! A right order always exists if there is no negative weight cycles. Assume that is the shortest path tree of a graph with no negative cycles and with the root being the source . Now if we relax the tree edges in BFS order starting from , we will ensure that before relaxing the edges of the children, we have relaxed the edges of the parent nodes and so, the parent would have a correct estimate. We can relax non-tree edges in any order. (This proves that a right ordering exists but does not provide a useful way to find such an ordering since we don’t know the shortest path tree before running Dijkstra lmao circular dependency)
Note that relaxing the edges of a graph in BFS order from the source would not work because we may have to relax the same edge multiple times (and hence leads to greater time complexity) (think about why it wouldn’t work: you are assuming that lower hops lower distance, which is not necessarily true)
Pick a vertex which is not there in sptSet and has a minimum distance value (we use a priority queue of nodes (NOT EDGES) to achieve this, prioritized by their distance).
Include to sptSet
Update distance value of all adjacent vertices of . To update the distance values, iterate through all adjacent vertices. For every adjacent vertex , if the sum of distance value of (from source) and weight of edge , is less than the distance value of v, then update the distance value of v. (In other words, relax all the outgoing edges from .)
Let be the source vertex and say, we start running Dijkstra. We would first mark the distances of and as 4 and 3 respectively. Then, we would look at . Since it has no outgoing edges, we move on. (But since we “explored” , we don’t expect the distance of to change again according to our invariant). Then we look at , update the distance of to be 2 since (the distance of C was greater than the distance of B plus the cost of an edge from B to C).Hence, our invariant no longer holds. It is easy to see why Dijkstra would fail for graphs with negative edges (even though in this case, the final distances would be correct, it is not true in general).
Run Dijkstra from . Since is explored before , the updation of distance of takes place while relaxing node (and at that step, will have a distance of 2) but this change is not propagated to the shortest paths containing as their intermediate node. In particular, the distance of would still remain 2 ( at the end of Dijkstra’s algorithm even though the correct answer would be 0 ()
Claim: When a node is taken out of the priority queue, its estimate is correct.
Suppose not. Let be the set of all nodes that have been explored and whose distance estimates are correct (inductive hypothesis). Since node was taken out of the priority queue, among all the nodes remaining, has the lowest estimate. The shortest path from the source to cannot go through the vertices not in because otherwise, their distances would have been smaller than that of , and hence they would have been removed from the PQ before . But since was popped off before them, the shortest path from to must go through only the vertices in and since we have already explored all vertices in and their estimates are correct (inductive hypothesis), the estimate of must also be correct.
Recall that in BF, we had to relax each edge times to be sure that we had found our correct distances. This was because we picked the edges to be relaxed in an arbitrary order.
For the sake of understanding Dijkstra’s deeply, let us assume that we don’t know how the PQ is implemented. Say, we only know that the decreaseKey
operation takes time while the insert
and deleteMin
operations take time. Let contains
and isEmpty()
take (this is not an unreasonable to assume since contains
can be implemented using a hash table - whenever you insert, add it to the hash table too. Moreover, even if we don’t have the contains
function, we can just call decreaseKey
which can delete the key, if it exists, and reinserts it using the new priority).
Firstly, each edge is relaxed exactly once. Hence, relax is called exactly times.
Our priority queue stores the nodes and their estimates as the priority. So, the maximum size of the PQ is at any time. (In fact, since you remove the starting vertex in the beginning itself)
Each node is inserted exactly once and deleted from the PQ exactly once. So, insert
and deleteMin
are called times. The time taken for this is .
Since there are edges, decreaseKey
is called at most times (in case every relaxation leads to an updation of the estimate). This takes .
So, the overall time complexity is .
The actual values of and depend on the implementation of the PQ. For example, if we use an AVL tree then insert
, deleteMin
, contains
, decreaseKey
(just search, delete, insert again to decrease the key) all take . So, the total running time of Dijkstra’s when AVL tree is used as PQ is . (Notice that size of the priority queue and hence is upper bounded by )
For a connected graph , and so we can simply write .
However, if we use an array as a PQ, insert
decreaseKey
and contains
takes (can use a Hashmap to map a node to an integer if the keys of the node are not actually integers). deleteMin
takes since we need to traverse the entire array.
Think of the time complexity of using a hash table to implement a priority queue: insert
and decreaseKey
would take but deleteMin
would take since it stores nodes in an unordered fashion. Since deleteMin
would be called exactly times, and decreaseKey
would be called at most times, the running time would be which is since cannot exceed in case of a directed graph. Observe that this is identical to simply using an array (after using a hash function to map to a distinct integer in the range to ) and so, there is no point in using a hash table over an ordinary array - don’t complicate it unnecessarily.
Question: If we have a graph with negative edges and the maximum negative weight is , can’t we just add to all edge, use Dijkstra and then subtract again to find the shortest path lengths?
No! Absolutely not! When you add to each edge, you are essentially prioritising the paths that have a shorter number of hops. You are reweighting the edges by modifying the shortest path length by a factor of edges along the shortest path. That is, if the actual shortest path has 2 hops but you add to each of the edges (so 200 to the path), while there is another (longer) path only 1 hop away, you only add 100 to the total path, and you end up with the wrong answer.
But**, multiplying a constant to each of the edges (reweighting by multiplying by a positive factor) preserves the shortest path (as the new total path length is also increased by a factor of , and does not depend on how many edges are in the shortest path)**
Our key insight is that if we relax the edges in the order that they appear in the shortest path from the source to a node, we only need to relax each edge once for the correct estimate to propagate to every node in the path. If you’re using as an intermediate node from the source to a node , before you look at the outgoing edges of , you need to make sure that has its correct estimate. If it does, you only need to look at the outgoing edges once because no further reduction of the estimate of can happen (and hence, your estimates of the neighbours of are also correct).
The algorithm works because if a path from to goes through , then we can be sure that is relaxed before (since must occur before in the topological ordering)
To obtain a topological sort, we can run a post-order DFS and add each node to the end of the current topological ordering to get the final ordering. This takes time. Then, all we need to do is iterate through every vertex and relax its outgoing edges, thus taking time as well.
Hence, we can find SSSP on a DAG in time!!!
Yes! It’s super easy! Just negate the weights of each edge and use the same algorithm above (topological ordering). Then, negate the distances in the end to get the longest path distances. So, we can find the longest path in a DAG in time!
No! If we negate the edges now, it is possible for negative cycles to be present and all our algorithms fail to find the shortest path with negative cycles. In fact, if you think carefully about it, if there exists a negative cycle in the negated graph, there exists a positive cycle in the original graph. So, you can keep going in cycles and the longest path distance can be . So, we should ideally specify that we mean “simple path” (no repeated edges)