BFS and DFS
Last updated
Last updated
Very simple yet important algorithm for traversing a graph and all its nodes.
Start from a node - visit all its neighbours. Then, for each of the neighbours, visit all its neighbours, and so on.
A BFS traversal covers all nodes and all edges (however, it is important to remember that it does not travel all paths between 2 nodes! There can be an exponential number of paths between any two nodes (even in a DAG) and so visiting all paths between two nodes necessarily takes exponential time! It is, nearly always, a terrible idea to brute force all paths between two nodes to determine some property.)
BFS gives you the shortest path from one node to all other nodes (i.e., SSSP) for an unweighted graph (directed is allowed).
All the edges in the parent BFS form a tree (since you don’t visit repeated vertices) - this is called a BFS tree. The order in which the edges are inserted into this BFS tree while running BFS is known as the BFS order of edges (with a given starting node).
The edges not included in the BFS algorithm (not travelled by) are called cross-edges.
Running time: (explained later)
If a graph is disconnected, you need to start BFS again from another node to cover the remaining nodes that have not yet been visited.
Note that BFS can only solve shortest path problems for unweighted graphs (or when all the weights are the same)
Follow a path until you get stuck
Backtrack until you find a new edge (a node that was not fully explored yet)
Recursively explore it.
Be careful to not repeat a vertex
Note: DFS does not give you the shortest path from one vertex to another (for a general graph, even if it is unweighted and undirected).
But if you think about it, DFS can be used to give the shortest path from a node to another node in an unweighted tree (since there is only one path between any two nodes anyway and DFS can find that path. With little modification, we can store the “depth” of DFS currently being run to determine the length of the path between a node to another node in an unweighted tree)
All the parent edges in the DFS form a tree (since you don’t visit repeated vertices) - this forms a DFS tree.
pre-order DFS: add node to a list/array the first time you see it (when you start exploring it)
post-order DFS: add node to a list/array once you have explored it completely (i.e., all its outgoing edges have been explored)
Actually, if we think from a high level, both BFS and DFS are exactly the same algorithm. They just use a different data structure to store the current nodes being visited. In particular, BFS uses a queue while DFS uses a stack.
It is very important to remember what BFS and DFS do: they visit every node, they visit every edge, but they DO NOT visit every path between 2 nodes.
In fact, traversing every path (even every shortest path) between two nodes takes exponential time (since there can be an exponential number (exponential in the number of nodes) of paths between two nodes)
We can define four types of edges in a graph after performing DFS:
Tree edge: It is an edge which is present in the tree obtained after applying DFS on the graph. All green edges are tree edges in the graph below.
Observe that an undirected graph cannot have forward edges and cross edges. Try it out yourself.
How would you identify a back edge in DFS? When you’re exploring a node, mark it as “exploring” (since it is still on the recursion stack). When you’re done exploring it entirely, mark it as “not exporing”. If while exploring a node, you reach any other node that is currently on the stack (still being explored), that edge is a back edge.
This gives us a linear time algorithm to detect a cycle (rather than using Bellman ford or what not)
Why is the running time and not ?
is a very loose upper bound (obtained by simply looking at the two for loops and thinking: we visit each node once and for each node, we look at all its outgoing edges, which can be (this itself should raise your suspicions). Hence, ). On careful analysis of the algorithm above, we observe that every statement can be associated with either a vertex or an edge. In particular, each vertex is only a part of 1 frontier and so the number of times that we repeat the statements within the while loop is at most . Moreover, the total number of edges is and we only check each edge exactly once. So, even though it appears to have 2 nested for-loops, the running time is because some frontiers are small, others are big. Some nodes have high degree, others have a low degree. Thinking from a higher level of abstraction, we know that BFS visits each node exactly once and each edge exactly once. So, the running time should be
Moreover, it is important to note that the running time of BFS is when we are using an adjacency list. If we decide to use an adjacency matrix, it takes us time to enumerate all the neighbours of a node. Since, we need to do this for every node, we end up with a BFS algorithm. Think of it this way: for each node, you need to look at all the entries corresponding to its row (recall that will give a row of numbers, with indicating an absence of an edge and indicating its presence). You need to do this for each node. So, you look at each element of the adjacency matrix once, resulting in an algorithm.
The running time of DFS is also . Again, it should be obvious that DFS-visit
is called on each node exactly once - for every call, we enumerate its neighbours. In total, we enumerate all edges twice (number of neighbours enumerated = number of edges), once at each end-point of the edge. Hence, .
Similar to BFS, this is only true if we use an adjacency list. If we use an adjacency matrix, it takes us to enumerate all edges for every node (not in total!). So, the running time using an adjacency matrix would be
Cross edge: An edge connecting such that and have no “ancestor-descendent” relationship between them in the original graph. Cross edges are not part of the DFS tree. The edge is a cross edge below.
Back edge: It is an edge such that is ancestor of node u but the edge is not not part of DFS tree. The edge is a back edge in the graph below.
Forward edge: It is an edge such that is a descendant of but the edge is not not part of the DFS tree. The edge is a forward edge in the graph below.
Another cool thing about classifying these edges is that we can use them to detect cycles: a graph has a cycle the DFS of the graph has a back edge. (pretty easy to understand)
If you consider the “start” of exploration of a node as an opening parantheses and the “completion” of exploration of a node as a closing parantheses then you can observe that when you run DFS you get a balanced parantheses. That is, for each closing paranthesis, its matching/corresponding paranthesis lies before it. For example, you might get something like: but not like . This is part of the depth first nature of DFS.