Introduction
Last updated
Last updated
A graph consists of 2 types of elements - nodes (vertices) and edges (arcs).
Except an empty graph (which consists of no nodes and no edges), every graph must have at least 1 node.
Each edge connects 2 nodes in a graph. Each edge is unique, i.e., there cannot be 2 edges between the same pair of nodes. In this definition, we also disallow self-loops (an edge from a node to itself)
A multigraph can consist of multiple edges between the same pair of nodes. In this module, we do not consider multigraph to be a type of graph.
A hypergraph is a graph in which each edge can connect more than 2 nodes but each edge is unique. We don’t consider a hypergraph to be a type of graph in this module.
A simple path consists of at least 2 nodes and intersects each node at most once. That is, a node cannot be in a path more than one time. A path can be described in terms of the nodes, which indicates the order in which the nodes were visited.
2 nodes are said to be connected if there is a path between them. The graph is said to be connected if every pair of nodes is connected.
If a graph is not connected, it is said to be disconnected. A disconnected graph has multilpe connected components.
A simple cycle is a “path” (well, not actually because one node appears twice) that starts and ends at the same node. A cycle must have more than 2 nodes and cannot contain repeated edges.
A (unrooted) tree is simply a connected graph with no cycles.
Some important properties of a tree:
There is a unique path between any two nodes of a tree.
Adding an edge between any two nodes of a tree creates a cycle
A forest is a graph with no cycles.
The degree of a node is the number of adjacent edges. The degree of the graph is the maximum degree of any node in the graph.
There are some special kinds of graphs as follows:
A bipartite graph is a graph whose set of nodes can be divides into 2 sets such that there is no edge connecting 2 nodes from the same set. Informally, a graph is bipartite if is possible to “colour” nodes using only 2 colours such that no 2 adjacent nodes have the same colour.
It is not always obvious to determine whether a graph is bipartite or not but the following theorem helps: A graph is bipartite if, and only if, it does not contain any odd-length cycles.
A planar graph with n nodes has a maximum of 3n - 6 edges. This should intuitively make some sense because there cannot be too many edges without intersecting.
A graph H is said to be a subgraph of graph G iff every vertex in H is also a vertex in H, every edge in H is also an edge in G, and every edge in H has the same endpoints as it has in G.
A corollary of the above theorem is that the total degree of the graph is always even. It also follows that in any graph, there are an even number (possibly 0) of nodes with odd degree.
A closed walk is a walk that starts and ends at the same vertex.
In general, the problem of finding a hamiltonian circuit or proving that none exists is an NP-hard problem (which might seem a little surprising since finding an euler circuit is pretty simple and the two problems appear quite similar).
When we use graphs to model real world problems we need to make the following decisions:
What do our nodes represent? generally, nodes represent the possible states of a problem.
What do our edges represent? generally, edges represent the transition between two states in a problem
Are the edges directed or undirected? (If directed, what does the direction represent)
Are the edges weighted or unweighted? (If weighted, what does the edge weight represent?)
Should you use an adjancecy matrix or adjacency list (or an edge list too!) representation?
What are you trying to find in the graph? (e.g. shortest path, longest path, minimum vertex cover, maximum independent set, minimum spanning tree, topological sort, SCCs, )
Which algorithm will work for our problem? Do we need to modify the algorithm in any way?
Do we need to store any additional information at each node (i.e., augment our graph) to help us get the answer?
It is always important to understand why the algorithm works - why does it give the correct output? For example, why does running Dijkstra give us the shortest path from a node to all other nodes? Which key property (read Invariant!) is obeyed throughout the algorithm?
For example, if we consider the facebook network - we can let each user represent a node and each “friendship” represent an edge.
Similarly, for puzzles we can let each state of the puzzle be a node and each of the adjacent states (reachable within 1 move) to be its adjacent nodes. I other words, an edge represents a move. This is particularly useful for puzzles like rubicks cube.
There are 2 popular ways to represent a graph to solve problems
Adjacency List
Adjacency Matrix
It consists of nodes stored in an array and a linked list for every node that stores all its neighbours.
In the above representation, we can see that e and f are adjacent to a while b is not adjacent to a.
If the graph is undirected, its adjacency matrix is symmetric.
As a basic rule of thumb, if the graph is dense, it is better to use an adjacency matrix (since you won’t be wasting a lot of space) and to use an adjacency list when the graph is sparse (very few edges).
Adjacency matrix is fast at answering queries related to: “is there an edge between 2 nodes?”. Adjacency list takes longer.
Adjacency list is fast for “enumerating all the neighbours of a node” while adjacency matrix is slower.
Adjacency list is also fast for a “find me any neighbour of a given node” query.
Read this section after you've learnt (almost) all the graph-related algorithms.
There are some common patterns / themes in how real-world problems are solved using graphs, and they're often natural optimisations to
Instead of running an algorithm (e.g. Dijkstra) from multiple different nodes, think if you can create a dummy node (super-source) that connects to all the sources you want to run your algorithm from.
If you want to find the shortest distance from every node to a particular destination node, you don’t need to run Dijkstra from all the nodes! Just reverse the edges, and run Dijkstra from the destination node to get the distance between each node and the destination.
When you want to maximize the sum of weights, think if you can negate the edge weights and use a minimisation algorithm.
A common theme of graph problems is to duplicate the graph or transform the graph in some way to get certain desirable properties (e.g. remove cycles from the graph). Think if transforming a graph helps solve your problem. For example, creating multiple copies of each node to represent different "states" of your problem.
So more mathematically, we can express any graph as a set of vertices and edges, i.e., where and . Each edge denotes that it connects nodes and . For all edges .
A tree with nodes has edges.
The diameter of a graph is the maximum distance between any pair of nodes, following the shortest path. In other words, diameter is a max-min property - it is the maximum (over all pairs of nodes) of the minimum distance between the 2 nodes. where is the length of the shortest path between and .
We define a sparse graph when and a dense graph when .
A clique is a complete graph (denoted by where is the number of nodes) - there exists an edge between any pair of nodes. So, there are edges. The degree of each node is , the diameter of the graph is .
A line (or path) is a graph in which the nodes are arranged like a line. The diameter is and the degree is .
A cycle is a graph in which all the nodes form a cycle. The degree of the graph is and the diameter is .
A graph is said to be planar if we can draw it on a 2-D surface like paper without any intersecting edges. It is not always easy to determine whether a graph is planar. But using the “map 4-colouring theorem” we know that if we can colour all the nodes of a graph using at most 4 colours such that no 2 adjacent nodes have the same colour, it must be planar. Another theorem is Kuratowski’s theorem: A finite graph is planar if, and only if, it does not contain a subgraph that is a subdivision of the complete graph or the complete bipartite graph .
A complete bipartite graph is a bipartite graph on two disjoint sets and such that every vertex in connects to every vertex in . If and , the complete bipartite graph is denoted as .
Handshake Theorem: If the vertices of are , where , then the total degree of the graph = the number of edges in .
A walk from to is a finite alternating sequence of adjacent vertices and edges of . The number of edges is the length of the walk.
A trail from to is a walk from to that does not contain any repeated edge.
A path from to is a trail that does not contain a repeated vertex.
Connected Component: A graph is a connected component of iff:
is a subgraph of .
is connected.
No connected subgraph of has as a subgraph and contains vertices or edges that are not in .
Let be a graph. An Euler circuit for is a circuit that contains every vertex and every edge of . An Eulerian graph is a graph that contains an Euler circuit. If a graph has an Euler circuit, then every vertex of the graph has positive even degree.
In fact, If a graph is connected and the degree of every vertex of is a positive even integer, then has an Euler circuit. Further, we can make a stronger claim: A graph **has an Euler circuit iff is connected and every vertex of has positive even degree.
Given a graph , a Hamiltonian circuit for is a simple circuit/cycle that includes every vertex of . (That is, every vertex appears exactly once, except for the first and the last, which are the same.) A Hamiltonian graph (also called Hamilton graph) is a graph that contains a Hamiltonian circuit.
If a graph has a Hamiltonian circuit, then has a subgraph with the following properties:
contains every vertex of .
is connected.
has the same number of edges as vertices.
Every vertex of has degree 2.
Let be an undirected graph with ordered vertices . The adjacency matrix of is the matrix over the set of non-negative integers such that: the number of edges connecting and for all . In this module,
The adjacency matrix of an undirected graph is symmetric, i.e., for all .
If is a graph with adjacency matrix , then for each positive integer and for all integers where is the number of vertices, then the entry of gives the number of walks of length from vertex to vertex . In particular, .
Euler’s formula: For a connected planar simple (multiple edges between nodes are not allowed) graph * with and , if we let * be the number of faces (including the outer area), then .
What is the diameter of an ( cube?
(Another common way is to use an edge list: simply a list of triplets which indicates that there is an edge of weight from node to node .)
Space consumption: for the array. since each edge appears twice - once et each entry of the endpoints. Total:
E.g. For a cycle, each vertex has exactly 2 edges and so, the space consumed is
In an adjacency matrix, the entry denotes whether or not there is an edge between node and node . That is, , where denotes the adjacency matrix of the graph whose edge set is
If the adjacency matrix of a graph is , then represents the matrix in which each entry denotes the number of walks of length from node to node .
The size of the matrix is and so, the space consumption is also for all kinds of graphs (including cycles)
Generally, if you use an adjacency matrix for an algorithm, you will probably need to visit all elements of the matrix to “travel all edges”. So, it might be better to use an adjacency list in such a case since it would take which might be less than if the graph is sparse.
If you want to maximize the path length where length is defined as the product of edge weights and your products are guaranteed to be between and (say, probabilities), you can take negative logarithm of each edge weight and use regular shortest path algorithms. Realize that maximising the is equivalent to maximizing since is a monotonically increasing function.
Whenever you transform the graph, make sure each node captures all the properties of the state that you need to determine which nodes you can visit from the current node. Traversing an edge should not depend on the path used to get to . itself must capture all the information necessary to determine this! So, storing stuff like number of hops travelled so far is generally not a good idea. Instead store stuff like “minimum number of hops to get from to the destination” or “minimum superpowers needed to solve the maze” (if superpowers are being used to break walls in the maze). Edges must be deterministic! No if-statements should be used to decide whether an edge is valid or not!