Introduction

Terminology

A graph consists of 2 types of elements - nodes (vertices) and edges (arcs).

Except an empty graph (which consists of no nodes and no edges), every graph must have at least 1 node.

Each edge connects 2 nodes in a graph. Each edge is unique, i.e., there cannot be 2 edges between the same pair of nodes. In this definition, we also disallow self-loops (an edge from a node to itself)

A multigraph can consist of multiple edges between the same pair of nodes. In this module, we do not consider multigraph to be a type of graph.

A hypergraph is a graph in which each edge can connect more than 2 nodes but each edge is unique. We don’t consider a hypergraph to be a type of graph in this module.

So more mathematically, we can express any graph as a set of vertices and edges, i.e., G=(V,E)G = (V,E) where V>0|V| > 0 and E{(v,w):vV,wV}E \subseteq \{(v,w): v \in V, w \in V \}. Each edge e=(v,w)e = (v,w) denotes that it connects nodes vv and ww. For all edges e1,e2E:e1e2e_1, e_2 \in E: e_1 \ne e_2.

A simple path consists of at least 2 nodes and intersects each node at most once. That is, a node cannot be in a path more than one time. A path can be described in terms of the nodes, which indicates the order in which the nodes were visited.

2 nodes are said to be connected if there is a path between them. The graph is said to be connected if every pair of nodes is connected.

If a graph is not connected, it is said to be disconnected. A disconnected graph has multilpe connected components.

A simple cycle is a “path” (well, not actually because one node appears twice) that starts and ends at the same node. A cycle must have more than 2 nodes and cannot contain repeated edges.

A (unrooted) tree is simply a connected graph with no cycles.

Some important properties of a tree:

  1. A tree with nn nodes has n1n-1 edges.

  2. There is a unique path between any two nodes of a tree.

  3. Adding an edge between any two nodes of a tree creates a cycle

A forest is a graph with no cycles.

The degree of a node is the number of adjacent edges. The degree of the graph is the maximum degree of any node in the graph.

The diameter of a graph is the maximum distance between any pair of nodes, following the shortest path. In other words, diameter is a max-min property - it is the maximum (over all pairs of nodes) of the minimum distance between the 2 nodes. max(u,vδ(u,v))max(\forall u, v \quad \delta(u,v)) where δ(u,v)\delta(u,v) is the length of the shortest path between uu and vv.

We define a sparse graph when E=O(V)E = O(V) and a dense graph when E=O(V2)E = O(V^2).

There are some special kinds of graphs as follows:

A clique is a complete graph (denoted by KnK_n where nn is the number of nodes) - there exists an edge between any pair of nodes. So, there are (n2)=n(n1)2\dbinom{n}{2} = \dfrac{n(n-1)}{2} edges. The degree of each node is n1n - 1, the diameter of the graph is 11.

A line (or path) is a graph in which the nodes are arranged like a line. The diameter is n1n - 1 and the degree is 22.

A cycle is a graph in which all the nodes form a cycle. The degree of the graph is 22 and the diameter is n2\lfloor \frac{n}{2} \rfloor.

A bipartite graph is a graph whose set of nodes can be divides into 2 sets such that there is no edge connecting 2 nodes from the same set. Informally, a graph is bipartite if is possible to “colour” nodes using only 2 colours such that no 2 adjacent nodes have the same colour.

It is not always obvious to determine whether a graph is bipartite or not but the following theorem helps: A graph is bipartite if, and only if, it does not contain any odd-length cycles.

A graph is said to be planar if we can draw it on a 2-D surface like paper without any intersecting edges. It is not always easy to determine whether a graph is planar. But using the “map 4-colouring theorem” we know that if we can colour all the nodes of a graph using at most 4 colours such that no 2 adjacent nodes have the same colour, it must be planar. Another theorem is Kuratowski’s theorem: A finite graph is planar if, and only if, it does not contain a subgraph that is a subdivision of the complete graph K5K_5 or the complete bipartite graph K3,3K_{3,3}.

A planar graph with n nodes has a maximum of 3n - 6 edges. This should intuitively make some sense because there cannot be too many edges without intersecting.

A complete bipartite graph is a bipartite graph on two disjoint sets UU and VV such that every vertex in UU connects to every vertex in VV. If U=m|U| = m and V=n|V| = n, the complete bipartite graph is denoted as Km,nK_{m,n}.

A graph H is said to be a subgraph of graph G iff every vertex in H is also a vertex in H, every edge in H is also an edge in G, and every edge in H has the same endpoints as it has in G.

Handshake Theorem: If the vertices of GG are v1,v2,,vnv_1, v_2, \dots, v_n, where n0n \geq 0, then the total degree of the graph = i=1ndeg(vi)=2×(\sum_{i=1}^{n}deg(v_i) = 2 \times (the number of edges in G)G).

A corollary of the above theorem is that the total degree of the graph is always even. It also follows that in any graph, there are an even number (possibly 0) of nodes with odd degree.

A walk from vv to ww is a finite alternating sequence of adjacent vertices and edges of GG. The number of edges is the length of the walk.

A trail from vv to ww is a walk from vv to ww that does not contain any repeated edge.

A path from vv to ww is a trail that does not contain a repeated vertex.

A closed walk is a walk that starts and ends at the same vertex.

Connected Component: A graph HH is a connected component of GG iff:

  1. HH is a subgraph of GG.

  2. HH is connected.

  3. No connected subgraph of GG has HH as a subgraph and contains vertices or edges that are not in HH.

Let GG be a graph. An Euler circuit for GG is a circuit that contains every vertex and every edge of GG. An Eulerian graph is a graph that contains an Euler circuit. If a graph has an Euler circuit, then every vertex of the graph has positive even degree.

In fact, If a graph GG is connected and the degree of every vertex of GG is a positive even integer, then GG has an Euler circuit. Further, we can make a stronger claim: A graph GG **has an Euler circuit iff GG is connected and every vertex of GG has positive even degree.

Given a graph GG, a Hamiltonian circuit for GG is a simple circuit/cycle that includes every vertex of GG. (That is, every vertex appears exactly once, except for the first and the last, which are the same.) A Hamiltonian graph (also called Hamilton graph) is a graph that contains a Hamiltonian circuit.

If a graph GG has a Hamiltonian circuit, then GG has a subgraph HH with the following properties:

  1. HH contains every vertex of GG.

  2. HH is connected.

  3. HH has the same number of edges as vertices.

  4. Every vertex of HH has degree 2.

In general, the problem of finding a hamiltonian circuit or proving that none exists is an NP-hard problem (which might seem a little surprising since finding an euler circuit is pretty simple and the two problems appear quite similar).

Let GG be an undirected graph with ordered vertices v1,v2,,vnv_1, v_2, \dots, v_n. The adjacency matrix of GG is the n×nn \times n matrix A=(ai,j)A = (a_{i,j}) over the set of non-negative integers such that: ai,j=a_{i,j} = the number of edges connecting viv_i and vjv_j for all i,j=1,2,,ni,j = 1,2, \dots, n. In this module, A[i][j]={1, if there exists an edge between node i and j0, otherwiseA[i][j] = \begin{cases}1, \text{ if there exists an edge between node i and j} \\ 0, \text{ otherwise}\end{cases}

The adjacency matrix of an undirected graph is symmetric, i.e., ai,j=aj,ia_{i,j} = a_{j,i} for all i,ji,j.

If GG is a graph with adjacency matrix AA, then for each positive integer mm and for all integers i,j=1,2,,ni,j = 1, 2, \dots, n where nn is the number of vertices, then the i,ji,j entry of AmA^m gives the number of walks of length mm from vertex ii to vertex jj. In particular, Am[i][j]>0     there exists a path of length m from node i to node jA^m[i][j] > 0 \implies \text{ there exists a path of length m from node i to node j}.

Euler’s formula: For a connected planar simple (multiple edges between nodes are not allowed) graph G=(V,E)*G = (V, E)* with e=E*e = |E|* and v=V*v = |V|*, if we let f*f* be the number of faces (including the outer area), then f=ev+2f = e - v + 2.

Modelling

When we use graphs to model real world problems we need to make the following decisions:

  1. What do our nodes represent? generally, nodes represent the possible states of a problem.

  2. What do our edges represent? generally, edges represent the transition between two states in a problem

  3. Are the edges directed or undirected? (If directed, what does the direction represent)

  4. Are the edges weighted or unweighted? (If weighted, what does the edge weight represent?)

  5. Should you use an adjancecy matrix or adjacency list (or an edge list too!) representation?

  6. What are you trying to find in the graph? (e.g. shortest path, longest path, minimum vertex cover, maximum independent set, minimum spanning tree, topological sort, SCCs, )

  7. Which algorithm will work for our problem? Do we need to modify the algorithm in any way?

  8. Do we need to store any additional information at each node (i.e., augment our graph) to help us get the answer?

It is always important to understand why the algorithm works - why does it give the correct output? For example, why does running Dijkstra give us the shortest path from a node to all other nodes? Which key property (read Invariant!) is obeyed throughout the algorithm?

For example, if we consider the facebook network - we can let each user represent a node and each “friendship” represent an edge.

Similarly, for puzzles we can let each state of the puzzle be a node and each of the adjacent states (reachable within 1 move) to be its adjacent nodes. I other words, an edge represents a move. This is particularly useful for puzzles like rubicks cube.

What is the diameter of an (n×n×n)n \times n \times n) cube? θ(n2logn)\theta \left(\dfrac{n^2}{logn}\right)

Representation

There are 2 popular ways to represent a graph to solve problems

  1. Adjacency List

  2. Adjacency Matrix

(Another common way is to use an edge list: simply a list of triplets (u,v,w)(u,v,w) which indicates that there is an edge of weight ww from node uu to node vv.)

Adjacency List

It consists of nodes stored in an array and a linked list for every node that stores all its neighbours.

In the above representation, we can see that e and f are adjacent to a while b is not adjacent to a.

Space consumption: V|V| for the array. 2E2|E| since each edge appears twice - once et each entry of the endpoints. Total: O(V+E)O(V + E)

E.g. For a cycle, each vertex has exactly 2 edges and so, the space consumed is O(V)O(V)

Adjacency Matrix

In an adjacency matrix, the i,ji,j entry denotes whether or not there is an edge between node ii and node jj. That is, A[v][w]=1    (v,w)EA[v][w] = 1 \iff (v,w) \in E, where AA denotes the adjacency matrix of the graph whose edge set is E.E.

If the graph is undirected, its adjacency matrix is symmetric.

If the adjacency matrix of a graph is AA, then AkA^k represents the matrix in which each entry denotes the number of walks of length kk from node ii to node jj.

The size of the matrix is V×VV \times V and so, the space consumption is also O(V2)O(V^2) for all kinds of graphs (including cycles)

As a basic rule of thumb, if the graph is dense, it is better to use an adjacency matrix (since you won’t be wasting a lot of space) and to use an adjacency list when the graph is sparse (very few edges).

Trade-offs

  1. Adjacency matrix is fast at answering queries related to: “is there an edge between 2 nodes?”. Adjacency list takes longer.

  2. Adjacency list is fast for “enumerating all the neighbours of a node” while adjacency matrix is slower.

  3. Adjacency list is also fast for a “find me any neighbour of a given node” query.

Generally, if you use an adjacency matrix for an algorithm, you will probably need to visit all O(V2)O(V^2) elements of the matrix to “travel all edges”. So, it might be better to use an adjacency list in such a case since it would take O(V+E)O(V+E) which might be less than O(V2)O(V^2) if the graph is sparse.

Tips and Tricks

Read this section after you've learnt (almost) all the graph-related algorithms.

There are some common patterns / themes in how real-world problems are solved using graphs, and they're often natural optimisations to

  1. Instead of running an algorithm (e.g. Dijkstra) from multiple different nodes, think if you can create a dummy node (super-source) that connects to all the sources you want to run your algorithm from.

  2. If you want to find the shortest distance from every node to a particular destination node, you don’t need to run Dijkstra from all the nodes! Just reverse the edges, and run Dijkstra from the destination node to get the distance between each node and the destination.

  3. When you want to maximize the sum of weights, think if you can negate the edge weights and use a minimisation algorithm.

  4. If you want to maximize the path length where length is defined as the product of edge weights and your products are guaranteed to be between 00 and 11 (say, probabilities), you can take negative logarithm of each edge weight and use regular shortest path algorithms. Realize that maximising the f(x)f(x) is equivalent to maximizing logf(x)logf(x) since loglog is a monotonically increasing function.

  5. A common theme of graph problems is to duplicate the graph or transform the graph in some way to get certain desirable properties (e.g. remove cycles from the graph). Think if transforming a graph helps solve your problem. For example, creating multiple copies of each node to represent different "states" of your problem.

  6. Whenever you transform the graph, make sure each node captures all the properties of the state that you need to determine which nodes you can visit from the current node. Traversing an edge (u,v)(u,v) should not depend on the path used to get to uu. uu itself must capture all the information necessary to determine this! So, storing stuff like number of hops travelled so far is generally not a good idea. Instead store stuff like “minimum number of hops to get from uu to the destination” or “minimum superpowers needed to solve the maze” (if superpowers are being used to break walls in the maze). Edges must be deterministic! No if-statements should be used to decide whether an edge is valid or not!

Last updated

Was this helpful?