Introduction

Terminology

A graph consists of 2 types of elements - nodes (vertices) and edges (arcs).

Except an empty graph (which consists of no nodes and no edges), every graph must have at least 1 node.

Each edge connects 2 nodes in a graph. Each edge is unique, i.e., there cannot be 2 edges between the same pair of nodes. In this definition, we also disallow self-loops (an edge from a node to itself)

A multigraph can consist of multiple edges between the same pair of nodes. In this module, we do not consider multigraph to be a type of graph.

A hypergraph is a graph in which each edge can connect more than 2 nodes but each edge is unique. We don’t consider a hypergraph to be a type of graph in this module.

A simple path consists of at least 2 nodes and intersects each node at most once. That is, a node cannot be in a path more than one time. A path can be described in terms of the nodes, which indicates the order in which the nodes were visited.

2 nodes are said to be connected if there is a path between them. The graph is said to be connected if every pair of nodes is connected.

If a graph is not connected, it is said to be disconnected. A disconnected graph has multilpe connected components.

A simple cycle is a “path” (well, not actually because one node appears twice) that starts and ends at the same node. A cycle must have more than 2 nodes and cannot contain repeated edges.

A (unrooted) tree is simply a connected graph with no cycles.

Some important properties of a tree:

There is a unique path between any two nodes of a tree.
Adding an edge between any two nodes of a tree creates a cycle

A forest is a graph with no cycles.

The degree of a node is the number of adjacent edges. The degree of the graph is the maximum degree of any node in the graph.

There are some special kinds of graphs as follows:

A bipartite graph is a graph whose set of nodes can be divides into 2 sets such that there is no edge connecting 2 nodes from the same set. Informally, a graph is bipartite if is possible to “colour” nodes using only 2 colours such that no 2 adjacent nodes have the same colour.

It is not always obvious to determine whether a graph is bipartite or not but the following theorem helps: A graph is bipartite if, and only if, it does not contain any odd-length cycles.

A planar graph with n nodes has a maximum of 3n - 6 edges. This should intuitively make some sense because there cannot be too many edges without intersecting.

A graph H is said to be a subgraph of graph G iff every vertex in H is also a vertex in H, every edge in H is also an edge in G, and every edge in H has the same endpoints as it has in G.

A corollary of the above theorem is that the total degree of the graph is always even. It also follows that in any graph, there are an even number (possibly 0) of nodes with odd degree.

A closed walk is a walk that starts and ends at the same vertex.

In general, the problem of finding a hamiltonian circuit or proving that none exists is an NP-hard problem (which might seem a little surprising since finding an euler circuit is pretty simple and the two problems appear quite similar).

Modelling

When we use graphs to model real world problems we need to make the following decisions:

What do our nodes represent? generally, nodes represent the possible states of a problem.
What do our edges represent? generally, edges represent the transition between two states in a problem
Are the edges directed or undirected? (If directed, what does the direction represent)
Are the edges weighted or unweighted? (If weighted, what does the edge weight represent?)
Should you use an adjancecy matrix or adjacency list (or an edge list too!) representation?
What are you trying to find in the graph? (e.g. shortest path, longest path, minimum vertex cover, maximum independent set, minimum spanning tree, topological sort, SCCs, )
Which algorithm will work for our problem? Do we need to modify the algorithm in any way?
Do we need to store any additional information at each node (i.e., augment our graph) to help us get the answer?

It is always important to understand why the algorithm works - why does it give the correct output? For example, why does running Dijkstra give us the shortest path from a node to all other nodes? Which key property (read Invariant!) is obeyed throughout the algorithm?

For example, if we consider the facebook network - we can let each user represent a node and each “friendship” represent an edge.

Similarly, for puzzles we can let each state of the puzzle be a node and each of the adjacent states (reachable within 1 move) to be its adjacent nodes. I other words, an edge represents a move. This is particularly useful for puzzles like rubicks cube.

Representation

There are 2 popular ways to represent a graph to solve problems

Adjacency List
Adjacency Matrix

Adjacency List

It consists of nodes stored in an array and a linked list for every node that stores all its neighbours.

In the above representation, we can see that e and f are adjacent to a while b is not adjacent to a.

Adjacency Matrix

If the graph is undirected, its adjacency matrix is symmetric.

As a basic rule of thumb, if the graph is dense, it is better to use an adjacency matrix (since you won’t be wasting a lot of space) and to use an adjacency list when the graph is sparse (very few edges).

Trade-offs

Adjacency matrix is fast at answering queries related to: “is there an edge between 2 nodes?”. Adjacency list takes longer.
Adjacency list is fast for “enumerating all the neighbours of a node” while adjacency matrix is slower.
Adjacency list is also fast for a “find me any neighbour of a given node” query.

Tips and Tricks

Read this section after you've learnt (almost) all the graph-related algorithms.

There are some common patterns / themes in how real-world problems are solved using graphs, and they're often natural optimisations to

Instead of running an algorithm (e.g. Dijkstra) from multiple different nodes, think if you can create a dummy node (super-source) that connects to all the sources you want to run your algorithm from.
If you want to find the shortest distance from every node to a particular destination node, you don’t need to run Dijkstra from all the nodes! Just reverse the edges, and run Dijkstra from the destination node to get the distance between each node and the destination.
When you want to maximize the sum of weights, think if you can negate the edge weights and use a minimisation algorithm.
A common theme of graph problems is to duplicate the graph or transform the graph in some way to get certain desirable properties (e.g. remove cycles from the graph). Think if transforming a graph helps solve your problem. For example, creating multiple copies of each node to represent different "states" of your problem.

PreviousHeap NextBFS and DFS

Last updated 3 months ago