🧑‍💻
CS2040S Data Structures and Algorithms
  • Algorithms
    • Overview
    • Time Complexity
    • Binary Search
  • Sorting Algorithms
  • Order Statistics
  • Interval Searching
  • Orthogonal Range Queries
  • Random Permutation Generation
  • Disjoint Set Union
  • Data Structures
    • Binary Search Tree
    • Trie
    • Hash (Symbol) Table
    • (a, b)-Trees
    • (k, d)-Trees
    • Heap
  • Graphs
    • Introduction
    • BFS and DFS
    • DAG and Topological Sort
    • SCC (Tarjan's and Kosaraju's)
    • SSSP (Bellman-Ford and Dijkstra)
    • MST (Prim's and Kruskal's)
  • Dynamic Programming
    • Concept
    • APSP Floyd-Warshall
    • Longest Increasing Subsequence
    • 0-1 Knapsack
    • Prize Collecting
    • Vertex Cover on a Tree
Powered by GitBook
On this page
  • Problem
  • Attempt 1 : Graph Modelling
  • Attempt 2 : Dynamic Programming
  • Faster, Faster, Faster

Was this helpful?

  1. Dynamic Programming

Longest Increasing Subsequence

PreviousAPSP Floyd-WarshallNext0-1 Knapsack

Last updated 8 months ago

Was this helpful?

Problem

Given a sequence of integers (say, an array), output the length of an increasing subsequence (not necessarily contiguous) of maximum length.

For example, if the input sequence was {8,3,6,4,5,7,7}\{8,3,6,4,5,7,7\}{8,3,6,4,5,7,7}, the answer would be 444 since the LIS would be {3,4,5,7}\{3,4,5,7\}{3,4,5,7}.

Attempt 1 : Graph Modelling

Think of the elements in the sequence as nodes. Draw a directed edge between the nodes (u,v)(u,v)(u,v) only if v>uv > uv>u and vvv occurs after uuu in the sequence. This forms a DAG. Notice that in this case the topological ordering is the order in which the elements occur in the sequence. The problem is now essentially to find the longest path in the DAG. We can run our DAG SSSP from every node with the edges reversed.

Our DAG SSSP takes O(V+E)O(V+E)O(V+E) time to run once. In the worst case, E=O(V2)E = O(V^2)E=O(V2) (think of an increasing sequence) and so, each SSSP takes O(n2)O(n^2)O(n2) where nnn is the length of our input sequence. We need to run this nnn times, and so our algorithm takes O(n3)O(n^3)O(n3).

Observe that we are recomputing a lot here! In particular, we relax each edge at most nnn times. Can we do better?

In fact, we can do much better. Let us try to think in terms of DP and break our problem into smaller subproblems.

The trivial case is when the sequence is of length 111. Then, we know that the LIS is of length 1.

For each outgoing edge (in reverse topological order of nodes), find the maximum LIS of the destination node and add 1. That gives you the LIS starting at that node.

Formally, define S[i]=LIS(A[i…n])S[i] = LIS(A[i \dots n])S[i]=LIS(A[i…n]) starting at A[i]A[i]A[i] where A[1,…n]A[1, \dots n]A[1,…n] is the input array. In terms of our table view, at each index iii of our table, we store the length of the longest increasing subsequence starting at index iii of the array. Our problem is to find the maximum entry in the table.

Our DP recurrence relation is:

  1. Base Case: S[n]=1S[n] = 1S[n]=1(length of LIS in an array of length 1 is 1)

  2. Recurrence: S[i]=(max(i,j)∈ES[j])+1S[i] = (max_{(i,j) \in E} S[j]) + 1S[i]=(max(i,j)∈E​S[j])+1.

Some pseudocode is as follows:

The running time of the above algorithm is essentially O(V+E)=O(n2)O(V+E) = O(n^2)O(V+E)=O(n2) (Be sure to be able to prove this)

Attempt 2 : Dynamic Programming

Let’s stop thinking about this as a graph now.

Define S[i]=LIS(A[1…i])S[i] = LIS(A[1\dots i])S[i]=LIS(A[1…i]) ending at A[i]A[i]A[i]

So, each index iii in the dp table stores the length of the LIS using the first iii elements.

Then,

  1. S[1]=1S[1] = 1S[1]=1 (length of LIS ending at first index)

  2. S[i]=(max(j<i, A[j]<A[i])S[j])+1S[i] = (max_{(j < i, \ A[j] < A[i])} S[j]) + 1S[i]=(max(j<i, A[j]<A[i])​S[j])+1 (Look at everything to your left which is smaller than you. Find the maximum length of LIS ending at all those indices. Add 1 to get your maximum length (since you are essentially extending the sequence). If there is nothing to the left of you that is smaller than you, you cannot extend any of those LISs and so, you need to start a new LIS and your value will be 1.

Note that “ending at index iii” necessarily means that iii is a part of the LIS. It does not include those LIS which come before it but do not contain this element itself (as then you wouldn’t know whether to extend this LIS or not since you don’t know the maximum element of the LIS) - To be able to extend an LIS, all you need to know is the current maximum element in the LIS. Here, index iii stores the maximum element of the LIS (since it ends at index iii). This is also why to determine the LIS of the entire array, you need to look at every element in the end again. We cannot be sure that the LIS contains the last element and so, we cannot simply look at the last index.

Some pseudocode:

Again, this algorithm is O(n2)O(n^2)O(n2), which should be fairly obvious by the 2 nested for-loops.

Another way of looking at it would be to realise that there are nnn subproblems (where a subproblem is defined as the length of the LIS ending at a particular index). A subproblem at index iii takes O(i)O(i)O(i) (since you need to look at all previously solved subproblems, and take the maximum). Then, ∑i=1nO(i)=O(n2)\sum_{i = 1}^n O(i) = O(n^2)∑i=1n​O(i)=O(n2).

The key invariant that we are maintaing is that at every iteration of the outer-loop, S[i]S[i]S[i] has the correct value of the maximum length of LIS ending at index iii.

Faster, Faster, Faster

It is possible to solve LIS in O(nlog⁡n)O(n\log n)O(nlogn) using binary search to solve subproblems faster. The key observation that allows us to use binary search is that as we extend the sequence, the length of the LIS can only increase or stay the same - hence, there is a monotonic behaviour of LIS.