Longest Increasing Subsequence
Last updated
Last updated
Given a sequence of integers (say, an array), output the length of an increasing subsequence (not necessarily contiguous) of maximum length.
For example, if the input sequence was , the answer would be since the LIS would be .
Think of the elements in the sequence as nodes. Draw a directed edge between the nodes only if and occurs after in the sequence. This forms a DAG. Notice that in this case the topological ordering is the order in which the elements occur in the sequence. The problem is now essentially to find the longest path in the DAG. We can run our DAG SSSP from every node with the edges reversed.
Our DAG SSSP takes time to run once. In the worst case, (think of an increasing sequence) and so, each SSSP takes where is the length of our input sequence. We need to run this times, and so our algorithm takes .
Observe that we are recomputing a lot here! In particular, we relax each edge at most times. Can we do better?
In fact, we can do much better. Let us try to think in terms of DP and break our problem into smaller subproblems.
The trivial case is when the sequence is of length . Then, we know that the LIS is of length 1.
For each outgoing edge (in reverse topological order of nodes), find the maximum LIS of the destination node and add 1. That gives you the LIS starting at that node.
Formally, define starting at where is the input array. In terms of our table view, at each index of our table, we store the length of the longest increasing subsequence starting at index of the array. Our problem is to find the maximum entry in the table.
Our DP recurrence relation is:
Base Case: (length of LIS in an array of length 1 is 1)
Recurrence: .
Some pseudocode is as follows:
The running time of the above algorithm is essentially (Be sure to be able to prove this)
Letās stop thinking about this as a graph now.
Define ending at
So, each index in the dp table stores the length of the LIS using the first elements.
Then,
(length of LIS ending at first index)
(Look at everything to your left which is smaller than you. Find the maximum length of LIS ending at all those indices. Add 1 to get your maximum length (since you are essentially extending the sequence). If there is nothing to the left of you that is smaller than you, you cannot extend any of those LISs and so, you need to start a new LIS and your value will be 1.
Note that āending at index ā necessarily means that is a part of the LIS. It does not include those LIS which come before it but do not contain this element itself (as then you wouldnāt know whether to extend this LIS or not since you donāt know the maximum element of the LIS) - To be able to extend an LIS, all you need to know is the current maximum element in the LIS. Here, index stores the maximum element of the LIS (since it ends at index ). This is also why to determine the LIS of the entire array, you need to look at every element in the end again. We cannot be sure that the LIS contains the last element and so, we cannot simply look at the last index.
Some pseudocode:
Again, this algorithm is , which should be fairly obvious by the 2 nested for-loops.
Another way of looking at it would be to realise that there are subproblems (where a subproblem is defined as the length of the LIS ending at a particular index). A subproblem at index takes (since you need to look at all previously solved subproblems, and take the maximum). Then, .
The key invariant that we are maintaing is that at every iteration of the outer-loop, has the correct value of the maximum length of LIS ending at index .
It is possible to solve LIS in using binary search to solve subproblems faster. The key observation that allows us to use binary search is that as we extend the sequence, the length of the LIS can only increase or stay the same - hence, there is a monotonic behaviour of LIS.