Prize Collecting
Last updated
Last updated
We know that finding a longest path in general is an NP-hard problem. But, finding the longest path of length where length is defined as the number of hops (number of edges in the path) is not.
Problem: Given a directed weighted graph (with possible positive and negative cycles), find the maximum weight path containing at most edges.
This can also be framed as prize-collecting problem: If each edge weight represents a prize and you have levels of energy left, find the maximum amount of prizes you can collect by walking around the graph (you can repeat edges and gain the prize again). We shall now refer to this as the prize-collecting problem.
(As an aside, to detect positive weight cycles, you can negate the edges and run BF and see if it converges after iterations. If it does not, the graph contains a positive weight cycle).
Transform the input graph into a DAG. How? Make copies of each node (duplicate the graph times) and draw edges to connect nodes between 2 adjacent graphs if there is an a edge between those nodes in the original graph. So, all edges point right and the path length is at most since an edge takes you from one graph to the next. An example is shown below:
Obviously there cannot be any cycles since an edge only points to a node in the next graph (hence, it forms a DAG).
Observe that our newly transformed graph contains nodes and edges.
Then the problem becomes equivalent to finding the longest path (in terms of sum of weights) in the DAG. We are certain that the maximum path length is . So, we can run the DAG SSSP with negated edges for each source (to see where we should start from)
Because of the obvious symmetry of the graph (since each graph is a duplicate of the other), we only need to run SSSP from one set of nodes (i.e., all nodes of 1 graph). We donāt need to run it from all the nodes of all the graphs even though we are allowed to have lengths because in the end, we will look at all nodes and take the maximum value.
This will take time since we are running DAG_SSSP (which takes time on the new graph for each source) times. (Whenever you transform a graph, do not forget to recompute the number of nodes and edges in the new graph).
We can optimize this by creating a dummy node to act as the super source to run the DAG_SSSP from. We connect this dummy node to all the nodes of the first graph from which we want to find the heaviest paths (maximum prizes) (this is a very very common technique to use when you are running an algorithm multiple times for different sources, just create a dummy node). Below, the blue node is the āsuper sourceā dummy node:
If we use a dummy node, we are running DAG_SSSP only once from the dummy node (to find the heaviest path in the graph from all the source vertices) which takes because now you have added edges to connect the dummy node, and time to look at each node in the end to find the maximum prize we can collect. We need to look at all the nodes and not just the last level of nodes because we are allowed to travel less than hops as per the problem. So, the total running time becomes (assuming the graph is connected and )
If you know the optimal solution for , then it is easy to compute the optimal solution for .
Define to be the maximum prize you can collect starting at and travelling exactly steps (notice how we modified our subproblem by ensuring exactly steps rather than at most, this makes it easier to reason about later - if you had kept it āat mostā, you wouldnāt know how many steps you had actually taken and so, you wouldnāt be able to use that to solve larger subproblems and fill the memo table correctly)
Trivially, our base case is for all nodes . That is, if you cannot travel anymore, you cannot win anymore prizes.
The crucial recurrence relation is:
where and is the cost of an edge from to .
In other words, if you know the maximum prize you can win if you have steps left from all the neighbours, just calculate the maximum of all the neighbours, while adding the prize won by travelling to each neighbour.
For every node, you know that the maximum you can earn in 0 steps is 0.
Then for each node, you check how much you can travel if you have 1 step: so look at all its neighbours and pick the maximum weight edge.
Then for each node (say, ), you check how much you can travel in 2 steps: look at all the neighbours, and find the maximum neighbour where is maximum among all neighbours of . This means starting from if you have to travel exactly 2 steps, you should go from to and then from to some other node (depending on the neighbours of )
In other words, for each node, calculate how much you would travel in exactly steps if you went to all your neighbours (essentially, just brute force). But you have already computed how much you can travel from your neighbours in steps so you donāt need to recompute - this is exactly why we fill up the memo table from bottom to top, in increasing order of for all the nodes. This ensures that we have all the information we need to make the right decision. We look at all possible solution and pick the best one at every step - so, this is a greedy algorithm implemented using DP to avoid the unnecessary recomputation.
Notice that we find the maximum in the entire memo table and not just the last row because the number of edges in the longest path needs to be at most and not necessarily equal to .
Realize that the main part of this problem was to spot the subproblems and identify the correct recurrence relation - once you do that, the problem is essentially solved.
There are subproblems: each node has to store possible values regarding the prizes you can win starting from that node if you travel exactly steps. In the worst case, each node is connected to all other nodes and we need to look at all other nodes, for every node, times, giving us a complexity. But, on closer inspection, this is a pessimistic bound that is only true for very dense graphs. In general, the bound is pretty loose.
A more detailed analysis is as follows:
Each edge is looked at times, once each time to determine the value of the node using the neighbourās value. In terms of a table view, think of a 2-D table. There are rows and columns. Then, each row is constructed based on the previous row (i.e., row requires row answers). Moreover, while filling any row, we look at exactly entries of the table (in total for the entire row) since we precisely look at each of the neighbours of a node, which adds up to when we consider all the nodes. So, it takes us about to fill the table. Then, to find the maximum value in the table, it takes us (size of the table) time. Overall, the time complexity becomes .