(k, d)-Trees
Last updated
Last updated
A kd-tree is another simple way to store geometric data in a tree (very useful for finding nearest neighbour problems). Let’s think about 2-dimensional data points, i.e., points (x,y) in the plane. The basic idea behind a kd-tree is that each node represents a rectangle of the plane. A node has two children which divide the rectangle into two pieces, either vertically or horizontally. For example, some node v in the tree may split the space vertically around the line x = 10: all the points with x-coordinates ≤ 10 go to the left child, and all the points with x-coordinates > 10 go to the right child.
Typically, a kd-tree will alternate splitting the space horizontally and vertically. For example, nodes at even levels split the space vertically and nodes at odd levels split the space horizon-tally. This helps to ensure that the data is well divided, no matter which dimension is more important. All the points are stored at the leaves. When you have a region with only one node, instead of dividing further, simply create a leaf.
Here is an example of a kd-tree that contains 10 points:
How do you search for a point in a kd-tree? What is the running time?
Start at the root. At each node, there is a horizontal or a vertical split. If it is a horizontal split, then compare the x-coordinate to the split value, and branch left or right. Similarly, for a vertical split. The running time is just , the height of the tree.
You are given an (unordered) array of points. What would be a good way to build a kd-tree? Think about what would keep the tree nicely balanced. What is the running time of the construction algorithm?
Solution: Basic approach: We can think of the construction recursively. At a given node, we have a set of points, and we need to split it horizontally or vertically. (We have no choice: that depends on whether it is an even or odd level.) Therefore, you might sort the data by the x or y coordinate (depending on whether it is a horizontal or vertical split), choose the median as the split value, and then partition the points among the left and right children. The running time of this is ), since you spend at every level of the tree to do the partitioning, i.e., the recurrence is
How to do better: Instead of sorting at every level, we could either (1) choose a random split key, or (2) Use QuickSelect to find the Median. Then, the partitioning step is only , and so the total cost is
How would you find the element with the minimum (or maximum) x-coordinate in a kd-tree? How expensive can it be, if the tree is perfectly balanced?
Solution: To find the minimum, if you are at a horizontal split, it is easy: simply recurse on the left child. But, if you are at a vertical node, you have to recurse on both children, since the minimum could be in either the top half or the bottom half. (Write out the recursive pseudocodde.) To find the running time, let’s look at the recurrence from taking two steps of the search (one horizontal and one vertical): . At each step down the tree, the number of points divides in half, i.e., after one step and after two steps. After two steps of the search, there are two more recursive searches to do. Solving this recurrence, you get a recursion tree that is depth , each node has cost and there are nodes in that tree (these many nodes are candidates to be considered for the minimum x-coordinate), so the total cost is