15.4. KD Trees — CS3 Data Structures & Algorithms (2024)

15.4.1. KD Trees¶

The kd tree is a modification to the BST that allowsfor efficient processing ofmulti-dimensional search keys.The kd tree differs from the BST in that each level of the kd treemakes branching decisions based on a particular search key associatedwith that level, called the discriminator.In principle, the kd tree could be used to unify key searching acrossany arbitrary set of keys such as name and zipcode.But in practice, it is nearly always used to support search onmulti-dimensional coordinates, such as locations in 2D or 3D space.We define the discriminator at level \(i\) to be\(i\) mod \(k\) for \(k\) dimensions.For example, assume that we store data organized by\(xy\)-coordinates.In this case, \(k\) is 2 (there are two coordinates),with the \(x\)-coordinate field arbitrarily designated key 0,and the \(y\)-coordinate field designated key 1.At each level, the discriminator alternates between \(x\) and\(y\).Thus, a node \(N\) at level 0 (the root) would have in its leftsubtree only nodes whose \(x\) values are less than\(N_x\) (because \(x\) is search key 0, and\(0 \mod 2 = 0\)).The right subtree would contain nodes whose \(x\) values aregreater than \(N_x\).A node \(M\) at level 1 would have in its left subtree onlynodes whose \(y\) values are less than \(M_y\).There is no restriction on the relative values of \(M_x\) and the\(x\) values of \(M\) ‘s descendants, because branchingdecisions made at \(M\) are based solely on the \(y\)coordinate.Figure 15.4.1 shows an example of how a collectionof two-dimensional points would be stored in a kd tree.

Figure 15.4.1: Example of a kd tree.(a) The kd tree decomposition for a \(128 \times 128\)-unitregion containing seven data points.(b) The kd tree for the region of (a).¶

In Figure 15.4.1, the region containing the pointsis (arbitrarily) restricted to a \(128 \times 128\) square, andeach internal node splits the search space.Each split is shown by a line, vertical for nodes with\(x\) discriminators and horizontal for nodes with \(y\)discriminators.The root node splits the space into two parts;its children further subdivide the space into smaller parts.The children’s split lines do not cross the root’s split line.Thus, each node in the kd tree helps to decompose the space intorectangles that show the extent of where nodes can fall in thevarious subtrees.

Searching a kd tree for the record with a specified xy-coordinateis like searching a BST, except that each level of thekd tree is associated with a particular discriminator.

Example 15.4.1

Consider searching the kd tree for arecord located at \(P = (69, 50)\).First compare \(P\) with the point stored atthe root (record \(A\) in Figure 15.4.1).If \(P\) matches the location of :math:A`,then the search is successful.In this example the positions do not match(\(A\) ‘s location (40, 45) is not the same as (69, 50)),so the search must continue.The \(x\) value of \(A\) is compared with that of\(P\) to determine in which direction to branch.Because \(A_x\)’s value of 40 is less than\(P\)’s \(x\) value of 69, we branch to the right subtree(all cities with \(x\) value greater than or equal to 40 are inthe right subtree).\(A_y\) does not affect the decision on which way tobranch at this level.At the second level, \(P\) does not match record \(C\)’sposition, so another branch must be taken.However, at this level we branch based on the relative \(y\)values of point \(P\) and record \(C\)(because \(1 \mod 2 = 1\), which corresponds to the\(y\)-coordinate).Because \(C_y\)’s value of 10 is less than \(P_y\)’s valueof 50, we branch to the right.At this point, \(P\) is compared against the positionof \(D\).A match is made and the search is successful.

If the search process reaches a NULL pointer, thenthat point is not contained in the tree.Here is a kd tree search implementation,equivalent to the findhelp function of the BST class.KD class private member D stores the key’s dimension.

private E findhelp(KDNode<E> rt, int[] key, int level) { if (rt == null) return null; E it = rt.element(); int[] itkey = rt.key(); if ((itkey[0] == key[0]) && (itkey[1] == key[1])) return rt.element(); if (itkey[level] > key[level]) return findhelp(rt.left(), key, (level+1)%D); else return findhelp(rt.right(), key, (level+1)%D);}

Inserting a new node into the kd tree is similar toBST insertion.The kd tree search procedure is followed until a NULL pointer isfound, indicating the proper place to insert the new node.

Example 15.4.2

Inserting a record at location (10, 50) in thekd tree of Figure 15.4.1 first requires a searchto the node containing record \(B\).At this point, the new record is inserted into \(B\)’s leftsubtree.

Deleting a node from a kd tree is similar to deleting from a BST,but slightly harder.As with deleting from a BST, the first step is to find the node(call it \(N\)) to be deleted.It is then necessary to find a descendant of \(N\) which can beused to replace \(N\) in the tree.If \(N\) has no children, then \(N\) is replaced with aNULL pointer.Note that if \(N\) has one child that in turn has children, wecannot simply assign \(N\)’s parent to point to \(N\)’schild as would be done in the BST.To do so would change the level of all nodes in the subtree, and thusthe discriminator used for a search would also change.The result is that the subtree would no longer be a kd tree because anode’s children might now violate the BST property for thatdiscriminator.

Similar to BST deletion, the record stored in \(N\) shouldbe replaced either by the record in \(N\)’s right subtree withthe least value of <var>N</var>’s discriminator, or by the record in\(N\)’s left subtree with the greatest value for thisdiscriminator.Assume that \(N\) was at an odd level and therefore \(y\) isthe discriminator.\(N\) could then be replaced by the record in its right subtreewith the least \(y\) value (call it \(Y_{min}\)).The problem is that <var>Y</var><sub>min</sub> is not necessarily theleftmost node, as it would be in the BST.A modified search procedure to find the least \(y\) value in theleft subtree must be used to find it instead.The implementation for findmin is shown next.A recursive call to the delete routine will then remove:math`Y_{min}` from the tree.Finally, \(Y_{min}\)’s record is substituted for therecord in node \(N\).

private KDNode<E>findmin(KDNode<E> rt, int descrim, int level) { KDNode<E> temp1, temp2; int[] key1 = null; int[] key2 = null; if (rt == null) return null; temp1 = findmin(rt.left(), descrim, (level+1)%D); if (temp1 != null) key1 = temp1.key(); if (descrim != level) { temp2 = findmin(rt.right(), descrim, (level+1)%D); if (temp2 != null) key2 = temp2.key(); if ((temp1 == null) || ((temp2 != null) && (key1[descrim] > key2[descrim]))) temp1 = temp2; key1 = key2; } // Now, temp1 has the smaller value int[] rtkey = rt.key(); if ((temp1 == null) || (key1[descrim] > rtkey[descrim])) return rt; else return temp1;}

See Also

sklearn.neighbors.KDTree

In findmin, on levels using the minimum value’s discriminator,branching is to the left.On other levels, both children’s subtrees must be visited.Helper function min takes two nodes and a discriminator asinput, and returns the node with the smaller value in thatdiscriminator.

Note that we can replace the node to be deleted with the least-valuednode from the right subtree only if the right subtree exists.If it does not, then a suitable replacement must be found in the leftsubtree.Unfortunately, it is not satisfactory to replace \(N\)’s recordwith the record having the greatest value for the discriminator in theleft subtree, because this new value might be duplicated.If so, then we would have equal values for the discriminator in\(N\)’s left subtree, which violates the ordering rules for thekd tree.Fortunately, there is a simple solution to the problem.We first move the left subtree of node \(N\) to become theright subtree (i.e., we simply swap the values of \(N\)’s leftand right child pointers).At this point, we proceed with the normal deletion process, replacingthe record of <var>N</var> to be deleted with the record containingthe least value of the discriminator from what is now\(N\)’s right subtree.

Assume that we want to print out a list of all records that are withina certain distance \(d\) of a given point \(P\).We will use Euclidean distance, that is, point \(P\) is definedto be within distance \(d\) of point \(N\)if \(\sqrt{(P_x - N_x)^2 + (P_y - N_y)^2} \leq d.\) 1

If the search process reaches a node whose key value for thediscriminator is more than \(d\) above the corresponding value inthe search key, then it is not possible that any record in the rightsubtree can be within distance \(d\) of the search key because allkey values in that dimension are always too great.Similarly, if the current node’s key value in the discriminatoris \(d\) less than that for the search key value, then no record inthe left subtree can be within the radius.In~such cases, the subtree in question need not be searched,potentially saving much time.In the average case, the number of nodes that must be visited during arange query is linear on the number of data records that fall withinthe query circle.

Example 15.4.3

We will now find all cities in the kd tree ofFigure 15.4.2 within 25 units of the point(25, 65).The search begins with the root node, which contains record\(A\).Because (40, 45) is exactly 25 units from the search point, it willbe reported.The search procedure then determines which branches of the tree totake.The search circle extends to both the left and the right of\(A\)’s (vertical) dividing line, so both branches of the treemust be searched.The left subtree is processed first.Here, record \(B\) is checked and found to fall within thesearch circle.Because the node storing \(B\) has no children, processing ofthe left subtree is complete.Processing of \(A<\)’s right subtree now begins.The coordinates of record \(C\) are checked and found not tofall within the circle.Thus, it should not be reported.However, it is possible that cities within \(C\)’s subtreescould fall within the search circle even if \(C\) does not.As \(C\) is at level 1, the discriminator at this level is the\(y\)-coordinate.Because \(65-25 > 10\), no record in \(C\)’s left subtree(i.e., records above \(C\)) could possibly be in the searchcircle.Thus, \(C\)’s left subtree (if it had one) need not besearched.However, cities in \(C\)’s right subtree could fall within thecircle.Thus, search proceeds to the node containing record \(D\).Again, \(D\) is outside the search circle.Because \(25+25 < 69\), no record in \(D<\)’s right subtreecould be within the search circle.Thus, only \(D\)’s left subtree need be searched.This leads to comparing record \(E\)’s coordinates against thesearch circle.Record \(E\) falls outside the search circle, and processing iscomplete.So we see that we only search subtrees whose rectangles fall withinthe search circle.

Figure 15.4.2: Searching in the kd tree of Figure 15.4.1.(a) The kd tree decomposition for a \(128 \times 128\)-unitregion containing seven data points.(b) The kd tree for the region of (a).¶

Here is an implementation for the region search method.

private void rshelp(KDNode<E> rt, int[] point, int radius, int lev) { if (rt == null) return; int[] rtkey = rt.key(); if (InCircle(point, radius, rtkey)) System.out.println(rt.element()); if (rtkey[lev] > (point[lev] - radius)) rshelp(rt.left(), point, radius, (lev+1)%D); if (rtkey[lev] < (point[lev] + radius)) rshelp(rt.right(), point, radius, (lev+1)%D);}

When a node is visited, function InCircle is used tocheck the Euclidean distance between the node’s record and the querypoint.It is not enough to simply check that the differences between the\(x\)- and \(y\)-coordinates are each less than the querydistances because the the record could still be outside the searchcircle, as illustrated by Figure 15.4.3.

Figure 15.4.3: Function InCircle must check the Euclidean distancebetween a record and the query point.It is possible for a record \(A\) to have \(x\)- and\(y\)-coordinates each within the query distance of the querypoint \(C\), yet have \(A\) itself lie outside the querycircle.¶

Here is a visualization of building a kd-tree.

1: A more efficient computation is\((P_x - N_x)^2 + (P_y - N_y)^{2} \leq d^{2}\).This avoids performing a square root function.

15.4. KD Trees — CS3 Data Structures & Algorithms (2024)

15.4.1. KD Trees¶

References