sklearn.neighbors.KDTree (2024)

class sklearn.neighbors.KDTree(X, leaf_size=40, metric='minkowski', **kwargs)

KDTree for fast generalized N-point problems

Read more in the User Guide.

Parameters:
Xarray-like of shape (n_samples, n_features)

n_samples is the number of points in the data set, andn_features is the dimension of the parameter space.Note: if X is a C-contiguous array of doubles then data willnot be copied. Otherwise, an internal copy will be made.

leaf_sizepositive int, default=40

Number of points at which to switch to brute-force. Changingleaf_size will not affect the results of a query, but cansignificantly impact the speed of a query and the memory requiredto store the constructed tree. The amount of memory needed tostore the tree scales as approximately n_samples / leaf_size.For a specified leaf_size, a leaf node is guaranteed tosatisfy leaf_size <= n_points <= 2 * leaf_size, except inthe case that n_samples < leaf_size.

metricstr or DistanceMetric64 object, default=’minkowski’

Metric to use for distance computation. Default is “minkowski”, whichresults in the standard Euclidean distance when p = 2.A list of valid metrics for KDTree is given by the attributevalid_metrics.See the documentation of scipy.spatial.distance andthe metrics listed in distance_metrics formore information on any distance metric.

Additional keywords are passed to the distance metric class.
Note: Callable functions in the metric parameter are NOT supported for KDTree
and Ball Tree. Function call overhead will result in very poor performance.
Attributes:
datamemory view

The training data

valid_metrics: list of str

List of valid distance metrics.

Examples

Query for k-nearest neighbors

>>> import numpy as np>>> from sklearn.neighbors import KDTree>>> rng = np.random.RandomState(0)>>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions>>> tree = KDTree(X, leaf_size=2) >>> dist, ind = tree.query(X[:1], k=3) >>> print(ind) # indices of 3 closest neighbors[0 3 1]>>> print(dist) # distances to 3 closest neighbors[ 0. 0.19662693 0.29473397]

Pickle and Unpickle a tree. Note that the state of the tree is saved in thepickle operation: the tree needs not be rebuilt upon unpickling.

>>> import numpy as np>>> import pickle>>> rng = np.random.RandomState(0)>>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions>>> tree = KDTree(X, leaf_size=2) >>> s = pickle.dumps(tree) >>> tree_copy = pickle.loads(s) >>> dist, ind = tree_copy.query(X[:1], k=3) >>> print(ind) # indices of 3 closest neighbors[0 3 1]>>> print(dist) # distances to 3 closest neighbors[ 0. 0.19662693 0.29473397]

Query for neighbors within a given radius

>>> import numpy as np>>> rng = np.random.RandomState(0)>>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions>>> tree = KDTree(X, leaf_size=2) >>> print(tree.query_radius(X[:1], r=0.3, count_only=True))3>>> ind = tree.query_radius(X[:1], r=0.3) >>> print(ind) # indices of neighbors within distance 0.3[3 0 1]

Compute a gaussian kernel density estimate:

>>> import numpy as np>>> rng = np.random.RandomState(42)>>> X = rng.random_sample((100, 3))>>> tree = KDTree(X) >>> tree.kernel_density(X[:3], h=0.1, kernel='gaussian')array([ 6.94114649, 7.83281226, 7.2071716 ])

Compute a two-point auto-correlation function

>>> import numpy as np>>> rng = np.random.RandomState(0)>>> X = rng.random_sample((30, 3))>>> r = np.linspace(0, 1, 5)>>> tree = KDTree(X) >>> tree.two_point_correlation(X, r)array([ 30, 62, 278, 580, 820])

Methods

get_arrays()

Get data and node arrays.

get_n_calls()

Get number of calls.

get_tree_stats()

Get tree status.

kernel_density(X,h[,kernel,atol,rtol,...])

Compute the kernel density estimate at points X with the given kernel, using the distance metric specified at tree creation.

query(X[,k,return_distance,dualtree,...])

query the tree for the k nearest neighbors

query_radius(X,r[,return_distance,...])

query the tree for neighbors within a radius r

reset_n_calls()

Reset number of calls to 0.

two_point_correlation(X,r[,dualtree])

Compute the two-point correlation function

get_arrays()

Get data and node arrays.

Returns:
arrays: tuple of array

Arrays for storing tree data, index, node data and node bounds.

get_n_calls()

Get number of calls.

Returns:
n_calls: int

number of distance computation calls

get_tree_stats()

Get tree status.

Returns:
tree_stats: tuple of int

(number of trims, number of leaves, number of splits)

kernel_density(X, h, kernel='gaussian', atol=0, rtol=1E-8, breadth_first=True, return_log=False)

Compute the kernel density estimate at points X with the given kernel,using the distance metric specified at tree creation.

Parameters:
Xarray-like of shape (n_samples, n_features)

An array of points to query. Last dimension should match dimensionof training data.

hfloat

the bandwidth of the kernel

kernelstr, default=”gaussian”

specify the kernel to use. Options are- ‘gaussian’- ‘tophat’- ‘epanechnikov’- ‘exponential’- ‘linear’- ‘cosine’Default is kernel = ‘gaussian’

atolfloat, default=0

Specify the desired absolute tolerance of the result.If the true result is K_true, then the returned result K_retsatisfies abs(K_true - K_ret) < atol + rtol * K_retThe default is zero (i.e. machine precision).

rtolfloat, default=1e-8

Specify the desired relative tolerance of the result.If the true result is K_true, then the returned result K_retsatisfies abs(K_true - K_ret) < atol + rtol * K_retThe default is 1e-8 (i.e. machine precision).

breadth_firstbool, default=False

If True, use a breadth-first search. If False (default) use adepth-first search. Breadth-first is generally faster forcompact kernels and/or high tolerances.

return_logbool, default=False

Return the logarithm of the result. This can be more accuratethan returning the result itself for narrow kernels.

Returns:
densityndarray of shape X.shape[:-1]

The array of (log)-density evaluations

query(X, k=1, return_distance=True, dualtree=False, breadth_first=False)

query the tree for the k nearest neighbors

Parameters:
Xarray-like of shape (n_samples, n_features)

An array of points to query

kint, default=1

The number of nearest neighbors to return

return_distancebool, default=True

if True, return a tuple (d, i) of distances and indicesif False, return array i

dualtreebool, default=False

if True, use the dual tree formalism for the query: a tree isbuilt for the query points, and the pair of trees is used toefficiently search this space. This can lead to betterperformance as the number of points grows large.

breadth_firstbool, default=False

if True, then query the nodes in a breadth-first manner.Otherwise, query the nodes in a depth-first manner.

sort_resultsbool, default=True

if True, then distances and indices of each point are sortedon return, so that the first column contains the closest points.Otherwise, neighbors are returned in an arbitrary order.

Returns:
iif return_distance == False
(d,i)if return_distance == True
dndarray of shape X.shape[:-1] + (k,), dtype=double

Each entry gives the list of distances to the neighbors of thecorresponding point.

indarray of shape X.shape[:-1] + (k,), dtype=int

Each entry gives the list of indices of neighbors of thecorresponding point.

query_radius(X, r, return_distance=False, count_only=False, sort_results=False)

query the tree for neighbors within a radius r

Parameters:
Xarray-like of shape (n_samples, n_features)

An array of points to query

rdistance within which neighbors are returned

r can be a single value, or an array of values of shapex.shape[:-1] if different radii are desired for each point.

return_distancebool, default=False

if True, return distances to neighbors of each pointif False, return only neighborsNote that unlike the query() method, setting return_distance=Truehere adds to the computation time. Not all distances need to becalculated explicitly for return_distance=False. Results arenot sorted by default: see sort_results keyword.

count_onlybool, default=False

if True, return only the count of points within distance rif False, return the indices of all points within distance rIf return_distance==True, setting count_only=True willresult in an error.

sort_resultsbool, default=False

if True, the distances and indices will be sorted before beingreturned. If False, the results will not be sorted. Ifreturn_distance == False, setting sort_results = True willresult in an error.

Returns:
countif count_only == True
indif count_only == False and return_distance == False
(ind, dist)if count_only == False and return_distance == True
countndarray of shape X.shape[:-1], dtype=int

Each entry gives the number of neighbors within a distance r of thecorresponding point.

indndarray of shape X.shape[:-1], dtype=object

Each element is a numpy integer array listing the indices ofneighbors of the corresponding point. Note that unlikethe results of a k-neighbors query, the returned neighborsare not sorted by distance by default.

distndarray of shape X.shape[:-1], dtype=object

Each element is a numpy double array listing the distancescorresponding to indices in i.

reset_n_calls()

Reset number of calls to 0.

two_point_correlation(X, r, dualtree=False)

Compute the two-point correlation function

Parameters:
Xarray-like of shape (n_samples, n_features)

An array of points to query. Last dimension should match dimensionof training data.

rarray-like

A one-dimensional array of distances

dualtreebool, default=False

If True, use a dualtree algorithm. Otherwise, use a single-treealgorithm. Dual tree algorithms can have better scaling forlarge N.

Returns:
countsndarray

counts[i] contains the number of pairs of points with distanceless than or equal to r[i]

sklearn.neighbors.KDTree (2024)

References

Top Articles
Latest Posts
Article information

Author: Madonna Wisozk

Last Updated:

Views: 6556

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Madonna Wisozk

Birthday: 2001-02-23

Address: 656 Gerhold Summit, Sidneyberg, FL 78179-2512

Phone: +6742282696652

Job: Customer Banking Liaison

Hobby: Flower arranging, Yo-yoing, Tai chi, Rowing, Macrame, Urban exploration, Knife making

Introduction: My name is Madonna Wisozk, I am a attractive, healthy, thoughtful, faithful, open, vivacious, zany person who loves writing and wants to share my knowledge and understanding with you.