- class sklearn.neighbors.KDTree(X, leaf_size=40, metric='minkowski', **kwargs)¶
KDTree for fast generalized N-point problems
Read more in the User Guide.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
n_samples is the number of points in the data set, andn_features is the dimension of the parameter space.Note: if X is a C-contiguous array of doubles then data willnot be copied. Otherwise, an internal copy will be made.
- leaf_sizepositive int, default=40
Number of points at which to switch to brute-force. Changingleaf_size will not affect the results of a query, but cansignificantly impact the speed of a query and the memory requiredto store the constructed tree. The amount of memory needed tostore the tree scales as approximately n_samples / leaf_size.For a specified
leaf_size
, a leaf node is guaranteed tosatisfyleaf_size <= n_points <= 2 * leaf_size
, except inthe case thatn_samples < leaf_size
.- metricstr or DistanceMetric64 object, default=’minkowski’
Metric to use for distance computation. Default is “minkowski”, whichresults in the standard Euclidean distance when p = 2.A list of valid metrics for KDTree is given by the attribute
valid_metrics
.See the documentation of scipy.spatial.distance andthe metrics listed in distance_metrics formore information on any distance metric.- Additional keywords are passed to the distance metric class.
- Note: Callable functions in the metric parameter are NOT supported for KDTree
- and Ball Tree. Function call overhead will result in very poor performance.
- Attributes:
- datamemory view
The training data
- valid_metrics: list of str
List of valid distance metrics.
Examples
Query for k-nearest neighbors
>>> import numpy as np>>> from sklearn.neighbors import KDTree>>> rng = np.random.RandomState(0)>>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions>>> tree = KDTree(X, leaf_size=2) >>> dist, ind = tree.query(X[:1], k=3) >>> print(ind) # indices of 3 closest neighbors[0 3 1]>>> print(dist) # distances to 3 closest neighbors[ 0. 0.19662693 0.29473397]
Pickle and Unpickle a tree. Note that the state of the tree is saved in thepickle operation: the tree needs not be rebuilt upon unpickling.
>>> import numpy as np>>> import pickle>>> rng = np.random.RandomState(0)>>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions>>> tree = KDTree(X, leaf_size=2) >>> s = pickle.dumps(tree) >>> tree_copy = pickle.loads(s) >>> dist, ind = tree_copy.query(X[:1], k=3) >>> print(ind) # indices of 3 closest neighbors[0 3 1]>>> print(dist) # distances to 3 closest neighbors[ 0. 0.19662693 0.29473397]
Query for neighbors within a given radius
>>> import numpy as np>>> rng = np.random.RandomState(0)>>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions>>> tree = KDTree(X, leaf_size=2) >>> print(tree.query_radius(X[:1], r=0.3, count_only=True))3>>> ind = tree.query_radius(X[:1], r=0.3) >>> print(ind) # indices of neighbors within distance 0.3[3 0 1]
Compute a gaussian kernel density estimate:
>>> import numpy as np>>> rng = np.random.RandomState(42)>>> X = rng.random_sample((100, 3))>>> tree = KDTree(X) >>> tree.kernel_density(X[:3], h=0.1, kernel='gaussian')array([ 6.94114649, 7.83281226, 7.2071716 ])
Compute a two-point auto-correlation function
>>> import numpy as np>>> rng = np.random.RandomState(0)>>> X = rng.random_sample((30, 3))>>> r = np.linspace(0, 1, 5)>>> tree = KDTree(X) >>> tree.two_point_correlation(X, r)array([ 30, 62, 278, 580, 820])
Methods
get_arrays()
Get data and node arrays.
get_n_calls()
Get number of calls.
get_tree_stats()
Get tree status.
kernel_density(X,h[,kernel,atol,rtol,...])
Compute the kernel density estimate at points X with the given kernel, using the distance metric specified at tree creation.
query(X[,k,return_distance,dualtree,...])
query the tree for the k nearest neighbors
query_radius(X,r[,return_distance,...])
query the tree for neighbors within a radius r
reset_n_calls()
Reset number of calls to 0.
two_point_correlation(X,r[,dualtree])
Compute the two-point correlation function
- get_arrays()¶
Get data and node arrays.
- Returns:
- arrays: tuple of array
Arrays for storing tree data, index, node data and node bounds.
- get_n_calls()¶
Get number of calls.
- Returns:
- n_calls: int
number of distance computation calls
- get_tree_stats()¶
Get tree status.
- Returns:
- tree_stats: tuple of int
(number of trims, number of leaves, number of splits)
- kernel_density(X, h, kernel='gaussian', atol=0, rtol=1E-8, breadth_first=True, return_log=False)¶
Compute the kernel density estimate at points X with the given kernel,using the distance metric specified at tree creation.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
An array of points to query. Last dimension should match dimensionof training data.
- hfloat
the bandwidth of the kernel
- kernelstr, default=”gaussian”
specify the kernel to use. Options are- ‘gaussian’- ‘tophat’- ‘epanechnikov’- ‘exponential’- ‘linear’- ‘cosine’Default is kernel = ‘gaussian’
- atolfloat, default=0
Specify the desired absolute tolerance of the result.If the true result is
K_true
, then the returned resultK_ret
satisfiesabs(K_true - K_ret) < atol + rtol * K_ret
The default is zero (i.e. machine precision).- rtolfloat, default=1e-8
Specify the desired relative tolerance of the result.If the true result is
K_true
, then the returned resultK_ret
satisfiesabs(K_true - K_ret) < atol + rtol * K_ret
The default is1e-8
(i.e. machine precision).- breadth_firstbool, default=False
If True, use a breadth-first search. If False (default) use adepth-first search. Breadth-first is generally faster forcompact kernels and/or high tolerances.
- return_logbool, default=False
Return the logarithm of the result. This can be more accuratethan returning the result itself for narrow kernels.
- Returns:
- densityndarray of shape X.shape[:-1]
The array of (log)-density evaluations
- query(X, k=1, return_distance=True, dualtree=False, breadth_first=False)¶
query the tree for the k nearest neighbors
- Parameters:
- Xarray-like of shape (n_samples, n_features)
An array of points to query
- kint, default=1
The number of nearest neighbors to return
- return_distancebool, default=True
if True, return a tuple (d, i) of distances and indicesif False, return array i
- dualtreebool, default=False
if True, use the dual tree formalism for the query: a tree isbuilt for the query points, and the pair of trees is used toefficiently search this space. This can lead to betterperformance as the number of points grows large.
- breadth_firstbool, default=False
if True, then query the nodes in a breadth-first manner.Otherwise, query the nodes in a depth-first manner.
- sort_resultsbool, default=True
if True, then distances and indices of each point are sortedon return, so that the first column contains the closest points.Otherwise, neighbors are returned in an arbitrary order.
- Returns:
- iif return_distance == False
- (d,i)if return_distance == True
- dndarray of shape X.shape[:-1] + (k,), dtype=double
Each entry gives the list of distances to the neighbors of thecorresponding point.
- indarray of shape X.shape[:-1] + (k,), dtype=int
Each entry gives the list of indices of neighbors of thecorresponding point.
- query_radius(X, r, return_distance=False, count_only=False, sort_results=False)¶
query the tree for neighbors within a radius r
- Parameters:
- Xarray-like of shape (n_samples, n_features)
An array of points to query
- rdistance within which neighbors are returned
r can be a single value, or an array of values of shapex.shape[:-1] if different radii are desired for each point.
- return_distancebool, default=False
if True, return distances to neighbors of each pointif False, return only neighborsNote that unlike the query() method, setting return_distance=Truehere adds to the computation time. Not all distances need to becalculated explicitly for return_distance=False. Results arenot sorted by default: see
sort_results
keyword.- count_onlybool, default=False
if True, return only the count of points within distance rif False, return the indices of all points within distance rIf return_distance==True, setting count_only=True willresult in an error.
- sort_resultsbool, default=False
if True, the distances and indices will be sorted before beingreturned. If False, the results will not be sorted. Ifreturn_distance == False, setting sort_results = True willresult in an error.
- Returns:
- countif count_only == True
- indif count_only == False and return_distance == False
- (ind, dist)if count_only == False and return_distance == True
- countndarray of shape X.shape[:-1], dtype=int
Each entry gives the number of neighbors within a distance r of thecorresponding point.
- indndarray of shape X.shape[:-1], dtype=object
Each element is a numpy integer array listing the indices ofneighbors of the corresponding point. Note that unlikethe results of a k-neighbors query, the returned neighborsare not sorted by distance by default.
- distndarray of shape X.shape[:-1], dtype=object
Each element is a numpy double array listing the distancescorresponding to indices in i.
- reset_n_calls()¶
Reset number of calls to 0.
- two_point_correlation(X, r, dualtree=False)¶
Compute the two-point correlation function
- Parameters:
- Xarray-like of shape (n_samples, n_features)
An array of points to query. Last dimension should match dimensionof training data.
- rarray-like
A one-dimensional array of distances
- dualtreebool, default=False
If True, use a dualtree algorithm. Otherwise, use a single-treealgorithm. Dual tree algorithms can have better scaling forlarge N.
- Returns:
- countsndarray
counts[i] contains the number of pairs of points with distanceless than or equal to r[i]
sklearn.neighbors.KDTree (2024)
References
- https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html
- https://www.geeksforgeeks.org/ball-tree-and-kd-tree-algorithms/
- https://de.abcdef.wiki/wiki/K-d_tree
- https://towardsdatascience.com/using-kdtree-to-detect-similarities-in-a-multidimensional-dataset-4be276dcf616
- https://codeofcode.org/lessons/k-dimensional-k-d-trees/
- https://opendsa-server.cs.vt.edu/ODSA/Books/CS3/html/KDtree.html
Top Articles
New York-style bagel recipes with everything seasoning | RecipeLabs
5 Low FODMAP Chicken Marinade Recipes For Weekly Meal Prep
Tactical Brewing Co
Hidden Valley High School - Roanoke County Public Schools
pinellas co free stuff "free stuff" - craigslist
Ultimate Anti-Bat Pillar Base Tutorial - Secure Your Base! | Conan Exiles Building Guide
Granite Falls Craigslist
Skip the Games Springfield: Unleash Fun Adventures!
San Angelo News, Weather, Safety, Sports | NewsBreak San Angelo, TX
Thoughtful Machine Learning with Python [PDF] | Online Book Share
Indiana Immediate Care.webpay.md
SBJ Esports: Prodigy Racing League looks to create real drivers
Latest Posts
Article information
Author: Madonna Wisozk
Last Updated:
Views: 6556
Rating: 4.8 / 5 (48 voted)
Reviews: 95% of readers found this page helpful
Author information
Name: Madonna Wisozk
Birthday: 2001-02-23
Address: 656 Gerhold Summit, Sidneyberg, FL 78179-2512
Phone: +6742282696652
Job: Customer Banking Liaison
Hobby: Flower arranging, Yo-yoing, Tai chi, Rowing, Macrame, Urban exploration, Knife making
Introduction: My name is Madonna Wisozk, I am a attractive, healthy, thoughtful, faithful, open, vivacious, zany person who loves writing and wants to share my knowledge and understanding with you.