blockingpy.nnd_blocker.NNDBlocker
- class blockingpy.nnd_blocker.NNDBlocker[source]
A blocker class that uses the Nearest Neighbor Descent (NND) algorithm.
This class implements blocking functionality using the pynndescent library’s NNDescent algorithm for efficient approximate nearest neighbor search.
- Parameters:
None
- index
The NNDescent index used for querying
- Type:
pynndescent.NNDescent or None
See also
BlockingMethodAbstract base class defining the blocking interface
pynndescent.NNDescentThe underlying nearest neighbor descent implementation
Notes
For more details about the algorithm and implementation, see: https://pynndescent.readthedocs.io/en/latest/api.html https://github.com/lmcinnes/pynndescent
Methods
__init__()Initialize the NNDBlocker instance.
block(x, y, k, verbose, controls)Perform blocking using the NND algorithm.
- block(x, y, k, verbose, controls)[source]
Perform blocking using the NND algorithm.
- Parameters:
x (DataHandler) – Reference dataset containing features for indexing
y (DataHandler) – Query dataset to find nearest neighbors for
k (int) – Number of nearest neighbors to find
verbose (bool, optional) – If True, print detailed progress information
controls (dict) –
Algorithm control parameters with the following structure: {
’random_seed’: int, ‘nnd’: {
’metric’: str, ‘k_search’: int, ‘metric_kwds’: dict, ‘n_threads’: int, ‘tree_init’: bool, ‘n_trees’: int, ‘leaf_size’: int, ‘pruning_degree_multiplier’: float, ‘diversify_prob’: float, ‘init_graph’: array-like or None, ‘init_dist’: array-like or None, ‘low_memory’: bool, ‘max_candidates’: int, ‘max_rptree_depth’: int, ‘n_iters’: int, ‘delta’: float, ‘compressed’: bool, ‘parallel_batch_queries’: bool, ‘epsilon’: float
}
}
- Returns:
DataFrame containing the blocking results with columns: - ‘y’: indices from query dataset - ‘x’: indices of matched items from reference dataset - ‘dist’: distances to matched items
- Return type:
pandas.DataFrame
Notes
The algorithm builds an approximate nearest neighbor index using random projection trees and neighbor descent. The quality of the approximation can be controlled through various parameters such as n_trees, n_iters, and epsilon.