blockingpy.nnd_blocker.NNDBlocker

class blockingpy.nnd_blocker.NNDBlocker[source]

A blocker class that uses the Nearest Neighbor Descent (NND) algorithm.

This class implements blocking functionality using the pynndescent library’s NNDescent algorithm for efficient approximate nearest neighbor search.

Parameters:

None

index

The NNDescent index used for querying

Type:

pynndescent.NNDescent or None

See also

BlockingMethod

Abstract base class defining the blocking interface

pynndescent.NNDescent

The underlying nearest neighbor descent implementation

Notes

For more details about the algorithm and implementation, see: https://pynndescent.readthedocs.io/en/latest/api.html https://github.com/lmcinnes/pynndescent

__init__()[source]

Initialize the NNDBlocker instance.

Creates a new NNDBlocker with empty index.

Methods

__init__()

Initialize the NNDBlocker instance.

block(x, y, k, verbose, controls)

Perform blocking using the NND algorithm.

block(x, y, k, verbose, controls)[source]

Perform blocking using the NND algorithm.

Parameters:
  • x (DataHandler) – Reference dataset containing features for indexing

  • y (DataHandler) – Query dataset to find nearest neighbors for

  • k (int) – Number of nearest neighbors to find

  • verbose (bool, optional) – If True, print detailed progress information

  • controls (dict) –

    Algorithm control parameters with the following structure: {

    ’random_seed’: int, ‘nnd’: {

    ’metric’: str, ‘k_search’: int, ‘metric_kwds’: dict, ‘n_threads’: int, ‘tree_init’: bool, ‘n_trees’: int, ‘leaf_size’: int, ‘pruning_degree_multiplier’: float, ‘diversify_prob’: float, ‘init_graph’: array-like or None, ‘init_dist’: array-like or None, ‘low_memory’: bool, ‘max_candidates’: int, ‘max_rptree_depth’: int, ‘n_iters’: int, ‘delta’: float, ‘compressed’: bool, ‘parallel_batch_queries’: bool, ‘epsilon’: float

    }

    }

Returns:

DataFrame containing the blocking results with columns: - ‘y’: indices from query dataset - ‘x’: indices of matched items from reference dataset - ‘dist’: distances to matched items

Return type:

pandas.DataFrame

Notes

The algorithm builds an approximate nearest neighbor index using random projection trees and neighbor descent. The quality of the approximation can be controlled through various parameters such as n_trees, n_iters, and epsilon.