blockingpy.annoy_blocker.AnnoyBlocker

class blockingpy.annoy_blocker.AnnoyBlocker[source]

Blocking with Spotify Annoy (Approximate Nearest Neighbors Oh Yeah).

__init__()[source]

Methods

__init__()

block(x, y, k, verbose, controls)

Perform blocking using the Annoy algorithm.

Attributes

METRIC_MAP

METRIC_MAP = {'angular': 'angular', 'dot': 'dot', 'euclidean': 'euclidean', 'hamming': 'hamming', 'manhattan': 'manhattan'}
block(x, y, k, verbose, controls)[source]

Perform blocking using the Annoy algorithm.

Parameters:
  • x (DataHandler) – Reference dataset containing features for indexing

  • y (DataHandler) – Query dataset to find nearest neighbors for

  • k (int) – Number of nearest neighbors to find

  • verbose (bool, optional) – If True, print detailed progress information

  • controls (dict) –

    Algorithm control parameters with the following structure: {

    ’random_seed’: int, ‘annoy’: {

    ’distance’: str, ‘seed’: int, ‘path’: str, ‘n_trees’: int, ‘build_on_disk’: bool, ‘k_search’: int

    }

    }

Returns:

DataFrame containing the blocking results with columns: - ‘y’: indices from query dataset - ‘x’: indices of matched items from reference dataset - ‘dist’: distances to matched items

Return type:

pandas.DataFrame

Notes

The function builds an Annoy index from the reference dataset and finds the k-nearest neighbors for each point in the query dataset.