blockingpy.annoy_blocker.AnnoyBlocker
- class blockingpy.annoy_blocker.AnnoyBlocker[source]
Blocking with Spotify Annoy (Approximate Nearest Neighbors Oh Yeah).
Methods
__init__()block(x, y, k, verbose, controls)Perform blocking using the Annoy algorithm.
Attributes
- METRIC_MAP = {'angular': 'angular', 'dot': 'dot', 'euclidean': 'euclidean', 'hamming': 'hamming', 'manhattan': 'manhattan'}
- block(x, y, k, verbose, controls)[source]
Perform blocking using the Annoy algorithm.
- Parameters:
x (DataHandler) – Reference dataset containing features for indexing
y (DataHandler) – Query dataset to find nearest neighbors for
k (int) – Number of nearest neighbors to find
verbose (bool, optional) – If True, print detailed progress information
controls (dict) –
Algorithm control parameters with the following structure: {
’random_seed’: int, ‘annoy’: {
’distance’: str, ‘seed’: int, ‘path’: str, ‘n_trees’: int, ‘build_on_disk’: bool, ‘k_search’: int
}
}
- Returns:
DataFrame containing the blocking results with columns: - ‘y’: indices from query dataset - ‘x’: indices of matched items from reference dataset - ‘dist’: distances to matched items
- Return type:
pandas.DataFrame
Notes
The function builds an Annoy index from the reference dataset and finds the k-nearest neighbors for each point in the query dataset.