blockingpy.voyager_blocker.VoyagerBlocker
- class blockingpy.voyager_blocker.VoyagerBlocker[source]
A class for performing blocking using the Voyager algorithm from Spotify.
This class implements blocking functionality using Spotify’s Voyager algorithm for efficient approximate nearest neighbor search. It supports multiple distance metrics and is designed for high-dimensional data.
- Parameters:
None
- index
The Voyager index used for nearest neighbor search
- Type:
voyager.Index or None
- x_columns
Column names of the reference dataset
- Type:
array-like or None
- METRIC_MAP
Mapping of distance metric names to Voyager Space types
- Type:
dict
See also
BlockingMethodAbstract base class defining the blocking interface
voyager.IndexThe underlying Voyager index implementation
- Raises:
ValueError – If path is provided but incorrect
Notes
For more details about the Voyager algorithm and implementation, see: https://github.com/spotify/voyager
- __init__()[source]
Initialize the VoyagerBlocker instance.
Creates a new VoyagerBlocker with empty index.
Methods
__init__()Initialize the VoyagerBlocker instance.
block(x, y, k, verbose, controls)Perform blocking using the Voyager algorithm.
Attributes
- METRIC_MAP = {'cosine': voyager.Space.Cosine, 'euclidean': voyager.Space.Euclidean, 'inner_product': voyager.Space.InnerProduct}
- block(x, y, k, verbose, controls)[source]
Perform blocking using the Voyager algorithm.
- Parameters:
x (DataHandler) – Reference dataset containing features for indexing
y (DataHandler) – Query dataset to find nearest neighbors for
k (int) – Number of nearest neighbors to find
verbose (bool, optional) – If True, print detailed progress information
controls (dict) –
Algorithm control parameters with the following structure: {
’random_seed’: int, ‘voyager’: {
’distance’: str, ‘k_search’: int, ‘path’: str, ‘M’: int, ‘ef_construction’: int, ‘max_elements’: int, ‘num_threads’: int, ‘query_ef’: int
}
}
- Returns:
DataFrame containing the blocking results with columns: - ‘y’: indices from query dataset - ‘x’: indices of matched items from reference dataset - ‘dist’: distances to matched items
- Return type:
pandas.DataFrame
Notes
The algorithm uses a graph-based approach for approximate nearest neighbor search. The quality of approximation can be controlled through parameters like ef_construction and query_ef.