GPU SUPPORT (blockingpy-gpu)

GPU (CUDA) accelerated version is available via blockingpy-gpu package. The available GPU indexes are from FAISS GPU (Flat, IVF, IVFPQ, CAGRA).

Requirements

  • OS: Linux or Windows 11 with WSL2 (Ubuntu)

  • Python: 3.10

  • GPU: Nvidia with driver supporting CUDA ≥ 12.4

  • Tools: conda/mamba + pip

To use the package install FAISS-GPU via conda/mamba, then install blockingpy-gpu with pip.

# 1) Env
mamba create -n blockingpy-gpu python=3.10 -y
conda activate blockingpy-gpu
conda config --env --set channel_priority flexible

# 2) Install FAISS GPU (nightly cuVS build) - this version was tested
mamba install -y \
  -c pytorch/label/nightly -c rapidsai -c conda-forge \
  "faiss-gpu-cuvs=1.11.0" "libcuvs=25.4.*"

# 3) Install BlockingPy and the rest of deps with pip (or poetry, uv etc.)
pip install blockingpy-gpu

What’s included vs CPU build

GPU backends

  • FAISS-GPU: flat, ivf, ivfpq, cagra.

CPU backends also available in blockingpy-gpu

  • FAISS (CPU), HNSW (hnswlib), Voyager, Annoy, NND (pynndescent).

Not included

  • mlpack backends (k-d tree, LSH) are not shipped in blockingpy-gpu.

Distances

L2, cosine, inner product

Index Configuration

Works the same as in CPU Blockingpy. Here are the defaults:

control_ann = {
    "gpu_faiss": {
            "index_type": "flat", #ivf, ivfpq, cagra
            "k_search": 30,
            "distance": "cosine",
            "path": None,

            "ivf_nlist": 100,
            "ivf_nprobe": 10,

            "ivfpq_nlist": 100,
            "ivfpq_m": 8,
            "ivfpq_nbits": 8,
            "ivfpq_nprobe": 10,
            "ivfpq_useFloat16": False,
            "ivfpq_usePrecomputed": False,
            "ivfpq_reserveVecs": 0,
            "ivfpq_use_cuvs": False,

            "cagra": {
                "graph_degree": 64,
                "intermediate_graph_degree": 128,
                "build_algo": "ivf_pq",
                "nn_descent_niter": 20,
                "itopk_size": 64,
                "max_queries": 0,
                "algo": "auto",
                "team_size": 0,
                "search_width": 1,
                "min_iterations": 0,
                "max_iterations": 0,
                "thread_block_size": 0,
                "hashmap_mode": "auto",
                "hashmap_min_bitlen": 0,
                "hashmap_max_fill_rate": 0.5,
                "num_random_samplings": 1,
                "seed": 0x128394,
            },
        },
}

From here everything works the same as in blockingpy. You can pass ann='gpu_faiss' to the block method and pass the controls dict with 'index_type' and that’s all. You can find an example of blockingpy-gpu workflow here

For more info about FAISS GPU and the indexes see here.