blockingpy.text_encoders.embedding_encoder.EmbeddingEncoder
- class blockingpy.text_encoders.embedding_encoder.EmbeddingEncoder(model='minishlab/potion-base-8M', normalize=None, max_length=512, emb_batch_size=1024, show_progress_bar=False, use_multiprocessing=True, multiprocessing_threshold=10000)[source]
Dense-vector encoder that wraps model2vec.StaticModel.
The encoder converts a
pandas.Seriesof text strings into aDataHandlerwhosedataattribute is a C-contiguousnp.ndarrayof shape(n_samples, embedding_dim)and whosecolsare the synthetic column namesemb_0 … emb_{d-1}.- __init__(model='minishlab/potion-base-8M', normalize=None, max_length=512, emb_batch_size=1024, show_progress_bar=False, use_multiprocessing=True, multiprocessing_threshold=10000)[source]
Methods
__init__([model, normalize, max_length, ...])fit(X[, y])No-op fit for scikit-learn compatibility.
fit_transform(X[, y])Fit the encoder on X and return the transformed matrix.
transform(X)Encode X into dense numeric vectors.
- fit_transform(X, y=None)
Fit the encoder on X and return the transformed matrix.
Equivalent to calling
fit()followed bytransform().- Parameters:
X – Series of input strings.
y – Ignored.
- Returns:
The encoded feature matrix together with its column names.
- Return type: