blockingpy.text_encoders.text_transformer.TextTransformer

class blockingpy.text_encoders.text_transformer.TextTransformer(**control_txt)[source]

Facade for selecting a concrete TextEncoder based on a control dictionary.

Parameters:

**control_txt – Configuration mapping. Must contain key encoder set to one of the registry keys ('shingle' or 'embedding'). Additional sub‑mappings with the same names may provide encoder‑specific keyword arguments.

__init__(**control_txt)[source]

Methods

__init__(**control_txt)

fit(X[, y])

Learn stateful parameters from X.

fit_transform(X[, y])

Fit the encoder on X and return the transformed matrix.

transform(X)

Convert raw strings into a numeric feature matrix.

fit(X, y=None)[source]

Learn stateful parameters from X.

The default implementation is a no-op that returns self; override in subclasses that need to build a vocabulary or train a model.

Parameters:
  • X – Series of input strings to learn from.

  • y – Ignored. Present for scikit-learn API compatibility.

Returns:

self to allow method chaining.

Return type:

TextEncoder

fit_transform(X, y=None)[source]

Fit the encoder on X and return the transformed matrix.

Equivalent to calling fit() followed by transform().

Parameters:
  • X – Series of input strings.

  • y – Ignored.

Returns:

The encoded feature matrix together with its column names.

Return type:

DataHandler

transform(X)[source]

Convert raw strings into a numeric feature matrix.

Subclasses must implement this method.

Parameters:

X – Series of raw text to encode.

Returns:

Wrapper containing the encoded matrix and its feature names.

Return type:

DataHandler