blockingpy.controls.controls_txt
- blockingpy.controls.controls_txt(controls, **kwargs)[source]
Create configuration dictionary for text processing operations.
- Parameters:
controls (dict) – Dictionary of control parameters to override defaults
**kwargs (dict) – Additional keyword arguments for direct parameter updates
- Returns:
Configuration dictionary with the following structure: {
’encoder’: str, ‘embedding’: {
’model’: str, ‘normalize’: bool, ‘max_length’: int, ‘emb_batch_size’: int, ‘show_progress_bar’: bool, ‘use_multiprocessing’: bool, ‘multiprocessing_threshold’: int,
}, ‘shingle’: {
’n_shingles’: int, ‘lowercase’: bool, ‘strip_non_alphanum’: bool, ‘max_features’: int,
},
}
- Return type:
dict
Notes
Configuration options: - encoder: Type of text encoder (‘shingle’ or ‘embedding’) For ‘embedding’, additional parameters are required:
model: Pretrained model identifier or path
normalize: Normalize output vectors if True
max_length: Maximum sequence length for encoding
emb_batch_size: Batch size for encoding
show_progress_bar: Show progress bar if True
use_multiprocessing: Use multiprocessing if True
multiprocessing_threshold: Threshold for multiprocessing
- For ‘shingle’, additional parameters are required:
n_shingles: Number of consecutive characters to combine
max_features: Maximum number of features to keep
lowercase: Convert text to lowercase if True
strip_non_alphanum: Remove non-alphanumeric characters if True