(evaluation_metrics)=
# Evaluation Metrics

In this section we explain the evaluation metrics used to assess blocking quality in BlockingPy.

## Notation and Terminology

In the context of blocking evaluation, we use the following notation:

### Basic Counts

- **TP** (True Positives): Number of record pairs correctly identified as matches - pairs that are both predicted matches and true matches, also known as Correct Links

- **TN** (True Negatives): Number of record pairs correctly identified as non-matches - pairs that are both predicted non-matches and true non-matches, also known as Correct Non-Links 

- **FP** (False Positives): Number of record pairs incorrectly identified as matches - pairs that are predicted matches but are true non-matches, also known as False Links

- **FN** (False Negatives): Number of record pairs incorrectly identified as non-matches - pairs that are predicted non-matches but are true matches, also known as False Non-Links


### Block-Related Notation
For deduplication:

- **n**: Total number of records in the dataset
- **$B_i$**: The i-th block
- **|$B_i$|**: Size (number of records) of block i
- **$\binom{n}{2}$**: Total number of possible record pairs in a dataset of size n

For record linkage:

- $\sum_{i} |B_{i,x}| \cdot |B_{i,y}|$ is the number of comparisons after blocking
- $|B_{i,x}|$ is the number of unique records from dataset X in i-th block
- $|B_{i,y}|$ is the number of unique records from dataset Y in i-th block
- $m$ and $n$ are the sizes of the two original datasets being linked

The blocking outcome can be represented in a confusion matrix as follows:

|               | Predicted Match   | Predicted Non-Match |
|---------------|------------------|---------------------|
| True Match    | TP              | FN                 |
| True Non-Match| FP              | TN                 |

## Evaluation Metrics

### Classification Metrics

#### Precision
Fraction of correctly identified pairs among all pairs predicted to be in the same block:

$$
\text{Precision} = \frac{TP}{TP + FP}
$$

#### Recall
Fraction of actual matching pairs that were correctly identified:

$$
\text{Recall} = \frac{TP}{TP + FN}
$$

#### F1 Score
Harmonic mean of precision and recall:

$$
\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
$$

#### Accuracy
Fraction of all correct predictions:

$$
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$$

#### Specificity
Fraction of actual non-matching pairs correctly identified:

$$
\text{Specificity} = \frac{TN}{TN + FP}
$$

#### False Positive Rate (FPR)
Fraction of actual non-matching pairs incorrectly predicted as matches:

$$
\text{FPR} = \frac{FP}{FP + TN} = 1 - \text{Specificity}
$$

#### False Negative Rate (FNR)
Fraction of actual matching pairs incorrectly predicted as non-matches:

$$
\text{FNR} = \frac{FN}{FN + TP} = 1 - \text{Recall}
$$

### Blocking Efficiency Metrics

#### Reduction Ratio (RR)
Measures how effectively the blocking method reduces the number of comparisons needed. The formula differs for deduplication and record linkage scenarios:

For deduplication (comparing records within one dataset):

$
\text{RR}_{\text{dedup}} = 1 - \frac{\sum_{i} \binom{|B_i|}{2}}{\binom{n}{2}}
$

where:
- $\sum_{i} \binom{|B_i|}{2}$ is the number of comparisons after blocking
- $\binom{n}{2}$ is the total possible comparisons without blocking
- $n$ is the total number of records in the dataset

For record linkage (comparing records between two datasets):

$
\text{RR}_{\text{link}} = 1 - \frac{\sum_{i} |B_{i,x}| \cdot |B_{i,y}|}{m \cdot n}
$

where:
- $\sum_{i} |B_{i,x}| \cdot |B_{i,y}|$ is the number of comparisons after blocking
- $|B_{i,x}|$ is the number of unique records from dataset X in i-th block
- $|B_{i,y}|$ is the number of unique records from dataset Y in i-th block
- $m$ and $n$ are the sizes of the two original datasets being linked

A reduction ratio closer to 1 indicates greater reduction in the comparison space, while a value closer to 0 indicates less reduction.

## Important Considerations

When evaluating blocking performance, it's crucial to understand that not all metrics carry equal importance due to the nature of the blocking procedure. Blocking serves as a preliminary step in the record linkage/deduplication pipeline, designed to reduce the computational burden while maintaining the ability to find true matches in subsequent steps.

Key priorities in blocking evaluation should focus on:

- **Recall** : High recall is critical as any true matches missed during blocking cannot be recovered in later stages of the linkage process. A blocking method should prioritize maintaining high recall even if it means lower precision.
- **Reduction Ratio** : This metric is essential as it directly measures how effectively the blocking method reduces the computational complexity of the subsequent matching process.
- **FNR** : Critical as False Negative pairs can not be adressed in the later stages of entity matching procedure.

As for other metrics:

- **Accuracy and Specificity** : Those should usually be high since most pairs fall into the **TN** category due to the nature of blocking.

- **Precision** : Low precision scores would be adressed in the later stages of entity matching procedure as most False Positive pairs would be eliminated during one-to-one comparison.

- **F1 score and FPR** : Same reasons as above.

Therefore, when evaluating blocking results, focus on achieving high recall and a good reduction ratio while accepting that other metrics may show values that would be considered poor in a final matching context.