Evaluation Metrics

In this section we explain the evaluation metrics used to assess blocking quality in BlockingPy.

Notation and Terminology

In the context of blocking evaluation, we use the following notation:

Basic Counts

TP (True Positives): Number of record pairs correctly identified as matches - pairs that are both predicted matches and true matches, also known as Correct Links
TN (True Negatives): Number of record pairs correctly identified as non-matches - pairs that are both predicted non-matches and true non-matches, also known as Correct Non-Links
FP (False Positives): Number of record pairs incorrectly identified as matches - pairs that are predicted matches but are true non-matches, also known as False Links
FN (False Negatives): Number of record pairs incorrectly identified as non-matches - pairs that are predicted non-matches but are true matches, also known as False Non-Links

Block-Related Notation

For deduplication:

n: Total number of records in the dataset
\(B_i\): The i-th block
|\(B_i\)|: Size (number of records) of block i
\(\binom{n}{2}\): Total number of possible record pairs in a dataset of size n

For record linkage:

\(\sum_{i} |B_{i,x}| \cdot |B_{i,y}|\) is the number of comparisons after blocking
\(|B_{i,x}|\) is the number of unique records from dataset X in i-th block
\(|B_{i,y}|\) is the number of unique records from dataset Y in i-th block
\(m\) and \(n\) are the sizes of the two original datasets being linked

The blocking outcome can be represented in a confusion matrix as follows:

	Predicted Match	Predicted Non-Match
True Match	TP	FN
True Non-Match	FP	TN

Evaluation Metrics

Classification Metrics

Precision

Fraction of correctly identified pairs among all pairs predicted to be in the same block:

\[ \text{Precision} = \frac{TP}{TP + FP} \]

Recall

Fraction of actual matching pairs that were correctly identified:

\[ \text{Recall} = \frac{TP}{TP + FN} \]

F1 Score

Harmonic mean of precision and recall:

\[ \text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \]

Accuracy

Fraction of all correct predictions:

\[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]

Specificity

Fraction of actual non-matching pairs correctly identified:

\[ \text{Specificity} = \frac{TN}{TN + FP} \]

False Positive Rate (FPR)

Fraction of actual non-matching pairs incorrectly predicted as matches:

\[ \text{FPR} = \frac{FP}{FP + TN} = 1 - \text{Specificity} \]

False Negative Rate (FNR)

Fraction of actual matching pairs incorrectly predicted as non-matches:

\[ \text{FNR} = \frac{FN}{FN + TP} = 1 - \text{Recall} \]

Blocking Efficiency Metrics

Reduction Ratio (RR)

Measures how effectively the blocking method reduces the number of comparisons needed. The formula differs for deduplication and record linkage scenarios:

For deduplication (comparing records within one dataset):

\( \text{RR}_{\text{dedup}} = 1 - \frac{\sum_{i} \binom{|B_i|}{2}}{\binom{n}{2}} \)

where:

\(\sum_{i} \binom{|B_i|}{2}\) is the number of comparisons after blocking
\(\binom{n}{2}\) is the total possible comparisons without blocking
\(n\) is the total number of records in the dataset

For record linkage (comparing records between two datasets):

\( \text{RR}_{\text{link}} = 1 - \frac{\sum_{i} |B_{i,x}| \cdot |B_{i,y}|}{m \cdot n} \)

where:

\(\sum_{i} |B_{i,x}| \cdot |B_{i,y}|\) is the number of comparisons after blocking
\(|B_{i,x}|\) is the number of unique records from dataset X in i-th block
\(|B_{i,y}|\) is the number of unique records from dataset Y in i-th block
\(m\) and \(n\) are the sizes of the two original datasets being linked

A reduction ratio closer to 1 indicates greater reduction in the comparison space, while a value closer to 0 indicates less reduction.

Important Considerations

When evaluating blocking performance, it’s crucial to understand that not all metrics carry equal importance due to the nature of the blocking procedure. Blocking serves as a preliminary step in the record linkage/deduplication pipeline, designed to reduce the computational burden while maintaining the ability to find true matches in subsequent steps.

Key priorities in blocking evaluation should focus on:

Recall : High recall is critical as any true matches missed during blocking cannot be recovered in later stages of the linkage process. A blocking method should prioritize maintaining high recall even if it means lower precision.
Reduction Ratio : This metric is essential as it directly measures how effectively the blocking method reduces the computational complexity of the subsequent matching process.
FNR : Critical as False Negative pairs can not be adressed in the later stages of entity matching procedure.

As for other metrics:

Accuracy and Specificity : Those should usually be high since most pairs fall into the TN category due to the nature of blocking.
Precision : Low precision scores would be adressed in the later stages of entity matching procedure as most False Positive pairs would be eliminated during one-to-one comparison.
F1 score and FPR : Same reasons as above.

Therefore, when evaluating blocking results, focus on achieving high recall and a good reduction ratio while accepting that other metrics may show values that would be considered poor in a final matching context.