Kickstart ML with Python snippets

Evaluating classification models

Precision, Recall, F-measure, and Weighted F-measure

These metrics are commonly used to evaluate the performance of classification models, particularly in situations where the classes are imbalanced.

Precision

Precision measures the accuracy of the positive predictions made by the model. It is the ratio of true positive predictions to the total number of positive predictions (both true positives and false positives).

$$ \text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}} $$

  • True Positives (TP): Correctly predicted positive instances.
  • False Positives (FP): Incorrectly predicted positive instances.

Precision answers the question: "Of all instances predicted as positive, how many were actually positive?"

Recall (Sensitivity or True Positive Rate)

Recall measures the ability of the model to identify all relevant positive cases. It is the ratio of true positive predictions to the total number of actual positive instances (both true positives and false negatives).

$$ \text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}} $$

  • False Negatives (FN): Actual positive instances that were incorrectly predicted as negative.

Recall answers the question: "Of all actual positive instances, how many were correctly predicted as positive?"

F-measure (F1-score)

F-measure or F1-score is the harmonic mean of precision and recall. It provides a single metric that balances the trade-off between precision and recall.

$$ \text{F1-score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} $$

The F1-score is useful when you need a balance between precision and recall and when you have an uneven class distribution.

Weighted F-measure (Fb-score)

The Weighted F-measure or Fb-score generalizes the F1-score by introducing a weighting factor ( \beta ) that balances the importance of precision and recall.

$$ F_\beta = (1 + \beta^2) \times \frac{\text{Precision} \times \text{Recall}}{(\beta^2 \times \text{Precision}) + \text{Recall}} $$

  • Beta (β): A parameter that determines the weight of recall in the combined score.
    • If \( \beta = 1 \), the Fb-score is equivalent to the F1-score (equal weight to precision and recall).
    • If \( \beta > 1 \), recall is given more weight.
    • If \( \beta < 1 \), precision is given more weight.

The Fb-score is useful when you want to emphasize either precision or recall more, depending on the specific needs of your application.

Example Calculation

Let's illustrate these metrics with a confusion matrix example:

Predicted Positive Predicted Negative
Actual Positive 50 10
Actual Negative 5 35
  • True Positives (TP): 50
  • False Positives (FP): 5
  • True Negatives (TN): 35
  • False Negatives (FN): 10

Precision

$$ \text{Precision} = \frac{TP}{TP + FP} = \frac{50}{50 + 5} = \frac{50}{55} \approx 0.91 $$

Recall

$$ \text{Recall} = \frac{TP}{TP + FN} = \frac{50}{50 + 10} = \frac{50}{60} \approx 0.83 $$

F1-score

$$ \text{F1-score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} = 2 \times \frac{0.91 \times 0.83}{0.91 + 0.83} \approx 0.87 $$

Fb-score (Example with ( \beta = 2 ))

$$ F_2 = (1 + 2^2) \times \frac{\text{Precision} \times \text{Recall}}{(2^2 \times \text{Precision}) + \text{Recall}} = 5 \times \frac{0.91 \times 0.83}{(4 \times 0.91) + 0.83} \approx 0.84 $$

Summary of Metrics

  • Precision: Indicates the correctness of positive predictions.
  • Recall: Indicates the coverage of actual positive instances.
  • F1-score: Balances precision and recall.
  • Fb-score: Provides a weighted balance of precision and recall, emphasizing one more than the other based on the value of ( \beta ).

Python Example

Here’s how you can calculate these metrics using the sklearn library:

from sklearn.metrics import precision_score, recall_score, f1_score, fbeta_score

# Sample data
y_true = [1, 1, 0, 1, 0, 1, 0, 0, 1, 0]  # Actual labels
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 1, 0]  # Predicted labels

# Calculate Precision
precision = precision_score(y_true, y_pred)
print("Precision:", precision)

# Calculate Recall
recall = recall_score(y_true, y_pred)
print("Recall:", recall)

# Calculate F1-score
f1 = f1_score(y_true, y_pred)
print("F1-score:", f1)

# Calculate F-beta score with beta = 2
fbeta = fbeta_score(y_true, y_pred, beta=2)
print("F2-score:", fbeta)
                            

Back to Kickstart ML with Python cookbook page