Kickstart ML with Python snippets
Understanding the basics of ROC curves

True Positive Rate (TPR) or Sensitivity or Recall:
 TPR measures the proportion of actual positives that are correctly identified by the model.
 Formula: $$ \text{TPR} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}} $$

False Positive Rate (FPR):
 FPR measures the proportion of actual negatives that are incorrectly identified as positives by the model.
 Formula: $$ \text{FPR} = \frac{\text{False Positives (FP)}}{\text{False Positives (FP)} + \text{True Negatives (TN)}} $$

Threshold:
 The classification threshold determines the point at which the model's predicted probability is converted into a binary decision (positive or negative).
 By varying this threshold, you can generate different TPR and FPR values, which are then plotted to form the ROC curve.
How to Construct an ROC Curve:

Calculate TPR and FPR:
 For a series of threshold values (typically from 0 to 1), calculate the TPR and FPR.

Plot the Points:
 Plot the TPR against the FPR for each threshold value.

Connect the Points:
 Connect these points to form the ROC curve.
Interpreting the ROC Curve:

Diagonal Line (45degree line):
 This represents a random classifier with no discrimination capability (TPR = FPR).

Above the Diagonal:
 Points above the diagonal indicate betterthanrandom performance.

Closer to the TopLeft Corner:
 The closer the ROC curve is to the topleft corner, the better the model's performance. This point represents high TPR and low FPR.
Area Under the ROC Curve (AUC  ROC):
 AUC (Area Under the Curve):
 The AUC value represents the overall ability of the model to discriminate between positive and negative classes.
 AUC ranges from 0 to 1:
 0.5: Represents a random classifier.
 1.0: Represents a perfect classifier.
 >0.5: Represents a model better than random.
Python Example:
Here’s how you can plot an ROC curve using the sklearn
library:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
# Generate a binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create a logistic regression model
model = LogisticRegression()
# Train the model
model.fit(X_train, y_train)
# Predict probabilities
y_prob = model.predict_proba(X_test)[:, 1]
# Compute ROC curve and ROC area
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)
# Plot ROC curve
plt.figure()
plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='red', lw=2, linestyle='')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()
Interpreting the ROC Curve and AUC

ROC Curve Shape:
 The curve starts at (0, 0) and ends at (1, 1).
 A curve that bows towards the topleft corner indicates good performance. The model achieves high TPR while maintaining low FPR.

Diagonal Line (Random Classifier):
 The dotted line from (0, 0) to (1, 1) represents a random classifier with no skill. Points above this line indicate betterthanrandom performance.

Area Under the Curve (AUC):
 The AUC value (0.5 to 1.0) provides a single measure of overall model performance. An AUC of 0.93, for example, indicates excellent model performance.
Practical Tips
 Threshold Selection: Depending on the application, you might choose different thresholds to balance between TPR and FPR. For example, in medical diagnostics, you might prefer a higher TPR (sensitivity) even at the cost of a higher FPR.
 Comparison of Models: ROC curves are useful for comparing the performance of different models. The model with the higher AUC is generally considered better.
 Understanding Tradeoffs: The ROC curve helps visualize the tradeoff between sensitivity (TPR) and specificity (1  FPR). This is crucial in applications where the cost of false positives and false negatives differs.