Plot precision-recall with SciKit Learn

# Imports Using Matplotlib for plotting, and SciKit Learn to create the predictions: ```python import matplotlib.pyplot as plt from sklearn.model_selection import cross_val_predict from sklearn.metrics import precision_recall_curve ``` # Setup Create a classifier, set or learn a cutoff for it (default is zero), and run (cross-validated) predictions with it. *Importantly*, ask for scores, not classes, as the returned vales by setting *`method="decision_function"`*. ```python my_clf = ... # some classifier my_threshold = ... # some cutoff value y_scores = cross_val_predict( my_clf, X_train, y_train, cv=3, method="decision_function" ) ``` Some classifiers (e.g., Random Forest) only expose a `predict_proba` method - in those cases, simply adapt the above `method` parameter. # Calculate Score Curves Use the true binary class label and the classifier scores for those instances as an input to the SciKit Learn method to calculate the **`precision_recall_curve`** from the `metrics` package. ```python precisions, recalls, scores = precision_recall_curve(y_train, y_scores) ``` If you want to show the chosen threshold, you need to find it on the precision and recall axes, too: ```python idx = (scores >= threshold).argmax() # first index ≥ threshold ``` # Precision and Recall Plots With this in place, you can plot the two curves for precision and recall separately: ```python plt.plot(scores, precisions[:-1], "b--", label="Precision", linewidth=2) plt.plot(scores, recalls[:-1], "g-", label="Recall", linewidth=2) plt.vlines(my_threshold, 0, 1.0, "k", "dotted", label="Threshold") plt.plot(thresholds[idx], precisions[idx], "bo") plt.plot(thresholds[idx], recalls[idx], "go") plt.grid() plt.xlabel("Classifier Score") plt.legend(loc="center right") plt.show() ``` ![[Precision-Recall Plot.png]] # Precision/Recall Curve If you rather plot the precision-recall curve itself, that can be done with the same dataset; To do that: ```python plt.plot(recalls, precisions, linewidth=2, label="Precision/Recall curve") plt.plot([recalls[idx], recalls[idx]], [0., precisions[idx]], "k:") plt.plot([0., recalls[idx]], [precisions[idx], precisions[idx]], "k:") plt.plot([recalls[idx]], [precisions[idx]], "ko", label="Threshold") plt.xlabel("Recall") plt.ylabel("Precision") plt.axis([0, 1, 0, 1]) plt.grid() plt.legend(loc="lower left") ``` ![[Precision-Recall Curve.png]] # Analysis: AUC PR To calculate the exact area under the precision-recall curve, use the `auc` function from the `metrics` package: ```python from sklearn.metrics import auc auc(recalls, precisions) ``` To get the exact precision and recall at the threshold: ```python print("Precision@Threshold =", precisions[idx]) print("Recall@Threshold =", recalls[idx]) ```