Plot ROC curves with SciKit Learn

# Imports Using Matplotlib for plotting, and SciKit Learn to create the predictions: ```python import matplotlib.pyplot as plt from sklearn.model_selection import cross_val_predict from sklearn.metrics import roc_curve ``` # Setup Create a classifier, set or learn a cutoff for it (default is zero), and run (cross-validated) predictions with it. *Importantly*, ask for scores, not classes, as the returned vales by setting *`method="decision_function"`*. ```python my_clf = ... # some classifier my_threshold = ... # some cutoff value y_scores = cross_val_predict( my_clf, X_train, y_train, cv=3, method="decision_function" ) ``` Some classifiers (e.g., Random Forest) only expose a `predict_proba` method - in those cases, simply adapt the above `method` parameter. # Calculate Score Curves Use the true binary class label and the classifier scores for those instances as an input to the SciKit Learn method to calculate the **`roc_curve`** from the `metrics` package. ```python fprs, tprs, scores = roc_curve(y_train, y_scores) ``` If you want to show the chosen threshold, you need to find it on the precision and recall axes, too: ```python idx = (scores >= threshold).argmax() # first index ≥ threshold ``` # Receiver Operating Characteristic Plot With this in place, you can plot the ROC curve: ```python plt.figure(figsize=(6, 5)) plt.plot(fprs, tprs, linewidth=2, label="ROC curve") plt.plot([0, 1], [0, 1], 'k:', label="Random classifier's ROC curve") plt.plot([fprs[idx]], [tprs[idx]], "ko", label="Threshold") plt.xlabel('False Positive Rate (Fall-Out)') plt.ylabel('True Positive Rate (Recall)') plt.grid() plt.axis([0, 1, 0, 1]) plt.legend(loc="lower right", fontsize=13) plt.show() ``` ![[ROC Curve.png]] # Analysis: AUC ROC To get the area under the ROC curve, use: ```python from sklearn.metrics import roc_auc_score roc_auc_score(y_train, scores) ``` And to simply show the metrics at the threshold: ```python print("Fall-Out@Threshold =", fprs[idx]) print("Recall@Threshold =", tprs[idx]) ```