Discrete evaluation measures

# Discrete evaluation measures ## Prologue ### Precision/Recall and the Confusion Matrix The light green area (true positives) contains the **hits**, the red area (false positives) the **errors**, and the false negatives are also referred to as **misses**. ![Precision, Recall, and the Confusion Matrix|400](Precisionrecall.svg.png) Source: https://en.wikipedia.org/wiki/Precision_and_recall Therefore: - Precision = hits / (hits + errors) - Recall = hits / (hits + misses) This section only contains set evaluation metrics. If you are looking to measure ranked results, take a look at [ranking evaluation metrics](Ranking%20evaluation%20metrics.md), instead. ## Confusion Matrix CM | Pos. Pred. | Neg. Pred. -- | -- | -- **Pos. Inst.** | $TP$ | $FP$ **Neg. Inst.** | $FN$ | $TN$ Columns and rows: - **Predictions** of the class "Positive" are counted in the Pos. Pred. *column*. - **Instances** of the class "Negative" are counted in the Neg. Inst. *row*. - Etc. Cells: - $TP$ : True Positives (hits) - $FP$ : False Positives (errors) - $FN$ : False Negatives (misses) - $TN$ : True Negatives (correct rejections) A perfect predictor only has counts in the $TP$ and $TN$ cells. While a predictor that gets no class right has zeros in both $TP$ and $TN$. (In which case, you can swap the predictions - at least, for binary classifiers.) ## Hit quality evaluation - **Precision** / positive predictive value PPC: $\frac{TP}{TP + FP}$ - **Recall** / **Sensitivity** / Hit rate / true positive rate TPR: $\frac{TP}{TP + FN}$ - **Specificity** / true negative rate TNR: $\frac{TN}{TN + FP}$ - Negative precision / negative predictive value NPV: $\frac{TN}{TN + FN}$ ## Miss quality evaluation - False discovery rate FDR (1 - Precision): $\frac{FP}{FP + TP}$ - Miss rate / false negative rate FNR (1 - Recall): $\frac{FN}{FN + TP}$ - **Fall-out** / false positive rate FPR (1 - Specificity): $\frac{FP}{FP + TN}$ - False omissions rate FOR (1 - Negative precision): $\frac{FN}{FN + TN}$ ## Holistic evaluation measures Often, you will want to optimize for more than one of the metrics presented above. The following options exist: ### Matthews Correlation Coefficient $\frac{TP \times TN - FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}$ The result ranges from -1 to 1; If it is 0, the result is the same as a random result. If it is negative, making exact opposite predictions would have been more successful. MCC should be your default metric for discrete set evaluations unless you have balanced relevant and irrelevant sets or cannot determine the set of True Negatives. ### Accuracy $\frac{TP + TN}{TP + TN + FP + FN}$ The less balanced the positive and negative sets are, the less indicative Accuracy becomes. Unless you have nearly equal-sized relevant and irrelevant sets, you should prefer MCC. ### $F_\beta$-Score $\frac{(1 + \beta^2) \times TP}{(1 + \beta^2) \times TP + \beta^2 \times FN + FP}$ The larger $\beta$ is, the more important *Recall* becomes. For the balanced $\beta=1$ case, the simplified $F_1$-Score is: $\frac{2 \times TP}{2 \times TP + FN + FP}$ Alternatively: $2\frac{Recall \times Precision}{Recall + Precision}$ The $F_\beta$-Score is useful if both Recall and Precision matter to you (possibly, with unequal weighting), while you cannot determine the set of True Negatives.