The Relationship Between Precision-Recall and ROC Curves

Loading...
Thumbnail Image

Date

Authors

Davis, Jesse
Goadrich, Mark

Advisors

License

DOI

Type

Technical Report

Journal Title

Journal ISSN

Volume Title

Publisher

University of Wisconsin-Madison Department of Computer Sciences

Grantor

Abstract

Receiver Operator Characteristic (ROC) curves and Precision-Recall (PR) curves are commonly used to present results for binary decision problems in machine learning. When the class distribution is close to being uniform, ROC curves have many desirable properties. However, when dealing with a highly skewed dataset, PR curves give a more accurate picture of an algorithm's performance. We show that a deep connection exists between ROC space and PR space. We prove that a curve dominates in ROC space if and only if it dominates in PR space. An important corollary to this proof is the notion of an achievable PR curve, and we show an efficient algorithm for computing the achievable PR curve. While it cannot be called a convex hull, this curve has properties much like the convex hull in ROC space. Finally, we show that differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between point. Furthermore, an algorithm which optimizes the area under the ROC curve is not guaranteed to optimize the area under the PR curve.

Description

Keywords

Related Material and Data

Citation

TR1551

Sponsorship

Endorsement

Review

Supplemented By

Referenced By