DOI: 10.1145/3821214 ISSN: 1049-331X
Effort-aware ROC Curves: a Comprehensive Approach to the Evaluation of Software Defect Prediction Models
Luigi Lavazza, Sandro Morasca, Gabriele Rotoloni
Background
: Effort-aware metrics (EAMs) have the merit of including software analysis effort in the evaluation of software defect prediction (SDP) models. Yet, almost all EAMs concentrate only on the effectiveness of SDP models, i.e., how well they identify actually defective modules. Instead, EAMs ignore the efficiency of SDP models, i.e., their ability to not classify non-defective modules as defective. Efficiency is important because modules incorrectly classified as defective will undergo useless analysis, thus causing effort waste, and also undermining the confidence of developers in the usefulness of SDP models.
Aim
: We aim to represent the performance of SDP models along several dimensions, while taking into consideration the effort needed for module analysis. Specifically, we look for indicators that account for both effectiveness and efficiency.
Method
: We extend Receiver Operating Characteristic (ROC) curves to explicitly represent analysis effort. To this end, we add iso-effort lines to the ROC space and iso-effort points to ROC curves. Iso-effort curves and points show where each point of the ROC space (corresponding to a classifier) is positioned with respect to analysis effort. We also define effort-aware metrics based on the area under the ROC curve.
Results
: We show that the proposed effort-aware ROC curves support systematic and complete evaluation of SDP models’ performance. Specifically, we show that a single effort-aware ROC curve provides the same information as multiple traditional EAMs. In addition, effort-aware ROC curves provide evaluation-oriented information in a coherent and visually intuitive way. We also show the practical application of the proposed techniques to well-known software defectiveness datasets.
Conclusions
: The proposed technique can effectively support both researchers and developers in comparing and selecting SDP models.