DOI: 10.1145/3808165 ISSN: 2994-970X

The Effect of Complexity and Provenance on Code Review Decisions: Evidence from a Controlled Experiment

Neha Singh, Francesco Sovrano, Vincent Hellendoorn, Alberto Bacchelli

A code revision is a proposed modification to a specific code snippet under review, created in response to a reviewer’s comment. Modern code review platforms, such as GitHub, allow participants to provide these revisions inline as concrete change suggestions that the others can accept or reject with a single action. While this feature promises efficiency, it may also shape how developers evaluate changes. We hypothesize that under higher code complexity, which increases cognitive load and uncertainty, and depending on suggestion provenance (human vs. AI), reviewers may rely more on heuristic judgments and readily available suggestions, potentially reducing review effectiveness. To test our hypothesis, we present the results of a between-subjects experiment with 385 participants, who were asked to review a changeset including the acceptance/rejection of a proposed code revision. The study tested for the effects of code complexity (low vs. high) and provenance labels (human vs. AI), while controlling for revision correctness. We analyzed developers’ review decisions through compliance patterns: acceptance of correct or rejection of incorrect code revisions (appropriate-compliance), acceptance of incorrect code revisions (over-compliance), and rejection of correct code revisions (under-compliance). We found that higher code complexity significantly ( < .05) increases over-compliance, with reviewers more frequently accepting incorrect suggestions. In contrast, provenance labels had no statistically supported effect on review outcomes. We also found no statistically supported evidence that provenance moderates the effect of complexity. This work contributes: (i) empirical evidence that higher code complexity increases the likelihood of accepting incorrect revision suggestions, (ii) an analysis of provenance showing no main effect on overall compliance, and (iii) clarification that the effect of complexity does not statistically depend on whether revisions are AI- or human-labeled, with any observed differences treated as preliminary and exploratory. Together, these results highlight the need for review systems that surface complexity cues and support more deliberate evaluation of suggested revisions, especially in cognitively demanding contexts. Data and Materials: https://doi.org/10.5281/zenodo.19481940

More from our Archive