Assessing replication success: a taxonomy and systematic comparison of replication success measures for prospective replications
Dennis Kondzic, Samuel Pawel, Jerome Hoffmann, Mathias Twardawski, Leonhard Held, Marie-Ann Sengewald, Steffi PohlAbstract
Replication studies are recognized as essential to the scientific process. Numerous measures have been developed to quantify replication success. Most measures were developed for post hoc replications, in which the primary study has been conducted and sometimes assumed to show a specific result (e.g. statistical significance). Consequently, methodological studies have focused on evaluating replication success measures for those replications. However, recent work emphasizes the value of prospective replications, in which primary and replication studies are planned simultaneously. Such replications allow researchers to control study characteristics and thus investigate which characteristics cause effect heterogeneity. This study provides replication success measures for prospective replications, and guidelines for choosing between them. We present a taxonomy of measures based on research questions they address and evaluate existing frequentist and Bayesian approaches for their applicability to prospective replications. We illustrate their application using an example from social psychology. In simulations, we compare the statistical properties of measures that aim at the same research question. Results indicate that there is almost always a trade-off between error types. Thus, no single measure emerged as always clearly superior. We highlight the assumptions and strengths of each measure and offer recommendations for choosing a measure based on replication goals.