When to Adjust for Multiple Testing: A Unifying Guiding Principle
Sabine Hoffmann, Simon Lemster, Gary Collins, Alexander Hapfelmeier, Georg Heinze, Andreas Mayr, Matthias Schmid, Juliane C. Wilcke, Anne‐Laure BoulesteixABSTRACT
Most original articles published in the medical literature report the results of multiple statistical tests. In a few simple cases, there is agreement on whether to adjust for the number of performed tests. For many cases encountered in practice, however, this is less clear, and the recommendations in the literature are contradictory, along different dimensions, or otherwise confusing. This lack of clear guidance may impair the conduct and interpretation of analyses, and encourage questionable research practices, ultimately jeopardizing the credibility of medical research. In this article, we refine, illustrate, and discuss a unifying guiding principle to assist both statisticians and applied researchers in deciding whether to adjust for multiple testing and, if so, over which set of tests. The principle is that multiple testing should be adjusted for if and only if authors, when reporting and interpreting their findings, put more emphasis on results of one or several of the tests because of their small p‐value(s) . We relate this principle to previously proposed rules and show how it can guide and clarify the choice of adjustment strategies in three complex multiple testing settings.