Assessing the foundations of forensic identification evidence: A critical examination of proficiency test design and results
Nicholas Scurich, Thomas D. AlbrightProficiency testing is widely used to assess expertise in medicine, engineering, and other high-stakes fields. In forensic science, such tests are often cited in court as evidence that pattern-comparison methods are accurate and reliable. Yet, the scientific value of proficiency testing depends on test design, scoring rules, and the extent to which results support valid inferences about real-world performance. A recent proficiency test of forensic firearm identification—the comparison of bullets and cartridge cases to determine whether they were fired from a particular gun—reported a false-positive error rate of nearly 20%. The broader significance of that result extends beyond the reported figure. We identify several limitations that can arise in forensic proficiency testing: use of items that may not reflect casework difficulty, reliance on consensus scoring rather than independently verified ground truth, inconsistent treatment of inconclusive responses, variation in test materials across examinees, and procedures vulnerable to contextual bias, including nonblind verification. These features make it difficult to distinguish examiner performance from properties of the test itself and weaken claims that reported scores estimate operational error rates. Properly constructed tests could measure individual differences in performance, identify sources of error, evaluate decision thresholds, and provide courts with more meaningful evidence about reliability. Although our focus is on firearms, these challenges extend to other pattern-comparison disciplines, including fingerprints and handwriting. We conclude that comprehensive, evidence-based reform of forensic proficiency testing is warranted.