DOI: 10.1093/bjd/ljag086.656 ISSN: 0007-0963

PS41 Item response theory differential item functioning of the Dermatology Life Quality Index across 13 European countries

Jeffrey Johns, Sam Salek, Faraz Ali, Florence Dalgard, Jörg Kupfer, Andrew Y Finlay

Abstract

We investigated differential item functioning (DIF) in the Dermatology Life Quality Index (DLQI) using data from a European 13-country multicentre cross-sectional study of dermatology outpatients. Noncompensatory differential item functioning (NCDIF) using the Mantel–Haenszel (MH) method with continuity correction and item purification was used to determine whether DLQI items performed differently by gender, and whether DIF effects were noncompensatory (i.e. not offset by other items). For each item, DIF direction was determined with Raju’s signed area measure index and magnitude with Raju’s unsigned area measure index. Simulated data from item response theory parameters were compared with observed data to evaluate the impact of DIF on score interpretation with DLQI severity bands and DLQI score > 10 (used internationally in guidelines to determine eligibility for biologics). Although several DLQI items (6, 8, 9 and 10) showed significant DIF (MH χ2 adjusted P-values < 0.05 and effect size ΔMH of 1–1.5), their NCDIF values were all below the critical ≈ 0.054, indicating limited noncompensatory impact at the item level. The differential test functioning (DTF) index (aggregated CDIF across all items in the test) was 0.072, indicating that the overall DIF was negligible because positive and negative item-level DIF effects compensated for each other. A DTF ≈ 0.07 shows that the group’s (male or female) expected total DLQI score differs by < 0.1 points due to DIF. Simulated data showed minimal effects on sum scores. Based on modelling of scores, there is a 0.9% lower chance of having a score > 10 as a male respondent compared with female over 200 000 patient samples, with an effect size of 0.036 for real data and 0.055 for simulated data (both extremely small). Although DIF is found in some items, its impact on score banding and the DLQI > 10 cut point can be negligible in real-world applications, providing reassurance over this aspect of validation.

More from our Archive