The Lab Fingerprint of HIV Comorbidities
Solomon Russom, Dimitrios Kollias, Saeid Pourroostaei Ardakani, Qianni ZhangDespite the success of antiretroviral therapy, people living with HIV remain at heightened risk of multimorbidity spanning cardiovascular, renal, hepatic, oncologic and neuropsychiatric domains. We investigate whether routinely collected electronic health record data (30 laboratory variables plus seven demographic/social descriptors) can support early, multi-label classification of recorded comorbidities in a real-world cohort of 2200 HIV-positive patients receiving continuous care at a major London hospital. We benchmark classical machine and deep learning models under two settings: a demographic-aware configuration that includes sensitive attributes (age, gender, race and continent of birth) and a demographic-unaware configuration that excludes them. XGBoost yields the best macro-F1 performance, and demographic-aware variants consistently outperform their unaware counterparts. Permutation feature importance revealed physiologically coherent drivers (e.g., creatinine/eGFR for renal and cardiometabolic labels, hemoglobin for hematologic labels, albumin for respiratory labels) and suggested that the relative contribution of demographic variables varied across comorbidity categories. These findings indicate that (i) routinely collected EHR data contain informative patterns associated with the multi-label comorbidity profiles of people living with HIV and (ii) carefully governed use of demographic context can improve accuracy while motivating transparent consideration of fairness and bias.