Infrared Thermography and Machine Learning for Mastitis Detection in Dairy Cows: A Pilot Case Study in Egyptian Farms
Aya S. Elmasry, Eman A. Elwakeel, Ali M. Allam, Marwa F. A. Attia, Alaa. T. Elmaria, Elsayed. E. M. Badr, Sobhy M. A. SallamMastitis is a major and costly dairy disease that reduces milk yield and quality and harms animal welfare. This study evaluated infrared thermography (IRT) combined with machine learning (ML) for non-invasive mastitis screening in dairy cows and explored links with biological and feeding-system variables in Egyptian farms. A total of 976 thermal udder images obtained from 488 Holstein cows were used, including 708 healthy and 268 mastitic images. Images were captured before milking, processed with CLAHE, resized to 224 × 224 pixels, and split using cow-level grouping before augmentation to prevent animal-level data leakage. The training set contained 780 original images and was augmented to a balanced 4708-image set (2354 per class), while the held-out test set remained unaugmented, with 196 original images (142 healthy and 54 mastitic). EfficientNetB3 with global average and max pooling extracted 3072 thermal features, and ten ML classifiers were evaluated. In the image-level hold-out evaluation, MLP achieved the best performance (accuracy = 86.22%, AUC = 0.9184, sensitivity = 74.07%, specificity = 90.85%), followed by SVM (accuracy = 83.67%, AUC = 0.8963). A separate group-based five-fold cross-validation yielded a more conservative AUC of 0.6812 ± 0.1323 and accuracy of 0.6244 ± 0.0642. Logistic regression analyses did not identify statistically significant associations between model predictions and somatic cell count (SCC), California Mastitis Test (CMT), blood biomarkers, or nutritional variables at p < 0.05. Ration A (Delta Misr) showed a higher observed mastitis incidence (20/40; 50.0%) than Ration B (Copenhagen; 16/45; 35.6%), but nutritional predictors were not statistically significant, indicating that farm-level confounding should be considered. Overall, IRT with ML remains a promising non-invasive screening approach, but broader multicenter datasets and independent external validation are needed before routine farm deployment.