Beta Normalization Aggregation-Based Ensemble Learning for Lung Cancer Classification: Evaluation on CT and Histopathological Images
Mobarak Abumohsen, Enrique Costa-Montenegro, Silvia García-Méndez, Amani Yousef Owda, Majdi OwdaThe early and accurate detection of lung cancer (LC) is one of the primary challenges in the clinical diagnostics process, which plays a vital role in the treatment of the disease. Although various deep learning (DL) techniques have been presented, the existing DL methods are mainly focused on single-modal images, either computed tomography (CT) or histopathological images, which are associated with poor generalization, diversity, and applicability. To mitigate the existing issues, the present work aims to develop a modality-independent ensemble DL framework that is independently evaluated on CT and histopathological image datasets for LC classification. In this work, the proposed framework was developed using the Beta Normalization Aggregation (BNA) technique, where the performance of three state-of-the-art pre-trained convolutional neural network (CNN) architectures was compared on two distinct imaging modalities images. Based on the comparative analysis of the performance metrics, Xception, DenseNet121, and MobileNetV2, are chosen to develop the Ensemble model. Predictions generated by the selected CNN models are aggregated using the proposed BNA strategy to improve classification robustness, which improves the confidence of the prediction results and discriminative capabilities. The experiments using public data sets have confirmed the excellent performance of the model. On the CT dataset, the proposed BNA Ensemble achieved a testing accuracy of 97.45%, with a precision of 97.88%, recall of 97.45%, F1-score of 97.45%, and an AUC of 0.9986. On the histopathological dataset, the framework achieved an accuracy of 99.80%, with precision, recall, and F1-score all reaching 99.80%, and an AUC of 1.0000. These results demonstrate the effectiveness, robustness, and generalizability of the proposed BNA framework. The analysis of the results using t-SNE plots, confusion matrices, ROC curves, and confidence distributions provided additional insights into feature separability, classification performance, and prediction confidence of the proposed framework.