DOI: 10.3390/biomedicines14071439 ISSN: 2227-9059

A Multimodal Biomedical Transformer Fusion Network for Disease-Level Rare-Disease-Inheritance Classification Using Ontology-Enriched Text, Metadata, and Gene Associations

Mahmood A. Mahmood, Khalaf Alsalem

Background/Objectives: Inheritance classification in rare diseases remains challenging because curated knowledge is incomplete, heterogeneous, and imbalanced across inheritance categories. Disease-level inheritance modeling can support knowledge organization, annotation review, and hypothesis generation in rare-disease resources. This paper introduces RareFusion-Net, a multimodal benchmark framework for disease-level inheritance classification, and evaluates whether integrating ontology-enriched disease text, structured epidemiological metadata, and gene-association information improves prediction in curated rare-disease knowledge bases. RareFusion-Net is intended for knowledge modeling, not individual patient diagnosis. Methods: We developed RareFusionBalanced, a gated multimodal fusion model that combines biomedical disease descriptions, structured metadata, and gene-related information using auxiliary supervision. Ontology-enriched disease text was treated as the dominant semantic modality, while tabular and gene modalities were incorporated as complementary evidence when available. Robustness was improved using balanced regularization, selective transformer fine-tuning, dropout, weight decay, label smoothing, early stopping, and prediction aggregation across random seeds. Evaluation included accuracy, macro-F1, micro-F1, macro-AUC, mean average precision, calibration metrics, class-wise analysis, statistical testing, and ablation experiments. Results: RareFusionBalanced achieved 0.7382 test accuracy, 0.6284 macro-F1, 0.7382 micro-F1, 0.9183 macro-AUC, and 0.6686 mean average precision. Calibration was favorable, with an expected calibration error of 0.0395 and a Brier-OVR of 0.0528. The multimodal model slightly outperformed TextOnly-TransformerBalanced, but improvement over the best TF-IDF baseline was not statistically significant. Ablation showed ontology-enriched text as the strongest modality, with gene associations adding complementary value. Conclusions: RareFusion-Net provides a practical benchmark for ontology-aware rare-disease inheritance modeling. Results suggest selective multimodal benefit while highlighting minority-class difficulty, limited statistical superiority, need for external validation, and improved biological interpretability.

More from our Archive