DOI: 10.3390/molecules31122162 ISSN: 1420-3049

Machine-Learning-Driven Molecular Design and Structure–Property–Performance Relationships in Pharmaceutical Chemistry

Aisulu Zh. Kabdraisova, Almagul K. Umbetova, Gulfairuz Zh. Kairalapova, Yuliya A. Litvinenko, Larissa R. Sassykova, Nazym S. Yelibayeva, Gauhar Sh. Burasheva, Aliya E. Berganayeva, Zhanibek S. Assylkhanov, Meruyert D. Dauletova, Dmitriy Yu. Korulkin, Marzhan A. Baiburkutova, Aigerim M. Sadvakas

This review examines the emerging role of machine learning (ML) in pharmaceutical chemistry, with emphasis on molecular design, synthetic feasibility, and structure–property–performance (SPP) relationships. By enabling pre-synthesis prediction of physicochemical properties, reaction pathways, and pharmaceutical performance, ML can reduce empirical trial-and-error experimentation and support more efficient exploration of chemical space. A structured narrative review design with PRISMA-aligned systematic search elements was used to evaluate 101 studies, enabling transparent literature identification, eligibility screening, and thematic synthesis across heterogeneous ML applications in pharmaceutical chemistry. This review examines structure–property relationships (SPRs) and property–performance relationships (PPRs), with emphasis on key pharmaceutical endpoints such as solubility, permeability, stability, dissolution, and bioavailability. An integrated SPP framework is proposed to connect molecular structure, intermediate properties, and final performance outcomes while incorporating retrosynthetic analysis and experimental feedback and closed-loop optimization. Recent frontier developments are also discussed, including molecular foundation models, multimodal language–graph models, diffusion-based molecular generation, E(3)-equivariant models, and MolMIM-like latent-space optimization. This review also covers co-folding and joint ligand–protein modeling, Boltz-2-like affinity prediction, AlphaFold 3-related biomolecular interaction modeling, and absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction. Key limitations include dataset leakage, benchmark inconsistency, assay variability, conformational and protonation-state effects, reproducibility challenges, regulatory constraints, and the gap between computational prediction and prospective experimental validation. Future progress is expected to depend on hybrid physics–ML models, uncertainty-aware prospective validation, autonomous experimentation, explainable artificial intelligence, and sustainability-aware molecular design. Overall, ML is evolving from a predictive tool into a chemically informed decision-support framework for rational, synthesis-aware, and experimentally validated pharmaceutical development.

More from our Archive