Similarity‐Driven Framework for Efficient Polymer Property Prediction Under Data Scarcity Scenarios
Amaia Elizaran Mendarte, Gustavo A. SchwartzABSTRACT
Predicting polymer properties directly from chemical structure is essential for the rational design of advanced materials. Although artificial neural networks (ANNs) have emerged as powerful tools for quantitative structure–property relationships and can achieve high predictive accuracy when large datasets are available, their applicability is often constrained by data scarcity. Here, we present a similarity‐driven approach to address data limitations and enhance ANN‐based prediction of the glass transition temperature ( T g ) of atactic acrylates. Building on the similarity principle, which holds that structurally similar molecules exhibit similar properties, we develop two data‐efficient frameworks. First, structural similarity methods (embedding‐based and edit distance), based solely on SMILES‐encoded polymer structures, without any property information, achieve a mean absolute percentage error (MAPE) of 5.9% (MAE of ~18 K) using only five property‐informed samples for the edit distance method. Second, the chemical similarity method uses SMILES representations and T g values to generate the similarity space and achieves an average MAPE of 4.6% (~13 K) by employing the T g values of the five nearest neighbors for prediction. These findings show that for the proposed dataset, the strategy for handling data scarcity yields accurate T g predictions that surpass those of traditional ANN methods, which have a MAPE of 8.7% (~24 K).