Advanced Representations and Generative Al for Polyolefins
Jiayi Song, Shan Ye, Rongjuan Cong, Shoupeng Qiao, Linge Ma, Shuai ShaoAbstract
The properties of polyolefins are governed not only by the repeat unit alone, but by ensemble-level variables including molecular weight distribution, branching frequency, comonomer incorporation, and tacticity. This distributional complexity poses challenges for the application of artificial intelligence and machine learning methods, which have been developed predominantly for well-defined, monodisperse polymer systems. This review paper summarizes the latest artificial intelligence-driven polyolefin research. It comprises of three major parts: molecular representation, property prediction, and inverse design. String-based notations (simplified molecular-input line-entry system or SMILES, BigSMILES, and Generative-BigSMILES), graph-based encodings, and descriptor-based methods are introduced with respect to their capacity to capture the stochastic ensemble character of polyolefins. Datasets collected from experimental characterization, established databases, computational simulations, and literature mining are discussed, along with the application of feed-forward networks, graph neural networks, transformers, and large language models to predict properties such as glass transition temperature, thermal conductivity, density, and mechanical responses. Surrogate-optimization approaches and generative models, including variational autoencoders, transformer generators, and large language model-based pipelines, are also reviewed for producing polymer candidates under target property constraints. As polyolefin-specific studies remain limited in certain areas, methods developed for general polymers are also discussed. The representation gap and the scarcity of polyolefin-specific datasets with detailed microstructural characterization are identified as the primary bottleneck, and future directions are outlined including ensemble-aware representations, multiscale physics-machine learning integration, and the extension of inverse design from repeat-unit-level generation to ensemble-level specification.