Composition Descriptors and Cultivar Transferability in Machine-Learning Models of Ultrasonication-Induced Functional Properties of Rice Flour
Hyeonbin Oh, Jung-Hyun Nam, Bo-Ram Park, Kyung Mi Kim, Ha Yun Kim, Yong Sik ChoFlow-cell ultrasonication of gelatinized rice flour slurries alters cultivar-dependent water solubility, viscosity, and retrogradation of pregelatinized rice flour, properties important for plant-based beverages and convenience foods. We tested whether cultivar-level composition descriptors, amylose, protein, and fiber, can represent cultivar-associated variation in ultrasonication responses while separating process-only prediction, within-domain cultivar representation, and unseen-cultivar transfer. Six rice cultivars were processed across nine amplitude-time combinations and two slurry concentrations. Water solubility index, apparent viscosity at a shear rate of 50 s−1, and setback viscosity were modeled using ElasticNet, partial least squares regression, support vector regression, random forest, and extreme gradient boosting. Three input formulations were compared: process variables alone, process variables plus composition descriptors, and process variables plus cultivar identity. Repeated nested group cross-validation showed insufficient process-only prediction and substantial improvement from composition descriptors. Within-domain validation showed comparable composition-descriptor and cultivar-identity performance under nonlinear algorithms. However, because cultivar identity is undefined for absent cultivars, leave-one-cultivar-out transfer of the composition-descriptor model remained uncertain. Cross-fitted Shapley additive explanations showed predictions used process and composition variables. For the validated cultivar-process domain, this approach can screen cultivar-process combinations for beverage and convenience-food applications, but replacing categorical source identifiers with continuous descriptors requires explicit transfer validation.