DOI: 10.1002/ail2.70036 ISSN: 2689-5595

Transformer‐Based Contextual Modeling for Predicting Calories From Recipes

Md. Siam Ansary, Amina Brinto

ABSTRACT

Accurate estimation of calorie content from text‐based recipes is a challenging and non‐trivial problem in automated dietary analysis and nutrition‐aware computing. Existing methodstypically rely on shallow lexical features, sentence‐level embeddings, or modular regression pipelines that struggle to capture fine‐grained contextual and compositional information present in culinary text. In this paper, we propose an enhanced transformer‐based regression framework that leverages contextual token representations and token‐level attention pooling to directly predict calorie values from unstructured recipe descriptions. Unlike stacking‐based or embedding‐only approaches, the proposed model is fine‐tuned end‐to‐end, enabling task‐specific adaptation of representations and explicit weighting of calorie‐relevant tokens. Extensive experiments conducted using five‐fold cross‐validation demonstrate that the proposed model significantly outperforms traditional lexical baselines, pretrained sentence embedding methods, and the previously introduced stacking‐based transformer regression framework across all evaluation metrics. The proposed model achieves lower prediction error and higher explained variance, indicating superior calibration and generalization. These findings highlight the importance of token‐level modeling and attention‐based aggregation for text‐based nutritional prediction. The proposed framework establishes a new state of the art in text‐based caloric estimation and provides a strong foundation for future research in automated nutritional analysis and intelligent dietary support systems.

More from our Archive