Polygon‐Aware Deep Learning Framework for Meal‐Level Nutrition Estimation From Food Images
Amani Tahsin Yasin, Elham Tahsin Yasin, Murat KokluABSTRACT
Accurate estimation of meal‐level nutritional content from food images is a challenging yet essential task for automated dietary assessment. Although recent computer vision methods have achieved promising results in food recognition and segmentation, many existing approaches rely on coarse bounding‐box representations, limiting their suitability for quantitative nutrition estimation. This study proposes a polygon‐aware, instance‐level framework for meal nutrition estimation that integrates precise food segmentation with region‐specific feature extraction. The proposed approach employs a YOLOv8n‐based instance segmentation model to localize individual food items in real‐world meal images, followed by polygon‐aware extraction of shape, color, and texture features exclusively from segmented food regions. Instance‐level features are aggregated using area‐weighted pooling to generate a meal‐level representation, which is subsequently used for nutrition estimation through multiple regression models, including Random Forest, XGBoost, LightGBM, Ridge Regression, and a CNN‐based regressor. The framework is evaluated on the FoodBD v2 dataset, which provides polygon annotations and nutritional labels for carbohydrates, protein, fat, fiber, calories, and glycemic load. Experimental results showed mask mAP@0.5 of 0.4232 and mAP@0.5:0.95 of 0.3400, whereas qualitative overlap analysis on 20 representative samples yielded a mean intersection over union (IoU) of 0.9434, and strong instance‐level classification accuracy (F1‐score of 0.87). Across all regression models, polygon‐aware features consistently improve nutrition estimation, yielding an average performance gain of 11.63%, with the best results obtained using XGBoost. Statistical significance testing confirms that improvements are robust ( p < 0.05 for most nutrients). Overall, the results highlight the effectiveness of fine‐grained, polygon‐level representations for reliable and explainable image‐based nutrition estimation, providing a scalable foundation for real‐world dietary monitoring applications.
Practical Applications
The proposed method can support automated dietary monitoring by estimating nutritional information directly from meal images captured using mobile devices. This approach may assist consumers, nutrition professionals, and digital health platforms in assessing meal composition without manual food logging. With further validation on diverse food datasets, the framework could be integrated into nutrition tracking applications for real‐world dietary assessment.