LLM-Guided Automated Feature Engineering for Time Series Data with Temporal Leakage Control
Maryam Khanian Najafabadi, Bushra Naeem, Touraj Khodadadi, Saman Shojae Chaeikar, Zawar ShahThis study proposes a time series-aware Large Language Model (LLM)-driven feature engineering framework for tabular prediction tasks. Existing automated feature engineering methods, including LLM-based approaches such as CAAFE and OCT-Tree and established libraries such as tsfresh and Featuretools, can generate useful features for general tabular data but do not explicitly address temporal availability constraints in time series settings. This can lead to data leakage when variables that are only available after the prediction event are used directly during model training. To address this limitation, the proposed framework classifies variables into antecedent features, consequent features, and historical aggregated features. The key innovation is that consequent variables are not discarded to prevent leakage but are instead routed into a leakage-safe historical aggregation pipeline, recovering predictive signal from post-event variables through temporally valid past values. The framework guides an LLM to generate structured feature engineering configurations, applies temporally valid transformations, performs feature selection, and evaluates the selected features using predictive models. A formal leakage control mechanism ensures that all aggregations use strictly past observations, applied within entity groups and before the temporal train–validation–test split. The framework is evaluated on two time series tabular tasks: Tesla stock prediction and English Premier League match outcome prediction. The results show that the proposed approach improves predictive performance compared with raw-feature baselines and selected existing automated feature engineering methods. On the Tesla dataset, the framework reduced MAE compared with both the baseline and the reported OCT-Tree result. On the EPL dataset, it improved accuracy compared with the odds-only baseline and the reported Azure ML preprocessing result. These findings suggest that combining LLM reasoning with explicit temporal constraints is a practical direction for automated feature engineering in time series tabular machine learning.