Predicting Thoroughbred Yearling Auction Prices with Machine Learning: Evidence from the Keeneland September Sale
Yanchao Yang, John Clarke, Tapan Mandal, Thinh Nguyen, Trung PhamAbstract
We apply machine learning methods to predict Thoroughbred yearling auction prices at the Keeneland September Sale (2020–2024). Our sample includes 5,788 yearling prices with pedigree data. We use both linear and tree-based models to predict log prices. We use cross-validation to tune model hyperparameters and select Ridge regression (α = 1.451) as the primary model for interpretation given its stability and interpretability. The Ridge regression explains approximately 54% of out-of-sample variation (R 2 ≈ 0.5403). Sire and Dam Reputation emerge as the dominant predictors. Results provide pricing benchmarks and show how reputation and session structure shape Thoroughbred yearling auction prices.