A Nonlinear Approach to the Performance Creation Mechanism of Startup Knowledge Resources: Identifying Time-Lag Effects and Growth Thresholds Using Machine Learning and Explainable AI
Won Gyu Lee, Eunji ChoiThis study examines how the resource configurations of early-stage startups are associated with intellectual property (IP) management capability. To achieve this objective, a dual analytical framework integrating hierarchical regression analysis (OLS) with machine learning techniques (XGBoost and SHAP) is employed. Because conventional linear models may not capture complex associations, the analysis also explores potential nonlinear patterns among key variables, which are interpreted as exploratory, model-based tendencies rather than as causal or temporal effects. The empirical findings reveal several important insights. First, the results of the linear regression analysis indicate that the main effects of simple quantitative indicators—such as firm age and organizational size—are not statistically significant. The interaction between the startup period and pre-startup education (H1) is the only relationship to approach statistical significance, although it is borderline and not robust to alternative variable coding. This pattern suggests that IP management capability is associated not with the quantity of inputs but with the preparedness of the entrepreneur’s knowledge resources. Second, the explainable artificial intelligence (XAI)-based analysis surfaces nonlinear patterns that are not captured by conventional linear models. Specifically, the model-estimated contribution of entrepreneurial education is comparatively small among firms in their first two years and larger among firms around the third year, and the model-estimated contribution of organizational size diminishes once the firm reaches roughly thirty employees. These inflections are model-based tendencies observed in SHAP dependence plots and are corroborated by formal segmented (breakpoint) regressions (spline terms p = 0.010 and p = 0.002). Methodologically, the study shows how integrating hierarchical regression with explainable machine learning (XGBoost and SHAP) can reveal nonlinear and threshold patterns that conventional linear models overlook. Building on this, it proposes resource latency as an interpretive lens, rather than an established construct, for age-related patterns in startup resource utilization, to be examined in future longitudinal research. From a practical perspective, the findings suggest the value of sustained support during the early scale-up period and of more systematic management structures as firms grow, while recognizing that these patterns are cross-sectional associations.