Utilizing Large Language Models for text-based Industry classification
James OffuttThis study develops a novel, dynamic industry classification system, rooted in Artificial Intelligence (AI), by using Large Language Model (LLM) technology to analyze and compare firms’ product descriptions as found in Securities and Exchange Commission (SEC) 10-Q and 10-K filings. Unlike traditional static classification systems such as the Standard Industrial Classification (SIC) or the North American Industry Classification System (NAICS), the proposed method dynamically quantifies the degree of competition and customer-supplier relationships between firms. It utilized a 210x210 similarity matrix to compile the relationship scores as a starting point for further analysis. This enhanced metric strengthens the literature and aids in the identification of portfolio correlations, providing nuanced firm-to-firm insights that other methodologies have not fully captured. In turn, this assists investors in risk management and provides insights into behavioral finance by highlighting how news perception affects market dynamics. It also has potential implications on merger and acquisition strategy, supply chain analysis, and policy making. The methodology employs Ordinary Least Squares (OLS) regression and pairwise correlation analysis to evaluate the efficacy of the LLM measurement against the SIC and NAICS codes. The LLM outperformed the other methods across most models. In the few cases where it did not, the models had low observation counts, lower