DOI: 10.1145/3797145 ISSN: 2994-970X

Small Is Beautiful: A Practical and Efficient Log Parsing Framework

Minxing Wang, Yintong Huo

Log parsing serves as the fundamental step in log analysis, splitting logs into constant templates and dynamic variables. While recent semantic-based parsers leveraging LLM have shown superior generalizability over prior syntax-based methods, their effectiveness is critically dependent on the scale of the underlying model. This dependency results in a significant performance collapse when using smaller, more practical LLMs, thereby creating a major barrier to real-world adoption where data privacy and computational constraints necessitate the use of succinct and resource-efficient models.

In a typical semantic parsing pipeline, the parsing cache is a critical component that stores the set of observed templates to quickly route incoming logs. The design of this cache is therefore paramount to the parser’s overall effectiveness. Motivated by such, we improve parsing accuracy from two insights: 1) designing a more flexible cache updating strategy that can rectify prior errors, and 2) including an explicit validation process to proofread templates before they are added to the cache, preventing error injection. In particular, we propose EFParser, an unsupervised LLM-based log parser, including template extraction, template correction, and validated templates. To mitigate the impact of degraded capabilities in smaller LLMs, we designed a dual cache with an adaptive updating mechanism. When the LLM generates a new template, this module determines if it is a novel pattern or a variation of an existing one. If it’s a variation, it merges the templates, thereby maintaining consistency and correcting the cache. Furthermore, we integrate a correction module that acts as a gatekeeper, validating and refining every LLM-generated template to ensure only high-quality, accurate patterns are cached. Evaluation on public large-scale datasets demonstrates that EFParser outperforms all baseline methods by an average of 12.5% across all evaluation metrics when running on smaller LLMs, with performance that even exceeds some baseline methods using large-scale LLMs, highlighting the advantages of systematic architectural design. Moreover, despite the additional processing procedures, the average processing time remains shorter than most semantic-based baselines. The superior performance on smaller LLMs combined with computational efficiency demonstrates that EFParser has significant potential for real-world deployment.

More from our Archive