Semantic Analysis of Technical Documentation: Systematic Review, Formal Task Definition, and Transformer-Based NER Implementation
Alexander Echin, Alla G. Kravets, Elena Safonova, Dmitry A. Skorobogatchenko, Danila KarasevThe increasing complexity and volume of technical documentation, including requirements specifications, patents, and engineering reports, create significant challenges for manual analysis and knowledge extraction. This paper includes a systematic review of methods for semantic content analysis of technical documents, with a particular focus on Natural Language Processing (NLP) techniques and Transformer-based models. The study formalizes the task of structured information extraction and provides a mathematical description of Named Entity Recognition (NER) as a core subtask. A practical case study demonstrates an end-to-end NER pipeline for Russian-language technical requirements, leveraging ruRoberta-large via spaCy-transformers. The results highlight both the potential and limitations of current approaches, emphasizing the critical role of annotation consistency and document format normalization. This work contributes to the development of intelligent systems for engineering documentation analysis and outlines key directions for future research.