DOI: 10.1097/gscm.0000000000000017 ISSN: 2837-8806

Chinese text recognition and knowledge graph of Shen Nong Ben Cao Jing based on BERT pre-trained language models

Lin Tong, Xu Tong, Lei Lei, Ziling Zeng, Sihong Liu, Lei Zhang, Cheng Wang, Hongjun Yang, Huamin Zhang

Background:

The research and utilization of ancient traditional Chinese medicine (TCM) books are relatively limited at present. With the rapid development of artificial intelligence (AI), knowledge graph related technology has brought light on this field.

Objective:

To construct the knowledge graph of Shen Nong Ben Cao Jing, analyze basic knowledge of materia medica, explore implicit knowledge, and conduct visualization display, as well to provide methodological references for the study of ancient TCM books.

Methods:

The types of knowledge entities and relationships between entities in Shen Nong Ben Cao Jing were analyzed. A training corpus dataset was produced by using the BIO sequence labeling method; a self-developed CNLP text labeling system was used for text labeling; the BERT model was used to recognize named entities; the relationships between entities were set based on rules and semantic associations; the data into the Neo4j-community 4.4.9 graph database was imported by using Cypher language for storage and visualization display after knowledge fusion; finally, a knowledge graph was constructed.

Results:

The knowledge graph of Shen Nong Ben Cao Jing included 5, 273 nodes and 11, 064 relationships. The schema layer contained 14 entity types and 15 relationship types. Through the query, knowledge can be visualized from the aspects of classification, property, and seven mutual relationships of herbal combination.

Conclusion:

The knowledge graph constructed in this study directly reflects the knowledge recorded in Shen Nong Ben Cao Jing and the relationship between them, which is suitable for knowledge mining and intuitive multi-dimensional display of ancient TCM books.

More from our Archive