Traditional chinese medicine synonymous term conversion: A bidirectional encoder representations from transformers-based model for converting synonymous terms in traditional chinese medicine
Release time: Aug 29,2023
Reading volume: 901
Background: The medical records of traditional Chinese medicine (TCM) contain numerous synonymous terms with different descriptions, which is not conducive to computer-aided data mining of TCM. However, there is a lack of models available to normalize synonymous TCM terms. Therefore, construction of a synonymous term conversion (STC) model for normalizing synonymous TCM terms is necessary.
Methods: Based on the neural networks of bidirectional encoder representations from transformers (BERT), four types of TCM STC models were designed: Models based on BERT and text classification, text sequence generation, named entity recognition, and text matching. The superior STC model was selected on the basis of its performance in converting synonymous terms. Moreover, three misjudgment inspection methods for the conversion results of the STC model based on inconsistency were proposed to find incorrect term conversion: Neuron random deactivation, output comparison of multiple isomorphic models, and output comparison of multiple heterogeneous models (OCMH).
Results: The classification-based STC model outperformed the other STC task models. It achieved F1 scores of 0.91, 0.91, and 0.83 for performing symptoms, patterns, and treatments STC tasks, respectively. The OCMH method showed the best performance in misjudgment inspection, with wrong detection rates of 0.80, 0.84, and 0.90 in the term conversion results for symptoms, patterns, and treatments, respectively.
Conclusion: The TCM STC model based on classification achieved superior performance in converting synonymous terms for symptoms, patterns, and treatments. The misjudgment inspection method based on OCMH showed superior performance in identifying incorrect outputs.
[FULL TEXT]
Methods: Based on the neural networks of bidirectional encoder representations from transformers (BERT), four types of TCM STC models were designed: Models based on BERT and text classification, text sequence generation, named entity recognition, and text matching. The superior STC model was selected on the basis of its performance in converting synonymous terms. Moreover, three misjudgment inspection methods for the conversion results of the STC model based on inconsistency were proposed to find incorrect term conversion: Neuron random deactivation, output comparison of multiple isomorphic models, and output comparison of multiple heterogeneous models (OCMH).
Results: The classification-based STC model outperformed the other STC task models. It achieved F1 scores of 0.91, 0.91, and 0.83 for performing symptoms, patterns, and treatments STC tasks, respectively. The OCMH method showed the best performance in misjudgment inspection, with wrong detection rates of 0.80, 0.84, and 0.90 in the term conversion results for symptoms, patterns, and treatments, respectively.
Conclusion: The TCM STC model based on classification achieved superior performance in converting synonymous terms for symptoms, patterns, and treatments. The misjudgment inspection method based on OCMH showed superior performance in identifying incorrect outputs.
[FULL TEXT]