Knowledge-Based Recurrent Neural Network for TCM Cerebral Palsy Diagnosis

Cerebral palsy is one of the most prevalent neurological disorders and the most frequent cause of disability. Identifying the syndrome by patients' symptoms is the key to traditional Chinese medicine (TCM) cerebral palsy treatment. Artificial intelligence (AI) is advancing quickly in several sectors, including TCM. AI will considerably enhance the dependability and precision of diagnoses, expanding effective treatment methods' usage. Thus, for cerebral palsy, it is necessary to build a decision-making model to aid in the syndrome diagnosis process. While the recurrent neural network (RNN) model has the potential to capture the correlation between symptoms and syndromes from electronic medical records (EMRs), it lacks TCM knowledge. To make the model benefit from both TCM knowledge and EMRs, unlike the ordinary training routine, we begin by constructing a knowledge-based RNN (KBRNN) based on the cerebral palsy knowledge graph for domain knowledge. More specifically, we design an evolution algorithm for extracting knowledge in the cerebral palsy knowledge graph. Then, we embed the knowledge into tensors and inject them into the RNN. In addition, the KBRNN can benefit from the labeled EMRs. We use EMRs to fine-tune the KBRNN, which improves prediction accuracy. Our study shows that knowledge injection can effectively improve the model effect. The KBRNN can achieve 79.31% diagnostic accuracy with only knowledge injection. Moreover, the KBRNN can be further trained by the EMRs. The results show that the accuracy of fully trained KBRNN is 83.12%.


Introduction
Cerebral palsy is a leading cause of disability and could be challenging to cure throughout life [1]. e TCM theory plays an active role in the treatment of cerebral palsy. Symptoms are crucial in clinical diagnosis and treatment [2]. During clinical diagnosis, doctors integrate TCM theories to identify the syndrome based on patients' symptoms, which are heavily influenced by the doctor's previous experience. AI-assisted TCM diagnosis relies primarily on digital data obtained by modern electronic instruments, making TCM diagnosis more quantitative, objective, and standardized [3].
us, it is necessary to have a computer-aided decision-making model for the diagnosis to balance the uncertainty of human factors. For the past two decades, owing to advancements in sensor, detector, and transducer technologies, it makes possible for AI to learn from digital information. us, AIassisted TCM diagnosis has become a burgeoning field of research [4]. In earlier research, most AI approaches employed in TCM diagnosis are mostly limited to traditional machine-learning algorithms and their modified forms, such as support vector machine (SVM), random forest (RF), AdaBoost, and decision tree (DT). Wang [5] used a Bayesian classifier to generate the relationships between the human pulse and diagnostic. Zhang et al. [6] studied quantitative correlations between diseases and the physical appearance of the human tongue. In these conventional machine learning methods, the characteristics are extracted by specialists with extensive TCM clinical expertise. Deep learning technology has grown rapidly in recent years. Unlike the traditional machine learning methods, neurons in deep learning models can acquire diagnostic properties from the initial data set. e deep learning model comprises more complex hierarchical multilayer networks of artificial neurons that can automatically discover valuable features from the original data. Hu et al. [7] proposed a classifier by using the Shannon energy envelope, Hilbert transform, and deep convolutional neural networks (DCNN) for the analysis of the human pulse. Combing the characteristics of basic image processing and deep learning, Fu et al. [8] presented a computerized tongue coating nature diagnosis method using deep neural networks. Hou et al. [9] proposed a neural network for tongue color classification, which is more practical and accurate than the traditional one. Although the previous studies have attained a high level of accuracy, they only considered single-modal data and only a portion of patients' information. erefore, recent studies are expected to introduce more comprehensive data. Yang et al. [10] developed a novel deep neural network that uses multiview features of the gene data to identify the disease genes. Dai et al. [11] proposed a multimodal deep learning framework based on the four-diagnosis of TCM. ese approaches effectively compensate for the information in a single-modal and improve the accuracy of the model.
With the rise of medical digitalization, the hospital information system deposited a considerable volume of EMR data, which completely documents the patients' situation in text form. ere is increasing interest in applying machine learning techniques to decision-making models for medical diagnosis and treatment. Liang et al. [12] adopted the deep belief network (DBN) to acquire feature representation from EMR and then combined the SVM for supervised learning on the labeled data. Similarly, various supervised machine learning algorithms such as random forest and logistic regression were used in [13] to build ischemic stroke classifiers. Although these ML-based methods outperform conventional techniques such as rule-based algorithms by using massive datasets, they ignore domain-specific knowledge. e knowledge graph (KG), once known as ontology in early research, serves as an excellent solution to inject domain-specific knowledge into the ML models. e KG is a multirelational graph composed of entities and relationships containing a large amount of prior knowledge [14,15]. Gone et al. [16] stood on advances in graph embedding learning techniques, decomposing the medicine recommendation task into a link prediction process, and proposed the safe medicine recommendation framework. Abdelaziz et al. [17] developed a large-scalesimilarly-based framework that predicts drug-drug interactions through text and graph embedding algorithms. ese studies fully exploit the domain knowledge in the knowledge graph, but they cannot benefit from the large scale of labeled data. In other words, an exceptional specialist should process not just sound professional knowledge but also extensive experience.
For the TCM cerebral palsy diagnosis model to benefit from both the knowledge graph and the EMR, we propose a two-step model called KBRNN to achieve this purpose. In the first step, we extract evidence-based diagnostic knowledge from cerebral palsy KG by using intelligent optimization algorithms and represent this knowledge as tensors.
en, we inject the knowledge into RNN by converting the tensor to the parameter of the RNN. So far, we have obtained the knowledge-based RNN (KBRNN) that can be trained with the TCM data for fine-tuning.
Our key contributions are listed as follows: (

Knowledge Graph Inference and Its
Applications. e knowledge graph contains the amount of prior knowledge [18], which can provide external information for various downstream tasks [19]. For medical tasks, Yang et al. [20] introduced the link prediction for the diagnosis of syndrome by dismantling medical records into multiple symptoms based on the KG. Zheng et al. [21] learned the relational embedding from nodes in KG to access medical knowledge and used them to improve the classifier's performance through the mechanism of medical knowledge attention. Zhang and Che. [22] constructed Parkinson's disease KG and KG completion methods that were leveraged to predict drug candidates. Yang et al. [23] pretrained the embeddings of entities by large-scale domain-specific corpus while learning the knowledge embeddings of entities via a joint TransC-TransE model. Lin et al. [24] combined the context provided by medical entity descriptions with the embeddings of medical entities and relations and user embeddings to learn patient similarities through a convolutional neural network. Lin et al. [25] utilized graph representation learning models to obtain the embedding vectors of the entities, then applied the embeddings to study patient similarities. ese works used joint representation to bring entity and word vector space closer. However, for KGs with large numbers of entities, dealing with entities and their relationships leads to higher time complexity.
Furthermore, there is also some research about inference on the KG directly, without embedding the relations and entities. El-Shafai et al. [26] provided a method that simulates syndrome differentiation through Bayes and TF-IDF on a knowledge graph to achieve automated diagnosis in TCM. Yao et al. [27] presented an ontology-based model that utilized ontology attributes for training the neural network for medicine side-effect prediction. Xie et al. [28] applied the TF-IDF to the TCM KG and proposed a knowledge-based syndrome reasoning model.

Neural Network with Knowledge
Enhance. Lin et al. [29] proposed a trigger matching network, which trains a trigger matching network with additional annotation and uses the output as the attention of the sequence labeler. Luo et al. [30] combined a neural network with regular expressions (RE) to improve supervised learning for natural language processing. Jiang et al. [31] proposed FA-RNN, a recurrent neural network that incorporates the benefits of both neural networks and regular expression rules. Finally, Jiang et al. [32] transformed regular expressions into neural networks to combine the two ways for slot filling. Figure 1 shows a two-step routine to construct a KBRNN, i.e., knowledge extracting and knowledge injecting. In knowledge extraction, an evolutionary algorithm is designed to extract high-scored knowledge from the KG. A part of EMRs is utilized to score knowledge. In knowledge injecting, knowledge is converted to a tensor in the knowledge embedding module. en, the tensor decompose module decomposes the knowledge tensor as the parameters of RNN. is gives us the KBRNN which incorporates domain knowledge.

Notation.
To focus on diagnosing the syndrome by the patients' symptoms, shown in Figure 2, we reconstruct a sub-KG K based on the KG proposed by [33]. In this sub-KG K , we only retain the symptom and syndrome entities related to this research and exclude other entities such as acupoints, formula, and herb which are not related to diagnosis. For description, we give each symptom a unique and continuous ID starting from 0 and denote the symptom by "SYM"+ID. Similarly, we use "SYN"+ID to refer to a syndrome.
As a KG, K consists of entities E and relations R.
E: a set of entities. |E| � N. ere are three types of entities (main symptom, additional symptom, and syndrome), is the relationship between entities.
In data processing and knowledge extraction, two common operations on K should be mentioned here.
E Query(SYNi, E ′ ): returns a set containing all the entities in E ′ that are connected to SYNi. E match(sentence): sequential output the alias of entities which appear in sentence.
In this study, each EMR contains two parts: the descriptions of the main symptom and the additional symptom. Via data preprocessing, we splice the two parts of each EMR to get a sentence s and c convert each EMR to a symptom-level sentence by E match(s). e EMR sentence corresponds to the EMR labeled as SYNi is defined as

Definitions and Task Complexity.
is section details the thought to treat the knowledge extracting task as an optimization problem.
Above all, we define what the "knowledge" in the KG is. For the SYN2 shown in Figure 2, one of the knowledge about SYN2 denoted as Knowl SYN2 can acquire by (2) and the result as (3).
e (3) can be visually converted to a regular expression (RE) as (4), where " | " is the OR operator, "+" means one or more occurrences.
Obviously, the sentence s SYN2 � <SYM5, SYM3, SYM7, SYN2 > labeled as SYN2 can be recognized by RE SYN2 . However, the risk raised with the REs is that it may lead to the wrong diagnosis. For example, RE SYN2 may also recognize the sentence s SYN3 � <SYM4, SYM2, SYM7 > labeled as SYN3. For this issue, it looks like a feasible method that enumerates the subsets of E main_sym and E add_sym , then, splicing them to generate Knowl SYN i as candidate solutions and filtering the useful Knowl SYNi with the verification of EMR sentences for each syndrome. But the time complexity is as high as Fortunately, too much knowledge injection complicates the diagnosis model, which will be discussed further in Section 3.4.2. us, for a specific syndrome named SYNi and a Knowl SYNi scoring function V, it is enough to find the "top-k Knowl SYNi " corresponding to the k highest score Knowl SYNi from all the Knowl SYNi of each syndrome.

Evidence-Based Complementary and Alternative Medicine
By the well-performance of the evolutionary algorithm in searching for relative optimal solutions from the large solution space, we design an evolutionary algorithm for knowledge extraction. Figure 3 shows the main steps of the algorithm.
Our knowledge extraction method via evolutionary algorithms is based on the combination of two well-known expansions to the standard genetic strategy. On the one hand, we apply repeated reinitializations of the candidate solution when it reaches a state of stagnation. On the other hand, we utilize parallel computing in the process of evolution. While the former effectively overcomes the evolutionary algorithm's difficulty of falling into local optimal, the latter significantly improves the efficiency by allowing parallel calculation of the score of each solution. Moreover, assigning individuals to different computational cores can be viewed as a strategy for multiple population evolution, optimizing the algorithm's robustness.
ere are two problem-specific modules in evolutionary algorithms, i.e., generator and evaluator. e following sections detail their specific implementation.

Generator.
e generator module creates the initial set of Knowl SYN the score of such solution, calculated by using the evaluator module. Initialize to 0. e generator generates a list of τ r � <ϕ r , ψ r , 0 > denoted by Γ � [τ 0 , τ 1 , τ 2 , . . . , τ l−1 ] by initializing the ϕ i and ψ i randomly, where l is the length of Γ, l > k, 0 ≤ r<l.

Evaluator.
e evaluator calculates the score of each τ in Γ. We select n EMR sentences as the test case to compute the score of τ. is section will detail the scoring algorithm.
For a syndrome aliased SYNi, the evaluator divides the n EMR sentences into two disjoint sets denoted by EMR true and EMR false , where let |EMR true | � a, |EMR false | � b, a + b � n. A sentence is divided into EMR true if and only if it is labeled as SYNi.
By these, variables t r , f r , c r about solution τ r � <φ r , ψ r , v r > can be defined as follows: (i) t r ∈N: the number of sentences in EMR true which can be recognized by τ r (ii) f r ∈N: the number of sentences in EMR false which can be recognized by τ r (iii) c r ∈N: the number of symptoms in τ r As explained in Section 3.3.1, a high-score solution corresponding to an RE that recognizes the maximum number of s SYNi while maintaining a minimal number of symptoms. In addition, misrecognition is not allowed. is provides us with the fundamental form of the scoring function equation.
c r can be calculated as equation.
e following describes the calculation of t r and f r . We maintain two matrices TP, FP with the following rules.   Evidence-Based Complementary and Alternative Medicine en, we obtain t r and f r from equation (7) and equation (8), respectively.
where ∘ denotes element-wise product and A ∈ [1] m .

Convert the Knowledge to KBRNN.
By the knowledge extraction algorithm details in Section 3.3, we get the top-k Knowl SYNi for each syndrome. A Knowl SYNi can be converted to an RE as (3)and (4) details in Section 3.3.1. We formally take the syndrome diagnosis task as a text classification problem, i.e., given an EMR sentence as the input of KBRNN, the output is the syndrome corresponding to the sentence. For this task, as usual, we further process the EMR sentences as follows: we add the "BOS" and "EOS" at both ends of each EMR sentence as the mark of the start and end. We fill the sentence with "PAD"s to make all the sentences of the same length. Accordingly, to ensure that the RE corresponding to the Knowl SYNi can recognize these new sentences, we add the $ * at both ends of RE, while $ is the wildcard, and * is the Kleene star operator. Take (3) as an example.
e equation (4) corresponding to (3) can be rewritten as the following equation: In the following section, we illustrate the implementation of the KBRNN, which is generated by injecting Knowl SYNi into RNN.

Embedding the Knowledge via Finite-State Automaton.
Finite-State Automaton (FSA) is an abstract model of computation, which can change from one state to another in response to some inputs. e FSA can be used to recognize sentences. Given a sentence s � < ′ BOS ′ , sym 1 , sym 2 , sym3, . . . , sym n , ′ EOS ′ > , an FSA Λ, we feed the elements of s into Λ in order. Λ recognizes s if and only if the state transition sequence starts from the start state and ends with a final state.
ere are two types of FSA: nondeterministic finite automaton (NFA) and deterministic finite automaton (DFA). e "deterministic" indicates that by giving the state an input, there is a unique transition to the next state. With ompson's construction algorithm [34], an RE can be converted into an NFA. en, the NFA can be converted to a unique DFA with a minimum number of states called m-DFA by the power set construction algorithm and the DFA minimization algorithm.
For k × |E syn |Knowl SYNi obtained by the algorithm in Section 3.3, each Knowl SYNi can be converted to an m-DFA. en, we merge all the m-DFAs by adding a new start state q ϵ and adding empty transitions from q ϵ to all start states of m-DFAs. is new FSA is denoted as A, which can be defined formally as a 5-tuple: A � <Q, Σ, δ, q ε , F ′ > .
Based on the above definition, we can represent A equivalently by matrixes T, S, F.
Now, we obtain the knowledge embedding 〈T, S, F〉.

Inject the Knowledge Embedding into RNN.
For a sentence s � <s 1 , s 2 , s 3 . . . , s x >, the Out(s) denotes the number of s recognized by m-DFAs, which can be expressed as the following equation: Here, we extend the approach in [31], which used canonical polyadic decomposition (CPD) to decompose T into E R ∈ R V×r , D 1 ∈ R K×r , D 2 ∈ R K×r , where r is a hyperparameter. As the study in [31], the decomposition is approximate when the r converges to the rank of T, and if r is too large, it may lead to a higher space complexity. In this work, the rank of T is positive to the number of symptoms in Knowl SYNi .
at is why, we must maintain a minimum number of symptoms in Knowl SYNi . E R has a dimension equal to the size of the input set Σ, which can be considered as the word embedding of each input word. In this work, we integrate the BERT [35] embedding into E R . Let w t be the word embedding of s t , u t be the 768dim word embedding generated by using bert-base-chinese, and v t be the embedding of s t in E r . e BERTembedding can be integrated by equation (11). Here, β ∈ [0, 1] is a hyperparameter, and G ∈ R D×r is a trainable matrix.
With the CPD result, the equation (10) can be rewritten to the recurrent form similar to the formal definition of RNN as the following equation: 6 Evidence-Based Complementary and Alternative Medicine So far, we have obtained the RNN injected with knowledge.

Datasets.
We collect the dataset from a project by the National Key Research and Development Program of the Chinese Academy of Traditional Chinese Medicine, "Chinese Medicine Data Center and Health Cloud Platform Building." e EMR data are mainly from the Hospital Information System (HIS), which includes admission records, course records, discharge summaries, and medical records of cerebral palsy patients within a specific time frame. ese data come from clinically valid cases and have been desensitized to protect patients' private information.
e original EMR data has several flaws, including a nonstandard format and diverse expression. A team of professionals is invited to tag the EMR data manually so that it may be organized into structured data for further research. Data tagging assumes the form of two-person cooperation to prevent errors caused by the limited expertise of a single individual. ere remain nonstandard data in the structured data after the data tagging process. For instance, a particular symptom may have several distinct expressions. In data standardization, numerous professional words are first standardized and sorted out collaboratively by a group of individuals. en, a medical specialist induces the standard terms included in the medical records. In the end, the standardization of 988 symptoms and 15 syndromes was achieved. According to traditional Chinese medicine, these symptoms may be further subdivided into main symptoms and additional symptoms. e main symptoms might generally represent the patients' overall condition, but the additional symptoms relate to complications, which is a significant diagnostic criterion for syndrome kinds.
us, we obtained 5514 labeled diagnostic records from 1755 patients. Each record has three fields, main symptoms, additional symptoms, and syndrome as the label.

Experimental
Steps. We divide the EMR dataset randomly into the following four parts: (i) Pre-set (20%): the pretrained dataset that engages in the scoring of knowledge in the knowledge extraction algorithm (ii) Train-set (50%): train dataset, the dataset used for training models (iii) Dev-set (20%): validation dataset, a set of examples used to tune hyperparameters (iv) Test-set (10%): test dataset, a dataset for testing the performance of the trained model During the knowledge extraction phase, we execute the evolutionary algorithm and utilize pre-set data for knowledge scoring and obtain top-k Knowl SYNi (k = 6 in practice) for each syndrome. We removed some Knowl SYNi that scored poorly, which is caused by the insufficiency of the corresponding syndromes' sample sizes.
During knowledge embedding and injecting, we obtain an untrained KBRNN that has not been trained on the trainset. We adopt some conventional machine learning models which are frequently used in text classification as baselines and compare them with KBRNN. For each baseline, we feed the hidden representation produced by these models into a multilayer perceptron (MLP) and use the cross-entropy loss as the objective function.

Experimental Results.
We compare KBRNN with RNN [36], LSTM [37], GRU [38], 4-layer CNN [39], 4-layer DAN [40] as well as their bidirectional variants. We use the crossentropy loss as the objective function and input the hidden representation generated by these models into a 3-layer MLP to obtain the label logits. For each dataset, we tune the learning rates from (1) e contribution of knowledge extraction to KBRNN: we use the pre-set as the training set of baselines and compare the performance of untrained KBRNN and baselines on the test set (2) e ability of KBRNN to benefit from labeled data: we utilize both the pre-set and the train-set (50%, 100%) as the training set and fine-tune the untrained KBRNN with the train-set Table 1 displays the classification accuracy of the KBRNN and baseline models on the test-set after training with varying amounts of training data. e KBRNN can achieve 79.31% diagnostic accuracy with only injecting the knowledge extracted from the KG based on pre-set and rises to 83.12% with sufficient training based on the 100% trainset. e result shows that the untrained KBRNN outperforms all the other baselines which are only trained on the pre-set. It is also better than some of the baselines trained with 50% of the train-set (Figure 4.). We believe that KBRNN obtains considerable a priori knowledge from the knowledge graph through injection. e classification result on the full samples by using the fully trained KBRNN is shown as the confusion matrix in Figure 5, which provides a good insight into how often samples of each fifteen syndromes are correctly classified or misclassified by the proposed model. We can find that the number of samples varies greatly in each syndrome type, and the true positive rate   Evidence-Based Complementary and Alternative Medicine could be maintained at a high level even for the syndrome with a large number of samples. As with other models, KBRNN can benefit from expanding the training set while keeping accuracy benefits.

Discussion and Conclusions
TCM, as a complementary field of medicine outside the modern medicine system, has played a significant role in cerebral palsy syndrome diagnosis. In this work, we propose a knowledge-based RNN (KBRNN) for cerebral palsy syndrome diagnosis. Our major contribution is building an evolutionary algorithm to extract the diagnosis knowledge from the KG. In particular, we also present the method of injecting the TCM knowledge into the RNN. Compared with the simple KG inference or the rule-based methods, as a neural network model, the KBRNN can be further trained by EMR data, which makes the KBRNN more generalized.
On the other hand, compared with the traditional neural network model, KBRNN can benefit from TCM knowledge. Specifically, with the help of TCM knowledge, KBRNN outperforms previous neural approaches in the scene where only a few EMRs are available, and it remains competitive in rich-resource settings.
In conclusion, KBRNN can benefit from two aspects, i.e., knowledge extracted from the cerebral palsy knowledge graph and labeled EMR. We show that KBRNN achieves higher accuracy in syndrome diagnostic tasks only with knowledge injection. Moreover, the performance of KBRNN can be further improved after training with a large amount of labeled EMR, which outperforms the current model.

Data Availability
All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.