A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures

Recognition of Traditional Chinese Medicine (TCM) entities from different types of literature is challenging research, which is the foundation for extracting a large amount of TCM knowledge existing in unstructured texts into structured formats. The lack of large-scale annotated data makes unsatisfactory application of conventional deep learning models in TCM text knowledge extraction. Some other unsupervised methods rely on other auxiliary data, such as domain dictionaries. We propose a multigranularity text-driven NER model based on Conditional Generation Adversarial Network (MT-CGAN) to implement TCM NER with small-scale annotated corpus. In the model, a multigranularity text features encoder (MTFE) is designed to extract rich semantic and grammatical information from multiple dimensions of TCM texts. By differentiating the conditional constraints of the generator and discriminator of MT-CGAN, the synchronization between the generated tag labs and the named entities is guaranteed. Furthermore, seeds of different TCM text types are introduced into our model to improve the precision of NER. We compare our method with other baseline methods to illustrate the effectiveness of our method on 4 kinds of gold-standard datasets. The experiment results show that the standard precision, recall, and F1 score of our method are higher than the state-of-the-art methods by 0.24∼8.97%, 0.89∼12.74%, and 0.01∼10.84%. MT-CGAN is able to extract entities from different types of TCM literature effectively. Our experimental results indicate that the proposed approach has a clear advantage in processing TCM texts with more entity types, higher sparsity, less regular features, and a small-scale corpus.


Introduction
Named Entity Recognition (NER) in Traditional Chinese Medicine (TCM) texts is an important task of TCM knowledge extraction, which refers to extracting the instances of TCM domain concepts from large-scale unstructured TCM texts and identifying their concept types. e grammatical form of classical Chinese and semantic peculiarities of TCM texts leads to the complexity of NER in the TCM field being higher than in other fields.
At present, there are deficiencies in various mainstream NER methods for general domain texts and TCM texts. Since the domain characteristics of TCM texts are not considered, the NER methods of generic domain texts have weak generalization ability and unsatisfactory recognition results in TCM texts. Compared with other methods, the NER models based on deep learning in the field of TCM texts have achieved better results, but they tend to need a large-scale training corpus. e highly specialized nature of TCM texts leads to the high cost of training corpus annotations. e lack of large-scale annotated data makes unsatisfactory application of conventional deep learning models in TCM text knowledge extraction. How to use the limited annotated corpus to realize NER for massive TCM texts has become an urgent problem to be solved.
Researchers have actively explored in NER domain with a lack of annotation data. Jia et al. [1] considered NER in TCM as a text span detection task, classifying all text spans and obtaining the relevant knowledge entities from the sentences based on the classification results. e approach does not require a large-scale annotated corpus but is based on a high-quality domain dictionary. Han Yuanbo Tao [2] combined Generative Adversarial Networks (GAN) with the BiLSTM-Attention-CFR model to solve the problems of lack of annotation data for domain NER tasks and inconsistent annotation of entities in the same document. However, this model requires a large-scale crowdsourced annotated dataset, although the cost of the crowdsourced annotation dataset is lower than that of a professional annotation dataset.
To solve the above problems, this paper introduces the generative adversarial idea to the NER tasks and proposes a Multigranularity Text Features Fused NER model based on CGAN (MT-CGAN). e model takes advantage of the fact that the Conditional Generation Adversarial Network (CGAN) model can generate data with the same distribution characteristics as the real data [3], and self-attention is better at capturing the internal correlation of data or features. Using this model, we extract features at different granularities of TCM texts and improve the domain NER performance under the condition of a small amount of annotated corpus. We summarize the contribution as follows: (i) A differential feature-constrained NER model in TCM texts is proposed based on the generative adversarial idea. e model solves the problems of lack of annotation data in TCM texts and the inapplicability of NER models in other domains in the NER tasks of TCM texts.

(ii) A Multigranularity Text Features Encoder (MTFE)
is designed to extract grammatical and semantic features embedded in different dimensions and feature spaces of text from words, sentences, paragraphs, and chapters, respectively, to enhance the multidimensional feature learning capability of the model. (iii) An improved U-NET is added to the generator of the MT-CGAN to hierarchically extract the features of the seed label. e extracted features are also passed to the decoding layer via jump links to achieve feature enhancement of the generation process. is further reduces the randomness of the generated content and speeds up the convergence of the model. e paper is organized as follows. Section 2 introduces the work related to the NER tasks in TCM texts and multigranularity features extraction. Section 3 describes the details of the MT-CGAN. Section 4 describes the experiments for the performance evaluation of the MT-CGAN. Section 5 concludes the paper.

NER of TCM Texts Based on Deep
Learning. Now, deep learning has become a mainstream technique for solving NER tasks because of its ability to discover hidden features. Deng et al. [4] combined BiLSTM and CRF and applied it to the Chinese medicine text NER, and the model improved the accuracy of NER because of the powerful ability of the LSTM model to extract global contextual semantic information. Li et al. [5] proposed a model called BiLSTM-Att-CRF by integrating attention into BiLSTM networks and proved that this model can avoid the problem of information loss caused by distance. An et al. [6] proposed a Multihead Self-attention-based BiLSTM-CRF model (MUSA-BiLSTM-CRF) for Chinese clinical named entity recognition. e proposal of the Transformer model [7] opened the era of dominance of pretrained models in NLP [8,9]. As a typical representative of pretrained models, BERT [10] has shown good performance in many NLP tasks [11,12]. Qu et al. [13] applied the combination of the BERT model and BiLSTM-CRF to the NER task of TCM texts, and the performance was improved compared with the BiLSTM-CRF model. Qu et al. [13] constructed a named entity recognition model based on Bert-BiLSTM-CRF to solve the problem of fuzzy entity recognition and less labeled data in the field of traditional Chinese medicine. Zhang et al. [14] proposed a semisupervised embedded Semi-BERT-BiLSTM-CRF model. Xiong et al. [15] proposed a method based on a relational graph convolutional network (RGCN), which utilized multisource knowledge in a unified manner for Chinese Clinical NER. All of these NER models employed large-scale annotated samples or pretrained models to maximize performance.
Compared with other deep learning models with end-toend training, GAN can learn the distribution of real samples and has better robustness, which can alleviate the problem of insufficient labeled samples in deep learning.

Conditional Generation Adversarial Network.
Generative Adversarial Networks (GAN) is a generation framework. Because it can use the adversarial sample [16] to generate data with the same distribution as the real target, it is applied to a variety of generation tasks. For example, Cao [17] used it to restore images and graphs. e original GAN has some limitations, which are mainly reflected in the uncontrollable process of generator generating samples. For this reason, Mirza and Simon [18] proposed CGAN. It aims to change the random generation of the target object and control the data generation process by incorporating specific conditions into the generator and discriminator inputs. It solves the problem that GAN cannot generate specific attributes in the NLP. Feng [19] proposed a Conditional Wasserstein Generative Adversarial Network model (CWGAN) for NER tasks by combining CGAN and a modified Wasserstein Generative Adversarial Network (WGAN-GP). ey verified the effectiveness of the CWGAN on the generic domain NER tasks. However, this model does not perform well in the NER task of TCM texts.
At the same time, by analyzing the characteristics of current NER models based on the generative adversarial idea, we find that these models are all based on feature extraction from a single granularity of text. As the carrier of knowledge, text with different granularity contains different information. Text features can be mined more comprehensively through multigranularity text feature extraction.

2
Computational Intelligence and Neuroscience

Multigranularity Features Extraction.
It was found that a multigranularity cognitive mechanism was used to improve the NER task. Zhang and Yang [20] designed a Lattice-LSTM model to encode characters and words, respectively, which achieved the utilization of information on both words and order of words and avoided the problem of word separation errors. Gao et al. [21] proposed contextual multifeature embedding (CMFE) and used it to construct a multifeature semantic fusion model (MFSFM) to realize citation entity recognition. Sun [22] referred to a method of recognizing Chinese clinically named entities based on multistrategy fusion. Li et al. [23] proposed a Flat-Lattice Transformer model, which transformed the Lattice structure into Flat-Lattice to link the semantic information of characters and words together for NER. Multigranularity feature extraction can obtain richer semantic information than single-granularity feature extraction. In the NER tasks, whether the semantic information can be fully extracted is the key to the performance of the model. erefore, we propose the MT-CGAN to solve the problem of NER in TCM texts.

Model
In this paper, we view the NER tasks in TCM as sequence annotation tasks and use the generative adversarial idea to construct the MT-CGAN model for implementing NER in TCM texts in a small-scale corpus. e description of the tasks is as follows: given an annotated corpus text, X � x 0 , x 1 , . . . , x i , . . . , x n is the text sequences, L � l 0 , l 1 , . . . , l i , . . . , l n } is the sequence of named entities labels for the corresponding text sequence X, where the x i represents a character, and x i represents the entity category label corresponding to x i . e role of the generator of the model is to generate the sequence of annotations L ′ � l 0 ′ , l 1 ′ , . . . , l i ′ , . . . , l n ′ } corresponding to.
e discriminator guides the model training by scoring L ′ so that the generator can generate a sequence of entity labels that approximates L. e following is a detailed description of the model.

Model Structure.
e MT-CGAN model proposed in this paper consists of the generator and the discriminator, and the structure is shown in Figure 1. e generator of the MT-CGAN consists of the MTFE and a modified U-NET (conditionally enhanced U-NET, C-U-NET). e discriminator of the MT-CGAN consists of multiple Convolutional Neural Networks (CNN) [24], normalization operations of each layer, and activation functions.
In this paper, the text attention data generated by the MTFE is used to replace the white noise input of the CGAN generator as the main input of the model generator in the text feature extraction process. Moreover, to improve the quality of label sequence generation, we fuse the multigranularity text features extracted by the MTFE with the TCM text entity distribution features extracted by the encoder of the C-U-NET and as the conditional input Y to the model's generator.
In the discriminator part of the MT-CGAN, we use a small-scale of real sample labels L as the conditional input of the discriminator, which improves the ability of the model to discriminate authenticity and accelerates the convergence speed of the model training.
e generator of the MT-CGAN generates a sequence of entity labels in vector form, and a Conditional Random Field (CRF) [25] is added to which to convert the vector into an intuitive symbolic representation form. When the MT-CGAN training reaches a steady state, the model loads the trained generator model to generate a sequence L ' of text labels. en L ' is used as the input for the CRF to obtain the mapping of text labels T. Finally, the generated text labels T are synchronized with the corresponding text to form the final text annotation sequence.

Multigranularity Text Features Encoder.
e MTFE of the MT-CGAN consists of four feature encoders with different granularity of characters, sentences, paragraphs, and chapters to extract text features from different dimensions.
e Character-Granularity Feature Encoder (CGFE) is implemented based on the GRU [26] network and is used to extract the position information h c among characters in text.
e Sentence-Granularity Feature Encoder (SGFE) is realized based on Transformer's encoder for extracting text features h s embedded in the textual context within a sentence. e Paragraph-Granularity Feature Encoder (PGFE), also based on Transformer's encoder, is introduced into this model to extract the deeper latent semantic and syntactic features h p contained among different sentences within a text segment. A local central feature within a chapter is important to the NER in TCM texts, so the Article-Granularity Feature Encoder (AGFE) based on CNN is designed to achieve the extraction of central features h a from the global scope of the chapter text. e SGFE and PGFE adopt the attention mechanism that can generate attention data for the corresponding granularity of text while extracting deep semantic features of the text context. Because these data reflect the contextual relevance among tokens, we fused the attention data generated by the SGFE and PGFE to form the main input to the generator of the MT-CGAN.

Character-Granularity Feature Encoder.
To ensure that the generated sequence of text labels is consistent with the original text sequence, this paper designs the CGFE based on GRU to extract the relative position feature h c of characters in TCM texts and uses h c to constrain the order of the entities label generating.
As shown in Figure 2(a) CGFE consists of an embedding layer, GRU, a fully connected layer, and an activation function. e GRU that can retain information in a sequence for long periods is used to encode character position information in text. It includes a reset gate and an update gate. e extent to which the preceding information is retained is determined by both. e smaller the value of the reset gate is, the more previous information is discarded, while the larger the value of the update gate, the more is retained.
Computational Intelligence and Neuroscience 3 e equation of reset gate is as follows: where r i is the reset gate, e i is the vector of the current input token, W (r) and U (r) are the reset gate weight matrices, and p i−1 is the relative position information of the previous token. e equation of the update gate is as follows: where z i is the gating controlling the update gate, W (z) and U (z) are the weight matrices of the update gate. e linear transformation parameters of the update gate and the reset gate are changed in the MT-CGAN model to obtain the candidate hidden state p i ′ and the output p i , respectively. p i ′ and p i are calculated by (3) and (4), respectively.
where ⊙ is the Hadamard product and the data is mapped between (−1, 1) by the Tanh activation function.
where (1 − z i ) ⊙ p i−1 represents the forgetting of the relative position information of the previous character and z i ⊙ p i ′ represents the remembering of the current position information. e text character sequence passes through two GRU layers to obtain the character relative position feature sequence P: Vector of seed features Figure 1: Model framework. || is a splicing operation, + is a contraposition addition operation. C_gra, S_gra, P_gra, and A_gra denote characters, sentences, paragraphs, and chapters of text, respectively. FCL represents a fully connected layer. CNNL represents the convolutional neural network Layer, and CEL denotes the cross-entropy loss function. 4 Computational Intelligence and Neuroscience P � p 1 , p 2 , . . . , p i , . . . , p n .

(5)
After the full concatenation and Tanh activation operations, the final character position feature h c of the text sequence is obtained.

Sentence-Granularity and Paragraph-Granularity
Feature Encoder. e entities in the field of TCM texts have their own characteristics. According to Yurui [27], concepts in the field of TCM texts are characterized by heterogeneity, and they have obvious multimeaning and fluidity, and the same concept could have different functions. e boundaries between the categories of entities, attributes, and relations are ambiguous. For example, the word 经脉 appears 29 times throughout the text of Qian Jin Fang, and its connotation and scope vary in different contexts, indicating the meanings of Channel of Qi-Blood, Channel Number, and Menses, respectively. erefore, to achieve accurate NER on TCM texts, the context of the text must be fully used.
e contextual semantic spaces constructed within a sentence and among different sentences are different, and the different semantic spaces have different representations for TCM knowledge and have different roles for NER in the TCM domain. e token level and sentence level semantics are equally important. erefore, it is necessary to obtain deep-level potential semantic features of text from different scope contexts by using text of both sentence and paragraph granularities as objects.
Transformer can find latent recognition patterns from data and has been proven to have the ability to extract deep semantic features embedded in a text [28]. To benefit from its self-attention mechanism, the Transformer can give the same attention to all elements of the input text sequence and can understand the relationship among different elements of input sequences, although the elements may be far away from each other [29]. Transformer models can model the relationships among words in long sentences or paragraphs, so the sentence-granularity and paragraph-granularity text feature encoders proposed in this paper are both implemented based on the encoder of Transformer.
In the following, we use the SGFE as an example to illustrate how attention data and contextual features of text are extracted based on Transformer. As shown in Figure 2(b), the SGFE is composed of a stack of six identical encoding layers EL i , where i � 1, . . . , 6 and represents the index of the encoding layers. We can obtain a i and h i through each encoding layer. a i and h i represent the attention data and contextual feature data, respectively. e outputs of the SGFE are a 6 and h 6 obtained from EL 6 . Each encoding layer has two sublayers. e first is a multihead self-attention mechanism, and the second is a fully connected feedforward network. e input of EL 1 is a vector matrix E s � e 0 s , . . . , e i s , . . . , e n s , e i s denotes the vector corresponding to the ith word in the input text sequence. e following describes how the attention data and contextual feature data of a text sequence are calculated.
(1) Attention Data. Multihead self-attention consists of 8 attention layers running in parallel, each attention layer calculates different attention data fragments of the input text from different semantic spaces, and then the eight fragments are concatenated to form the attention data. e equation for calculating the attention data fragment is as follows: where Q h represents the query matrix, K h represents the key matrix, V h represents the value matrix obtained by transforming E s through the linear matrices W Q h ∈ R d k ×d mo de l , W K h ∈ R d k ×d mo de l and W V h ∈ R d v ×d mo de l respectively, and h represents the index of the attention header, d k � d v .
e output data of the multihead self-attention are concatenated by each attention data fragment according to the following: where W O ∈ R hd v ×d mo de l is the matrix weight.
(2) Contextual Feature Data. e calculation of the output h i for EL i is as follows: where Linear is the fully connected operation, LayerNorm represents the layer normalization operation, when i � 1, f represents the vector matrix E s , and when i � 2, . . . , 6, it represents the output of the previous encoding layer h i−1 .

Article-Granularity Feature Encoder.
Usually, different TCM articles focus on different themes. e core sentence of a chapter is considered the carrier of the overall idea of the chapter, which exists in some parts of the chapter. e CNN is a feedforward network model structure based on artificial neural network. It has the advantages of local connection and weight sharing, which greatly reduces the number of parameters needed to learn in the network. rough multilayer nonlinear transformations, a CNN can learn the implicit features in a large-scale text [30]. So the AGFE based on CNN is designed to extract the core features h a of a chapter-level text. As shown in Figure 2(c), the AGFE consists of three onedimensional convolutional layers, a maximum pooling layer, and a fully connected layer. Firstly, each one-dimensional convolution operation can obtain the features of the corresponding text fragment, and then after maximum pooling and fully connected operation with dropout, the final output of the AGFE h a is obtained.

Seed of Entity Distribution
Features. Different types of TCM texts cover different types of entities, and the distribution of entity label sequences reflects different patterns, which can enhance the effectiveness of entity label generation. e model proposed in this paper can obtain entity Computational Intelligence and Neuroscience tagging distribution features based on the annotated corpus, which we call the seed of entity distribution features.
We refer to the classification criteria of the 12 first-level categories given in the General catalogue of Ancient Chinese Medicine books and select three basic categories of Chinese medical texts, with the additional category of comprehensive Chinese medical texts as the research object, then analyze the entities contained in above 4 categories of texts to summarize 19 typical entity types in total as shown in Table 1.
As shown in Figure 3, the following are the construction steps of the seed for a certain type of TCM text.
Step 1: Using word embeddings of 19 typical entities to generate 19 initial vectors v i , (i � 1 ∼ 19) independent of text type.
Step2: Let f i (i � 1 ∼ 19) be the number of times entity type i occurs in the annotated corpus, and use f i to calculate the probability distribution of each type of entity p i , (i � 1 ∼ 19). e calculated probabilities are then multiplied with the corresponding initial entity vectors v i (i � 1 ∼ 19) and spliced into a vector as the representation that reflects the distribution features of the entities in this corpus, which is used as the input to the Label Seed Feature Encoder (LSFE). Figure 4, the H represents text features and the A represents sentence and paragraph-grained text attention features, both generated by the MTFE. Red arrows indicate convolutional layers (including Convolution, Normalization, and activation). Green arrows indicate transposed convolutional layers (including Transpose Convolution, Normalize, and Activate).

e Generator Overview.
e generator of the MT-CGAN consists of the MTFE and C-U-NET, and it is used to generate the sequence of entity labels corresponding to TCM texts. e C-U-NET is composed of the LSFE, the Multi-Granular Feature Fusion Layer (MGFFL), and the Feature Augmented Label Generation Decoder (FALG Decoder). e C-U-NET model is the main framework of the generator network. Overall, the encoding and decoding processes of the generator form a hierarchically symmetrical U-shaped structure as shown in Figure 4. e encoder of the generator consists of the MTFE and LSFE of the C-U-NET. e role of the LSFE is to encode the features of seeds that reflect the distribution patterns of different categories of TCM entities and to output them hierarchically to the decoder, as shown in Figure 4. e decoder of the C-U-NETacts as the decoder of the generator.
ere is no data from other spaces introduced between the encoder and decoder of the traditional U-NET [31]. By adding a MGFFL in the middle of its encoder and decoder, we have improved the U-NET to form the C-U-NET.

e C-U-NET.
e encoding process of the encoder in the C-U-NET adopts a downsampling network structure. Each encoding layer passes the features extracted from that layer to the corresponding decoding layer, as shown in e decoding process is enhanced by using feature information passed from the corresponding layer of the LSFE.
(1) Label Seed Feature Encoder. e LSFE is composed of 6 layers of CNN. ere are a convolutional layer and a layer normalization and a ReLU activation function in each of the first five layers. e sixth layer consists of a convolutional layer and a layer normalization and a tanh activation function. e final layer uses tanh as the activation function to improve training efficiency and to preserve more fully the feature information of the labels.
(2) FALG Decoder. e role of the FALG Decoder is to decode feature sequences by convolution and transposed convolution, and it also contains 6 layers. e first layer is composed of a transposed convolutional layer, a layer normalization, and a ReLU activation function. Layers 2 to 5 consist of a transposed convolutional layer, a convolutional layer, two layer normalizations, and two ReLU activation functions. While the 6th layer has one normalization layer only and one of the activation functions has been replaced with tanh.

e Discriminator.
e discriminator is composed of 7 convolutional layers. A layer normalization and a LeakyReLU activation function are added after each layer of convolution, and the Sigmoid activation function is used in the last layer.
e LeakyReLU activation function maximizes the retention of features contained in the data. e Sigmoid, as a saturated activation function, is placed at the last layer of the network to compress the range of real values of the model and improve the accuracy of the discriminator. To meet the requirements of CGAN, the conditional and primary inputs of the discriminator are the real label sequence feature L and the generated label sequence feature L′, respectively.

Confrontation Loss Function.
To improve the generator accuracy of named entity label sequences, the generator and discriminator with different constraints as input are constructed. e CGAN converts the unsupervised generator model into a supervised or semisupervised generator model and converts the GAN loss probability representation into a conditional probability formula using conditions as restrictions. e discriminator optimizes the generator in terms of the truthfulness of the generated labels. e loss function is as in the following: where D is the discriminator, G is the generator, E is the expected value of the sample, E L∼P data (L) is the data sample from the real data, where T is the sequence of entity labels generated by the generator and L CrossEntropy is the cross-entropy loss function.

Adversarial Training Process.
As shown in Algorithm 1, the training of the MT-CGAN is divided into two steps. First, the generator is fixed and the discriminator is trained. en, the discriminator is fixed and the generator is trained. In the training process, the two steps interact alternatively until reaching a stable state.

Model Parameters and Datasets
Experimental parameters. e MT-CGAN is implemented using Pytorch 1.3.1 library. e Adam optimizer is utilized during the training procedure to solve the problem of gradient sparsity and gradient oscillation in the field of natural language. e learning rate of the generator and discriminator is set to 1e − 4, the learning rate decay is 0.95, the dropout is 0.3, and the training epoch is set to 20. All models are trained and tested on a single NVIDIA RTX 3090. Datasets. We use four books Shennong's Classic of Material Medical, Medical Cases of Famous Doctors in Different Periods of China, Miraculous Pivot, and Syndrome in TCM, as the original texts for NER annotation and annotate four types of the corpus as the experimental dataset. e scale of the annotated corpus for each book is shown in Table 2.
e Shennong's Classic of Material Medical is one of the four classics of Chinese medicine, was written in the Han Dynasty, and is the earliest known book on herbals, containing 365 species of herb-medicine. e Medical Cases of Famous Doctors in Different Periods of China is a collection of Chinese medical cases recorded by famous doctors from all periods of Chinese history. ere are more than 18,000 medical cases in total, and each case introduces the patient's     Table 1; the convolutional layer parameters and transposed convolutional layer parameters of FALG Decoder refer to Table 2 ..

Experiments and Results
. For the TCM NER task, we chose several typical methods with which to compare the proposed approach. We selected two advanced methods: the BiLSTM-CRF architecture [4], BERT-BiLSTM-CRF architecture [13], and a variant of the BERT model architecture that can fuse multigranularity information of text (Roberta-c) [33]. ese methods are described as follows.
BiLSTM-CRF [4] consists of the LSTM and CRF and is one of the most popular model architectures for handling NER tasks before pretraining models were proposed.
BERT-BiLSTM-CRF [13] consists of BERT, BiLSTM and CRF. As a representative of pretrained models, BERT has proven to be one of the most advanced models for representation learning in NLP [32]. e combination of BERT and BiLSTM-CRF models is one of the current popular architectures for NER task models.
Roberta-c [33] is a variant of the BERT model and outperforms the BERT model in Chinese NLP tasks. C denotes a word-character set in a self-attention module and it can integrate character and word information.
is section shows the comparative experiments and results analysis of the MGFF-GGAN with other three different types of baseline models on the datasets. e experimental results use Precision (P), Recall (R), and F1 score (F1) as evaluation metrics for the model evaluation. e performance on the test set of the different methods is presented in Tables 3 and 4.
According to the data presented in Tables 3 and 4, the overall performance of the proposed model is superior to that of the baseline models, except for on the Medical Cases of Famous Doctors in Different Periods of China dataset. is shows that our model shows overall advantages in various types of TCM texts, which is also the strongest advantage of our proposed model. Although the performance of the MT-CGAN does not improve significantly on both the  Table 3 and Table 4, the P values of the proposed model are 90.26%, 90.36%, 77.73%, and 78.45%, respectively, on the four data sets, all of which are higher than other comparison models. It benefits from the fact that the MT-CGAN can extract the contextual semantic relations of text at different granularities.
We also observed that the performance of MT-CGAN on both Miraculous Pivot and Syndrome in TCM datasets was significantly improved. Especially on the Syndrome in TCM, the values of F1 and P of our proposed method are improved by 2.35∼5.70% and 3.83∼61.2%, respectively.
e results indicate that the proposed model has good performance under the condition of small-scale annotated corpus and independent of external resources.

Ablation Study.
In this section, we investigate the ablation in different aspects of the MT-CGAN to better understand their relative importance.
We used the annotated corpus of the Syndrome in TCM as dataset to evaluate combination strategies at different granularities. As shown in Table 5, the F1 value is the highest when the features of the four granularities are used as input together. e F1 value is significantly improved when adding character-granularity features. e main reason is that the MT-CGAN uses GRU to extract the relative position features among characters from the character-granularity, which is used to constrain the order of entity label generation and plays a decisive role in the model.
We conducted a comparative experiment on whether the seed for different types of TCM texts can improve the effectiveness of named entity recognition. e results of the experiment, which are shown in Table 6, show that our method can enhance the effectiveness of entity label generation. When random entity label features are used as input, the F1 value is 75. 16. In contrast, the F1 value increased by 2.19 when we used the seed of the corresponding text type as input.

Comparative Experiments with Different Forms of Main and Conditional Input.
We verified the validity of our approach by experimentally comparing the differences with     (2) for k steps do (3) Sample minibatch of mZ samples A|Y′ 1 , A|Y′ 2 , . . . , A|Y′ m from Z prior p g (A|Y′) (4) Sample minibatch of m example L 1 ′ , L 2 ′ , . . . , L m ′ from data generating distribution p data (L′) (5) Update the discriminator by ascending its stochastic gradient: end for (8) Sample minibatch of mZ samples A|Y′ 1 , A|Y′ 2 , . . . , A|Y′ m from Z prior p g (A|Y′) (9) Update the discriminator by ascending its stochastic gradient: (10) ∇ θ d 1/m m i�1 log(1 − D(G(A|Y′))) (11) end for (12) e gradient-based updates can use any standard gradient-based learning rule. We used Adam in our experiments.
ALGORITHM 1: e training process. Minibatch stochastic gradient descent training of MT-CGAN. e number of steps to apply to the discriminator, k, is a hyperparameter. We used k � 1. Z is the fusion of text's multigranularity features, seed of entity distribution features and attention data for different types of TCM texts.    Table 7, the F1 value of the model was improved by 5.73% using the attention data of the text compared to white noise as the main input to the generator. e experimental results validate that the use of textual attention data can guide the model to generate more accurate labels.
As shown in Table 8, compared to the same conditional input, the F1 value of the MT-CGAN is improved by 4.56% and the number of training rounds for the model to reach the steady state is reduced by about 5 epochs. e experimental results demonstrate the method that uses the fused data Y as the conditional input in the model generator can improve the accuracy of the NER and speed up the convergence of the model.

e Comparison of Model Performance on Different Scales
of Annotated Corpus. In this paper, we verified the performance of the MT-CGAN with small-scale samples by reducing the scale of the training dataset. We conducted comparison experiments with the Roberta-c model on different proportions of the original annotated datasets of the Medical Cases of Famous Doctors in Different Periods of China and Miraculous Pivot, respectively. As shown in Figure 5, we found that the MT-CGAN can maintain relatively stable F1 values, which indicates that the MT-CGAN has better recognition effects than other baseline models on the small-scale annotated corpus. e effectiveness of the MT-CGAN based on generative adversarial ideas for smallscale sample NER tasks was demonstrated.

Conclusions
In this paper, we propose the MT-CGAN to achieve TCM NER under the condition of a small-scale annotated corpus.
To ensure that the model can learn multiple dimensions of TCM text information, MTFE is designed in MT-CGAN to extract text grammatical and semantic information from four granularity of words, sentences, paragraphs, and chapters. Moreover, to make MT-CGAN get accurate named entity labels when processing different types of TCM texts, the seed of entity distribution features is introduced as the input of the model. We verify MT-CGAN performance on different types of TCM texts corpora, and the experimental results show that the model can achieve accurate NER tasks. e main advantage of the MT-CGAN is to deal with texts that contain more entity types and has sparse entity distribution, and have a more random and less regular linguistic form. Although our approach outperforms previous methods, our model still fails in some situations. For example, the R and F1 scores of our experimental results are lower than those of certain models, so we will make efforts to improve the R and F1 values of the model in the future.

Data Availability
Some data can be found within the article. Other data can be obtained from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.