An Entity Relation Extraction Method for Few-Shot Learning on the Food Health and Safety Domain

In recent years, entity relation extraction has been a critical technique to help people analyze complex structured text data. However, there is no advanced research in food health and safety to help people analyze the complex concepts between food and human health and their relationships. This paper proposes an entity relation extraction method FHER for the few-shot learning in the food health and safety domain. For few-shot learning in the food health and safety domain, we propose three methods that effectively improve the performance of entity relationship extraction. The three methods are applied to the self-built data sets FH and MHD. The experimental results show that the method can effectively extract domain-related entities and their relations in a small sample size environment.


Introduction
Food is inextricably linked to human health, particularly the nutritional components that can significantly improve health. For instance, increased spermidine intake protects against cancer, metabolic disease, heart disease, and neurodegeneration [1]. A higher intake of whole grains and dietary fiber is associated with a decreased risk of death from liver cancer and disease [2]. In addition, contaminants in food or an excessive amount of artificial additives have a detrimental effect on human health and even cause various diseases. For instance, interference with the intestinal microbiota's metabolites caused by food contaminants (polycyclic aromatic hydrocarbons, polychlorobiphenyls, brominated flame retardants, dioxins, pesticides, and heterocyclic amines) may promote the establishment of an inflammatory state in the intestine [3]. Food emulsifiers and thickeners influence the intestinal microbiota, mucosal barriers, and inflammatory pathways, and there are numerous possible pathogenic mechanisms [4].
With the proliferation of available human health data, humans use predictive or classification models to extract useful information, which helps people enjoy better health protection. Typical of this available human health data is the electronic health record (HER), the largest source of medical text data. One of the critical points in analyzing these unstructured textual data is to extract vital medical concepts, so many named entity recognition methods and entity relation extraction methods have emerged. Wan et al. proposed an ElMo-ET-CRF model approach for extracting medical named entities from Chinese electronic medical records (CEMR) using dynamic context-dependent ElMo character embedding to merge more lexical, syntactic, and semantic information that alleviate the long context-dependency problem [5]. Luo et al. proposed a novel tagging scheme considering overlapping relations to solve the overlapping problem in biomedical texts and then built the Att-BiLSTM-CRF model to extract the entities and their relations according to the rules [6]. Fei et al. proposed a new crossgraph neural model for joint extraction of overlapping entity relations in biomedical texts. ey treat the entity relation extraction task as a relational triad prediction and construct entity graphs by enumerating possible candidate entity spans [7].
It is possible to help explicitly store the relationships between complex medical concepts for use in subsequent tasks by extracting them. Similarly, for the food health and safety domain, food regulators sample food products on the market and examine the ingredients and form structured data. However, in addition to structured data, more unstructured textual data goes beyond the act of food sampling and inspection to include announcements of food safety events, news reports, and knowledge about food and human health. Cenikj et al. proposed a method to detect the relationship between food and disease entities from text. ey explored the feasibility of migration learning using a pretrained model based on BERT and achieved good results with few-shot learning [8]. Popovski et al. proposed a rule engine for extracting food concepts called FoodIE, a rulebased named entity recognition method with rule content describing food entities' computational linguistic and semantic information [9].
However, the food health and safety field does not have many available resources like the biomedical field, so fewshot learning becomes an effective method to improve information extraction. Qu et al. proposed a Bayesian metalearning method for relation extraction in few-shot learning environment, applying graphical neural networks to global relation graphs to improve accuracy [10]. Sainz et al. reformulated relation extraction as an entailment task with good results for zero-or small-sample relation extraction tasks [11]. e essential technique for extracting valid information in the food health and safety domain is entity relation extraction. is task requires the model to identify entities and relations in a sentence correctly and combine them correctly. e complexity of the task and the small sample size together contribute to the difficulty of the entity relation extraction task with few-shot learning. For supervised deep learning models, small sample size can lead to an underfitting of the model. In other words, there is not enough information to train a valid model. In addition, the model tends to converge at a local optimum, leading to poorer practical results of the model.
To solve the underfitting problem, we propose two methods to enrich the semantic features of the input text. e first is to disassemble Chinese characters into radicals (https://en.wikipedia.org/wiki/ Radical_(Chinese_characters)) and construct semantic input at the radical level. As pictographs, radicals are part of the structure of Chinese characters. e radicals have specific meanings; in other words, they represent part of the semantic information of the constituent characters. erefore, constructing semantic input at the level of radicals can enrich the semantic information of the input text very well [12,13]. e second approach to enrich semantic information is to fuse input text characters and word vectors. As units with independent meanings, characters are often given new meanings after forming words, and word information can significantly complement the interpretation of words within words [14]. For instance, the Chinese character "瘦 (lean)," "肉 (meat)," and "精 (essence)" form the word "瘦肉 精," which stands for a drug that is banned from being added to animal feed. In their study, Chen and Hu also verified that Chinese words contain rich semantic information [15]. Tran et al. combined the advantages of character-and word-level translation and proposed a new method for translating Chinese into Vietnamese [16].
To solve the problem of the model falling into local optimality, we propose a text noise removal model to help the model converge quickly to the global optimum. As shown in Figure 1, for a given input text: "Enrofloxacin belongs to the third generation of quinolones, is a class of synthetic broad-spectrum antibacterial drugs, according to the "National Food Safety Standards Maximum Residue Limits for Veterinary Drugs in Food," the maximum residue limit of enrofloxacin in the skin and meat of fish is 100 μg/ kg." ere are four groups of entity relations combinations in the sentence.
We need to extract the two entity relation groups below the sentence in Figure 1 (index values 3 and 4 in Table 1). e two entity relation groups above the sentence (index values 1 and 2 in Table 1) are weakly related to food health and safety. In this paper, we believe that we need to accurately propose the entity relation groups that are food-health-related. e entity relation groups that are weakly related to food health and safety will become noise to affect the extraction effect.
To effectively extract entities and relations for texts in the food health and safety domain in a small-sample learning environment, this paper proposes a method for entity relation extraction in the food health and safety domain, FHER (food health entity relation extraction).
Our contribution can be summarized as follows: (1) To reduce the influence of noise in the text, we propose a text denoising method for domain entity relation extraction to convert the text denoising task into a sequence prediction task, which effectively reduces the correct range of entity relation extraction. (2) To address the lack of semantic information due to the small sample size under the few-shot learning, we propose constructing the input at the part-head level to enrich the semantic information. (3) We propose the fused character and word method to improve the ambiguity problem caused by errors in Chinese word separation boundaries.

Related Works
ere are two approaches in current research [17]. Traditionally, a pipeline approach is used to extract entity mentions using a named entity recognizer and then predict the relationship between each pair of extracted entity mentions [18,19].
is approach inevitably brings the problem of error propagation. To alleviate the error propagation problem, Yu and Lam proposed a joint extraction approach that performs two subtasks simultaneously [20]. 2 Computational Intelligence and Neuroscience e joint extraction approach is gradually becoming mainstream in entity relation extraction tasks. Geng et al.
proposed an end-to-end joint entity and relation extraction method based on a combined attention mechanism of convolutional and recurrent neural networks [21]. Wan et al. proposed a region-based hypergraph network (RHGN) for joint entity and relation extraction, introducing the concept of regional hyper nodes to enhance contextual connections [22]. Wei et al. proposed a cascading pointer labelling approach, which solves the problem of overlapping entity relations [23]. Qiao et al. proposed a joint entity relation extraction model BERT-BiLSTM-LSTM for agriculture and verified that the BERT model has better migration in agriculture [24].
Due to the scarcity of textual data in the food health and safety domain, few-shot learning can effectively improve the effectiveness of entity relation extraction. Most of the current research on small-sample learning has focused on relationship extraction tasks. For relation extraction in a fewshot learning environment, Qu et al. propose a Bayesian meta-learning method that applies graphical neural networks to global relation graphs to improve accuracy [10]. Sainz et al. reformulate relation extraction as an entailment task with good results for zero-or small-sample relation extraction tasks [11].
For the entity relation extraction task in food health and safety, we propose a joint entity relation extraction method FHER with noise removal and feature enhancement, which mainly addresses how to perform effective entity relation extraction in a few-shot learning environment with high noise and low data volume. Our model adopts a cascading pointer annotation approach [23] to mitigate the entity overlap problem in the entity relation extraction task.

Overview.
e FHER method divides the whole entity relation extraction task into five parts: the first is a text noise removal model, the second is the construction of a radical level input, the third is the fusion of character and word features, the fourth is the prediction of possible subjects in a sentence, and the last is the prediction of relations and corresponding objects. Figure 2 shows the overall structure of the model. e input sentence In passes through the BERT Encoder layer to obtain vector H c . e prediction sequence P s is obtained after passing through the text noise removal model. e position vector E can be obtained through the binary function. en, the input In r of sentence In at the radical level is constructed, and the vector H r is obtained after passing through the BERT encoder layer. e vectors H c and H r and the position vector E are fed into the fusion function to obtain the vector H cw after fusing the characters and words. Using the position vector S and the fusion vector H cw , we can obtain the possible object positions corresponding to each relation and thus the combination of entity relations present in the sentence.
We define the entity relation extraction task in the food health and safety domain: unstructured input text, output extracted human health-related concepts and their relationships, and further form food-and health-related knowledge triad. Formally, for a given sentence x, all possible triples T s, r, o { } are extracted, with s representing the subject in sentence x, o representing the object in sentence x, and r representing the relationship between the subject and the object. For sentence x, the probability of all possible triples T that it contains is as follows: (1) where, according to the chain rule, s ∈ T denotes the subject appearing in the triplet T, T|s denotes the triplet T containing the subject, (r, o) ∈ T|s denotes the combination of objects and relations appearing in the triplet containing the subject, and R is the set of relations. For a given training set D, the probability that all sentences may contain the triplet T is as follows: e goal of the entity relation extraction task in the food health and safety domain is to find all possible triples (s, r, o) in the data set D.

BERT Encoder.
e implementation of the language model basis of the model is the encoder layer (BERT encoder). e BERT model consists of a multilayer bidirectional transformer consisting of an encoder that learns the valid information in the context very well [25]. We use a pretrained BERT model using a Chinese corpus from the food domain to encode the context of the input sentence In. Formally, for a sentence of length M can be represented as We input these tokens into the same BERT encoder layer as the trained one to obtain a vector representation H c of the input sentences as follows:

Text Noise Removal Model.
We summarized the possible relationships between concepts related to food health and safety into 12 kinds by reading many food health and safety information. Table 2 lists the details of the relations since the text contains combinations of entity relations with low relevance to the domain; these irrelevant combinations of entity relations act as noise. It will interfere with identifying entity relation combinations in the food health and safety domain. To reduce the influence of noise in the text, we propose a text denoising method for domain entity relation extraction to convert the text denoising task into a sequence prediction task, which effectively reduces the correct range of entity relation extraction. Figure 3 shows the structure of the text noise removal model structure.
We transformed the text noise removal task into a sequence prediction model. To capture the sentence context information, we used the BiLSTM layer for further feature extraction. en, considering the dependencies between the labels, we use the undirected probabilistic statistical graphical model CRF layer to handle the exact sequence    Computational Intelligence and Neuroscience labelling problem. Each character is assigned a BIO tag (B represents the beginning of an entity, I represents In an entity, and O represents Out of an entity). At the same time, we modified this tagging method to predict the possible types of entities (see Figure 4). With the BERT encoder, we obtain the vector representation H c of the sentence, which is then fed to the BiLSTM layer to obtain the intermediate vector representation H Bi as follows: For the input sequence In and the corresponding predicted label sequence Y, the model is trained to obtain the prediction sequence result P s as follows: 3.4. Construction of the Radical Feature. As the smallest semantic unit, radicals themselves have particular semantic meanings. Chinese characters are usually composed of smaller primary radicals, which are the most basic units that constitute the meaning of Chinese characters. In essence, this radical semantic information helps make characters with similar radical sequences (writing order) close to each other in the vector space, so it can be used to enrich the semantic information of the word vector and enhance the model effect. As shown in Figure 5, the sentence "Olive oil contains unsaturated fatty acids" can be represented at the radical level as "木木水口月一食口月月酉." e radical for the word fat is "月." When "月" is used as a radical, it can be interpreted as relating to the moon or meat. e radical of the character "酸" is "酉," and its meaning is related to alcohol and fermentation. e meaning of the decomposed radicals can enhance the semantic information of the word itself.
Formally, for the input sentence In, the corresponding radical input In r is generated based on the radical decomposition, and for each character w ri in In, the radical is added to In r if there is a radical, or directly to In r if it is a unique word. Finally, we get the input In r at the radical level as follows: In r � w r1 , w r2 , w r3 , . . . , w rM .
Similarly, the vector representation H r corresponding to the input In r can be obtained through the BERT encoder layer as follows: e sequence prediction result P s obtained from the text noise removal model is converted into a position vector E by a binary function I(·). I(·) will mark the food health-related tag positions as 1 and the irrelevant tag positions as 0.
e input vectors H c and H r and the position vector E are input to the fusion function G(·).
e function G(·) multiplies the position vectors E with the input vectors and concatenates them. At last, the vector H cr is calculated as follows: 3.5. Character and Word Fusion. As in Section 3.1, we use the BERT encoder layer to get the vector representation of the input sentence H w . e difference is that the BERT pretraining model used in the previous section is to segment the text into Chinese characters before training, while the BERT pretraining model used here is first to segment the text into Chinese words before training. e length of the input sentence after Chinese word separation is N.
Because of the difference between Chinese and English word construction methods, different characters will be combined into words in Chinese word construction. en the generated words will generate new meanings. So, in the food health and safety domain, especially in few-shot learning, we need to use as much prior knowledge as possible to help us improve the recognition, so we fuse the characterbased vector representation H c and the word-based vector representation H w .
We multiply the position vector E obtained from the text noise removal model with H w to obtain the vector H we . en we concatenate H cr to H w according to the corresponding position to obtain the fused vector H cw (see equation (11)), where the K(·) function expands the vector H cr to the same length as H w . For example, the vector for the word "食物" is A, and the vectors for the characters "食" and "物" are B and After combining characters and words, we can predict the location in the sentence where the subject may be. Because the entity relation extraction task itself suffers from the problem of overlapping entity relations, according to the view proposed by Wei et al. [23]: for a given subject s, any relation related to s (relations in T) corresponds to the corresponding object o in the sentence, while all other relations necessarily have no corresponding object in the sentence, that is, the set of corresponding objects is the empty set. So we first predict the possible positions of the subject in the sentence.
For a given input, we predict the location of potential subjects. Two binary classifiers form a subject marker, which marks potential subjects' starting and end positions. e subject marker is expressed as follows: where P sub head i denotes the probability that each position in the input vector may be the start position of the subject and P sub tail i in denotes the probability that each position in the input vector may be the end position of the subject. If this probability is greater than the threshold τ set in advance, it is for the position is a candidate for the start or end of the subject, marked as 1, and the rest is marked as 0. K head and K tail are learnable parameters, and b head and b tail are deviation parameters.

Relation Extraction.
We obtain all possible sets of entity relations in the input sentence by object identification of each subject to all relations. After the subject tagger, we obtain the candidate subject position information, incorporated into the fusion vector as follows: Similarly, the object tagger predicts the potential object location, consisting of two binary classifiers. e object tagger is expressed as follows: e process of predicting the location of the object is similar to that of predicting the location of the subject. P obj head i in equation (15) denotes the probability that each position in the input vector may be the start position of the object, and P obj tail i in equation (16) denotes the probability that each position in the input vector may be the end position of the object. K r head and K r tail are learnable parameters, and b r head and b r tail are deviation parameters. All relations in the set R have an object tagger to mark the start and end positions of candidate objects.

Data Set and Evaluation Metrics.
is paper uses an independently constructed food health-related data set FH (Food Health Dataset). Twelve types of food-health-related relationships are defined, and then "OTHER" relationship types are defined to represent less relevant relationships to food health in the sample data. Table 2 lists the 12 relationship types, the Chinese and English names of the included relationships, and their abbreviations. In addition, the paper constructs a small sample data set MHD (Medical Health Dataset) of medical-health-related data to measure the method's generalization performance. e data set MHD consists of 5 relationship types with 200 sentences. e training set contains 180 sentences, and the test set contains 20 sentences. e total number of valid annotated sentences in the FH data set is 1,420. is paper divides the FH data set into a training set and a test set. Figure 6 shows the details of the data set.  Figure 4: Modified BIO tagging method. e input text characters can be tagged as the beginning and middle positions of the entity and outside the entity. e "Food" prefix indicates that the entity is related to food health and safety, and the "Noise" prefix indicates that the entity is a noise that is not related to food health and safety.

Computational Intelligence and Neuroscience
To improve the performance of the model, we introduced more data samples from the public domain into the data set FH to form the incremental data set FH++, which is expanded from 1,420 sentences to 11,327 sentences to get better results in the text noise removal model. For the entity relation extraction task, we remove the "Other" relationships from the FH data set to form the FFH (filtered food health data) data set (http://39.96.33.199:9009/FFH_dataset.zip) used for entity relation extraction experiments.
For the evaluation metrics used in the experiments, we used three commonly used evaluation metrics, precision (P), recall (R), and F1 value, to measure the experimental results from different aspects. eir calculation criteria are as follows: in the formula, TP i indicates that the model correctly predicts the number of relationships present in given sentence In i , FP i indicates that the model incorrectly predicts the number of relations that do not exist in given sentence In i , and FN i denotes the number of relations that the model failed to predict for given sentence In i contained in the sentence.
For the text-to-noise model, we use the metric P to measure whether the model is predicting the location sequence correctly, the metric R to measure the model's ability to predict the location of entities in a complete way, and the metric F1 to measure model performance in aggregate. We focus more on metric P than on metrics R and F1, as it is more important to find the noise accurately.
For the entity relation extraction task, we use indicator P to measure how well the model extracts the correct combination of entity relationships in a sentence, indicator R to measure the model's ability to extract coverage in the face of sentences containing multiple relationships, and indicator F1 to measure the model's performance in aggregate.

Implementation Details.
e pretrained BERT model we implemented in the deep learning framework PyTorch uses the Chinese pretrained BERT model (BERT-wmm) published by Cui et al. [26] with its default hyperparameter settings. e models in this research are implemented on a PC with the configuration of Intel(R) Xeon(R) CPU 3.50 GHz, 64 Gb RAM, and GTX3090 graphics. Table 3 records the other parameters used in the experiment. e parameters used for the text noise removal model, construction of features at the part-head level, the fusion of character and word features, and entity relation extraction experiments are listed in Table 4.

Result of Text Noise Removal.
e following are the experimental results of text noise removal, and the model is trained iteratively on the data set FH++. Figure 7 records the loss values and the final precision/recall curves during the training process, and it can be observed that the loss values gradually decrease, and the model gradually converges as the number of iterative training increases. e precision/recall curves indicate that the model has a good performance. Table 5 records the precision, recall, and F1 values for the text noise removal experiments on FH and the incremental data set FH++. e model can obtain better convergence and accuracy when training with the incremental data set. e positive and negative values in the experimental results shown in this paper are confidence intervals calculated by combining multiple results at a confidence level of 95%. Table 6 records the experimental results obtained on the FH++ data set where we tried different structures to remove the noise, containing precision, recall, and F1 values. In Table 6, both the CNN and CNN-BiLSTM models use the softmax function as the activation function to transform the denoising task into a classification task. e results show that

Result of Entity Relation Extraction.
For the entity relation extraction task, Table 7 records the effect of the model on the data set FFH, which contains the values of precision, recall, and F1 for a given sentence containing one to five relationships. e experimental results show that the extraction effectiveness of the model decreases as the number of relations in the input sentences increases, which is related to the imbalance of the data itself, with most of the samples in the data set having less than three relations. However, the overall recognition effect is still good. e FHER method achieves 91.32% accuracy, 82.21% recall, and 86.52% F1 value on the FFH data set and achieves good extraction results in few-shot learning.
We did ablation experiments on the proposed denoising model to verify the effectiveness of few-shot learning for denoising. In Table 8, N represents the number of relations in the sentence, and the value in the table is the precision values for the entity relation extraction task. e experimental results show a significant decrease in recognition precision after the denoising model for a sentence containing multiple relations because the model identifies group or entity relations that we do not want. erefore, denoising the input samples in few-shot learning is an effective method.
Similarly, we conducted ablation experiments to add prefix features to enhance and fuse character and word features. Table 9 records the experimental results. According to the experimental results, richer semantic information can effectively improve the effect of entity relation extraction during few-shot learning.
e NIC model represents no features at the constructive radical level in the model, and the NCW model indicates that there are no fused characters and words in the model, and the whole model uses a vector of characters as units.
At last, we test the generalization performance of the method on the data set MDH. e experimental results show that constructing the input at the radical level and fusing characters and words can achieve good results even with smaller data set volumes. Table 10 records the results of the method FHER on the data set MDH.

Discussion.
We discuss the degree of semantic information in the text after its transformation into a radical.
ere are as many as 3,500 commonly used Chinese characters (http://www.gov.cn/gzdt/att/att/site1/20130819/ tygfhzb.pdf ). ese 3,500 characters can be combined in vast amounts of meaningful words, making the vector space generated after representation learning of Chinese through language models more complex. In contrast, the radicals, which are the main constituents of Chinese characters, can     For the entity groups "olive oil" and "unsaturated fatty acid" and their radical representations, we used the Chinese pretrained language model BERT-wmm to calculate their similarity based on the distance (see Figure 8). We collected subject-object combinations from the data set FFH. e sum of subject and object text lengths ranged from 4 to 20, so text similarity at the character level and the radical level was calculated between each pair of subjects and objects in them. Figure 9 shows the results of the text-similarity calculation. e experiment proves that the decomposition of Chinese characters into radical representations reduces the complexity of the text to a certain extent.

Conclusions
is paper proposes a method for entity relation extraction in food health and safety. ree methods are proposed for entity relation extraction under small sample learning to improve the entity relation extraction effect. Firstly, we propose removing text noise according to domain specialization. e experimental results show that removing entity relations in the text unrelated to the domain can significantly improve the effect of entity relation extraction. en, we construct text features at the radical level by disassembling individual Chinese characters from the characters' structure.
e experimental results show that the input at the radical level has low complexity and contains rich semantic features. Finally, we fuse the input text's character vectors and word vectors from the Chinese character constructions. e experimental results show that fusing characters and words can effectively enrich the semantic information in the input vectors. e above method has a good extraction effect in few-shot learning. In the method FHER, the effect of the entity relationship extraction task is subject to a threshold, which leads to an unstable performance of the model. In the future, we will further explore the influence of noise in the text on the degree of deep learning models, the semantic relationships, and differences between the three different levels of Chinese radicals, characters, and words. Moreover, we attempt to transfer the proposed method to other application fields such as agricultural image recognition, greenhouse environmental time-series prediction, and food safety risk assessment [27][28][29][30][31].
Data Availability e FFH data sets used to support the findings of this study are included within the article.    Figure 9: Results of text similarity calculations at the character level and the radical level. e horizontal coordinates represent the sum of the text lengths of the subject and object, while the vertical coordinates represent the similarity results. e blue bars represent radicals, and the red bars represent characters. e text similarity calculated at the radical level is higher than that at the character level.