Design and Implementation of a Medical Question and Answer System Based on Deep Learning

Medical services play a pivotal role in people’s lives and in the national economy. Although the number of healthcare facilities is currently growing every year, there are still major problems in terms of access and pressure on the ow of people. erefore, there is an urgent need for complementarymedical services to alleviate the ow of patients and their psychological burden and to enable them to receive timely medical advice. is article designs and implements a medical Q&A system based on deep learning. We took a retrieval-based approach, using crawler technology that has been manually reviewed to build the Q&A database, and the Seq2Seq algorithm and the TF-IDF model to build the answer generation model. e medical question and answer system developed enable eective Q&A and relevant medical advice to be given.e algorithm proposed in this paper can quickly provide users with accurate answers compared to conventional search methods in real datasets.


Introduction
Arti cial intelligence technologies, represented by machine learning, are currently being used in large numbers in various industrial sectors. Deep learning is a key technology in machine learning that has made great breakthroughs in computer vision, natural language processing, and speech processing, some of which surpass those of human professionals. However, the medical industry is more traditional and outdated than social media and e-commerce, and the development of modern medical technology can greatly improve people's quality of life.
With the rapid development of medical information technology, technologies such as deep learning and natural language processing have also developed rapidly, and intelligent diagnostic technology based on arti cial intelligence will bring about a huge change to the medical industry. Answering users' questions quickly, accurately, and concisely in natural language is an important problem to be solved in medical Q&A systems. Traditional database search methods are unable to meet the demands of search e ciency and accuracy, and Q&A using natural language can quickly provide users with accurate answers compared to conventional search methods.
is algorithm will improve the e ciency of medical services, promote the development of medical question and answer systems, and provide users with more accurate and faster answers to everyday medical questions. e rest of the paper is organized as follows: Section 2 introduces the relevant materials and methods. e general design, key technologies of the medical Q&A system, and application results and data analysis are explained in Section 3. Finally, Section 4 concludes the work of this paper.

Related Studies.
In the late 1980s, the discovery of new neural network propagation algorithms gave impetus to the development of machine learning and sparked a machine learning frenzy based on statistical models.
is frenzy continues to this day. In the 1990s, machine learning models such as the vector machine, boosting, and maximum entropy methods were developed and achieved good results in both theory and practice. Research on shallow artificial neural networks has been in limbo during this time due to the difficulties of theoretical analysis and the fact that training methods also take time to hone. e rapid growth of the Internet since 2000 to date has greatly necessitated intelligent parsing and prediction of massive amounts of information, but shallow learning models have achieved good results online. Some of the most relevant applications are as follows: CTR prediction, content-oriented recommendations, web search sorting, spam filtering, etc. Deep learning is the most active field of artificial intelligence and has achieved fruitful results in the fields of speech recognition, computer vision, and natural language processing in recent years [1]. One of them is artificially intelligent customer service. Now, companies are launching their own intelligent Q&A systems. Examples include Google's GoogleAssistant and Apple's Siri. ey can answer some basic natural language questions and can also follow simple user instructions. In China, many manufacturers have developed their own smart quiz software, such as Huawei's HiAssistant and Xiaomi's "Xiaoai classmate." Another commonly used Q&A system is a type of intelligent voice audio, such as the Tmall Genie developed by Alibaba, which solves basic natural language problems and can perform some basic commands.
is voice interaction-based Q&A can solve some of the problems of everyday life, but they rely more on their own experiences on the Internet and face open-ended questions rather than medical ones.
Most of the current medical Q&A uses knowledge graphs to store medically relevant knowledge in a nonrelational database in an entity-relational way [2] and to present medical advice in a search and reasoning manner. Izcovich et al. [3] developed a graphical GRADE-based medical Q&A system based on GRADE. Oyelade et al. [4] collected relevant information based on the patient's symptom profile in order to conduct an initial specialist consultation. In addition, there have been many achievements in this field in China in recent years. For example, Xin [5] built a community health Q&A system using natural language processing techniques and various machine learning methods, while Chao [6] used big data analysis and deep learning techniques to build a disease guidance system that can be very helpful for patients for consultation and guidance. Elytai et al. [7] used a joint learning model to perform knowledge extraction and a stackpropagation framework to recognise medical input interrogatives and quickly feed the user with accurate medical answers. Hu [8] implemented CMQA, a Chinese medical Q&A system that understands user semantics well and generates SPARQL queries.
However, natural language processing techniques in Chinese are complex, and existing theories and industrialized results are not yet well suited to address the intolerance that exists in medical problems. erefore, further research in the field of medical Q&A is still to be conducted. e Seq2Seq model is often used in machine translation, chat robots, text summarization, automatic generation of picture descriptions, and creation of ancient poems. In addition, the Seq2Seq model can also be applied to speech recognition, search intent completion, and recommendation. In the search recommendation scenario, when the user inputs the first half of the keyword, through the idea of interesting writing, the user inputs the first half of the vocabulary as the model input, predicts the possible search content in the second half, and improves the search efficiency. On this basis, we propose a medical Q&A system based on deep learning and build a sequence-order (Seq2Seq) architecture.

General Design of the Medical Q&A System.
e medical Q&A system developed in the thesis uses a hierarchical architecture consisting of four layers: data layer, model layer, functional layer, and interaction layer. e advantages of using a layered architecture are that it reduces the correlation between layers and facilitates the standardisation of work; specific layers can be replaced, and analysis can be carried out from one level without too much knowledge of other levels, thus enhancing the repeatability and modifiability of the system. e architecture of the medical Q&A system based on deep learning [9][10][11] is shown in Figure 1.
e data layer managed and processed the data for the training corpus, laying a solid foundation for the design of an appropriate training set for the model layer. e training corpus was recorded this time in the form of questions with the title being the symptom of a condition and the answer being the name of a condition, combining the symptoms of these conditions sequentially and in reverse order to form a series of question responses.
At the model layer, secondary processing of the completed training set is completed with segmentation of the text, similarity operations, and feature extraction of the text. e Seq2Seq model is learnt so that the value of loss gradually decreases to achieve better accuracy.
At the functional layer, using natural language processing technology, the medical Q&A system is achieved by extracting text features from the input text and analysing the results to output predictive text, which is then subjected to secondary operations on the generated text via TF-IDF to improve accuracy.
e interaction layer provides access to the underlying layers for the purpose of meeting user requirements. is part consists mainly of the front-end interface and the human-computer interaction, which displays and receives the user's interactive actions through the terminal. e interaction layer is the top layer of the whole system and is the level that is directly accessible to the user.

Key Technologies for Medical Q&A Systems
e intelligent quiz developed in this thesis uses the Seq2Seq model based on the encoderdecoder architecture [12,13].
e Seq2Seq model is essentially an encoder-decoder construct: it transforms a series of long variables of data into a fixed vector; the decoder converts this fixed-length vector into a larger sequence of data. e difficulty of obtaining the true meaning in the case of long input sequences can be well overcome by introducing the attention mechanism. e workflow of the Seq2Seq model is shown in Figure 2.
When doing the experiment, the first step is to obtain names, aliases, sites, infectiousness, population, symptoms, complications, departments to which they belong, and clinical management, treatments, common drugs, etc., of different diseases in the emergency department from one of the health sites, using crawlers. e data were analysed and filtered to obtain 3600 sets of questions and answers, which were then put into lists and then saved in data files for easy correction and training. e model was built using the TensorFlow 2.0 framework. is model is a variant of the RNN that improves the neural network's ability to extract long text information [14], achieving better results than using the LSTM alone.

TF-IDF Model.
TF-IDF is a statistical method for assessing the importance of words to a document set or corpus. TF-IDF has two values, one for TF (term frequency) and the other for IDF (inverse document frequency). It is calculated as follows: e basic concept of IDF is that if the smaller number of files includes t, which means that n is smaller and the IDF is larger, it means that t has a better classification function. If the number of documents including t is m and the number of all documents of other classes containing t is k, it is clear that the number of all documents containing t, n � m + k, is large when m is large and n is also large, then the smaller value will be obtained by the IDF, indicating that t does not have good classification performance. In practice, however, when a word is used multiple times in a category of documents, it indicates that it can denote the character of the text, and such words should have a higher weight and be used to distinguish other documents.

Bahdanau Attention.
TensorFlow provides two attention mechanisms, a Bahdanau attention mechanism (additive accumulation) and a Luong attention mechanism (multiplicative multiplication), and the former is used in this system [15,16].
Bahdanau, an additive attention mechanism, uses a linear combination to output the hidden layer of the decoder and the full position of the encoder, thus improving the decoding pattern of the queue [17,18]. It is essentially a twolayer fully connected network with an activation function of tanh and an output layer of dimension 1. e advantages are that the encoder generates a hidden state vector for each input vector; the alignment score is calculated at each encoder output xi using the hidden state st-1 of the previous moment; this alignment score can be converted into a probability distribution vector by softmax; according to the probability distributed alignment score, the context vector ct can be derived by weighing the encoder outputs at each position; the context vector ct and the embedding corresponding to the previous moment's encoder outputŷt-1 are spliced as the current moment's encoder input, and the new output and hidden state are generated by the RNN network, with the real target sequence y�(y1,ym) in the training process, and more yt-1 is used instead ofŷt-1 as the decoder input at moment t. At time t, the hidden state of the decoder is denoted as st � f(st-1, ct, yt-1), and the attention fraction of the hidden state st-1 for all outputs X of the encoder at each t is As shown in Figure 3, the blue one is an encoder and the red one is a decoder. Based on traditional encoding and decoding algorithms, the attention mechanism requires more context vectors to generate the corresponding context vectors. Each context vector is a weighted sum of each word_x of Input_Sentence, where the weight vector is the attention vector, indicating the importance of each word_x of Input_Sentence at this point in time when word_y produces Output_Sentence. Eventually, the current text vector is combined with the current y, and it is taken as the final result.

Experimental Evaluation Indicators.
e trained model can be used to predict new text, and it is not possible to determine whether the results are satisfactory. No model is as good as it should be, and we are always looking for better. When the model has finished training, in order to determine how good or bad it is, we can make predictions based on the available information and compare the predictions with reality as a way to judge how good the model is [19]. erefore, we need some metrics to measure how similar the actual predicted results are to the expected results.
Common evaluation metrics for text classification tasks include accuracy, precision, recall, and F1-score to name a few.
(1) Accuracy. Accuracy is the most basic evaluation metric, which is the percentage of correctly classified test samples of the total test samples. e advantages are that it is simple to calculate, easy to understand, and can be used for both dichotomous and polyphenolic classes. However, when the data are unbalanced, it is not a good measure of how good the model is. e formula is as follows: Accuracy � answer true answer all . (4) (2) Precision. In the classification model, there exists an outcome with an output, which is a prediction. Assume that A is predicted by Class_1. ere are only two cases of A: A is Class_1 (prediction is correct) and A is not Class_1 (prediction is wrong). If all data are predicted, then Class_1 data will appear to be incorrectly predicted in relation to Class_1 and the other non-Class_1 will be considered to be Class_1. e confusing evidence is shown in Table 1. e denominator of precision is all test samples classified as Class_1, and the numerator is the number of test samples that are predicted to be Class_1 test samples that are actually positive classes.
(3) Recall. Recall is also derived from the above table, and its denominator is all test samples that are positively true to Class_1; its numerator is predicted to be a test sample of Class_1, as is precision.
(4) F1-Score. In practical experiments, we all want both precision and recall to be high, but in reality, you cannot have both, so there is something that combines them both, the F1 value. e larger the F1 value, the better the model.
e similarity is the result of a similarity calculation between the predicted answer by the model and the original test data.

Algorithm Description.
When designing the Q&A system, we initially used the TF-IDF algorithm, which calculates the similarity between user input and existing statements in the database and then returns the answer corresponding to the value with the highest similarity, but the algorithm takes longer to calculate as the data grow. Later, after reviewing the information, the Seq2Seq model, which is currently the most used, was chosen for implementation. Seq2Seq is generative, predicting possible output values from a trained model and getting one word or text per prediction.
Since the most likely value is predicted each time, the value is not necessarily what the user wants or the correct value. If the predicted result is a value within a certain range, it is possible to control the content of the output and also improve the accuracy of the output. us, the two methods can be combined, and result A predicted by Seq2Seq can be then calculated by TF-IDF to output result B that is most similar to A from the existing database, as shown in Figure 4.  Table 2.
Secondly, all test datasets were manually tested, and a total of 410 test datasets were counted; it was found that a total of 361 datasets could be predicted by the model, a rate of 88.0488%.
Because the Seq2Seq model is called based on the textual properties of the sentence, if the information in the training set is not precise enough, or if the user's text does not correspond to the input to the model, then this can lead to inaccurate results. On this basis, a simple algorithm of TF-IDF is introduced to perform text similarity analysis. We take the output of Seq2Seq and recall TF-IDF to perform a secondary operation so that the output belongs to the data already used in the database. For example, the Seq2Seq model predicts "exercise-induced asthma" for "coughing and dry cough after strenuous exercise" (a symptom of cough variant asthma), although this condition is not named in the database, and then the TF-IDF algorithm outputs "cough variant asthma" to improve the accuracy of the output. e comparison results are shown in Table 3.
As the information used in this system is taken from the emergency department, it is very useful for emergency management of emergencies. e medical information provided to the user by the system comes from the medical knowledge base, and this database is rigorously hand-selected, its answers are output based on the data available in the database, so the accuracy of its answers can be guaranteed. As calling the model consumes a long time, the files for model calculation are placed on Tencent Cloud servers, which improve the rate of model calculation and reduces the time for Q&A. Simulations of the calculations show that the method gives a relatively good answer.
Given the specific nature of the system's model training set, which requires user input of symptoms to most accurately invoke the model, a guided input module was added to the system. e initial values in the module are the twenty most common symptoms for the user to select. For each symptom selected, the symptom is added to the input box, and the value in the guided input module change to all the remaining symptoms that have the symptomatic disease. e input box monitors the user's input in real time, and if the user enters symptoms themselves, the value of the guided input module will change accordingly. It is recommended that the user selects 3-5 symptoms before choosing to send, as this will make the answers more accurate. After sending, the symptoms on the right revert to their initial values.

Conclusions
is article designs and implements a medical Q&A system based on deep learning. e algorithm proposed in this paper can quickly provide users with accurate answers compared to conventional search methods in real datasets. e system has been validated through several experiments and has achieved excellent results [20]. e added guided input enables the user to select the information accurately, which in turn helps the user to quickly locate disease information and know how to administer medicine. However, this system has some limitations in certain aspects. Due to the specificity and high quality requirements of the medical question and answer content and the system's ability to learn autonomously is not yet optimal, in practice, the user is currently limited to selecting or entering symptoms in order to use the system most effectively. In terms of modelling, the accuracy of the model has not been maximised, and modifications to certain values could be considered in the future. In terms of data, the quality of the training set would need to    be improved by consulting a professional and modifying it manually due to the small amount of first aid data; in addition, a library of common question and answer statements could be added to diversify user input [21]. Intelligent diagnostic techniques based on artificial intelligence will bring great changes to the healthcare industry [21]. e intelligent diagnostic technology based on artificial intelligence will bring about a huge change to the medical industry [22,23].

Data Availability
No data were used to support the findings of this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.