Deep Personalized Medical Recommendations Based on the Integration of Rating Features and Review Sentiment Analysis

To comply with the rapid development of big data in mobile services, an increasing number of websites have begun to provide users with recommendation decisions in various areas, like shopping, tourism, food, and medical treatment. However, there are still some challenges in the field of medical recommendation systems, such as the lack of personalized medical recommendations and the problem of data sparseness, which seriously restricts the effectiveness of such recommendations. In this paper, we propose a personalized medical recommendation method based on a convolutional neural network that integrates revised ratings and review text, called revised rating and review based on a convolutional neural network (RR&R-CNN). First, the review text is divided into user and doctor datasets, and BERT vectorized representations are performed on them. Moreover, the original rating features are revised by adding the sentiment analysis values of the review text. Then, the vectorized review text and the revised rating features are spliced together and input into the convolutional neural network to extract the deep nonlinear feature vectors of both users and doctors. Finally, we use a factorization machine for feature interaction. We conduct comparison experiments based on a Yelp dataset in the “Health & Medical” category. The experimental results confirm the conclusion that RR&R-CNN has a better effect compared to a traditional method.


Introduction
With the advent of the age of information intelligence, mobile computing and recommendation systems are developing rapidly. However, in the face of massive amounts of information, the personalized demands of users are constantly increasing, and thus, the requirements for recommendation results and performance are also increasing. Traditional recommendations face difficulty in satisfying users' needs. Since deep learning can learn deep nonlinear feature representations, it is of great significance to study recommendation systems combined with deep learning [1,2]. A recommendation system covers all aspects of people's lives. Correspondingly, recommendations for medical services are gradually developing. However, some problems cannot be ignored in the field of medical recommendation systems. (1) Currently, most medical software or platforms only provide an open interactive window for users to communicate and seek medical treatment online. Moreover, due to the excessive number of experts and the complex needs of users, a platform cannot accurately recommend suitable doctors for users and lacks personalization recommendations [3]. (2) Like those in other fields, medical recommendation systems may also face the problem of sparse rating data [4] since the doctors evaluated by each user only account for a small part of the total number of doctors. The solution to this problem is to add review text [5], as it expresses a user's emotions and explains his/her rating. However, the accuracy of user reviews and ratings is interfered with by users' sentiments. Sometimes, a user provides a poor evaluation of a doctor but gives him/her a high rating. For example, we find a negative description in a 4-star review, which affects the objectivity and accuracy of the evaluation of this doctor. Therefore, it is necessary to combine the original rating with the sentiment analysis value of the review.
To solve the problem of the inaccuracy of rating and static coding of review text in a medical recommendation system, in this paper, the main contributions are shown as follows.
RR&R-CNN uses the revised rating and review text as input to the model. We use the revised rating combining the original rating and the review sentiment analysis to provide better and more objective recommendations. This is mainly considering that, sometimes, a user's ratings and evaluations of doctors do not match, and the user's review text expresses his/her emotional tendencies. Meanwhile, we use the pretrained model-BERT [6]-to obtain the vector representation of the review text. This model can resolve the problem of polysemy to help us understand the semantics and emotions in the review text and improve the accuracy of our rating prediction. In the input layer of the convolutional neural network model, a user's review text of doctors and the revised rating are combined as input features. We conduct experiments on a Yelp dataset, and the experimental results show that RR&R-CNN effectively improves the recommendation effect.
The remainder of this paper is structured as follows. Section 2 presents the related works on recommendation systems based on deep learning, the representation of review text and sentiment analysis, and their applications in recommendation systems. Section 3 describes the components of our network model in detail. In Section 4, we introduce the design of our experiments and the analysis of the experimental results. Finally, we conclude the paper in Section 5.

Related Work
2.1. Deep Learning-Based Recommender System. A recommendation system can help users deal with information overload and provide them with personalized suggestions, content, and services [7]. Traditional recommendation algorithms cannot learn the deep features of users and items by using shallow models. In recent years, deep learning has made rapid progress and development in many fields. It can not only perform deep nonlinear representations on massive user and item datasets but also automatically learn features from multisource heterogeneous data, especially for unstructured data such as text and pictures to extract feature information [8]. Thus, deep learning has become a crucial research hotspot in recommendation systems.
Recommendation systems based on deep learning usually take user and item-related data such as explicit and implicit feedback data as input features, and they generate item recommendations for users from the implicit representation of users and items obtained through the deep learning model [8]. A convolutional neural network is a typical model in deep learning. It is a special feedforward neural network with convolution and pooling operations that can effectively capture global and local features and significantly improve efficiency and accuracy [9]. Convolutional neural networks are widely used in recommendation systems, mainly for learning feature representations from multisource data such as images, text, audio, and video. For review text, convolutional neural networks have been widely used because of their pow-erful feature extraction ability. Kim et al. [10] used convolutional neural networks to obtain deep hidden vectors from the text information of commodities and used multilayer convolution to obtain the correlation between words, which improved the accuracy of rating prediction in the recommendation system. Zheng et al. [11] proposed a parallel convolutional neural network that processes both user and product text information. A convolutional neural network extracts the deep features of users and products. Then, a factorization machine is introduced as the corresponding rating estimator.

Representation of Review Text and Its Application in
Recommendation Systems. Review text usually contains information about a user's emotional expression and product features, which is of great reference value for other users in making decisions [12]. In medical recommendations, user reviews generally include detailed and personalized evaluations of doctors, such as feelings about the medical environment and doctors' abilities and attitudes [13]. In the rating prediction task, we need to extract the necessary information from the review text and convert it into a language that can be recognized by a computer. This task involves the representation method of review text, which converts the text into a vector representation. Widely used vector representation methods include word2vec [14], Glove [15], ELMo [16], and BERT [6].
The core idea of the word2vec model is to obtain a vectorized representation of a word through its context. Although the structure of this method is easy to understand, it can still learn high-quality word embeddings. Glove is based on word2vec considering the use of statistical information in the corpus. The above two methods of expressing word vectors have achieved certain results, but both of them belong to static coding; that is, the same word has the same meaning in different contexts. This may lead to a deviation in the understanding of semantics [17]. The ELMo model, which was proposed in early 2018, dynamically generates word vectors according to context. Its network structure adopts a "two-layer and two-way" long short-term memory (LSTM) network language model, which effectively alleviates the problem of polysemy. In October of the same year, the BERT model was born, which uses dual task pretraining and finetuning training strategies to perform pretraining on a large number of unlabeled texts. The BERT model has made substantial progress in 11 NLP tasks [6]. The difference between BERT and ELMo is that the representation of each word in the ELMo model is based on the current entire sentence, while the BERT model considers the full text of the context when generating word vectors. In [18], a pretrained BERT model is used to replace the previous word embedding layer, which can obtain a more accurate semantic understanding and further improve performance.

Sentiment Analysis and Its Application in
Recommendation Systems. Emotional factors play a vital role in a user's decision-making process. The feedback provided by users in the face of different types of information reflects their emotional state at the time, and this change in emotional state can directly affect their thinking [19]. Users' 2 Wireless Communications and Mobile Computing emotions are expressed explicitly through product ratings and implicitly through product reviews. Therefore, a recommendation system can use emotional factors to analyze a user's behavior and then judge his/her choice in terms of different items [20]. However, Zhang et al. [21] compared user ratings with their sentimental tendencies to review text and found that the former cannot completely replace the latter. For example, as shown in Table 1, the review text expresses the negative sentiment that doctors and nurses are both rude and impatient. A user's rating of "3 points" is not a true evaluation, but an encouraging sense, which directly affects the accuracy of the doctor's evaluation. Therefore, it is of great significance to study the combination of rating and review sentiment analysis. At present, quite a few researchers have proposed related methods. In [22], Abbasi-Moud et al. used semantic clustering and sentiment analysis to extract user preference information from user review text. In [23], Zhao et al. calculated the sentiment deviation of the user's review text and integrated it into the matrix decomposition to improve the accuracy of the score prediction. Lei et al. [24] proved that users' social emotions, including users' emotional similarity, interpersonal emotional influence, and project reputation, play a key role in improving rating prediction. In the location recommendation, Wang et al. [25] proposed a system that could recommend green space to users. They mainly calculated the contents of various tweets into the polarity values that could reflect users' emotions by using the emotional-relevance dictionary and understood users' attitudes towards them through the polarity values. Zhao et al. [26] proposed a method for mining points of interest that combines geographic attributes and emotional attributes. The emotional attributes are mainly based on emotional analysis of text information. In [27], Zheng et al. proposed a new preference prediction mechanism from the perspective of users' subjective emotions, taking into account the influence of changes in user emotions over time on user behavior in sequence recommendation. In addition, Zhang et al. [28] proposed a method of prefiltering opinions using a comprehensive measurement of a user's emotional orientation and original rating level. This method uses word2vec and LSTM to model the sentiment analysis of the review text to predict the sentiment tendency rating of each user's reviews of an item, and then, it performs a weighted sum with the original rating to correct the deviation between the original rating and the user's emotional tendency. The HFT model [29] uses matrix factorization to obtain the latent factors of the scoring value. Moreover, topic distribution was used to mine the latent topics of review texts. Then, a mapping relationship between latent factors and latent topics was established to achieve model fusion.

RR&R-CNN: Medical Recommendation
Model Based on a Convolutional Neural Network's Integration of Rating and Review Sentiment Analysis In this paper, we divide the review text into a user dataset and a doctor dataset. Then, we use a pretrained BERT model to obtain vectorized representation of the review text. Moreover, we combine the original rating data of the user and the doctor with the sentiment analysis value of the review text to revise the rating. Finally, the vectorized representation of the review text and the revised rating is combined as the input data of the convolutional neural network. Therefore, this method can discover the high-level features of users and doctors and provide personalized doctor recommendation services for users.
3.1. Model Structure. RR&R-CNN consists of the following three parts, and its specific architecture is shown in Figure 1.
(1) Input layer: the review text is divided into a user review dataset {UserT 1 , UserT 2 ⋯ UserT d }and a doctor review dataset {DoctorT 1 , DoctorT 2 ⋯ DoctorT d }, and we use the pretrained BERT model to vectorize the review text; for the rating data, we use the sentiment analysis value of the review text combined with the original rating as the revised rating, and we splice it with the embedding value of the review text (2) Feature extraction layer: based on the feature extraction layer of the convolutional neural network, we use parallel CNN to extract the deep features of users and doctors. This layer includes the convolutional layer, pooling layer, and fully connected layer (3) Rating prediction layer: we interact the extracted features of the user and the doctor through the factorization machine to obtain the predicted value, and we use the loss function to calculate the error between the predicted value and the true value 3.2. Preprocessing of the Input Layer. We perform sentiment analysis and processing on the review text to obtain the sentiment polarity value. We use the TextBlob sentiment analysis tool, which is a tool used to perform natural language processing tasks. The range of its emotional polarity is [-1,1], where -1 denotes negative, and 1 denotes positive. We use a formula to combine the emotional polarity value obtained from the review text with the original rating [1][2][3][4][5].
The specific combined Equation (1) is as follows [30]: where R ij represents the original rating of doctor j by user ⅈ, Text for 3 stars I've come here very often and in the past it was great and I loved it but it seems like things have changed a lot. The staff in the front were very nice but the nurses and doctors were rude and impatient. The doctor did not listen to what I was saying and had to repeat myself often. He seemed angry at me the whole time and made me uncomfortable. If it wasn't for good past experiences I would've given 2 stars.
3 Wireless Communications and Mobile Computing and the value is [1][2][3][4][5]. S ij represents the value of the sentiment polarity of the review text calculated by using TextBlob. RS ij represents the final rating after combining the sentiment analysis value of the review text with the original rating. ρ is a parameter used to adjust the weight between the original rating and the sentiment polarity value.
We load the pretrained BERT model into RR&R-CNN as a part of the model to update the vector representation of the review text. For user reviews, we convert the user review text {UserT 1 , UserT 2 ⋯ UserT d } (d as the maximum number of reviews) into a vector {UserO 1 , UserO 2 ⋯ UserO d } after being mapped by BERT. The number of embedding dimensions is 768. For doctor reviews, we use the same method. An important reason for using BERT is that it can solve the polysemy problem that cannot be solved by word2vec.
Finally, we use the direct splicing method, that is, splicing the embedded vector corresponding to a sentence with the corresponding revised rating. Compared with the original input dimension, one-dimensional data are added, and the width of the convolution kernel is increased by a factor of 1. We splice the vectorized review texts with the revised ratings as the input of the convolutional neural network.

Feature Extraction
Layer. The feature extraction layer mainly uses a convolutional neural network to extract deep features, including a convolutional layer, pooling layer, and fully connected layer.

Convolutional Layer.
After splicing the sentence vector obtained by BERT with the revised rating feature, the convolution kernel is used to convolve it. The convolutional layer is composed of convolution kernels of different sizes. Because the weights of the convolution kernels are shared, a convolution kernel can extract only a certain type of feature, so we need to use different sizes of convolution kernels, which represent different feature extractors. After calculating a convolution kernel, a column vector is obtained, which represents the features extracted by the convolution kernel from the sentence. Each convolution kernel performs a convolution operation to obtain a feature map. The use of the convolution   Wireless Communications and Mobile Computing operation and the activation function to obtain the corresponding feature map is expressed as Equation (2): where c i represents the operation result of the convolution kernel. w is the weight of the convolution kernel. b is the bias term of the convolution kernel. UO i:i+t−1 represents the vector matrix. * is the convolution operation. f ðxÞ is the activation function. In this paper, we use the rule activation function. The representation of all feature maps C (3) obtained after the convolution operation is as follows: 3.3.2. Pooling Layer. Since we use convolution kernels with different heights in the convolution layer, the vector dimensions obtained after convolution will be different. The pooling layer follows the convolution layer, which aims to reduce the number of features, and it decreases the amount of calculation required by reducing dimensions. In this paper, we use the max-pooling layer. The principle of the max-pooling layer is to select the largest value in the feature map. This maximum value represents the most important feature that we capture and the final feature that we want to select. The new feature map discards other unimportant features. Equation (4) is as follows: After the pooling operation, we obtain a fixed-size feature representation.

Fully Connected Layer.
We input the results of the pooling layer into the fully connected layer, multiply it by the weight matrix of the fully connected layer, add a bias, and adopt a random dropout strategy, that is, randomly deleting the neurons in the neural network. In each iteration, we only use part of the neuron training model to obtain the values of w and b and lose approximately half of the neurons each time, which is equivalent to training on different neural networks. Thus, the dependence of neurons is reduced to solve the problem of overfitting.

Prediction Layer.
Although the above outputs can be regarded as the features of users and items, these two outputs may not be directly compared in different feature spaces. Therefore, a prediction layer is set at the end of the model to couple U u and N i , and the features of user u and item ⅈ are spliced into a single vector, z = ðu, iÞ. We introduce the factorization machine [31] to perform feature interaction, and the user's prediction rating for doctors is obtained after multiple trainings. Given N training samples, Equation (5) is used for training: We use the loss function of the tf.nn.L2_loss model to train the target. This function uses the L2 norm to calculate the error value of the tensor. Equation (6) is as follows: where input y represents the true value, and prediction represents the predicted value.

Experimental Data Set and Evaluation
Metrics. This experiment uses a Yelp dataset (https://www.yelp.com/ dataset), which has a total of 66,859 million data points. We randomly select 1,000,000 pieces of data, and then, we find, screen, and analyze the "Health & Medical" data according to the category from these 1,000,000 pieces of data. Finally, the dataset used in this experiment is selected from the medical-related classification dataset, which contains 69,813 reviews made by 57,953 users about 4,658 doctors. The density of the experimental dataset is 0.026%. In the experiment, 80% of the dataset is randomly selected as the training set, 10% is regarded as the validation set, and the other 10% is treated as the test set. The experiment was repeated three times to obtain the average value. The root mean square error (RMSE) is used as the standard to evaluate the performance of the algorithm, as it evaluates model accuracy by calculating the difference between the predicted rating and true rating, and it is one of the most commonly used evaluation indicators in recommendation systems. Generally, the smaller the value of the RMSE is, the better the recommended effect. Equation (7) is as follows: In the above formula,r n represents the nth predicted rating, r n represents the nth true rating, and N represents the number of samples.

Comparison Method.
We select two traditional methods-the latent factor model (LFM) [32] and latent Dirichlet allocation (LDA) [21]-and three deep learning methods-deep cooperative neural networks (DeepCoNN) [11], convolutional matrix factorization (ConvMF) [10], and variants of DeepCoNN (DeepCoNN+rating). First, we select the LFM model that only contains rating features but not the review text information as comparative experiments. The LFM uses matrix factorization to predict the rating of unknown commodities. Second, we select the LDA model that only contains review text information as comparative experiments. LDA is used to learn the document-topic distribution matrix from the review text as the potential feature of the item, and the optimal solution is found through gradient descent for recommendation. Third, we select the DeepCoNN model, a deep learning model, for comparative experiments. Deep-CoNN convolutes the user review text and the commodity review text to predict the rating. Fourth, we select a deep learning model, ConvMF, involving ratings and reviews. This 5 Wireless Communications and Mobile Computing model combines CNN and probabilistic matrix factorization (PMF) to predict rating. Fifth, we add a revised rating feature to the DeepCoNN to obtain the "DeepCoNN + rating" algorithm, which combines the original rating with the sentiment analysis rating of the review, and we use this algorithm to perform comparative experiments. Finally, we improve the fifth method by using the pretrained BERT model instead of word2vec static encoding for the review text, which can effectively obtain semantic information, and it also revises the original rating by performing sentiment analysis on the review text. This method can effectively improve recommendation performance. The difference between RR&R-CNN and Deep-CoNN is that the former uses BERT to vectorize the review text, while the latter uses word2vec; moreover, the former adds the revised rating features that are spliced with the embedded values of the review text to the input layer and inputs them into the convolutional network, whereas the latter only uses review data in the input layer.

Experimental Environment and Parameter Settings.
The experimental environment configuration of this paper is as follows. The system version is Ubuntu 18.04, the GPU model is RTX2080Ti/11 GB, and the CUDA version is 10.2. In addition, the deep learning framework is TensorFlow 1.14. The specific experimental parameter configuration is as follows.

Embedded Representation. DeepCoNN uses word2vec
pretrained vectors-Google News [33]-to embed the review text. The word vector is approximately 100 billion words trained from Google News, containing 3 million words and phrases, and the dimension of the vector is 300. The continuous word bag structure model is used for training, and the vectors are not updated during the training. If there are words in the sentence that are not in the pretrained dictionary, then, they will be used instead of random numbers. ConvMF uses the word embedding word2vec. RR&R-CNN uses the pretrained BERT model-"BERT-Base, cased_L-12_H-768_A-12." This model uses a 12-layer transformer, the parameter of multihead attention is 12, the total parameter size of the model is 110 MB, the maximum length of the selected review text is 256, and the embedding dimension of the word vector is 768. The optimizer uses Adam, and the initial learning rate is lⅇ − 5.

Parameter
Setting of Networks. The RR&R-CNN convolution kernel size is t * v, where t represents the number of words contained in the vertical direction of the convolution kernel, the values are 3, 4, and 5, and each size has 100 convolution kernels. v is the dimension of the embedding word vector. To prevent the model from overfitting, we add a dropout operation, in which the dropout rate is 0.2, to the network structure of the model. The number of latent factors for users and doctors in the factorization machine is 64 from {16, 32, 64, 128}. The L2 regularization parameter in the loss function is 0.6. The values of the parameters in DeepCoNN and ConvMF are consistent with those in the original paper.

Traditional Model Parameters.
In the comparative experiment of the LDA model, which only uses the review text data, theme dimension k is set to 10, superparameter α is 0.2, and β is 0.1. In the comparative experiment of the LFM that only uses rating data, the number of hidden factors is set to 20, the regularization coefficient is 0.01, and the learning rate is 0.1.  Table 2 shows a comparison of the experimental results.

Experimental
The experimental data in Table 2 are repeated three times, and the average value is taken. At the same time, the best results are expressed in italic. The experimental results show that in the traditional method, the effect of the LDA model, which only considers the review text, is worse than that of the LFM, which only uses the rating. This shows that only considering the review text does not guarantee that the learned features will be helpful for rating prediction, which means it cannot be completely independent of rating features. Moreover, the performance of deep learning models is better than that of traditional methods, which demonstrates the powerful advantages of deep learning. DeepCoNN model considers the review text of the user and doctor. The ConvMF model uses only the review text of the doctor but adds the rating information of the user, and its effect is better than that of DeepCoNN, which indicates that combination of rating features and comment text is indeed better than using the comment text alone. RR&R-CNN based on the deep learning model, combined with the review text and the revised rating features, reduces prediction error by 4.2% compared with DeepCoNN, which proves that the performance of our recommendation model is better.
To further prove the effectiveness of each module in RR&R-CNN, we show the ablation study-DeepCoNN+rating and DeepCoNN-BERT. In Table 3, DeepCoNN is the baseline, DeepCoNN+rating adds the revised rating feature on the basis of the baseline, and DeepCoNN-BERT replaces word2vec with BERT based on the baseline. The experimental results show that the effect of adding the revised rating feature and using BERT to embed the review text simultaneously is better than that of using either method alone. It demonstrates the combination of the two factors is effectiveness. Figure 2, we set the value of ρ from 0 to 1, with a step size of 0.1. When ρ is 0, it means that only the emotional polarity value of a user's review text is considered; when ρ = 1, it means that only 6 Wireless Communications and Mobile Computing a user's original rating is considered. It can be seen from the data displayed in the figure that when ρ is 0.8, the RMSE value is the smallest. When ρ is greater than 0, the corresponding RMSE value decreases to various degrees. This confirms our hypothesis that the use of a combination of the original rating and the sentiment analysis value of the review text, instead of the original rating data, can effectively alleviate the problem in which users sometimes have a low evaluation of doctors but give them high ratings. Figure 3, when the value of ρ is 0.8, the number of selected convolution kernels ranges from 10 to 400. When the number of convolution kernels is 100, the RMSE value is the lowest, and when the number of convolution kernels is greater than 100, the model shows an upward trend; that is, the per-formance becomes worse, so the number of convolution kernels is set to 100. Figure 4, we perform various experiments with different iterations. When the epoch is 6, the value of RMSE is the lowest, and both rmse_train and rmse_valid continuously decrease. When the number of epochs is greater than 6, rmse_train decreases continuously, while rmse_valid increases instead of decrease, indicating that the model only fits rmse_train very well, so the epoch value is 6.

The Number of Iterations. As shown in
4.4.5. The Value of FM_k. As shown in Figure 5, the experiment is carried out for different K values {16, 32, 64, 128}. When the fixed epoch is selected as 6, the k value is 32, and the RMSE value is the lowest.

Conclusions
To solve the problem of how to better recommend doctors according to the personalized demands of users, we propose RR&R-CNN. (1) In contrast to the traditional distributed text representation, RR&R-CNN uses the pretrained BERT model to obtain the vectorized representation of the review text, which solves the problem of polysemy. (2) Regarding the rating features, considering the reasons that, sometimes, a user's evaluation does not match his/her rating, we use the revised user and doctor ratings as the final representation. The revised rating and the representation of the review text are spliced together as the input of the convolutional neural network for deep feature extraction. (3) As the model is divided into a user network and a doctor network, the factorization machine is used to perform the feature interaction between the two networks, and the loss function is used to reduce the calculation error in rating prediction.
The experimental results show that the recommendation effect is improved after adding the revised rating features and that the BERT vectorization of the review text is better than word2vec vectorization. However, a disadvantage of the BERT model is that the time and memory consumed by it are very large. At present, there are some methods for compressing and reducing the complexity of the BERT model. In future work, we will verify this. If it indeed reduces the time and memory consumed by the experiment and improves the recommendation performance, then, we will improve it within the existing method. In addition, BERT, which involves knowledge in the medical field, such as Clin-icalBERT [34] and BioBERT [35], will also serve as our follow-up research work.

Conflicts of Interest
The authors declare that they have no conflicts of interest.   Wireless Communications and Mobile Computing