Next-Day Medical Activities Recommendation Model with Double Attention Mechanism Using Generative Adversarial Network

Medical activities recommendation is a key aspect of an intelligent healthcare system, which can assist doctors with little clinical experience in clinical decision making. Medical activities recommendation can be seen as a kind of temporal set prediction. Previous studies about them are based on Recurrent Neural Network (RNN), which does not incorporate personalized medical history or differentiate between the impact of medical activities. To address the above-given issues, this paper proposes a Next-Day Medical Activities Recommendation (NDMARec) model. Specifically, our model firstly proposes an inpatient day embedding method based on soft-attention which balances the impact of different medical activities to get a joint representation of medical activities that occurred within the same day. Then, a fusion module is designed to combine features of inpatient day and medical history to achieve personalization. These features are learned by the self-attention mechanism that solves the long-term dependency problem of RNNs. Last, adversarial training is introduced to improve the generalization ability of our model. Extensive experiments on a real dataset from a hospital are conducted to show that NDMARec outperformed both classical and state-of-the-art methods.


Introduction
As populations grow and societies develop, the demand for high-quality healthcare services continues to rise, while regional and national diferences in healthcare quality continue to become greater. Young physicians who lack clinical experience have difculty making efective clinical decisions when faced with unfamiliar conditions. In addition, public health emergencies, such as the COVID-19 outbreak, have had a dramatic impact on the healthcare system. In recent years, artifcial intelligence (AI) has shown great potential for development. Terefore, if AI technology can be efectively used to achieve accurate recommendations for clinical medical activities, the overall quality of healthcare services will be greatly improved. Te motivation for this work is to train recommendation models to assist young physicians who lack clinical experience with related diseases to make efective clinical decisions in the face of unfamiliar diseases. It helps inexperienced physicians to plan followup treatment by recommending next-day medical activities.
Previous research on Electronic Medical Record (EMR) mostly focused on risk prediction [1] and readmission prediction [2]. In recent years, related researches are more focused on the prediction of the next clinical event [3]. Te difculty they encounter is how to weigh the accuracy of the results against the intelligibility. Neural network models usually have higher accuracy than simple statistical models but are not as intelligibile. Next-day medical activities recommendation does not simply predict a clinical event but recommend a set containing multiple medical activities, which can be seen as a special next-basket recommendation [4] or temporal set prediction [5]. Te difculty of the temporal set prediction problem is how to efciently represent a set and capture the temporal relationships between diferent sets. Next-day medical activities recommendation models likewise need to have good comprehensibility, which is the key to medical application models. And, the difculty of the temporal set prediction problem is also present in our study. Tus, It is still a challenging task in the medical domain.
Firstly, it is difcult to achieve an efective representation of the medical activity set. What it intends to denote is related to the medical activities that occurred on that day, but the importance of the medical activities is diferent. For example, a patient takes vitamin B12 and capecitabine (a chemotherapy drug) on a given day. Obviously, the latter is more important. Hence, it is a challenge to highlight information on important medical activities without losing information on ancillary medical activities when generating inpatient day embedding. Secondly, the relationship between inpatient days is difcult to describe, which includes both chronological and causal relationships. For example, a patient is allergic to a particular medication that is frst tried, so a new medication is switched to the next-day's medical activity. Tirdly, when doctors formulate medical activities for inpatients, they are infuenced by the earlier medical history in addition to the current condition. All the independent recommendation model learns about is the treatment pattern between inpatient days, but the individual features of inpatients are not emphasized. Inspired by [6], a generative adversarial network is used to solve this problem. Te goal is to train the recommendation network to learn efective features which can recommend medical activities which better match the individual characteristics of inpatients. Te discriminator specifcally distinguishes between machine-recommended medical activities and real next-day medical activities and is used as an adversary for the recommendation network. If the discriminator is able to distinguish between the recommended medical activities and the real medical activities, the recommendation network is penalized.
Here, the recommended medical activities we considered should be more in line with the real distribution of next-day medical activities. Firstly, the recommended set of medical activities should include the actual set of next-day medical activities as much as possible. Secondly, the elements in the intersection of the two sets should be ranked as high as possible in the recommended set of medical activities, i.e., the corresponding recommendation score should be as high as possible. Finally, the recommended medical activities are personalized as much as possible, rather than generic medical activities such as general food and nursing care.
To solve the above-given issues, we propose a new attention-based neural network for next-day medical activities recommendation, which consists of four components: inpatient day embedding, multihead self-attention mechanism, information fusion, and generative adversarial networks. A soft-attention mechanism is used to generate inpatient day embedding to balance the impact of diferent medical activities. Potential relationships within inpatient day sequences and word features of medical history are extracted by multihead self-attention to alleviate long-term dependency and enable parallel computing. Medical activity and historical information are fused, which allows personalized information about the inpatient to be added when recommending the next-day medical activities. A generative adversarial network is used in the model training stage to improve the generalization ability of the recommendation network.
Te contributions of this paper can be summarized as follows: (i) A novel recommendation model for next-day medical activities is proposed, which considers the personalized impact of medical history and the importance of diferent medical activities and formalize them as attention factors. (ii) Generative Adversarial Network is used to improve the quality of recommendations, which enforces the network to learn features which can be used to recommend activities with a distribution which resembles activities that occurred really. (iii) Extensive experiments on a real dataset from a topnotch hospital validate the superiority and interpretability of NDMARec.
Te rest of this paper is organized as follows: Section 2 reviews previous work related to the problem studied in this paper. Section 3 presents the formalization of the problem. Section 4 describes in detail each component proposed by our framework and model. Section 5 evaluates the proposed approach through experiments. Finally, Section 6 concludes the whole paper.

Next-Basket Recommendation.
Next-basket recommendation has been studied in diferent domains. In the feld of e-commerce, the next-basket recommendation is an important part of many e-commerce websites [7] proposes a traditional model that mixes Markov chains and factorization [4] proposes a dynamic recurrent basket model based on RNN and uses max pooling to get the set embedding [8] proposes an encoder-decoder framework that uses average pooling to get set embedding and uses attention mechanism to apply information from diferent input sets to diferent output sets. Te maximum pooling method causes information about auxiliary medical activities to be discarded. Information about important medical activities is not emphasized in the average pooling approach. Based on the limitations of both of them, we propose a set embedding representation based on a soft-attention mechanism to avoid the above-given information loss problem [5] obtains set embedding by constructing a heterogeneous graph that considers semantic relationships between sets, items, users, and categories. DNA sequences in the biological feld are a kind of sequence data [9,10] carry out efective research in DNA sequence patterns. However, in the medical feld, similar problems have been rarely studied [3] handle highdimensional input vectors by linear dimensionality reduction and predicts the diagnosis of the next admission based on RNN [11] predicts the next clinical event based on LSTM and attention mechanisms. Both deal only with structured information, i.e., medical activities or diagnostic results and do not make use of unstructured information, i.e., medical history texts. It is undeniable that medical history information has a profound impact on the design of subsequent healthcare strategies. In addition, RNN-based models process sequential data in a serial manner. Tis inevitably results in slow computation speed. Te serialized computation only utilizes the temporal information of the sequence of hospitalization days and ignores the interactions among hospitalization days.

Attention Mechanism.
Te attention mechanism is a feature extraction method. Before it was proposed there were also other feature extraction methods such as Hidden Markov Models [12] and Genetic Algorithms [13] applied in the feld of gesture recognition. From a conceptual perspective, the attention mechanism can selectively flter out a small but important amount of information from the vast information but the fltered ones are important and focus attention on them while ignoring the unimportant information. Te implementation is done by training to assign diferent weight coefcients to each message, and then weighting and summing them to obtain the overall representation. Te attention mechanism is an essential concept in neural networks. With the introduction of Transformer [14], the self-attention mechanism is widely used. and even outperformed CNN and RNN on many vision related tasks [15] and language related tasks [16]. Transformer-based variants also achieve great success in the feld of text processing [17,18]. Self-attention mechanism also achieves advanced performance in recommendation systems [19,20]. Compared with RNN and LSTM that specialize in sequence data, self-attention is more likely to capture long-term dependency in sequences and facilitate parallel computation.

Generative Adversarial Network.
A generative adversarial network is a machine learning framework consisting of two neural networks [21]. It is implemented as two networks competing against each other, one of which is a generator network that captures features of real sample data to generate fake data. Te other is the discriminator network, which observes both real and fake data to discriminate the authenticity of the input. Te application of generative adversarial networks showed great success in image generation research [22]. Although there are applications in the feld of recommendation systems [23,24], the application of adversarial learning in temporal set recommendation is an unexplored task. Inspired by the application of generative adversarial networks in image multi-tag recommendation [6], we introduce generative adversarial networks into our study.

Data Description and Preprocessing.
Te EMR dataset used in this study comes from the electronic medical record database of a top-notch hospital. As shown in Figure 1, each medical record contains the following information: (i) Medical history. Te medical history is textual data which includes the patient's past history, current medical history, and allergy history (ii) Medical activity. Tis information includes daily medical activities such as medications, biochemical tests, etc In our collected dataset, according to the hospital's EMRs system, we regarded a patient's admission process as an EMR record. Table 1 shows the demographics of patients in the dataset. From the table, frstly we can discover that all breast cancer patients are female patients. Secondly, patients are mainly centered in the aged of 40-70 years, and most patients are from rural areas. Finally, the vast majority of patients are negative for HCV, and many patients did not have previous CT or MRI examinations.
In order to integrate and utilize the above-given data, we conduct the following preprocessing steps. Firstly, inspired by [25], past history, current medical history, and allergy history are combined and meaningless characters are removed. Secondly, removing duplicate activities and unify English and Chinese names for medical activities. Tirdly, records with too few or too many days of inpatient stay are removed. Finally, the records are truncated with 70% to 80% of inpatient days and medical activities that occurred on the last day after interception are regarded as real next-day medical activities since the purpose of our study is not to recommend medical activities for the last day of hospitalization. Table 2 shows more detailed statistics of our data.

Problem Formalization.
Formally, we defne all the unique inpatients and medical activities in the entire dataset as U � u 1 , u 2 , . . . u N and V � v 1 , v 2 , . . . v M with size N and M, respectively. And, let T � t 1 , t 2 , . . . t N be the set of medical histories, with each t i corresponding to u i . W � w 1 , w 2 , . . . w J is a lexicon of words that appeared in medical histories, and each t i is an indefnitely long sequence Medical Record I) Past history: diabetes mellitus, heart disease III) Allergic history: pollen allergies, hives II) Current medical history: She underwent a modifed radical surgery for " lef breast cancer" due to a "lef breast mass".
f is a mapping from the set of medical activities that occurred in a day to a vector of fxed size d e, where V⊆V is a set containing the medical activities which occurred on that day. Te goal of our study can be formalized as follows: where W are learned parameters, V i,k+1 is the set of next-day medical activities recommended by the model. Tis problem can be formalized as a supervised learning task, which frst trains a recommendation model based on a large number of historical records (including medical history and medical activities since admission) and then recommends the sets of medical activities for the next-day based on the trained model. If the input is only medical activities, then all the model learns is the treatment pattern without personalized information. For this, we conduct joint learning of medical history to address this aspect of personalization. Te key idea is to learn to focus on personalized information, which is patient-specifc. Te joint optimization of the proposed recommendation network simultaneously allows to learn aspects of personalization.

Proposed Method
Te framework of the proposed NDMARec is shown in Figure 2, which consists of four parts: inpatient day embedding, encoder based on multihead self-attention, generative adversarial network, and fusion module.
Te frst inpatient day embedding is motivated by the medical commonsense that the medical activities that occur during hospitalization difer in importance from one another. Terefore, the concept of inpatient day embedding is proposed, which is obtained by dynamically weighting the included medical activities. Specifcally, by training the model to assign diferent weight coefcients to diferent medical activities for enhancing the information of important medical activities while suppressing but not discarding the information of ancillary medical activities. Tis approach we propose efectively deals with the information loss problem [4,8] caused. Te second component inspired by [14] learns potential information between inpatient days and medical history through multihead selfattention, and this component takes inpatient day embedding and word embedding as input. It efectively deals with the limitations caused by RNN-based models [4,5,8]. Te self-attention module introduces position encoding into the inpatient day vector as temporal information. Te self-attention mechanism captures the interactions between inpatient days. Tis module implements parallel processing by matrix multiplication to speed up the computation. Te detailed rationale is in Section 4.2.2. Te third part consists of a generator and a discriminator, which generate medical activities based on hidden features and discriminate true and false, respectively. Te fourth component is the fusion of potential features learned by the second component to achieve personalization.

Inpatient Day Embedding.
A key issue of next-day medical activities recommendation is how to represent medical activities occurred on a day as a vector. Tat is, the vector length should not be too long and the distribution should not be too sparse. Try to represent as much information as possible with a dense short vector, while paying attention to the importance of diferent medical activities within the same day, i.e., the infuence on subsequent medical activities. Most existing studies use a M-dimensional one-hot vector s � [s 1 , s 2 , . . . , s M ] to solve this, where s j � 1, if v j ∈ V; s j � 0, otherwise. Tis type of representation has two drawbacks: (1) the dimension of s is too large while active elements are sparse; (2) the importance of diferent medical activities is not considered.
To overcome above issues, an embedding method based on soft-attention is designed. Association rule analysis shows that certain medical activities always occur within the same day due to the synergy between them [26]. Based on this, we propose to use word embedding that learns a dense vector with small dimension for each medical activity. Formally, let W e ∈ R d e×M denote the embedding matrix for medical activitiy, where d e is the embedding dimension. Each medical activity is encoded as a one-hot column vector v ∈ R M , where v i -th value is 1 and other values are zeros. Ten, e � W e v denotes the embedding vector for medical activity v. It is well known that embedding can encode objects with low-dimensional vectors and still preserve their meaning [27]. Considering the diferent importance of  medical activities occurred on a day, they should be treated diferently to enhance the information of important medical activities and avoid losing the information of auxiliary medical activities. Terefore, the proposed embedding method is formalized as follows: where trainable parameters q ∈ R d e , b ∈ R d e and W v2 d ∈ R d e×d e control weights, α i is the attention weight corresponding to the medical activity v i .

Encoder
Based-on Multihead Self-Attention. RNN is the most commonly used model for temporal data mining, which usually processes the sequence data in order. Te existing solutions to problems similar to our study are mostly based on RNN. or its variants such as LSTM and GRU. But none of them completely solve the issues of longterm dependency. Tis also means that the RNN-based network structure may not take full advantage of some of the medical activities at the time of the patient's initial admission thus reducing the model performance. As a result, in this study, multihead self-attention is applied to inpatient day sequences and medical histories because it focuses on each inpatient day or word simultaneously, which not only improves the parallelism but also alleviates long-term dependency. Multiple attention heads enable the network to capture richer information. For the inpatient day and the medical history, using two neural networks with the same structure to process them separately. Te diference between the two is only in the way the fnal information is aggregated.
As shown in Figure 1, one uses attention-based aggregation while the other uses average aggregation. Next, the structure of the encoder based on multihead self-attention will be described in detail.

Position Encoding.
To exploit the temporality between inpatient days and the sequentiality of words in the medical history, position encoding is set for inpatient day embedding and word embedding to represent relative positions, respectively.  (4), the inpatient day embedding and word embedding are summed with their position encoding respectively as the input to the downstream task.   Figure 3 shows the detailed structure of the self-attention mechanism. Te left and right of Figure 3 indicate the input vector processed by Equation (4) and the output vector processed by the self-attention mechanism, respectively. Te inpatient day embedding and the word embedding after adding the position encoding are translated as q, k, and v through the three mapping matrices (W Q , W K , and W V ) of their modules, respectively.

Multihead Self-Attention and Residual Connection.
where W Q , W K and W V ∈ R d e×d k denote the mapping matrices applied to the inpatient day embedding or the word embedding, respectively. b Q , b K and b V ∈ R d k denote the bias, respectively. Each q is multiplied by a matrix K concatenated by the transpose of all k of the whole sequence and then processed by softmax to obtain its self-attention weights for the elements at other positions of this sequence. Tis is then multiplied by all v and summed to obtain the output at the current position.
Compared with RNN, the self-attention mechanism focuses on the whole sequence simultaneously, which ensures parallel computation to reduce the time complexity of the algorithm. We compute the output of all elements of the entire sequence simultaneously by concatenating the query vectors as a matrix Q to achieve parallel computation: where is the scaling factor used to alleviate the gradient vanishing. Multiple attention heads are used to capture richer information and features and the learned features are combined as output: where H i � Attention(Q, K, V) and W o ∈ R h·d k×d e , h denotes the number of self-attention heads, Concat (·) is a vector concatenation operation.
As shown in Figure 2, referencing most network structures, the residual connection [28] and layer normalization [29] are also applied to our model. Te purpose of layer normalization is to normalize the state of the hidden layer in the neural network to a standard normal distribution to accelerate convergence. Te residual connection is essentially an additive node. It is responsible for passing the upper gradient to the lower level in back propagation to preserve the original state of the gradient, which reduces the risk of gradient disappearance and gradient explosion in the network and makes the whole network more active in the learning state.
where W F1 ∈ R d e×d f and W F2 ∈ R d f×d e are trainable parameters, ReLU (·) is the activation function. h is the hidden vector.

Attention-Based Aggregation of Inpatient Day
Information. Tere is time-dependency between next-day medical activities and previously occurring medical activities. Specifcally, we consider that diferent sequences of inpatient days may have diferent efects on diferent nextday medical activities. For example, if an inpatient sufers adverse reaction to an injectable drug, the patient's subsequent medical activities will be altered as a result. In addition to the need to change the medication, measures need to be taken to mitigate the adverse reaction to ensure smooth followup treatment. As mentioned previously, the impact of medical activities that occurred previously is diferent. In addition, the medical activities that occurred on the last day are more infuential. Hence, designing an attention module to focus on previous inpatient days and leverage their different impact.
where e iK � MLP (Concat(h i , h K )). MLP is a multilayer perceptron and h d k and h i are the hidden vectors calculated from Equation (9). K is the number of days since admission.  Journal of Healthcare Engineering

Medical History Information
Aggregation. Personalization is an important feature of recommendation systems [30]. For this study, the emphasis of personalization is that the model recommends next-day activities that are specifc to the patient's condition, rather than simply learning a generic treatment model. Hence, the model incorporates medical history information in its recommendations to improve performance. We adopt an averaging pooling strategy to aggregate the hidden states of words in the medical history learned by the multi-head self-attention in Section 4.2.2.
where J indicates the number of self-attended heads in Section 4.2.2.

Generating Adversarial Network.
In this study, we propose to explore adversarial learning for next-day medical activity recommendations. Te idea is to use the additional adversarial loss for medical activity prediction, thus ensuring that the recommended medical activities have a distribution similar to the ground truth. Specifcally, if the recommended results are not similar to the actual distribution of the ground truth, the adversarial loss is introduced to penalize the recommendation network. Te loss of generator G and discriminator D are incorporated into our model separately, where the medical activities from the generator network are considered as generated labels. And, there is a discriminator to distinguish the generated labels from the ground truth labels, that is, the real next-day medical activities. Te purpose of G is to fool D by generating medical activities that resemble real activities occurred on next-day, which is achieved by minimizing the following loss function: Where L G denotes the generator loss and z d i is the output of Section 4.2.3, which is the inpatient day information used for next-day medical activity recommendations. Te G predicts generalized medical activity V G . It consists of two linear layers, followed by a fully connected layer for prediction. And, there is also a cross-entropy loss function: where p i indicates whether a medical activity occurred on next-day and p i is a medical activity predicted by G.
Te task of D is to distinguish whether the input is generated or ground truth. If the D can distinguish between ground truth and recommendation results, the recommendation network is penalized. It is trained by the following equation: (15) where V Ti is a set of medical activities occurred on the next day. Te D consists of two linear layers with ReLU and uses the sigmoid activation function at the end. As previously illustrated, our study extracted information about the patient's medical history in addition to considering medical activity in the recommendation process. Tis is work that has not been done by most studies about temporal set prediction and next-basket recommendation [31] that focus more on sequential patterns. Inspired by [32], this study integrates the personalized information of medical history with the sequential pattern of medical activities to generate the fnal recommendation results. Notably, our study employs an additive fusion strategy controlled by hyperparameters, which eliminates a large amount of multiplicative computations in fusing features to speed up the computation and achieve competitive results compared to the concatenation method. Terefore, the fnal recommended results for next-day medical activities are represented as follows:

Information
where W d and W txt ∈ R d e×M are learnable parameters, 1 ∈ R m×1 is an all-one vector, ⊙ denotes elementwise Hadamard product, λ is a hyperparameter, β ∈ R M×1 is a vector composed of 0 or 1 and 1 means the corresponding dimension of l p is nonzero, respectively. α is a weight factor used to balance the information from long-term orders and learned. Te α is calculated as follows:

Loss Function.
We build an end-to-end model to jointly train the above parts and choose Mean Square Error (MSE) as a loss function. Te objective function to be minimized is defned as follows: Journal of Healthcare Engineering where y and y denote the ground truth and recommended next-day medical activities, respectively. c is a hyperparameter.

Experiments
In this section, we conduct extensive experiments to validate the efectiveness of the proposed method. We frst the evaluation metrics and the comparison baseline. Ten, the performance comparison of our method with classical and state-of-the-art methods is given. Finally, the efectiveness of each module of our method is verifed by the ablation study, and the interpretability of the model is discussed by visualizing the attention coefcient during inpatient day embedding generation and the attention coefcient during next-day medical activity recommendation.

Experimental Settings.
We omit the dataset description since it has been introduced in Section 3.1. Other experimental settings will be described in the following parts.

Evaluation Metrics.
Tere is a ranking list of top-K items generated from the output and the K is set to 10, 15, 20, and 25, respectively. We use Recall and NDCG to evaluate our method. Next-day medical activities recommendation can be regarded as a special kind of multilabel classifcation problem.
(i) Recall is a widely used measure for multi-label classifcation [33]. For each patient, recall is calculated as follows: (ii) NDCG is a measure that considers the ranking order of recommendation results [34]. For each patient, NDCG is calculated as follows: We adopt the average recall and NDCG of all inpatients as metrics.

Compared Methods.
We compare our method with the following baselines, including both classical and the state-of-the-art methods: (i) PersonalTOP: it counts the medical activities that have occurred since admission for diferent inpatients and then makes recommendations. (ii) ItemTransfer: it frst constructs a transmission relationship (represented by an adjacency matrix) between medical activities between diferent hospitalization days since the admission of a given patient and then recommends medical activities for the next day in conjunction with the medical activities of the last day. (iii) DREAM [4]: it is an earlier method of using deep neural networks for next-basket recommendations. DREAM uses max pooling to generate basket's embedding and uses RNN to generate recommendation results. (iv) Sets2sets [8]: it uses average pooling to generate set embedding and designs a GRU-based encoder-decoder framework for multi-period prediction. (v) DHNTSP [5]: it is the state-of-the-art method in temporal set prediction based on LSTM, which designs a set representation method based on a heterogeneous information network.

Confguration of Our Method.
We divide our dataset into train, validation, and test set across inpatients with ratios of 70%, 10%, and 20% to do experiments. PyTorch is used to build our model and Adam [35] is adopted as the optimizer. Te stacked layers of self-attention is applied to the inpatient day sequence and medical history of 1 and 2, respectively. Te dimension of the embedding, d e, is set to 32. Te hyperparameters λ and c are both set to 0.5.

Performance Analysis.
To demonstrate the efectiveness of our next-day medical activities recommendation model, we compared NDMARec with all comparison methods. Te results are shown in Table 3 and Figure 4. And, the proposed NDMARec model achieved better performance in most cases. In addition, there are some interesting fndings in these comparison experiments. Firstly, PersonalTOP achieves better performance. Tis is because many medical activities are long-term medical orders in our dataset, which means their frequency will be high. So even though PersonalTOP does not consider the time dependency, it achieves comparable performance.
Secondly, ItemTransfer achieves better performance than PersonalTOP because it considers the transfer relationships of medical activities between adjacent inpatient days. Tis shows that capturing the transfer relationships between medical activities can improve performance.
Tirdly, although DREAM and Sets2sets use neural networks to focus on inpatient days, they do not achieve better performance. Tis is because they do not consider the importance of medical activities when generating inpatient day embedding. Te max pooling used by DREAM results in the loss of information about ancillary medical activities, while the average pooling used by Sets2sets lead to information about important medical activities not being highlighted. Sets2sets also introduces an attention mechanism compared to DREAM but does not achieve a signifcant performance improvement. Tis is because the objective function of Sets2sets is set to emphasize medical activities which occur less frequently, which may suppress the prediction efect of medical activities that belong to long-term medical orders. 8 Journal of Healthcare Engineering Finally, in most cases, NDMARec outperforms other methods. Te previous models did not have a medical history feature extraction module and information fusion module. Tey only use structured data, i.e., daily medical activity since admission, as input to recommend next-day medical activity. NDMARec, however, uses not only medical activities as input, but also unstructured data, i.e., medical history text, as input. Te two features are fused to recommend next-day medical activities. In addition to the import of medical history information, the next section Ablation Study also illustrates the advanced nature of each module of NDMARec. Compared to PersonalTOP and ItemTransfer, NDMARec captures the dynamic temporal dependency of medical activities between inpatient days. Te attention mechanism is also used to diferentiate the impact of inpatient days. Compared to DREAM, Sets2sets, and DHNTSP, NDMARec alleviates long-term dependency by multihead self-attention that simultaneously focus on each day's medical activities. In addition, an embedding method based on soft-attention is adopted to balance the importance of diferent medical activities when designing inpatient day embedding and the use of generative adversarial networks allows our network to learn features that are more conducive to recommending accurate medical activities, both of which result in better performance of our method.

Ablation Study.
To verify the efect of the components of our model, we design the following simplifed variant of our model: (i) NDMARec-MP: it takes the max value of each dimension of the medical activity embedding in the inpatient day embedding component imitating DREAM, which loses a lot of information (ii) NDMARec-AP: it takes average pooling for medical activity embedding to get inpatient day embedding, which causes important medical activities not being highlighted (iii) NDMARec-NH: it removes the component that handles medical history which means that the efect of medical history is not considered (iv) NDMARec-NA: it removes the generative adversarial network during training (v) NDMARec-A-B: A and B denote the number of selfattention stacking layers applied to the inpatient days and medical history, respectively Te results of the ablation study are shown in Table 4 and Figure 5. From the results, we can draw the following conclusions: frstly, the inpatient day embedding method based on the soft-attention outperforms average pooling and max pooling, which indicates that our embedding method selects more important medical activities adaptively. Secondly, the fusion of medical history signifcantly improves performance. Observations indicate that fusing medical history to assist medical activities implementation can increase the personalization of recommendation results and improve the performance. Tirdly, the use of generative adversarial networks in the training phase improves the  Journal of Healthcare Engineering learning ability of the model. Tis shows that the joint use of generators and discriminators can help our model learn efective features, which could improve the generalization ability of the recommendation network by penalizing it. Finally, we explore the efect of the number of self-attention stacking layers. Surprisingly, the model performance does not improve with increasing the number of layers. Observations indicate that the best performance is achieved when setting the number of self-attentive layers handling inpatient days and medical history to 1 and 2, respectively. Tis may be due to the fact that the dependency relations of the inpatient day sequences are not as complex as the sentences in the machine translation task and the words in the medical history text are carefully cleaned and preprocessed. Terefore, a smaller number of layers is sufcient to obtain good performance while too many layers could lead to overftting. Similar observations can be found in [36].

Visualization and Interpretability.
To discuss the interpretability of our method, we randomly selected a sample for which the daily medical activities are shown in Table 5. Te corresponding attention coefcients for this sample during the generation of inpatient day embedding and during the recommendation of next-day medical activities are visualized as shown in Figure 6. Te red part represents medical activities that occurred during the day, the green part represents inpatient days, and the bottom arrow denotes the sequence order of inpatient days. Te lightness of red indicates the importance of the medical activity for that day, and the lightness of green denotes the infuence of this day on the recommendation of the medical activities for the next-day.
From Table 5 and Figure 6, it is easy to observe that our model does focus attention on certain important medical activities, which means that when generating the inpatient day embedding, the attention mechanism adaptively enhances certain important medical activities (e.g., Matrine and sodium chloride injection on the 4-th day and Pantoprazole sodium for injection and Cisplatin injection on the 7-th day) and suppressing the features of ancillary medical activities (e.g., Common food, Tertiary care on the 2-nd day). Tis attention mechanism is clearly consistent with medical theory and demonstrates the validity and interpretability of the inpatient day embedding module. Te distribution of attention when recommending next-day medical activities also matches the design goals of this study. Te bottom left corner of Figure 6 indicates that the recommendation model based on multihead self-attention does not sufer from the remote dependence issue of the RNN-based model; that is, our model does not ignore the medical activity that occurs when the patient is frst admitted to the hospital, and not only that, our model assigns a relatively high attention coefcient to it, which indicates that the medical activities that occurred on the frst day of the patient's admission continue to infuence the planning and implementation of subsequent treatment strategies.   Te visualization also revealed the interesting phenomenon that the higher the number of medical activities occurring on a given day does not indicate a greater contribution of that day to the recommendation of medical activities on the following day. Tis is also true in actual clinical care. A cancer patient may only take oral anticancer drugs on a certain day, but the subsequent medical activities must be arranged and formulated around alleviating the side efects of that drugs, which means that the day the oral anticancer drugs are taken has a signifcant impact on the subsequent medical activities, and the attention coefcient allocated to that day becomes correspondingly larger. Te signifcantly larger attention coefcient assigned for the current last day demonstrates that our method focuses on the infuence of the patient's current day's medical activities when recommending next-day medical activities. Tis is in line with the idea of considering the user's current interests in session-based recommendations [37,38]. However, most studies in session-based recommendation explicitly combine current interests with long-term user preferences; our method relies entirely on a self-attention mechanism to make the model autonomously emphasize current information, which illustrates that our recommendation model can automatically discover the importance of the current inpatient day.

Conclusion
In this paper, we propose a new end-to-end model (called NDMARec) to recommend next-day medical activities for inpatients based on medical history and occurred medical activities. NDMARec extracts dependencies between diferent inpatient days and fuses medical histories for recommendations. Features of medical activities and medical histories are learned by an encoder based on a multihead self-attention. In addition, we incorporate a generative adversarial network to enhance the learning capability of the model during the training phase. Te results of multiple comparative experiments and ablation study demonstrate the better performance of our model and the efectiveness of each module.
Current research in the area of medical activity recommendation is relatively small. Te main constraints are the difculty in obtaining real data, the complexity of the data, and the diversity of real clinical scenarios. Tere is also a lot of unused information for our study such as patient survival and readmission rates. In the future, we will consider including patient survival and readmission rates when recommending next-day medical activities to further suggest the accuracy and validity of the recommended outcomes. In addition, we will also deepen cooperation with hospitals to train diferent recommendation models for diferent diseases to enhance the personalization of recommendation results.

Data Availability
Te EMR dataset used in this study was obtained from the electronic medical record database of Liaoning Cancer Hospital in China. Because of patient privacy issues, they are not directly disclosed.

Conflicts of Interest
Te authors declare that they have no conficts of interest.