Research on Disease Prediction Method Based on R-Lookahead-LSTM

Cardiovascular disease is one of the most serious diseases that threaten human health in the world today. Therefore, establishing a high-quality disease prediction model is of great significance for the prevention and treatment of cardiovascular disease. In the feature selection stage, three new strong feature vectors are constructed based on the background of disease prediction and added to the original data set, and the relationship between the feature vectors is analyzed by using the correlation coefficient map. At the same time, a random forest algorithm is introduced for feature selection, and the importance ranking of features is obtained. In order to further improve the prediction effect of the model, a cardiovascular disease prediction model based on R-Lookahead-LSTM is proposed. The model based on the stochastic gradient descent algorithm of the fast weight part of the Lookahead algorithm is optimized and improved to the Rectified Adam algorithm; the Tanh activation function is further improved to the Softsign activation function to promote model convergence; and the R-Lookahead algorithm is used to further optimize the long-term memory network model. Therefore, the long- and short-term memory network model can be better improved so that the model tends to be stable as soon as possible, and it is applied to the cardiovascular disease prediction model.


Introduction
In recent years, the global information technology revolution has ushered in a new round of development climax with the vigorous development of emerging technologies such as cloud computing, big data, artificial intelligence, and the internet of things. At the same time, it has also injected new vitality into the development and transformation of traditional industries. At present, it is gradually being used in various industries such as transportation, logistics, education, and so on. In the field of medical and health care, the continuous accumulation of medical data makes the traditional medical and health care industry gradually develop to the stage of smart medical care. At this stage, the ability of computers to process data has grown rapidly by orders of magnitude, making the application scenarios of artificial intelligence in the medical and health industry more diverse, including medical image analysis, medical record and literature analysis, auxiliary diagnosis and treatment, drug development, and disease prediction. e medical internet of things [1,2] and many other aspects have promoted the formation of a patient-centered medical data network. With the recent explosion of artificial intelligence, machine learning as the main research method of artificial intelligence has also made great progress, with outstanding achievements in speech recognition, risk research, disease diagnosis, and so on. It has high accuracy in prediction, which also provides a new research direction for disease prediction.
e prediction model has gradually developed from the original prediction model based on expert rules to the prediction model based on statistical analysis and then to the prediction model based on machine learning, with increasing efficiency and accuracy. With the development of science and technology, prediction models for ischemic heart disease [3], liver disease prediction models [4], Bayesian-based disease prediction models [5], and acute coronary syndrome diagnosis and prediction models [6] continue to emerge., which is an early disease prediction model, mainly through the establishment of prediction models through machine learning algorithms. At this stage, researchers mainly build models to predict diseases through a machine learning algorithm, and the accuracy and prediction effect need to be improved. Reference [7] proposed a hybrid method based on random forest and multivariate adaptive regression splines to establish a disease prediction model. Reference [8] established an MNN heart disease prediction model. Reference [9] proposed a new method based on heart rate signals. A new method for CAD diagnosis is based on feature extraction. Literature [10] proposed a disease prediction model using support vector machine classification algorithm and web service framework for medical data analysis. Literature [11] proposed a disease prediction based on dynamic sampling and transfer learning model. References [12,13] proposed a multivariate time series prediction and time series knowledge graph link prediction model based on LSTM, respectively, which achieved great advantages in terms of computational cost and prediction effect. Reference [14] proposed an online medical decision support system for predicting chronic kidney disease. Reference [15] proposed an enhanced feature-level deep convolutional neural network model. e researchers went from initially using a machine algorithm to integrating statistical models and mathematical models into machine learning models, which further improved the prediction effect of the model. Reference [16] established two models to predict a blood disease. Reference [17] constructed a deep learning-based disease prediction model based on five different types of medical data. Reference [18] combined feature selection techniques with five classifications. Algorithms are used in combination. Literature [19] proposed an adaptive signal processing method based on the Levenberg-Marquardt filter of the function link artificial neural network. Literature [20] proposed an integrated deep learning method using deep learning and feature fusion model. More and more researchers are joining the study of prediction models. In order to improve the prediction effect of the models, many people use different models to predict the same disease or use different medical data sets to build a model. is to a certain extent improves the accuracy of the model. e neural network model has gradually emerged in the field of forecasting, but the parameters of the neural network are difficult to adjust, and the selection of optimization algorithms will also cause differences in forecasting results. Many prediction models only improve the structure of the neural network and do not analyze the difference in results caused by different optimization algorithms. Even when the input data set and model structure are exactly the same, the structure obtained by choosing different optimization algorithms is not the same. In view of the problem of low model accuracy caused by different optimization algorithms, this paper integrates the gradient descent optimization algorithm and the long-short-term memory network model to establish a combined disease prediction model, helping people become more aware of their physical condition, while also providing support for doctors to intervene in patients. e second section of this paper expounds on the theoretical basis related to the model proposed in this paper; the third section introduces the basic process and main steps of the model establishment; and the fourth section carries out the experimental analysis, mainly including the comparative experiments of optimization algorithms and the lengths of different optimization algorithms. is paper compares and analyzes the algorithms proposed in this paper from different aspects through comparison experiments of time memory network models; the fifth section summarizes the full text.

Random Forest Feature Selection Algorithm.
e random forest algorithm [21] uses the Gini index method to evaluate the feature importance. First, the contribution of each feature vector in the data set to each tree in the random forest is judged, and then the average value is taken, and finally, the feature vectors are compared. e size of the contribution is arranged in descending order. e calculation formula of the Gini index is shown in the following formula P k : Here, k represents the category and P k represents the sample weight of the sample k. e importance score of feature x j on node m is shown in the following formula: Here, GI l and GI r represent the Gini indices of the two new nodes before and after the split, respectively.
If the node where feature x j appears in tree i is in set M, then the importance score of feature x j on the i-th tree is shown in the following formula: Assuming that there are n trees in the random forest, the importance score is shown in the following formula: Finally, the obtained feature importance score is normalized, and the formula is shown in the following formula:

LSTM Algorithm.
Aiming at the long-term dependence of the convolutional neural network model, Hochreiter and Schmidhuber improved on the basis of RNN and proposed a long short-term memory networks (LSTM) model, which better solves the gradient explosion and disappearance [22].

Computational Intelligence and Neuroscience
Compared with the ordinary convolutional neural network model, the LSTM model increases the cell state and introduces input gates, output gates, and forget gates to save and control information. e function of the cell state is to judge whether the information is useful; the input gate determines the retention of the current information; the forget gate determines the retention of the previous state; and the output gate determines the output information according to the current state of the neural network. As shown in Figure 1, compared with the RNN model, the model has a cell state C t in addition to the hidden state h t at each moment. For the t-th time step, the forget gate, the input gate, and the output gate, respectively, represent for f t , i t , and o t . e core of the LSTM model is the cell state, which consists of an activation function sigmoid and a multiplier.
e sigmoid function outputs a number in the range of (0,1), indicating how much information can pass through the cell state. e processing of data by the LSTM model mainly includes the following four steps: Step 1. Forget the door e forget gate controls the forgetting degree of information, and the forget gate controls whether the current cell state forgets the hidden cell state at the previous moment through the output probability of the sigmoid function. e forget gate takes the state output h t-1 of the hidden layer at the last moment and the current input x t as input and obtains the output vector f t of the forget gate, which determines how much state C t-1 is retained by the previous neuron, so the output value f t of the forget gate determines the pair of degree of retention of information. e formula expression is as follows: Here, f t is the output value of the forget gate and σ is the sigmoid function.
Step 2. Input gate e input gate is responsible for the input of current information, considering which new information to add to the cell state. First, the current input information is jointly determined by h t-1 and x t . en, h t-1 and x t get the new candidate cell state C t through the Tanh network layer, and the input gate assigns weights to the components in C t to control how much new information is added to the network, and its weight range is generally between (0,1). e formula expression is as follows: Step 3. Cell state update Update the cell state information C t-1 to obtain new cell state information C t . e update formula is shown in the following formula: Here, ⊙ is the Hadamard product.
Step 4. Output gate Using h t-1 and x t to calculate the current state o t , the new cell state C t is adjusted by the Tanh activation function and multiplied by this vector, and finally, the cell output h t of the current model is obtained. e formula expression is as follows: When training the LSTM model, there are eight sets of parameters that need to be learned by the model, namely, four sets of weights w f , w i , w o , and w c and four sets of bias items b f , b i , b o , and b c corresponding to f t , i t , o t , and C t . For the RNN algorithm and the LSTM algorithm, the most commonly used BPTT (back-propagation through time) algorithm [23] is used for training. ere are five main steps: Step 1: initialize the weight parameters Step 2: use the above formula to forward calculate the output value of each neuron Step 3: calculate the error value of each neuron in reverse Step 4: calculate the gradient of each weight parameter according to the corresponding error value Step 5: use the optimization algorithm to update the weights and iterate continuously until the error converges to the specified threshold and stop training

Lookahead Algorithm.
Most of the gradient descent algorithms are improved by the SGD algorithm, including the Adagrad algorithm, the Adadelta algorithm, the RMSProp algorithm, the Adam algorithm, the Nadam algorithm, and the AdaBound and AMSBound algorithms [24]. ese optimization algorithms improve the Computational Intelligence and Neuroscience exploratory training process by incorporating momentum methods or by optimizing the learning rate and finally allow the model to converge. e Lookahead algorithm [25] differs from the above algorithms in that it adopts a completely new design: two sets of weights are maintained and interleaved between them. Simply put, it allows the faster set of weights to keep exploring forward while leaving the slower set of weights behind, resulting in better long-term stability. e Lookahead algorithm flow is as follows: Step 1: backup the existing parameter θ t of the model Step 2: starting from parameter θ t , update k steps with the SGD algorithm to obtain a new weight θ t Step 3: update model weights e Lookahead algorithm maintains two sets of weights: fast weights and slow weights. First, an internal optimizer (generally a stochastic gradient descent optimization algorithm) is used to iteratively update the fast weights part k times, and then the slow weights part is updated in the direction of the last fast weights part. e fast weights part can move forward faster to better explore, and the slow weights part can play a role in maintaining stability. e two cooperate with each other to explore the entire space in a more detailed manner, enabling the Lookahead algorithm to explore faster and more stably and achieve convergence. When the fast weights part is slowly explored near the minimum value, the update of the slow weight part prompts Lookahead to explore better new regions, thereby improving the accuracy. e formulas of the fast weight part and slow weight part are as follows:

RAdam
Algorithm. e modified adaptive moment estimation algorithm, RAdam algorithm [26], is one of the latest optimization algorithms, which has the advantages of fast training speed and good convergence effect. Adam algorithm [27] cannot make accurate adaptive momentum selection due to lack of data in the early stage of training, the convergence speed is very fast when using the Adam algorithm to optimize, but it is easy to fall into the local optimum. erefore, it is necessary to add a warm-up phase when the training is just started. However, the number of preheating needs to be adjusted manually, and the loss value is different for different times, and it will also be different on different data sets. erefore, the RAdam algorithm is improved on the basis of the Adam algorithm. Based on the design of the rectifier function, the algorithm can dynamically and adaptively open or close the learning rate to adjust the gradient, which makes the initial learning rate more robust and avoids early training due to lack of the drastic change problem caused by learning enough data. e RAdam algorithm flow is as follows: Step 1: initialize where the step size is α t T t�1 , the exponential decay rate of moment estimation is β 1 , β 2 , the initial parameter is θ 0 , the random objective function is f t (θ), the initial time step is t � 1, the moment of the SMA is initialized to m 0 � 0 and v 0 � 0, and the maximum length of the SMA is calculated as Step 2: calculate the gradient g t at the t-th step, update the second-order moment estimate v t , update the firstorder moment estimate m t , and calculate ρ t .
Step 3: if ρ t > 4, update parameter θ t by calculating adaptive learning rate and variance correction; otherwise, use nonadaptive momentum to update parameter.
Step 4: determine whether the parameters are converged. If the parameters have converged, stop training; otherwise, let t � t + 1 and repeat steps 3 and 4 until the parameters converge.

Disease Prediction Model
Based on R-Lookahead-LSTM 3.1. Improved LSTM Algorithm. In this section, the LSTM model is improved, and the Softsign activation function is used to replace the Tanh activation function in the input gate, which can make the model converge quickly. e improved LSTM model structure is shown in Figure 2. e processing of data by the improved LSTM mainly includes the following four steps: Step 1: control which historical information needs to be discarded by the cell state by forgetting the sigmoid activation function in the gate. e forget gate takes the state output h t-1 of the hidden layer at the previous moment and the current input x t as input and outputs a vector f t , which determines how much of the state C t-1 of the previous neuron to retain. e formula expression is as follows: Step 2: the input gate is responsible for the degree to which new information about the cell state is added. e information to be added first is determined by the state output h t-1 of the hidden layer at the previous moment and the current input x t . en h t-1 and x t get the candidate cell state C t through the Softsign network layer, and the input gate assigns each component in C t a weight between 0 and 1 to control how much new information the network adds. e formula expression is as follows: Step 3: update the cell state information C t-1 to obtain new cell state information C t . e update formula is as follows: 4 Computational Intelligence and Neuroscience Step 4: the output gate is responsible for the determination of the output value h t . Use h t-1 and x t to calculate the state o t after the information passes through the output gate; C t is multiplied by the vector after being adjusted by the Tanh function; and finally, the unit output of the current neural network is obtained. e formula expression is as follows: When training the LSTM model, there are eight sets of parameters that need to be learned by the model, namely, four sets of weights w f , w t , w o , and w c and four sets of bias items b f , b i , b o , and b c corresponding to f t , i t , o t , and C t . e training process is as follows: Step 1: initialize the weight parameters Step 2: use the above formula to forward calculate the output value of each neuron Step 3: calculate the error value of each neuron in reverse Step 4: calculate the gradient of each weight parameter according to the corresponding error value Step 5: use the R-Lookahead optimization algorithm to update the weights and iterate continuously until the error converges to the specified threshold and stop training 3.2. Improved Lookahead Algorithm. To solve LSTM, it is necessary to continuously solve and update various weights and bias terms of the model to make it approach or reach the optimal value, so as to achieve the effect of minimizing the loss function. is chapter optimizes the LSTM model using the improved Lookahead optimization algorithm (R-Lookahead algorithm). e Lookahead algorithm maintains two sets of weights, including the fast weights part and the slow weights part. is section makes improvements to the fast weights section, using a modified adaptive moment estimation algorithm to adjust the gradient.
e RAdam algorithm can dynamically switch on and off the adaptive momentum so that the entire training process can be quickly stabilized, avoiding the phenomenon of violent oscillation caused by the limited number of samples in the early training period. e R-Lookahead algorithm is shown in Algorithm 1.

R-Lookahead-LSTM Disease Prediction Model.
As the main algorithm of this section, the LSTM algorithm can solve the problem of gradient explosion and disappearance in the RNN model training process to a certain extent. e model structure of the LSTM algorithm is optimized and improved on the basis of the model structure of the RNN algorithm. is algorithm sets an output gate, an input gate, and a forget gate in each neuron of the RNN algorithm. For data samples, on the one hand, the LSTM algorithm improves the long-term dependence of the RNN algorithm to a certain extent. On the other hand, the Softsign activation function in the improved LSTM algorithm can improve the convergence speed of the model and make the model stabilize as soon as possible. However, due to the overall structure of the neural network model, no matter how it is optimized and reconstructed, in some application scenarios, its training results will still appear to be more or less locally optimal. In view of the above problems, this chapter uses the R-Lookahead algorithm to optimize and improve the LSTM model and proposes the R-Lookahead-LSTM model for predicting cardiovascular disease. e workflow of the R-Lookahead-LSTM model is shown in Figure 3, and the model building process is shown in Algorithm 2.

Data Preprocessing.
e cardiovascular disease data set selected in this paper includes three types of data: objective facts, examination results during physical examination, and information provided by patients, with 11 input variables and 1 target variable. is paper aims to predict whether a sample has cardiovascular disease through body-related features. If the sample has cardiovascular disease, the label is 1, and if the sample is healthy, the label is 0. is data set is a classic two-category data set, which meets the requirements for data sets in this paper.

Construction of Strong Eigenvectors.
e purpose of this paper is to build a predictive model suitable for cardiovascular disease. If the input of the model only relies on the feature vector in the data set, there is a lack of strong features in the context of the prediction of the disease, and the prediction result is not ideal. In order to generate new strong features, based on the feature-derived method, this paper analyzes the existing feature vectors in the data set, mines the underlying laws and data structures of cardiovascular disease prediction problems, and constructs a series of new features. e process of constructing new features will be described in detail below.
Aiming at the prediction of cardiovascular disease, this paper firstly selects height and weight for targeted research, analyzes the relationship between height and weight and σ σ σ Computational Intelligence and Neuroscience whether or not the disease is diagnosed, and constructs new features based on their relationship. Figures 4 and 5 are scatter plots of height, weight, and disease, depicting the distribution of height and weight in the cardiovascular disease data set, which can reflect the relationship between height, weight, and disease, as well as the overall law. As you can see from the graph, for the same height, people with heavier weights are more likely to have the disease. Further analysis, obesity may be an important factor leading to cardiovascular disease. In order to make the prediction model fit this point more accurately, a new eigenvector body health index (BMI) is added to comprehensively consider height and weight, and the calculation formula is shown in the following formula: Mean arterial pressure (MAP) is the mean arterial blood pressure level throughout a cardiac cycle. Pulse pressure (PP) is the pressure difference between ap_hi and ap_lo. Studies have shown that MAP and PP may be risk factors for cardiovascular disease. In order to make the predictive ability of the model more accurate, new eigenvectors mean  arterial pressure (MAP) and pulse pressure (PP) are added, and the calculation formulas are shown in the following formula: PP � ap hi − ap lo.

Data
Processing. Due to the huge amount of data in the cardiovascular disease data set and the nonuniform type of features, which makes the research work difficult, the data needs to be processed first. is paper checks the duplicate values. After checking, it is found that there are 24 duplicate samples in the data set. e duplicate items have no effect on the training of the model, so the duplicate samples are deleted. A box plot was used to detect outliers. Looking closely at the data, we noticed that the youngest was about 29 years old, the youngest was 55 cm tall, the youngest weighed 10 kg, the tallest was 250 cm tall, and the heaviest weighed 200 kg, the smallest ap_hi is -150, and the smallest ap_lo is -70. In order to deal with such outliers, we delete samples with height and weight below 5% or above 95% and delete samples with negative blood pressure. e normal value for diastolic blood pressure is 60-80 mmHg, and the normal value for systolic blood pressure is 90-120 mmHg, so this paper will remove ap_hi outliers over 200 and ap_lo outliers less than 50. For the cholesterol and blood sugar fields, this subsection takes the form of the data set itself because the progression of the data also represents the level of the sample in this feature. At this point, the data set has been basically processed in this section, but there are still two problems. One is that the continuous variables and the categorical variables in the features do not belong to the same dimension. If the original data values are used directly, the features will affect the prediction results. e degree of influence is different, and the features are not comparable; second, some features are in the form of categorical variables, and the numerical size will mislead the model to a certain extent, so the data needs to be normalized and onehot coding. In addition, in order to more intuitively see the relationship between features, this paper uses the Pearson correlation coefficient to draw a heat map.
(1) Normalizing. In large-scale data analysis projects, data often have different sources, and their dimensions and specifications are different and cannot be directly compared, so normalization processing is required to eliminate the resulting bias. After the original data is normalized, all indicators are in the same order of magnitude, and the input features and target predicted values approximately obey the normal distribution, which helps eliminate outliers and noise in the data and is suitable for comprehensive comparative evaluation. is section normalizes the "age," "height," "weight," "zap_hi," "ap_lo," "MAP," and "PP" fields to reduce the feature data of different magnitudes covering other features on the target effect of the function. e normalization formula is as follows: (2) One-Hot Encoding. In machine learning algorithms, we often encounter categorical features. ese feature values are not continuous but discrete and disordered, so it is necessary to digitize such data. Plotting the height distribution by violin drawing, we found that the height of class 2 is always higher than that of class 1, so class 1 represents women and class 2 represents men, and 2 is always numerically higher Computational Intelligence and Neuroscience 7 than 1, which will mislead the model to some extent, so the gender field uses one-hot encoding in this paper.
(3) Heat Map. According to the size of the correlation coefficient corresponding to the different square colors in the heat map, the size of the correlation between the variables can be judged, and the relationship between the various features can be observed more intuitively. e calculation formula is as follows: where ρ 1 is the correlation coefficient, Cov is the covariance, and E is the mathematical expectation. From Figure 6, we can easily see the relationship between data and data, which lays the foundation for subsequent feature selection.

Feature Selection.
After feature construction and data processing, this paper uses a random forest algorithm to model all samples for feature selection. is article uses the Scikit-learn library in Python. First, load the Random-ForestClassifier module in the Scikit-learn library to model all the data. When the model is built, we use the featur-e_importances_ function in the module to output the importance of all features. is function will calculate and record the Gini coefficient change value of each node after feature splitting during the tree construction process. Finally, a unified normalization can be done to obtain the importance of each feature. e feature importance rankings are shown in Table 1.
rough the feature importance ranking made by the random forest feature selection algorithm, several important information can be found. First, the three new feature vectors MAP, PP, and BMI constructed in this paper are in the relatively high position, which shows that the constructed new feature vector plays a very important role in the prediction of cardiovascular disease; second, the feature importance ranking made by the random forest feature selection algorithm is basically the same as the result of the heat map, which shows that the important features selected by the random forest algorithm are very important. It is very convincing. In this paper, the top 12 features are selected as the input vector of the cardiovascular disease prediction model constructed in this paper. Removing the features with lower rankings through feature selection can speed up the running speed of the next prediction algorithm and at the same time improve the prediction accuracy of the obtained model and improve the model's overall performance.

Confusion Matrix.
e confusion matrix is a metric for judging the results of a classification model and is part of model evaluation. Confusion matrices are mostly used to judge the pros and cons of classifiers and are suitable for classified data models. e confusion matrix is shown in Table 2.
True positive (TP) and true negative (TN) measure the ability of a classification model to predict whether a patient has a disease or not, and false positive (FP) and false negative (FN) identify the number of false predictions produced by the predictive model. e accuracy rate represents the overall predictive ability of the machine learning model and is used to measure the success of the predictions of the disease prediction model. e recall is used to measure the sensitivity of a disease prediction model with the aim of recalling potential cases. F-score is the weighted harmonic average of precision and recall, which is often used to evaluate the quality of classification models. F1 combines the results of precision and recall, and when it is higher, it can indicate that the experimental method is more effective. is paper uses different performance indicators to evaluate the model proposed in this paper, and the formula is shown in Table 3.

MCC Value.
e Matthews correlation coefficient (MCC value) [28] is a contingency matrix method for calculating the Pearson product-moment correlation coefficient between actual and predicted values.
e key advantage of the Matthews correlation coefficient is that the classifier must make correct predictions for most negative cases and most positive cases to get a high-quality score, independent of their proportion in the entire data set.

Optimization Algorithm Experiment Comparison.
In order to show the advantages of the improved Lookahead algorithm proposed in this paper, RMSprop algorithm [29], Adam algorithm, RAdam algorithm, and Lookahead algorithm are used as comparison algorithms, and the parameters are adjusted for comparison experiments. e parameters of each optimization algorithm are set as shown in Table 4.
In order to compare the performance of the above optimization algorithms, this section tests the RMSprop, Adam, RAdam, Lookahead, and R-Lookahead algorithms and uses accuracy and loss as evaluation indicators. e accuracy graph and loss graph of the optimization algorithm are shown in Figures 7 and 8. e results show that with the increase of the number of iterations, the correct rate of each optimization algorithm keeps increasing, and the loss value keeps decreasing. Among them, the RAdam algorithm has been improved on the basis of Adam. It can dynamically open or close the adaptive learning rate to adjust the gradient according to the variance of the adaptive rate. erefore, when the number of iterations is 500, the accuracy and loss are better than Adam's algorithm. e Lookahead algorithm maintains two sets of weights to achieve faster convergence. R-Lookahead algorithm Require: Initial parameters φ, objective function L Require: Synchronization period k, slow weights step size α, optimizer RAdam for t � 1, 2, . . . do Synchronize parameters θ t,0 ←φ t−1 for i � 1, 2, . . . , k do Sample minibatch of data d ∼ D θ t,i ←θ t,i−1 + Radam(L, θ t,i−1,d ) end for Perform outer update φ t ←φ t−1 + α(θ t,k − φ t−1 ) end for return parameters φ ALGORITHM 1: R-Lookahead algorithm.

R-Lookahead-LSTM algorithm
Step 1: process the data, analyze the correlation of the data, and use the ensemble learning algorithm random forest algorithm to perform feature selection on the data, so as to further determine the feature vector required to build the model Step 2: divide the above-processed data set into a training set and test set according to the ratio of 7:3 Step 3: determine the structure of the LSTM model for the data samples of the training set and determine the number of network layers and initialization parameters through experimental tests Step 4: train the model and use the R-Lookahead optimization algorithm to optimize the loss function of the LSTM model Step 5: use the test set data samples as the input data of the model to test the prediction effect of the R-Lookahead-LSTM disease risk prediction model Step 6: use multiple indicators such as accuracy rate, recall rate, F1-score, specificity, and MCC value to evaluate the prediction effect of the model ALGORITHM 2: R-Lookahead-LSTM algorithm.

Computational Intelligence and Neuroscience
At this time, the traditional Lookahead algorithm slow weight uses the SGD algorithm, which also achieves good results. e accuracy is 0.7747, and the loss is 0.4226. When the SGD algorithm in the slow weight part of the Lookahead algorithm is improved to RAdam, this improves the speed to a certain extent and is more stable, and the loss value is also smaller to 0.3928. e specific results of the experiment are shown in Tables 5 and 6.

Comparison of LSTM Models of Different Optimization
Algorithms. e optimization problem is one of the most important research contents in the field of machine learning. Even when the model structure and model input data are exactly the same, the results obtained by selecting different deep learning optimization algorithms are different. In order to better compare the effects of RMSprop, Adam, Radam, Lookahead, and R-Lookahead algorithms on the LSTM model, this section integrates the above optimization algorithms with the models established in this section and analyzes LSTM models using different algorithms in detail. Experiments show that the Lookahead algorithm and the Lookahead algorithm using RAdam have achieved good results, with the correct rate being 0.828296 and 0.857287, respectively; the traditional LSTM       Table 7 and Figures 9-11.

Summary
e disease prediction model is proposed to help doctors make decisions more informatively and accurately. In order to scientifically and effectively assist doctors in medical decision-making, this paper proposes the R-Lookahead-LSTM model for disease prediction. In order to better highlight the prediction effect of the model proposed in this paper, this study compares the proposed model with LSTM, RMSprop-LSTM, Adam-LSTM, RAdam-LSTM, and Lookahead-LSTM in detail. Accuracy, recall, F1-score, MCC value, and other aspects of the prediction model were compared and analyzed. Compared with the traditional LSTM model, the accuracy rate of the proposed model increased by 0.117417; the precision rate increased by 0.108828; the recall rate increased by 0.116046; the F1-score increased by 0.112320; and the MCC value increased by 0.235585. And MCC values increased significantly. For a disease prediction model, obtaining a good accuracy rate indicates that the model has a good classification effect; obtaining a relatively high    recall rate, that is, recalling possible case samples, has very important practical significance. e MCC value shows that the model proposed in this section can make correct predictions for most negative cases and most positive cases and achieves the desired effect.
e whole experimental process clearly shows that the model proposed in this chapter has good classification performance and achieves the desired effect, providing a new method for the field of disease prediction.

Data Availability
In this paper, we used the cardiovascular disease data set. e cardiovascular disease data set can be obtained from the following website: https://www.kaggle.com/sulianova/ cardiovascular-disease-dataset#cardio_train.csv.

Conflicts of Interest
e authors declare that there are no conflicts of interest.