Prediction of Bronchopneumonia Inpatients' Total Hospitalization Expenses Based on BP Neural Network and Support Vector Machine Models

Objective BP neural network (BPNN) model and support vector machine (SVM) model were used to predict the total hospitalization expenses of patients with bronchopneumonia. Methods A total of 355 patients with bronchopneumonia from January 2018 to December 2020 were collected and sorted out. The data set was randomly divided into a training set (n = 249) and a test set (n = 106) according to 7 : 3. The BPNN model and SVM model were constructed to analyze the predictors of total hospitalization expenses. The effectiveness was compared between these two prediction models. Results The top three influencing factors and their importance for predicting total hospitalization cost by the BPNN model were hospitalization days (0.477), age (0.154), and discharge department (0.083). The top 3 factors predicted by the SVM model were hospitalization days (0.215), age (0.196), and marital status (0.172). The area under the curve of these two models is 0.838 (95% CI: 0.755~0.921) and 0.889 (95% CI: 0.819~0.959), respectively. Conclusion Both the BPNN model and SVM model can predict the total hospitalization expenses of patients with bronchopneumonia, but the prediction effect of the SVM model is better than the BPNN model.


Introduction
Bronchopneumonia (also known as lobular pneumonia) is one of the most common respiratory infections [1], with an incidence rate of more than 20% [2]. Bronchopneumonia is a severe disease that threatens people's health in China. It is also a disease that accounts for a large proportion of the spectrum of hospitalized infections [3,4]. Bronchopneumonia is often caused by bacteria, viruses, molds, mycoplasma pneumonia, and other pathogens. It can also be "a mixed infection" by viruses and bacteria [5]. After the onset, the inflammation of lung tissue thickens the respiratory membrane and blocks the lower respiratory tract, causing dysfunction of ventilation and ventilation. The clinical manifestations are fever, cough, and shortness of breath [2].
The aging stage with the highest incidence of bronchopneumonia among children is 5~9 years old, and the onset age of patients gradually decreases [6]. Once infected, it will affect patients' quality of life and bring a certain economic burden to families. In addition, the disease can cause pressure on the national medical insurance fund [7,8]. Therefore, strengthening the cost research of bronchopneumonia and formulating effective intervention measures can reduce the economic burden on patients and medical insurance [9].
Data mining is a process that combines artificial intelligence and database technology to extract potentially valuable information from a large number of complex and fuzzy data [10][11][12][13][14]. The application of artificial intelligence in the medical field is gradually maturing. BP neural network (BPNN) model [15] and support vector machine (SVM) model [16] have no special requirements for data distribution and have certain fault tolerance. In addition, they are widely used in dealing with complex relationships between data and can seek the optimal solution under the current information [17]. Thus, this study used these two models to predict the total hospitalization cost of patients with bronchopneumonia and compared the prediction efficiency of the two models.

General Information.
A total of 355 patients with bronchopneumonia who were mainly diagnosed as discharged from the first page of medical records from a grade III class hospital in Anhui province from January 2018 to December 2020 were collected. Inclusion criteria: (1) inpatients; (2) the diagnosis was bronchopneumonia. Exclusion criteria: (1) length of stay was 1 day; (2) it costs more than 40,000 yuan.

Research
Indicators. The preliminary included research indicators include medical payment method, hospitalization times, sex, age, nationality, occupation, marital status, admission way, admission situation, whether to change majors, discharge departments, actual hospitalization days, whether to implement clinical pathway management, whether to complete clinical pathway, whether to have complications, whether to be critically ill or seriously ill during hospitalization, whether to meet the outpatient discharge diagnosis, whether to meet the admission and discharge diagnosis, admission condition, and whether to merge. The dependent variable is the total hospitalization expenses.

Partition of Data
Set. Since the model construction of deep learning depends on the training of a large amount of data, it has the problem of uneven data distribution. Therefore, it is necessary to preprocess the data set. To prevent overfitting, this study verified the included data by a 10fold crossover method. That is to say, it is divided according to the ratio of 7 : 3 to form a training set and a test set [18]. Among the patients, 70% of the data sets were used for the training set (n = 249) and 30% for the test set (n = 106).

Construction of Prediction Model
2.4.1. BP Neural Network Model. Total hospitalization expenses were used as the output variable, and statistically significant variables in univariate analysis were used as input variables. The hidden layer activation function is the hyperbolic tangent function, and the output layer activation function is the identity function. The data set is divided into the training set and test set, and the prediction model and BPNN model are constructed, respectively. The accuracy of the network will be calculated based on the verification set, where the relative error is the proportion of the sum of squares of the residuals and mean deviations to the sum of squares of the dependent variables. The prediction accuracy is 1-relative error [19]. After the network training is completed, the importance of each input variable to the prediction of the target variable is judged to reflect the relative effect of the input variable. The specific process is shown in Figure 1.
First, the algorithm is propagated forward. Calculate the output values of each neuron in the hidden layer and the output layer: Then, back propagation is carried out to calculate the error of each hidden layer neuron: δ j is the sum of error information of all neurons in layer j + 1.
Finally, the weights of neurons are updated: 2.4.2. Support Vector Machine Model. In this study, the total hospitalization cost was a continuous variable, and the dependent variable should be discretized before the SVM fitting. The main parameters of the SVM model include penalty coefficient C and kernel function parameter σ. The selection of parameters in the SVM algorithm is very important to the learning performance of SVM. Reasonable parameter values can make SVM have higher training accuracy and stronger generalization ability. Therefore, this study will first screen out the optimal combination of C and σ parameters and establish the SVM model under the optimal combination of C and σ parameters. To select the optimal combination of C and σ parameters, the data is first normalized. Then, the data were input into the SVM model for verification to screen out the optimal combination. If the verification is inconsistent, the parameters need to be updated for verification again until the optimal combination is screened out. The SVM model building process is shown in Figure 2.
2.5. Statistical Analysis. One-way analysis was conducted on the relationship between the included research indicators and the total hospitalization expenses. Then, according to the results of the one-way analysis, significant variables are included in the BPNN model and SVM model as independent variables. After the training of the built models, the

Analysis Results of Research Indicators.
Univariate analysis was performed on the variables initially included. According to the results of medical payment method, hospitalization times, age, marital status, admission situation, critical illness during hospitalization, meet admission, and discharge, combined with other diagnosis, discharge departments, and receive surgical treatment have statistical significance (P < 0:05). Details are shown in Table 1. However, gender, ethnicity, occupation, admission route, transfer department, complications, and discharge mode had no statistical significance on the total hospitalization cost (P > 0:05) ( Table 2).  Figure 4). As age may cause confounding of marital status, stratified analysis was conducted on age (≤25 years old and >25 years old) and marital status. The results showed that if age was controlled, the correlation between marital status and total hospitalization cost was not statistically significant (P > 0:05).

Distinction between BPNN Model and SVM Model.
The area under the curve (AUC) of the BPNN model is 0.838 (95% CI: 0.755~0.921), which meets the prediction accuracy requirements. In comparison, the AUC of the SVM model is 0.889 (95% CI: 0.819~0.959) ( Figure 5). The two prediction models have obtained a good prediction effect. However, the prediction efficiency of the SVM model is higher than the BPNN model.

Discussion
Bronchopneumonia is an infectious disease with a high incidence in China, especially among children. Its clinical manifestations are fever, cough, and shortness of breath, which affect the normal life [20]. It will not only affect the quality of life of patients but also bring a certain economic burden to families and pressure to the national medical insurance fund. Symptomatic treatment is a common intervention method with bronchopneumonia, which can effectively    [21]. However, due to the younger age of patients, poor treatment compliance, and strong stress reaction, the hospitalization expenses are increased. Therefore, it is of specific clinical significance to predict the related indexes of total hospitalization expenses of patients with bronchopneumonia.
At present, the research on hospitalization expenses mainly includes traditional statistical methods, improved statistical methods, and machine learning methods [22,23]. Traditional statistical methods have strict requirements on data, such as data normal and independent. Although nonparametric methods have no strict requirements on data characteristics, their efficiency is reduced because they do not use sample information to the maximum extent [24]. The improved statistical method combines other theories based on traditional methods and overcomes the inevitable   Computational and Mathematical Methods in Medicine defects of traditional methods to a certain extent. However, for some complex data, such as hierarchical data, subdepartment data, and doctor data within the hospital, the improved method is more complicated to calculate [25]. Before machine learning, we need to conduct a careful and in-depth preanalysis of the included data set. Otherwise, the results may be misleading. As a grey-box method, data mining can get correct results as long as researchers correctly master the input format of data and the way of reading the results, so it has perfect practicability [26]. In this study, the BPNN model and SVM model were used to analyze the total hospitalization cost of patients with bronchopneumonia, and good prediction results were obtained. The analysis of influencing factors pointed out that the length of stay and discharge department are two significant factors affecting the cost, which has practical guiding significance. The length of hospitalization is related to the severity of the disease and the effect of treatment, so it is necessary to improve the accuracy of treatment and rational use of antibacterial drugs in clinical practice. Different antibiotics can be selected for different patients at the beginning of admission according to their sputum culture results [27,28]. For example, for older patients who have been exposed to antibiotics for a long time and higher-grade antibiotics can be selected to improve the treatment effect. For those with good physical quality and sensitivity to antibiotics, low-grade antibiotics can be selected appropriately [29]. The different treatment methods and medication habits of doctors in different departments lead to the difference in total hospitalization expenses [30]. Therefore, it is recommended that doctors select appropriate treatment plans for patients. At the same time, the patient's age is also a major factor affecting hospitalization expenses. This is mainly because older patients have more comorbidities, relatively poor resistance, and low sensitivity to drugs, so there are relatively more drugs in the treatment process, resulting in prolonged hospitalization days and increased hospitalization expenses [31]. The results of the SVM model in this study showed that marital status was a major factor affecting the total hospitalization cost. Still, there was no statistical significance after stratified analysis and control of confounding factors.
The AUC of the prediction models was also compared in our study. The results show that the AUC of these two models is 0.838 (95% CI: 0.755~0.921) and 0.889 (95% CI: 0.819~0.959), respectively. It further shows that the prediction effect of SVM is better than the BPNN model. The reason may be that the SVM pursues optimal solutions under existing information and can perfectly solve high and local extremum problems [32]. The SVM overcomes the defects of the BPNN method, such as the difficulty in determining the reasonable structure and the existence of local optimum, especially for the data with dependent variables as classification variables, and has been effectively used in practice [17].
There are several limitations to our study. First, the specimens included were too small and from the same region. In addition, the incompleteness of the included predictors may directly affect the prediction results after deep learning. In future research, the sample size and prediction factors will be increased.

Conclusion
In conclusion, BPNN and SVM prediction models can effectively predict the total cost of hospitalized patients, and the most critical factor affecting the total cost of hospitalization is the length of stay. Therefore, shortening the length of stay may minimize the financial burden of patients.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
All authors declare no conflicts of interest in this paper.