Big Data-Enabled Analysis of DRGs-Based Payment on Stroke Patients in Jiaozuo, China

Stroke is the first leading cause of mortality in China with annual 2 million deaths. According to the National Health Commission of the People's Republic of China, the annual in-hospital costs for the stroke patients in China reach ¥20.71 billion. Moreover, multivariate stepwise linear regression is a prevalent big data analysis tool employing the statistical significance to determine the explanatory variables. In light of this fact, this paper aims to analyze the pertinent influence factors of diagnosis related groups- (DRGs-) based stroke patients on the in-hospital costs in Jiaozuo city of Henan province, China, to provide the theoretical guidance for medical payment and medical resource allocation in Jiaozuo city of Henan province, China. All medical data records of 3,590 stroke patients were from the First Affiliated Hospital of Henan Polytechnic University between 1 January 2019 and 31 December 2019, which is a Class A tertiary comprehensive hospital in Jiaozuo city. By using the classical statistical and multivariate linear regression analysis of big data related algorithms, this study is conducted to investigate the influence factors of the stroke patients on in-hospital costs, such as age, gender, length of stay (LoS), and outcomes. The essential findings of this paper are shown as follows: (1) age, LoS, and outcomes have significant effects on the in-hospital costs of stroke patients; (2) gender is not a statistically significant influence factor on the in-hospital costs of the stroke patients; (3) DRGs classification of the stroke patients manifests not only a reduced mean LoS but also a peculiar shape of the distribution of LoS.


Introduction
Stroke is an acute cerebrovascular disease, which is characterized by the sudden numbness of some parts of the body, such as face, arm, or leg. is is a group of diseases caused by brain tissue damage, caused by blood flow to the brain due to blocked blood vessels or sudden rupture of blood vessels in the brain [1]. With the development of society and the change of life style, stroke becomes one of the main causes of disability and death in the world. According to the report of World Health Organization (WHO), stroke is the second leading cause of deaths in the world, accounting for 11.3% of the total deaths, and near 5.8 million people died of stroke [2,3]. Global Burden of Stroke (GBS) reported that there are three characteristics for mortality rate caused by stroke in the low-income, lower-middle-income, upper-middle-income, and high-income countries [4]. e mortality rate of stroke in low-income countries is about 43 per 100,000 population. In lower-middle-income and high-income countries, the mortality rate of stroke is about 62 per 100,000 population, while the number of deaths in upper-middleincome countries reaches up to 117 per 100,000 population.
is is more than twice as high as in low-income countries. Recently, with the number of stroke patients increasing, in-hospital costs spent on stroke are increasing year by year. In western counties, the expenditure on stroke accounts for 2%-4% of total healthcare expenditure [5]. In 2008, the total expense of stroke patients in the United States reached up to $40.9 billion, accounting for 2.6% of the annual healthcare expenditure. e total annual expenditure of stroke in the United Kingdom was as much as £25.6 billion in 2019 [6]. Excepting institutionalization cost, the overall expenditure of the stroke for European Union (EU) was €45 billion, accounting for 28% of the total expenditure of cardiovascular diseases [7]. e Aggregate National Healthcare expenditure of Brazil for the ischemic stroke was about £326.9 million from 2006 to 2007 [8]. is cost accounted for a large proportion of the overall healthcare expenditure since only 18% of Brazilians had private health insurance. In 2012, the entire economic expenditure of stroke was $5 billion in Australia, and the loss of healthy life and the total burden of disease cost in 2012 was $49.4 billion [9]. In Korea, the total economic expenditure of the stroke was $3.53 billion, of which $1.74 billion and $1.79 billion were direct costs and indirect costs, respectively [10]. e authors in [11] presented a detailed analysis of financial data on the direct inhospital costs of the stroke treatment in Lebanon.
In China, stroke is the first leading cause of mortality and disability of adults. In recent years, the incidence, morbidity, mortality, and disability-adjusted life years (DALY) of the stroke in China are on the rise, and the disease burden caused by stroke is quite serious [12][13][14][15][16][17]. e Global Burden of Disease (GBD) showed that the number of the stroke patients in China was 4.03 million in 2016, and the average growth rate was 8.3% in 1997-2016. With the ageing of the society, the acceleration of urbanization process, and the popular unhealthy life style of residents, the incidence and mortality of the stroke are increasing dramatically in China [18]. e incidences of DALY caused by stroke, respectively, accounted for 4.6% and 9.71% of all diseases in the world and in China [19], where the disease burden of stroke was more than twice the global average. In 2014, the total cost of the stroke outpatient and inpatient service in China was ¥20.71 billion, and the average annual growth rate of medical cost of the stroke was 24.96% [20]. Due to the imbalance of the economic level and specialized stroke resources among the areas of China, the prevention and management of the stroke are facing great challenges in Henan province, China. Henan is located in the center of China, whose resident population in 2018 was 96.05 million, accounting for 6.9% of the total population of China [21]. According to the Chinese Stroke Association (CSA), the incidence of the stroke of Henan province in 2012 was 3.22%, which was the highest among all provinces of China [22]. e number of stroke patients was about 1.5 million, and the direct and indirect cost of stroke were ¥10 billion. Jiaozuo is located in the northwest of Henan, whose population was 3.55 million in 2019. e differences between urban and rural areas and the regional imbalance in stroke prevention and treatment in Henan are more significant, which are prominent problems that need to be solved at present. erefore, it is of profound importance to investigate the costs of stroke in Jiaozuo. Table 1 presents the incidence and mortality caused by stroke between 25 and 74 years old from 1987 to 1993. Table 1 reveals that Henan has higher incident and mortality than the average of China.
How to solve the problem of the rapid growth of costs for the stroke has always been a worldwide concern, which has sparked a great deal of research interest. e pioneering work was conducted by Fetter at al. in Yale University, from which diagnosis related groups (DRGs) were originally proposed [24]. DRGs were first used to monitor the quality of medical services in medical institutions [25]. en, the second generation DRGs have been realized in 1983, which were introduced as the basis of the hospital paying of healthcare systems. It was shown that, due to the application of DRGs in the United States from 1983 to 1990, the proportion of the total medical expenditure of gross domestic product (GDP) decreased from 16-18% to 7-8% [26]. Since then, DRGs have been adopted by the most developed countries as the prospective payment system (PPS) to control in-hospital cost [27][28][29][30][31][32]. It is manifested that DRGsbased PPS can not only improve the utilization efficiency of medical resources, but also reduce the length of stay (LoS) in hospital and the stroke burden.
In China, it can be divided into three stages for the DRGs [33]: (1) the first stage is exploring; (2) the second stage is piloting; and (3) the last one is completing. e first stage started from 1980s to 2001. In this stage, DRGs were introduced and explored by Beijing and Tianjin to provide a strong financial incentive for the healthcare of public hospitals [34,35]. e second stage was motivated by new rural cooperative medical system (NCMS) for rural Chinese residents [36] 'Notice on Piloting the Simplified DRG-PPS' of China's Ministry of Health (CMH) and advanced Beijing-DRGs (BJ-DRGs), where the DRGs-PPS/genuine DRGs have been piloted in some public hospitals of China [37,38]. In the third stage, CMH has released two issues of '2008 Quality Supervision and Management Manual of Simplified DRGs-PPS' of Central People's Government of the People's Republic of China (CPGPRC) and 'Five Key Reforms on Medicine and Health' to assist the promotion of DRGs across the entire China [39,40]. In [41], DRGs-based payment has been conducted on a trial basis in the selected hospitals in Shanghai. erefore, unlike the DRGs from other developed countries, the DGRs currently being adopted by China can be regarded as a transitional version of other counties' DRGs [33].
Henan has the largest population of China, especially the rural population, which is urgent to keep the balance of medical resources between rural and urban. In 2018, Health Commission of Henan Province (HCHP) has issued the 'Notification on the Key Points of Provincial Medical and Political Work', which stated that quality control of the first page of medical records should be strengthened [42]. Jiaozuo city is in the northwest of Henan province. Its total area is 4,071 square kilometres, and the city has a permanent population of 3.5971 million. In addition, the urbanization ratio has reached 60.94%, and Jiaozuo has just issued to promptly explore DRGs payment systems in 2019 [43].
In addition, with the development of information and computer technologies, massive growth of data is a great challenge for further applications. To solve this problem, big data mining and analyzing of collected data to extract the valuable information have become main tasks. For this aspect, multivariate linear regression analysis has been identified as an effective evaluation way of big data by employing the statistical significance to determine the explanatory variables. Big data has been extensively adopted in many fields, such as information communication, finance, security, energy, and electricity [19][20][21][22][23]. With successful applications of big data in the above fields, it has provided many conditions and experience for its application in the healthcare field.
Motivated by the above discussion, this paper aims to investigate the effects of China Healthcare Security-DRGs (CHS-DRGs)-based hospital payment on the stroke patients in Jiaozuo area of Henan province. In the following, we simply use "DRGs" to refer to "CHS-DRGs". Some influence factors of in-hospital cost of the stroke patients are analyzed via multivariate linear regression analysis, which is one of big data algorithms [11], such as age, gender, LoS, and outcomes. In addition, contrary to the existing works, we also study the distributions of length of stay (LoS) for DRGsbased stroke patients since they reveal the burden not only for stroke patients, but also for medical resources for hospitals. is also presents an incentive for hospital to reassign the medical resources for different stroke patients. e synthetization indicates that DRGs can save the medical costs, improve medical service quality, and reduce LoS. Finally, we carry out the analysis of the impact of outcomes on the in-hospital cost with boxplots.

Description of Data.
is study is conducted based on the data collections of the First Affiliated Hospital of Henan Polytechnic University of Jiaozuo; 3,590 discharged stroke cases from 1 January 2019 to 31 December 2019 were selected. In order to avoid the extreme casemix, four cases are ignored: (1) LoS is smaller than 1; (2) LoS is larger than 60; (3) the in-hospital cost is smaller than 7000; and (4) the inhospital cost is larger than 300,000. We select the data of 2019 since the Department of Human Resources and Social Security of Henan Province (DHRSSHP) has issued the notification that DRGs have been determined as the payment of Henan public hospitals of DHRSSHP [44]. e extracted data includes admission number, age, gender, occupation, dates and modes of admission and discharge, admission condition, expense, and diagnostics of cerebral infarction (ICD-10 codes: I60-I63). In addition, patients transferred from other hospitals and dying before discharge are excluded in the extracted data.

Statistical Analysis of Data.
e objective of this paper is to analyze the effects of pertinent influence factors of DRGsrelated stroke patients on the in-hospital cost in Jiaozuo public hospitals of Henan province. As in China Healthcare Security-DRGs (CHS-DRGs) [45], the data of the stroke patients can be divided into four stroke-related groups based on the major diagnosis, LoS, admission mode, type of stroke, and outcomes. According to the above discussion, the number of effective data is 3,590. e characteristics of stroke-related groups in Jiaozuo hospital are illustrated in Table 2. In this study, readmission and priority patients are omitted since the number of patients of this stroke kind is only two. For the purpose of convenience, LQ and UQ represent lower quartile and upper quartile, respectively. e average payment includes cure fee, bunk fee, western medicine fee, Chinese medicine fee, examination fee, emission fee, surgery fee, lab fee, inspection fee, sanitary materials fee, and other fees.
Groups I60-I62 are of the type hemorrhage, while group I63 is of the type ischemic. As shown in Table 2, I61 has the most LQ of LoS, UQ of LoS, and median of LoS, which are, respectively, 12 days, 34 days, and 22 days. en, LQ of LoS, UQ of LoS, and median of LoS in I62 are 11 days, 29 days, and 21 days, respectively. I63 has the least LQ of LoS, UQ of LoS, and median LoS of 8 days, 17 days, and 14 days. Although I60 has smaller LQ of LoS (9 days), UQ of LoS (24 days), and median LoS (17 days) than those of I61 and I62, it has the most expensive group due to larger treatment fee, western medicine fee, and sanitary materials fee. e expenses of the other three groups are ¥40,481.09, ¥45,414.56, and ¥18,644.21, respectively. Compared with I60, I61, and I62, I63 has a large proportion of outpatient service. e outpatient service and emergency treatment account for 62.27% and 37.73%, respectively.
From Table 2, it is clearly shown that all groups have large proportions of cure and improvement. Compared with other groups, I62 has the largest death rate, about 5.71%, which is about twice larger than that of I61 and about six times that of I63. Finally, as can be seen, the ischemic group has the least in-hospital cost compared to other groups due to the least LoS and largest improvement.

Measurement of LoS.
To analyze the effects of DRGsrelated stroke on the LoS and in-hospital cost, we investigate the distribution of the LoS of in-hospital stroke patients, which can be divided into one most expensive group (I60), two more expensive groups (I61 and I62), and one least expenditure group (I63). In light of this fact, the distributions of LoS of four groups are presented in the form of histograms to distinguish the empirical distributions of LoS. To this end, the in-hospital patients can be divided into hemorrhage stroke and ischemic stroke. e distributions of LoS of the above two types are illustrated by the histograms, which can distinguish the empirical distribution of LoS of the two types. e effects of age, gender, and LoS on the cure, improvement, and death for the two groups are analyzed. It is noted that the other terms are removed since they include many kinds of cases.

Measurement of Outcomes.
For the purpose of evaluating the performance of the hospital treating of the stroke patients, four outcomes are considered, namely, cure, improvement, unhealed, and death. e above cases are computed on the basis of discharge mode. Moreover, we investigate the distributions of cure, improvement, unhealed, death, and others according to the type of the stroke (hemorrhage and ischemic), as shown in Table 3. is comparison is adopted for the following two reasons: (1) stroke is classified into hemorrhagic and ischemic, which can keep representative datasets; (2) some sampling errors can be controlled by avoiding the cases of small number of stroke patients. erefore, unhealed of these two groups is removed due to very small number of the stroke patients. e variation, the central tendency, and the outliers for the outcomes of the stroke patients are evaluated by utilizing boxplots. On this basis, the variation is characterized by the first quartile and the third quartile, while the central tendency is measured by the median. e outliers capture the data points outside the boxplot whiskers.

Results
In this section, we will explore the influence factors of hospitalization expenses of the stroke patients. First, considering age, gender, LoS, and outcome as dependent variables, we study the effects of DRGs of stroke patients on the hospitalization expense via multivariate linear regression analysis.
en, the distributions of LoS of DRGs-related stroke patients for I60, I61, I62, and I63 are discussed. Finally, we present the effects of outcomes of the stroke patients on the hospitalization expenses.

Stroke-Related DRGs.
In order to obtain more insights, by utilizing multivariate linear regression analysis [46], we carry out the analysis of influence effects on the total inhospital costs of the DRGs stroke patients as shown in Table 4. In this table, we take total in-hospital costs as the independent variable and the patient's age, gender, LoS, and outcomes as the dependent variables. Using multivariate linear regression analysis, we can conclude that although the unstandardized coefficient of gender is not statistically significant, the multiple linear regression model has a good agreement with R � 0.539, R 2 � 0.291, and ∆R � 0.291 due to a P value less than 0.05. In addition, the unstandardized coefficients of age, LoS, and outcomes are statistically significant. Figures 1-4, the mean LoS of DRGs-related stroke inpatients for I60, I61, I62, and I63, based on the stroke patient data from the First Affiliated Hospital of Henan Polytechnic University, is 17, 22, 21, and 14 days, respectively. In Figures 1-4, we show the histograms of density of LoS for different groups according to DRGs. From Figures 1 and 3, we can find that there are some vacancies due to the small number of the stroke patients. e histograms in Figures 1-3 almost have the same peak of LoS, which is 13.30%, 13.30%, 12.58%, respectively, while the density of Figure 4 is about 19.95%. is means that ischemic stroke patients have higher peak of LoS than the other three groups. Finally, we can also conclude that DRGs classification of the stroke patients, using LoS as a grouping variable, manifests not only a reduced mean LoS but also a peculiar shape of the distribution of LoS. It happens that most stroke patients stay in hospital for all the groups between 10 and 20 days.

Analysis of Outcomes on Hospitalization Expenses.
In this subsection, the boxplots are provided to demonstrate the impacts of outcomes on the in-hospital cost. For analytical accuracy, we remove other terms since they include different influence factors. In addition, the values of hospitalization expenses larger than 200,000 and smaller than 7,000 are removed. Figures 5-8 show boxplots of the hospitalization expenses versus different outcomes, namely, cure, improvement, unhealed, and death. As shown in Figures 5-8

Discussion
Recently, the rapid increase of in-hospital cost has become a common problem in our country. Stroke has become the first leading cause of death among urban and rural residents in China. It not only leads to the loss of labor ability, but also dramatically increases the cost of diagnostic hospitalization. Its burden of disease puts enormous economic pressure on families and society. In addition, payment systems have significant effects on treatment behaviors for stroke patients and hospitals. DRGs are effective ways to improve quality of service and reduce unnecessary medical resources, which has been identified as an important direction for China's medical reform, and DRGs have been adopted as the main payment system [46]. erefore, it is of great significance to analyze the influencing factors of stroke hospitalization expenses for social and economic benefits. Based on the multivariate linear regression analysis, we show that stroke is a common cardiovascular disease in   Journal of Healthcare Engineering Jiaozuo area. Among all the influence factors, gender is the least one, which has no effect on the hospitalization expense.
In the future hospital payment system, gender should not be included in group pricing of the payment standard. Moreover, age, LoS, and outcomes are the three significant influence factors that should be included in the payment standard. Finally, we can conclude that the DRGs yield perfect coefficient of variation (CV) and reduction in variation (RIV) results for the medical expense control, which is consistent with the existing literature [47]. Motivated by this, DRGs are the main directions of Jiaozuo's medical reform.
In general, LoS can be classified into value-adding patient days and non-value-adding patient days [48]. e value-adding patient days refer to the days that are meaningful to the diagnosis and prognosis of patients and that patients or payers of medical expenses are willing to pay their expense. Non-value-adding patient days refer to those days that are not necessary, just increasing the cost of hospitalization, and are meaningless to patient's diagnosis.
us, LoS is an important influence factor of hospitalization expenses of the stroke patients. Note that throughout this paper we use LoS to refer to value-adding patient days. By optimizing medical source allocation and improving medical service, LoS can be shortened and then hospitalization expense can be reduced. LoS of most stroke patients is between 10 and 30 days, and the peaks of density of LoS for hemorrhage and ischemic stroke are lower than 14% and 20%.
Among influence factors, outcomes have significant impacts on the hospitalization expenses, among which improvement and unhealed are the most important ones.
is is because improvement has large interquartile intervals and unhealed has the largest mean. erefore, it is of great significance to analyze the impact of outcomes on hospitalization expenses in Jiaozuo area of Henan province.

Conclusion and Future Work
e economic conditions and level of development of China do not meet the regulatory requirements for full implementation of DRGs. e improved DRGs schemes have been explored in some cities of China, and they will be gradually piloted across most provinces of China. Jiaozuo of Henan province, as a pilot province of DRGs, has made a lot of important explorations. Under the conditions of the existing resources, however, beginning to screen for common and frequent diseases and investigating the related diseases in each disease group and the key factors for these diseases affecting the medical expenses, according to the standardsetting process simplified based on standardization of clinical path and the accurate cost accounting, will greatly improve the execution efficiency of related diagnosis of Jiaozuo in Henan province.
Utilizing big data related algorithm, this work analyzes the influence factors of DRGs-based stroke patients on inhospital costs via DRGs; however, how to design a DRGs model for the payment of Jiaozuo hospital will be our further work. In addition, aiming to further enhance the accuracy of the analysis, some advanced machine learning algorithms should be involved, such as decision tree, neural network, and support vector machine. Our analysis methods can be extended to other chronic diseases (hypertension, coronary heart disease, cancer, and diabetes), which are set aside as our future work.

Data Availability
e data used to support this study are available from the corresponding author upon request.