Evidence-Based Cutoff Threshold Values from Receiver Operating Characteristic Curve Analysis for Knee Osteoarthritis in the 50-Year-Old Korean Population: Analysis of Big Data from the National Health Insurance Sharing Service

We aimed to investigate the characteristics of patients with osteoarthritis (OA), using the data of all Koreans registered in the National Health Insurance Sharing Service Database (NHISS DB), and to provide ideal alternative cutoff thresholds for alleviating OA symptoms. Patients with OA (codes M17 and M17.1–M17.9 in the Korean Standard Classification of Disease and Causes of Death) were analyzed using SAS software. Optimal cutoff thresholds were determined using receiver operating characteristic curve analysis. The 50-year age group was the most OA pathogenic group (among 40~70 years, n = 2088). All exercise types affected the change of body mass index (p < 0.05) and the sex difference in blood pressure (BP) (p < 0.01). All types of exercise positively affected the loss of waist circumference and the balance test (standing time on one leg in seconds) (p < 0.01). The cutoff threshold for the time in seconds from standing up from a chair to walking 3 m and returning to the same chair was 8.25 (80% sensitivity and 100% specificity). By using the exercise modalities, categorized multiple variables, and the cutoff threshold, an optimal alternative exercise program can be designed for alleviating OA symptoms in the 50-year age group.


Introduction
According to the database (DB) of the Health Insurance Review and Assessment Service, the proportion of patients with diseases of the musculoskeletal system was 50% in 2003 and 71% in 2013. In particular, knee diseases were the 10th most common cause of hospitalization, involving costs of about 2500 USD per person and ranking second in terms of total medical expenses (http://www.hira.or.kr). In addition, medical treatments take an average of 8 days to complete. Furthermore, the number of patients with OA increases steadily and OA pathogenesis shows a pattern of occurring even in younger age groups, which necessitates the establishment of prognostic and treatment programs for knee OA.
Several trials to identify multiple causes of knee OA have been performed to establish individualized and specific treatments for curing OA symptoms [1,2].
As many cases have an unspecified etiology (causality; i.e., OA has multiple causes such as knee anterior cruciate ligament, meniscus damage, and quadriceps muscle weakness), providing individualized solutions for OA is difficult [3][4][5]. Because the causality between knee damage and OA cannot be clearly identified and it is difficult to find solutions for OA (many factors lead to aggravation of knee OA), prevention of OA is considered the ideal option [6]. To reach a reasonable solution to this issue, understanding the various characteristics of OA from diverse aspects is important to determine related factors affecting the pathogenesis of the 2 BioMed Research International The total analyzed population from the NHISS DB showed sex differences in demographic characteristics. NHISS DB, National Health Insurance Sharing Service database; BMI, body mass index; SGOT AST, serum glutamic oxaloacetic transaminase and aspartate aminotransferase; SGPT ALT, serum glutamic pyruvic transaminase and alanine aminotransaminase; Gamma GTP, gamma glutamyl transpeptidase.
disease and to decide appropriate optimal cutoff threshold points for improving prognostic programs (e.g., exercise prescriptions).
Big data analysis of health information is possible, as health-care providers mandatorily register their patient's injury or disease information through the National Health Insurance Sharing Service (NHISS) for reimbursement of medical service costs in Korea. The NHISS DB provides standardized health and medical information from unilateral medical check records of the entire Korean population. This DB was electronically organized, digitalized, and formatted from 2002 to 2013.
To address the above-mentioned issues and provide beneficial information about patients with OA for ameliorating OA symptoms from various aspects by using NHISS big data, we first set the following research objectives.
In this study, we aimed to examine the most prevalent ages of patients with OA among the elderly Korean population and characterize various features of OA according to four different exercise types, in order to provide evidence-based proper cutoff points for ameliorating OA pathogenesis. We examined three hypotheses in this study, as follows: (1) There would be specific age ranges of pathogenic OA in the elderly Korean population. (2) Variables related to comprehensive and physical function will show the overall characteristics of OA in relation to different exercise modalities. (3) There are appropriate cutoff thresholds for variables of pathogenic OA, and these can provide beneficial information for designing a reasonable prognostic program for improving OA symptoms. The demographic, exercise, and functional test characteristics of each age group were subsequently investigated. The flow of the study is described in Figure 1.

Data Source and Subject Population.
A total of 514,866 patients (approximately 10% of the whole population aged >40 years from the NIHSS DB pool) were randomly selected from patients registered in the NHISS. After the age of 40 years, Korean adults mandatorily need to undergo life cycle-based health checks, and the NHISS DB stores their data including exercise information and basic physiological health screening data (Table 1). An individualized design based on the NHISS data of patients aged > 40 years was used because OA-related symptoms with clinical evidence were observed at this age [7]. Patients with OA codes M17 and M17.1-M17.9 in the KSCDCD (http://kssc.kostat.go.kr/ksscNew web/ index.jsp) were extracted by using SAS software (   (3)) in the most OA pathogenic age group (hypothesis (1)).

Statistical Analysis.
All data are presented as mean ± standard deviation. In the most OA pathogenic group, reciprocal effects and interactions were investigated by analyzing the established independent and dependent variables to understand the characteristics of OA. Statistically well-defined variables were then selected to provide the optimal cutoff thresholds in the most OA pathogenic patient group. As described in the Categorization of Variables section, the four exercise types were indicated as independent variables, and the 10 comprehensive and 5 physical functionrelated variables were considered dependent variables. Each of the four exercise types was tested for its effectiveness and interaction with the 15 dependent variables by using two-way univariate analysis of variance (ANOVA) in the 50-year OA group (to verify hypothesis (2)). By using logistic regression between patients with and without OA in the 50-year age group, risk factors according to the existence of OA were found and these variables were analyzed by using receiver operating characteristic (ROC) curve analysis to determine the cutoff points for the optimal threshold for avoiding OA pathogenesis (to examine hypothesis (3)). SAS version 9.4 (SAS Institute, Cary, NC, USA) and SPSS version 18.0 were used for all statistical analyses. A value of < 0.05 was considered to indicate a statistically significant difference in all analyses.

Pathogenesis of OA in Each Age
Group. The subjects, aged from their 40s to 70s, were divided into four age groups with increments of 10 years (40-49, 50-59, 60-69, and 70-79 years). The linear function graph was verified to comparatively investigate the frequency of OA pathogenesis in each group (Figure 2). The angle of the graph for the 40-49-and 50-59-year age groups was 7.4 ∘ and 8.9 ∘ , respectively, and these groups showed an increasing OA pathogenesis pattern; however, the 60-69-year (angle, −3.6 ∘ ) and 70-79-year (angle, −17.7 ∘ ) age groups showed reversely decreasing values to patient numbers. Thus, we clarified our first hypothesis based on the observation that OA pathogenesis occurred most frequently in the 50-year age group.

Four Independent Variables (Exercise Modalities) on Comprehensive Variables. A sex difference was observed only in Walk30
Wek Freq ID according to age difference within 50∼ 59-year age group ( < 0.01). All exercise types significantly affected the change of BMI; however, Walk30 showed a sex difference ( < 0.05). The interaction between sex difference and different exercise types was effective in changing BMI ( < 0.01). Sex difference affected BP in all exercise types ( < 0.01); however, exercise frequency did not seem to affect the change of BP in all exercise types. The four exercise types seemed more related to hypertension, cardiopathy, stroke, diabetes, cancer, and others than to tuberculosis, hepatitis, and hepatism in HCHK PMH CD2. Higher frequencies of exercise did not seem to affect the change of disease history; however, it depended more on the type of exercise ( < 0.01).
Male patients with OA aged 50∼59 years had higher values of SGOT AST and SGPT ALT than female patients with OA. Higher frequency of exercise showed lower values of SGOT AST and SGPT ALT. Significant differences of sex and exercise type and interactions between sex difference and exercise type were found in SGOT AST and SGPT ALT ( < 0.01). The highest value of SGOT AST was seen with the least frequency of male exercise. TOT CHOLE in men  was lower than that in women ( < 0.01). Only Exerci Freq had no relation to TOT CHOLE ( < 0.01). For improving TOT CHOLE, Mov20 or Mov30, not Exerci Freq, may be a better option. Figure 3(b) shows that daily Exerci Freq for improving TOT CHOLE was more effective in men. All types of exercise significantly affected the changes of WAIST ( < 0.01).

Figures 3(a) and 3(b)
show representative variables among 10 variables that had clearly common patterns according to the relationship to sex difference in Exerci Freq.

Physical Function-Related Variables according to Independent Variables (Four Exercise Types).
All exercise types affected the changes in ELD LLFX SEC ( < 0.01), with patterns showing that daily exercise seemed to decrease the time in ELD LLFX SEC (Figure 3(c)). In ELD LLFX YN (presence or absence of gait disability), patients with OA with gait disability did not exercise at all, whereas all patients with OA without gait disability performed exercise. Sex difference was seen in all types of exercise ( < 0.01) except Mov20 Wek freq. The balance test ELD STF SEC was affected by all exercise types ( < 0.01). Another balance test, ELD STFX MTHD, showed a significant difference in the effect of all exercise types ( < 0.05). Sex difference, exercise type, and the interaction between sex difference and exercise type had significant differences in FALL ( < 0.01). Figure 3(d) shows that female patients with less exercise experienced falls more frequently, and this difference showed a decreasing tendency as the women exercised more (Figure 3(d)). A clear difference was observed between the daily exercise and less exercise groups (Figure 3(d)). Panels (c) and (d) of Figure 3 represent similar patterns of each variable in the four exercise types.
We thus examined hypothesis (2) and concluded that there are specific high-risk dependent variables with the effect of exercise modalities in the incidence of OA.

Which Variables Affect OA Pathogenesis and What Are the Evidence-Based Optimal Cutoff Points of the Variables for Preventing and Alleviating OA Symptoms?
Randomly selected patients without OA registered in the NHISS were compared with patients with OA for logistic regression analysis. Among the variables, we found that FALL ( < 0.05) and ELD-LLFX-SEC ( < 0.01) significantly affected the pathogenesis of OA (Table 3). For 50-year-old patients with OA, ROC curve analysis provided ideal cutoff values for ELD LLFX SEC as 8.25 s (80% sensitivity and 100% specificity, area under the curve 0.861) (Figure 4). FALL did not have a meaning for the cutoff value because the responses for FALL were 1 (no experience of falls) or 2 (with experience of falls).   Hypothesis (3) was also clearly examined by using these results.

Discussion
We obtained the following OA-related results from patients registered in the NHISS by clustering the retrospective cohort of Koreans aged > 40 years.
(1) OA pathogenesis was found most frequently in the 50-year age group. (2) The four analyzed exercise types affected the change of BMI ( < 0.05) and affected BP with a sex difference ( < 0.01); however, exercise frequency did not affect the change of BP. Exercise type and exercise frequency were important for SGOT and SGPT. Only (3) The optimal cutoff threshold for ELD LLFX SEC was 8.25 s (80% sensitivity and 100% specificity).

Big Data Analysis as a Useful
Tool. NHISS is a worldwide unprecedented DB because all Korean citizens are mandatorily registered in this DB. Therefore, the entire medical history of the whole Korean population can be traced at any time [8].
Use of the NHISS DB overcomes the flaws of previous data sources such as private information and the limited contents of large-scale DBs. This DB allows researchers to access cross-sectional, retrospective, and prospective studies of each individualized patient derived from the DB cohort.
By using the large-scale NHISS DB, we were able to extract patients with OA according to the study design, and we were able to compare different age groups in terms of the pathogenicity of OA. In this study, we found that OA pathogenesis occurred most frequently in the 50-year age group among the randomly selected Koreans with OA aged > 40 years ( = 2088; nonoverlapping patients with only one prescription for OA) from the total cohort of 514,866 patients in the NHISS DB. Interestingly, the 60-and 70-year age groups were oppositely different from the 40-and 50-year age groups. The number of patients with OA in the 60-70year age groups unexpectedly declined, and we presumed that patients in these age groups easily neglect visiting the hospital or have a higher mortality rate than the younger age groups, which affected our results. We believe that this DB will be useful in promoting national health and welfare.

Individualized Exercise Prescription for
Reducing the Incidence of OA. A specific exercise prescriptive program based on these data can be designed. BP, cholesterol, and HCHK PMH showed a pattern that is more related to the specific intensity, time, and frequency of exercise. The results showed that tailor-made exercise prescriptions are possible, and thus this study evidences that the DB can be an indispensable tool for improving the symptoms of OA. Our results comprehensively include microlevel to macrolevel data, from blood tests to physical function tests. Most modernized countries have emerging issues about experiences of falls in the aging society [9]. In Korea, 13% of 828 urban and 32% of 2295 rural elderly dwellers experienced falls in 1 year [9,10]. The socioeconomic cost from fall injuries was calculated to be 343,614,988,000 Korean won [11]. To relieve this socioeconomic and individual burden, a more specially individualized exercise prescription can be an ideal option. According to this study, falls more often occurred in 50-yearold women with OA than in 50-year-old men with OA; thus, performing any kind of exercise on a daily basis can be an ideal option for improving fall symptoms in patients with OA.

Optimal Cutoff
Threshold for OA Pathogenesis. The finding that OA occurred most frequently in the 50-year age group was used in further analysis to identify factors influencing the pathogenesis of the disease. We found that FALL and ELD LLFX SEC ( < 0.01) were associated with the occurrence of OA in the 50-year age group, and we suggest that this is associated with the muscle strength of the lower limb [12] ( Table 3).
The results of four static and dynamic physically functional tests were examined in this study. Interestingly, only ELD LLFX SEC was significantly associated with the 50-year age group of patients with OA but not the three other similar functional tests ( < 0.01). This suggests that physically dynamic weight bearing load (ELD LLFX SEC) rather than static weight bearing load (ELD STF SEC; ELD STFX MTHD) beneficially suppresses damage on the knee joints [13]. According to the provided cutoff value, an immediate lower-limb strengthening exercise program should be developed to improve OA symptoms [14].
In conclusion, we obtained the following multifaceted results from the large-scale analysis of the NHISS DB, which can help in developing evidence-based OA preventive methods such as individualized exercise prescription programs.
(1) Pathogenic OA was found most frequently in the 50year age group. (2) All four exercise types investigated affected the change of BMI ( < 0.05) and had a sex-different effect on BP ( < 0.01); however, exercise frequency did not affect the change of BP. Exercise type and exercise frequency were important for SGOT and SGPT. Exerci Freq only did not affect the change of cholesterol ( < 0.01), and all exercise types affected loss of WAIST ( < 0.01). All types of exercise positively affected the balance test ( < 0.01). (3) The optimal cutoff threshold for SGOT AST and ELD STF SEC was 18.75 (88% sensitivity and 85% specificity) and 14.75 (75% sensitivity and 100% specificity), respectively.