Multivariate Radiological-Based Models for the Prediction of Future Knee Pain: Data from the OAI

In this work, the potential of X-ray based multivariate prognostic models to predict the onset of chronic knee pain is presented. Using X-rays quantitative image assessments of joint-space-width (JSW) and paired semiquantitative central X-ray scores from the Osteoarthritis Initiative (OAI), a case-control study is presented. The pain assessments of the right knee at the baseline and the 60-month visits were used to screen for case/control subjects. Scores were analyzed at the time of pain incidence (T-0), the year prior incidence (T-1), and two years before pain incidence (T-2). Multivariate models were created by a cross validated elastic-net regularized generalized linear models feature selection tool. Univariate differences between cases and controls were reported by AUC, C-statistics, and ODDs ratios. Univariate analysis indicated that the medial osteophytes were significantly more prevalent in cases than controls: C-stat 0.62, 0.62, and 0.61, at T-0, T-1, and T-2, respectively. The multivariate JSW models significantly predicted pain: AUC = 0.695, 0.623, and 0.620, at T-0, T-1, and T-2, respectively. Semiquantitative multivariate models predicted paint with C-stat = 0.671, 0.648, and 0.645 at T-0, T-1, and T-2, respectively. Multivariate models derived from plain X-ray radiography assessments may be used to predict subjects that are at risk of developing knee pain.


Introduction
Knee pain is the most common and disabling symptom of Osteoarthritis (OA) [1,2]. This disease affects 1 in every 10 adults over 60 years in the United States and the rate of incidence is incrementing due to changes in lifestyle and life expectancy [3][4][5][6][7]. The prevalence and the symptomatic importance of pain in OA subjects make pain prediction a very important task for the management of OA patients. Pain is a late manifestation of a pathological change in joint tissues; therefore, the early detection of pathological process may be used to determine who is at risk of developing OA related pain. This early detection of the underling pathology may be possible with the aid of noninvasive procedures like medical imaging. Medical imaging has proved to be a very important and effective tool in OA diagnosis; it is also the most common first-hand information for physicians and a probed form to obtain a good approach to OA staging [8][9][10][11][12][13].
Due to its maturity, simplicity and broad base deployment of X-Ray, it is the primary medical imaging modality used in OA diagnosis and staging. Radiological OA has been defined as subjects presenting bone alterations (osteophytes) and reduced joint space [14]. This findings have been correlated to joint symptoms of pain and stiffness [15]; but the bony changes prognosis power have not been properly studied in longitudinal studies [16][17][18][19]. The biggest challenge facing radiological correlation to symptomatic OA is the multifactorial source of joint pain and the subjective perception of pain [20]. Other challenge has been the lack of standardized image assessment procedures that allow a proper evaluation and comparisons of OA studies. To overcome these limitations, validated subject questionnaires [21,22] and standardized image assessments have been developed [23][24][25][26].
The Osteoarthritis Initiative (OAI) has been recollecting thousands of clinical data in OA patients, subjects at risk, and control subjects using validated questionnaires and standardized image assessments procedures. The OAI effort brings very important information that will offer a better understanding of the disease process.
In this work, the OAI X-ray quantitative image assessments of joint space width, the central reads of Kellgren and Lawrence (K&L), and the Osteoarthritis Research Society International (OARSI) scores are explored in their association to concurrent and future knee pain. The number of radiological findings as reported by the OAI central image assessments is large: osteophytes, bone attrition, and reduced joint space at the medial and lateral aspects of the joint. The OAI quantitative image analysis of joint space also provides a set of measurements that makes data exploration a challenge. This large array of radiological features and its association to pain cannot be handled by simple statistical analysis tools. Advanced feature selection and bioinformatics tools provide a proven method to handle this complex issue [27][28][29][30]. These advanced methods automatically build simple multivariate models that best describe the association of radiological features with pain.
This work explores the use of this bioinformatics tools to build radiological multivariate models of future joint pain and concurrent joint pain, with the objective of finding what radiological features or models can be used in to determine which set of people with radiological OA findings is at greater risk developing knee pain. This paper is organized as follows; after introduction, the patients selection and methods of selection are explained, in Data Acquisition, the process of image feature acquisition is presented; in Statistical Analysis, the complete transformations and data analysis are explained; in Results, the tables with the numerical results are presented; finally, Discussion, Conclusion, and the future work are presented.

Patients Methods
Study Population. "Data used in the preparation of this article were obtained from the Osteoarthritis Initiative (OAI) database, which is available for public access at http://www .oai.ucsf.edu/datarelease/." All subjects were selected from OAI databases. Based on the available information, this study was designed for right knee only.
Being a pain prediction study, the development of chronic pain in the right knee was used as the variable to look at. Using the five-year screening information, a group of subjects was selected. All subjects should not present chronic pain as a symptom in their baseline visit and should not have been medicated for pain.
Control subjects were selected under the criteria of the following: not presenting pain as a symptom since the baseline visit to 60-month visit, not presenting a symptomatic status since the baseline visit to 60-month visit, and taking no pain medication from the baseline visit to the 60-month visit. Case subjects were selected under the criteria of the following: not presenting pain as a symptom at baseline visit, not presenting a symptomatic status at baseline visit, taking no pain medication at the baseline visit, and developing chronic right knee pain in some time point after baseline and up to 60-month visit.
Only the subjects with a complete quantitative or semiquantitative X-ray assessment screening were included in the final test. Due to this last condition, two different groups were created, one for quantitative study and one for semiquantitative study. All demographic information is presented in Table 1, and the selection process is described in detail in Figure 1.

Data Acquisition
In this analysis, right knee assessment from the OAI datasets, "central assessment of longitudinal knee X-rays for quantitative JSW" version 1.6, was right knee assessment from OAI dataset "central reading of knee X-rays for K-L grade and individual radiographic features of knee" version 1.6, and the outcome information was chronic pain, defined by the question in the OAI dataset "right knee symptom status." This information was preanalyzed by two different radiologist groups associated with the OAI; one group evaluated the images using the OARSI quantitative grading scale [25,31] and the semiquantitative K-L grading scale [26,32,33]. In Table 2, a description of the assessed features and their IDs are presented.
All X-ray images were assessed using the OAI method; automated computational software and an external reader delineate the margin of the femoral condyle and the tibial plateau; in Figure 2, an example of the software output is presented.
Using an anatomical coordinate system, an objectivelocation is determined. In Figure 3, an example of the reader line is presented. According to OAI information, a study of longitudinal knee radiographs suggested that = 0.2 mm to = 0.275 mm may be the optimal range for measuring medial JSW( ); an example of this measurements is presented in Figure 4.
All semiquantitative variables assessed for this work included the standard OAI protocol. This vendor includes Kellgren and Lawrence (K&L) grades, individual radiographic features (IRFs) such as osteophytes, and joint space narrowing in specific anatomic locations, based on published atlases.
In general, two expert readers independently assessed each film, blinded to each other's reading and to a subject's clinical data. Baseline and follow-up films were scored while being viewed simultaneously and with the readers blinded to chronological order of the images with the baseline film known and follow-up films randomly ordered.

Statistical Analysis
For quantitative and semiquantitative data, using the time of pain incidence as a marker, three different groups were built: T-0, using the radiological data of the subject at the moment of chronic pain development; T-1, using the radiological data on the subject a year before chronic pain development; and T-2, using the radiological data of the subject two years before chronic pain development. Seventeen quantitative variables and nineteen semiquantitative variables were explored in this work.      For the data analysis, the groups were analyzed using univariate and multivariate techniques. In both cases, for quantitative data, allometric association of joint height and gender to joint space width was adjusted using a linear regression [34], a common technique in related literature. All quantitative data was Z normalized using the rank inverse normal transform [35] using the standard levels of normalization reported in literature [36].
Seventeen quantitative features were measured in right knee assessments; the description of the features is shown in Table 2. To avoid the gender bias, all image features from the quantitative datasets went through a height and gender adjustment using a linear regression presented in the following: where JSWadj represents the adjusted measurement, JSW is the original measurement, and 0 , 1 , and 2 are the coefficients obtained from the linear regression.
Due to the nature of the distribution of the binary outcome variable, in both cases (quantitative and semiquantitative), the univariate analysis was performed using logistic regression as a cost function using all features presented in Table 2. A general linear model, odds ratios, and the area under the Receiver Operating Characteristic (ROC) curve (AUC) were calculated on each feature; the ROC curve was constructed for each quantitative analysis, and the curve is a graphical representation of the sensitivity against 1-specificity for a binary classifier system as the discrimination threshold is varied. Semiquantitative and quantitative data were analyzed independently. In both analyses, the different groups determined by the time of impact were tested independently to avoid bias.
In multivariate analysis, in order to select the best combination of features for the quantitative and semiquantitative prediction models, a multivariate search strategy was performed using elastic-net regularized generalized linear models as a classifier (LASSO) [37][38][39], with a 10-fold cross validation as a feature selection strategy; this method is commonly used in classification works. Accuracy, AUC, -stat, and confusion matrix were obtained; in order to minimize the residual error of the prediction model, the lambda used in this research was chosen at lambda = lambda⋅min [37]. Lambda for quantitative was 0.037, 0.029, and 0.069, at T-0, T-1, and T-2, respectively. Lambda for semiquantitative was 0.062, 0.062, and 0.048, at T-0, T-1, and T-2, respectively.
All the statistical analysis was performed using R software and packages [40].

Results
The statistical description trough the time points of each quantitative and semi quantitative features are presented in Tables 3 and 4. In the univariate analysis, all quantitative features showed not to be predictive by it self. In the semi-quantitative features, the "Osteophytes (OARSI grades 0-3) femur medial compartment (XROSFM)" showed to be predictive. In Tables  5 and 6 the complete statistical results of all the individual features are presented.
Using multivariate analysis of quantitative data three predictive models were obtained. For the time of pain incidence, a six features predictive model obtained the best accuracy and AUC. In the one year before pain incidence, a two feature predictive model obtained the best accuracy and AUC. In the two years before the pain incidence a two features predictive model obtained the best accuracy and AUC, the resulting curves are presented in Figure 5. In Table 7, a complete results and statistical analysis of each model is presented.
Using multivariate analysis on semi-quantitative data three predictive models were obtained. For the time of pain Computational and Mathematical Methods in Medicine 5   incidence, a four features predictive model obtained the best accuracy and C-stat. In the one year before pain incidence, a two feature predictive model obtained the best accuracy and C-stat. In the two years before the pain incidence a three features predictive model obtained the best accuracy and Cstat. In Table 8, a complete results and statistical analysis of each model is presented.

Discussion
This case-control longitudinal analysis of subjects with chronic right knee pain found an association between radiographic evidence of early OA changes and the future onset of chronic pain symptoms. Specifically, it was found that particular radiological changes in knee anatomy are present at least two years in advance of the onset of chronic pain for a selected group of patients. Therefore, these results may indicate that specific changes in joint space and bony structure are risk factors for the future development of OA related pain. These findings reinforce the conclusions of several population-based studies that have reported that persons with radiographic knee OA are at higher risk of pain development compared to persons without radiological OA [11-13, 16, 18, 41, 42]. The reported radiological features may be added to the well-known risk factors of OA severity like varus-valgus mal-alignment [43]. The quantitative driven multivariate predictive models presented in Table 7 may indicate an association between the medial cartilage abnormalities and the chronic pain. The presence of lateral and medial osteophytes in the semiquantitative multivariate models reported in Table 8 was associated with chronic pain development. The changes in the medial JSW (JSW = 0.275 or = 0.300), bony damage, and Chondrocalcinosis were present two years before the pain occurrence. The individual features (Tables 5 and 6) were not as predictive as the multivariate models as expected given   the fact that OA is a whole organ disease that affects several tissues at the same time [44]. When comparing these results to our previous efforts [45,46], we saw an increase in the AUC from 0.652, 0.617, and 0.674 to 0.695, 0.623, and 0.620, at T-0, T-1, and T-2 time points. Furthermore, the process will take less than 2 min of computation time contrasted to the 48 hrs of computation using the same machine. The models obtained using LASSO were more stable since the process is deterministic compared to the stochastic nature of the genetic algorithm of the original work.
There are several limitations to our study. First of all, pain is a subjective outcome that changes from person to person. Second, we limit the inclusion to subjects that were not taking pain medications; therefore, the number of those developed pains during the observation period was small. Third, we limit the exploration to right knee findings, and unilateral pain may be affected by the symptoms of the contra lateral knee. Given these limitations, we cannot generalize the findings and the external validation of the results is required to assess the clinical applicability of the models.

Conclusions
Even though pain is a very complex and subjective clinical outcome, the systematic analysis of objective radiological 8 Computational and Mathematical Methods in Medicine  features was able to find a multivariate model that indicates that there are certain anatomical features that preceded the development of knee pain. A biomarker based on those features may be used to help physicians to choose the best therapy or course of action for patients that present those features. This represents a great area of impact especially in developing countries, where access to the high level of health care system is very restricted. Based on these results, it is evident that multivariate models obtained by computational methods can make better use of radiological characteristics, increasing the chance for the future development of an effective computer assisted diagnosis and/or treatment selection system.