Item Response Theory Analysis of Two Questionnaire Measures of Arthritis-Related Self-Efficacy Beliefs from Community-Based US Samples

Using item response theory (IRT), we examined the Rheumatoid Arthritis Self-efficacy scale (RASE) collected from a People with Arthritis Can Exercise RCT (346 participants) and 2 subscales of the Arthritis Self-efficacy scale (ASE) collected from an Active Living Every Day (ALED) RCT (354 participants) to determine which one better identifies low arthritis self-efficacy in community-based adults with arthritis. The item parameters were estimated in Multilog using the graded response model. The 2 ASE subscales are adequately explained by one factor. There was evidence for 2 locally dependent item pairs; two items from these pairs were removed when we reran the model. The exploratory factor analysis results for RASE showed a multifactor solution which led to a 9-factor solution. In order to perform IRT analysis, one item from each of the 9 subfactors was selected. Both scales were effective at measuring a range of arthritis SE.


Introduction
The benefits from physical activity to improve arthritis outcomes are well established [1][2][3][4][5]. High self-efficacy (SE) has been shown to be associated with better arthritis health outcomes including adherence to physical activity recommendations [6]. In fact, SE is one of the most important psychosocial determinants of physical activity behavior [7][8][9][10][11]. Bandura's well-known definition of SE is based on social cognitive theory and "focuses on the individual's personal confidence beliefs about his or her capacity to undertake behavior or behaviors that may lead to desired outcomes, such as health" [12]. SE is a task-specific or behavior-specific construct meaning that to increase physical activity, then you only need to focus on SE for physical activity [13,14].
Recent literature suggests the importance of evaluating both SE for a specific task and SE for disease self-care [6,12].
More specifically, Marks et al. suggested that to be effective, interventions should focus not only on increasing SE for a specific task (e.g., physical activity) but also on enhancing arthritis SE (i.e., disease self-care) [6,12]. This approach is supported by Kovar et al.'s intervention study evaluating a walking program in patients with knee osteoarthritis. They found that enhancing both physical activity SE and SE for arthritis self-care led to improvements in function without an increase in symptoms [15].
Because SE is modifiable, there is increasing interest in interventions. If effective interventions are to be designed to increase SE for arthritis self-management, then accurate measurement of SE is crucial. An on-going challenge has been in identifying people with low SE for disease selfmanagement in sample populations of persons with chronic diseases like arthritis [6,12]. To assess this precision of SE measurement, we examined two SE for arthritis scales using item response theory (IRT) in participants from two community-based randomized controlled trials (RCTs) on physical activity in adults with arthritis.
IRT represents "a diverse family of models designed to represent the relation between an individual's item response and underlying latent trait" [16]. IRT has several notable benefits. First, in the context of health outcomes and disability, IRT models allow for the differential weighting of items in terms of their severity. IRT also provides item and test information functions. Information functions describe not only how much information is provided by a given item or test, but also where that information is provided. This knowledge can play a crucial role when choosing a scale for a particular purpose. One scale may measure low levels of SE very well but fail to adequately assess higher levels. We hypothesized that the two SE scales studied here will possess different measurement characteristics. These different measurement characteristics will provide guidance in determining which measure is preferred depending on the situation with the overall goal of increasing the precision of SE measurement.  [17]. The PACE project team worked in conjunction with the NC Arthritis Program and with community facilities throughout the state including senior centers, assisted living communities, community centers, churches, and wellness centers to recruit participants. The project conducted classes and assessments at 18 sites in counties throughout North Carolina. Class enrollment at the sites ranged from 6 to 34 participants, with a total of 346 participants recruited. The participants had to be exercising <3 times a week for <20 minutes at a time to enroll. The baseline assessments were conducted from August 2003 to November 2003. The demographics included a mean age of 70, 90% female, 75% Caucasian, and 60% had more than a high school degree. Both the baseline and eight-week follow-up assessments involved administering self-report measures on symptoms, function (including physical performance tests), physical activity, and psychosocial outcomes. At the end of the 8week intervention, study participants in the intent-to-treat analysis showed decreased pain and fatigue and increased arthritis SE [17].

Materials and Methods
Active Living Every Day (ALED) is a 20-week lifestyle program designed to teach behavioral skills to become and stay physically active [18,19]. The goal of the second RCT was to evaluate ALED compared to a delayed control in individuals with arthritis. The ALED instructors were recruited with the help of the North Carolina Area Agencies on Aging. The instructors were trained in Chapel Hill, NC in December 2003 by one of the original program developers from the Cooper Institute. Three-hundred and fifty-four sedentary (exercising <3 times a week) participants enrolled from 17 urban and rural sites recruited in a similar manner as PACE above. The demographics for this study population include a mean age of 69 years, approximately 80% female, 75% Caucasian, and >50% had more than a high school education. Self-report assessments are on function (including physical performance), symptoms, physical activity, and psychosocial outcomes at baseline and 20weeks. Two-level (site 2nd level) regression models were used to determine adjusted mean outcome values for the intervention and control groups at 20 weeks. In the intent-totreat analyses, the intervention group showed improvement over the control group for all outcomes and significant changes for several outcomes including gait speed, 2-minute step, and scores on the Community Healthy Activities Model Program for Seniors (CHAMPS) physical activity scale [18].

2.2.
Measures. The 28-item Rheumatoid Arthritis SE scale (RASE) was completed by PACE participants at baseline and the 8-week follow-up; this study uses the baseline data. The RASE scale measures confidence in one's ability to perform specific self-management behaviors for individuals with all forms of arthritis even though it was initially developed for individuals with rheumatoid arthritis [20,21]. The scale is self-administered and takes approximately ten minutes to complete. Scores from the RASE are created by summing the 28 items with a five-point Likert response pattern, yielding a possible range of 28 to 140 points. Higher scores indicate higher SE for arthritis self-management [20,21]. The RASE has demonstrated sensitivity to change following a selfmanagement education program (+5.2, SD 15.5) [20]. The baseline RASE score in the PACE study was 105.05, SD 12.66.
The 5-item Pain (PSE) and the 6-item Other Symptoms (OSE) subscales from the Arthritis SE scale (ASE) were collected from the ALED participants at baseline and at 20 week follow-up; again this study uses the baseline data. The ASE scale was developed by Lorig and colleagues to measure a respondents' SE for arthritis self-management behaviors (e.g., decreasing pain, keeping pain from interfering with normal activities, and dealing with the frustration of having arthritis) [22]. These two subscales are estimated to take approximately five minutes to complete. The 9-item Function subscale is the third subscale of the ASE but was not collected in ALED [21]. The items were scored with a 10-point response pattern, with one representing "very uncertain" and 10 "very certain." Lorig et al. found the 5-item PSE and the 6-item OSE subscales both sensitive-to-change when evaluating the Arthritis Self-Management course using the ASE [22]. The baseline scores in the ALED study are PSE 6.63 (SD 2.06) and OSE 6.94 (2.14). Table 1 displays the items from the RASE and ASE utilized in this study.

Analysis.
The goal of this series of analyses is to obtain IRT-based item parameters for both the ASE and RASE. Our original intention was to perform a unidimensional IRT analysis of both scales. Although published literature suggests that each scale exhibits multidimensionality, it is often the case that different approaches will yield different results [20,22]. Even if the scales are found to be multidimensional, there are a number of strategies available to handle such a scale. We therefore performed the analyses Table 1: Content of Rheumatoid Arthritis Self-efficacy scale (RASE) and Arthritis Self-efficacy scale (ASE) scales. ASE subscales item number and content a : 5-item Pain (items 1-5) and 6-items Other Symptoms (items [6][7][8][9][10][11] (1) decrease your pain quite a bit?
(2) continue most of your daily activities?
(3) keep arthritis pain from interfering with your sleep?
(4) make a small to moderate reduction in your arthritis pain by using methods other than taking extra medication?
(5) make a large reduction in your arthritis pain by using methods other than taking extra medication? (6) control your fatigue? (7) do something to help yourself feel better if you are feeling blue? (8) regulate your activity so as to be active without aggravating your arthritis? (9) deal with the frustrations of arthritis? (10) As compared with other people with arthritis like yours, how certain are you that you can manage arthritis pain during your daily activities?
(11) manage your arthritis symptoms so that you can do the things you enjoy doing?

RASE item number and content b
(1) use relaxation techniques to help with the pain.
(2) think about something else to help with pain.
(3) use my joints carefully (joint protection) to help with pain.
(4) think positively to help with pain.
(5) avoid doing things that cause pain.
(6) wind down and relax before going to bed, to improve my sleep.
(7) have a hot drink before bed, to improve my sleep.
(8) use relaxation before bed, to improve my sleep.
(9) pace myself and take my arthritis into account to help deal with tiredness.
(10) accept fatigue as part of my arthritis.
(11) use gadgets to help with mobility, household tasks, or personal care.
(12) ask for help to deal with the difficulties of doing everyday tasks.    with an eye towards identifying unidimensional scales, while being mindful of the potential for multiple dimensions. Exploratory and confirmatory factor analyses (EFA and CFA) were used to assess the extent to which a onedimensional model could adequately explain the observed item responses. EFAs were conducted in CEFA using ordinary least squares (OLS) estimation, polychoric correlations, and oblique quartimax rotations (where necessary) [23]. In the EFA we focused on the scree plots, and if there was evidence of more than one factor, then we focused on the resulting factor loading matrix. The CFAs were conducted in LISREL, again with polychoric correlations, but this time using diagonally weighted least squares (DWLSs) estimation to provide correct fit indices (see Wirth and Edwards, 2007, for a more detailed description) [24,25]. There are a number of fit indices available when conducting structural equation modeling-based CFA, but we have found a combination of the root mean square error of approximation (RMSEA), comparative fit index (CFI), and the root mean square error (RMSE) providing a nice balance of information regarding how well the model accounts for the observed data [26,27]. Once a sufficiently unidimensional set of items had been identified, an IRT analysis was performed on each scale using the graded response model (GRM) as implemented in the Multilog software package [28,29]. Following the IRT analysis we examine the estimated item parameters, standard error curve (SEC), and test information function (TIF) to better understand both how individual items are contributing to the scale and how the scale is functioning as a whole. Prior to any factor analytic or IRT analyses we collapsed any category which was chosen by less than 2% of the respondent. This led to no collapsing on the ASE (which was surprising, given that each item had 10 response categories) and minimal collapsing on the RASE. This study was approved by the University of North Carolina Biomedical institutional review board and it was conducted with the understanding and the consent of the human subjects.

Results and Discussion
The analyses proceeded differently for the ASE and the RASE scales and in light of this we present the results from each in separate sections below.

ASE Results and Discussion.
The initial validation study on the ASE found evidence for two and three factors. We focused on the items comprising what Lorig et al. titled the PSE and OSE subscales [22]. Although these were found to constitute two separate factors in the original study, our  results suggest that they are adequately explained by one factor. The scree plot from these 11 items is shown in Figure 1. The scree plot suggests that there is one dominant factor. A one-factor model was fit in a CFA framework to assess model fit. The fit of the one factor model to the 11 items was poor (RMSEA = 0.14, CFI = 0.96, RMSE = 0.6), at least judging by the RMSEA, which is the fit index we tend to focus on. There was some evidence in this solution for two locally dependent item pairs (1 & 2 and 4 & 5). LISREL automatically calculates modification indices (MIs) for parameters that are constrained in a particular model. In theory, they are chi-square distributed with one degree of freedom and represent the expected improvement in model fit if a particular parameter was freely estimated. The covariances among the residuals are typically constrained to zero in CFA models. Large MI values for particular residual covariances suggest that, even after accounting for their shared relationship to the latent construct, items are more related to one another than the model predicts. We removed one item from each pair (1 & 5) and reran the model with the remaining nine items. This model seems to adequately explain the observed data (RMSEA = 0.06, CFI = 1.0, Before moving to an IRT analysis, we wanted to be sure that the two-factor model was not more appropriate for these data. We fit a basic two-factor model and then, when the same evidence for locally dependent pairs arose, we added correlated errors to accommodate that excess covariance. Although the two factor model with two correlated errors fits well (RMSEA = 0.06, CFI = 0.99, RMSR = 0.2), the correlation between the two factors was estimated at 0.95. A correlation of this magnitude strongly suggests that those two factors are, in fact, one factor.
Based on the strength of the factor analytic results we performed a unidimensional IRT analysis. In keeping with the results from the one-factor CFA, we omitted Items 1 and 5 from the IRT analysis. The parameter estimates from that analysis are given in Table 2. Although some of the slope parameters are high, subsequent analyses suggest that Table 2: IRT parameters for the 9-item modified version of the 5-item Pain and 6-item Other Symptoms subscales from the Arthritis Self-efficacy scales (ASE).   they are not inflated due to local dependence. The SEC and TIF for the modified 9-item version of the ASE are shown in Figure 2. As can be seen here the resulting scale provides highly reliable scores between −2.5 and 2 standard deviations. The precision quickly drops as scores increase above 2, as is noted by the increasing standard error curve and decreasing information curve. The marginal reliability for the nine-item scale was 0.95.
The factor analytic results suggest that, despite published literature to the contrary, the PSE and OSE subscales from the ASE can be adequately accounted for by one underlying construct [22]. We identified two locally dependent pairs of items and dealt with this by removing two items. In addition to alleviating the local dependence, this has the added benefit of shortening the scale slightly.

RASE Results and Discussion.
The EFA results showed not only one dominant eigenvalue (11.0), but also two other sizeable subsequent eigenvalues (2.9 & 2.1). A three-factor solution was estimated, but the resulting factors did not appear coherent from a substantive standpoint. One-and three-factor models were fit in a CFA framework to provide fit indices. The one-factor model did not fit particularly well (RMSEA = 0.09, CFI = 0.95, RMSR = 0.11), but a three-factor model with a few cross loadings provided an appreciably better fit (RMSEA = 0.5, CFI = 0.98, RMSR = 0.08). Table 3 contains the factor loadings from this threefactor model. Despite the reasonable fit of this model, we found the lack of substantive coherence to be troubling.
The original validation study of the RASE suggested that it had eight factors and an additional three "orphan" items which did not load on any of those eight factors [20]. We attempted to replicate their final model in a CFA framework, but the estimator converged to an inadmissible solution. Although several attempts were made to modify this model, all resulting solutions were inadmissible.
At this point, we went back to the items themselves and performed our own categorization process, where the number of factors and factor structure was determined based on a reading of the items. This led us to a nine-factor solution. We fit this model in a CFA framework and the model fit quite well (RMSEA = 0.03, CFI = 0.99, RMSR = 0.08). In an attempt to better understand the structure of this scale, we then fit a second-order factor model where a higherorder factor was underlying the nine lower order factors. While no direct comparisons between this and the base ninefactor CFA are possible (the models are unfortunately not nested), we note that the second-order model did account for these data reasonably well (RMSEA = 0.05, CFI = 0.98, RMSR = 0.1).
These results suggest that although there may be one common construct underlying the responses to the items found on the RASE, it does so through nine subfactors. To the extent that there are different numbers of items representing each of these subfactors, the resulting summed score will be a weighted combination of them. In an effort to avoid this weighting and to see if it would be possible to perform a unidimensional IRT analysis on a subset of the 28-item RASE, we selected one item from each of the nine subfactors. When choosing items, we tried to balance statistical characteristics (choosing items with high factor loadings in earlier analyses) and content validity (insuring that the resulting collection of items had face validity). The fit of a one-dimensional model for these nine items was then assessed using CFA. This model fits the data well (RMSEA = 0.06, CFI = 0.99, RMSR = 0.06), which suggests that for this nine-item subset, unidimensionality is a plausible assumption. An IRT analysis was then conducted on those nine items. The resulting scale had a marginal reliability of 0.84 and with the exception of one item, all slopes were greater than one (item parameters are provided in Table 4). As indicated in Figure 3, the nine-item subset has a relatively uniform level of measurement precision (standard errors between 0.3 and 0.4) between −3 and +2 standard deviations.
The factor analytic work for the RASE was substantially more complex than for the ASE. Neither the onedimensional model we were hoping for nor the eightdimensional model presented in the literature provided an adequate explanation of the RASE data. We went back to the item content created our own "bins" into which the items   appeared to fall, which led us to a nine factor model. This model had good fit to the data and an additional higherorder model also had good fit. As previously mentioned, these two results suggest that while there may be nine subfactors, they are all related to some overarching latent factor. We proceeded by choosing one item from each subfactor to serve as the representative item for the subfactor on a shortened RASE.

Limitations.
The two populations here are from the Southeastern US and both populations have similar demographics that are somewhat homogenous (i.e., primarily female, educated, and Caucasian). The retrospective recall reliance of these self-efficacy measures is a limitation especially for the RASE which has in its direction "even if you are not actually doing it at the moment" [20]. These scales are only analyzed cross-sectionally because the analyses proved to be much more complex determining the ability for each of these scales to detect change to be too in-depth for one manuscript. Cross population comparisons were not possible because we did not have data on both measures in one sample. We originally planned to equate these two Arthritis 7 arthritis SE scales but the wording variations were slight enough not to allow common-item equating procedures [30]. Although we were not successful, our results may be informative to future researchers who wanted to utilize common-item procedures on these scales.

Conclusion
We acknowledge that there are more complex solutions for a scale like the RASE. However, the alternative proposed here (the modified 9-item RASE) has the virtue of being shorter, representative of the construct of interest, and easy to implement with currently existing IRT software. In summary, these results show that, if necessary, unidimensional IRT could be used with a scale exhibiting the complex hierarchical structure of the RASE. While the 9-item modified version of the two ASE subscales presented here is very effective at measuring much of the range of arthritis self-efficacy, it is not precise for individuals with very high levels (>2 standard deviations above the mean) of arthritis self-efficacy. The same holds for our modified 9-item version of the two RASE subscales. However, considering the very small number of individuals we would anticipate to have scores to be high (roughly 2.5%); this is not a serious weakness. When it would potentially become problematic is if either scale were being used to assess a treatment which was highly effective. In this case, either scale may exhibit a ceiling effect which could mask improvement beyond a certain level. Although any comparison between the scales must be made with caution, it does appear that the 9-item modified version of the two ASE subscales is able to provide more precise estimates than the modified 9-item RASE. This study is a first step towards increasing the precision of identifying those people with arthritis and low SE. This information may better inform SEenhancing interventions [12].