Assessing Physical Performance in Centenarians: Norms and an Extended Scale from the Georgia Centenarian Study

Centenarians display a broad variation in physical abilities, from independence to bed-bound immobility. This range of abilities makes it difficult to evaluate functioning using a single instrument. Using data from a population-based sample of 244 centenarians (M Age = 100.57 years, 84.8% women, 62.7% institutionalized, and 21.3% African American) and 80 octogenarians (M Age = 84.32 years, 66.3% women, 16.3% institutionalized, and 17.5% African American) we (1) provide norms on the Short Physical Performance Battery and (2) extend the range of this scale using performance on additional tasks and item response theory (IRT) models, reporting information on concurrent and predictive validity of this approach. Using the original SPPB scoring criteria, 73.0% of centenarian men and 86.0% of centenarian women are identified as severely impaired by the scale's original classification scheme. Results suggest that conventional norms for older adults need substantial revision for centenarian populations and that item response theory methods can be helpful to address floor and ceiling effects found with any single measure.


Introduction
The oldest old display a broad range and variability of physical and cognitive abilities [1][2][3][4][5][6]. The large range of performance presents a significant measurement problem to researchers. For example, about one-third of centenarians perform well cognitively, at the range of those who are in their 60s and 80s; on the other hand, about 50% of centenarians have some form of dementia, and about one-third have moderate to severe dementia [7]. Similar measurement issues are present for physical functions. Handgrip strength in the oldest varies from <5 kg to >30 kg [5,8]. Some centenarians live independently and perform all physical and instrumental activities of daily living while others are immobile and bed bound [5,8]. Few norms exist for physical performance among centenarians and a central problem is that current functional performance batteries display both floor and ceiling effects.
The purpose of this paper is to (1) present normative data for centenarians on the Short Physical Performance Battery and (2) provide evidence for the validity of an extended SPPB scaling that addresses issues of floor and ceiling effects by combining data from instruments with different levels of scaling into one continuous scale developed using item response theory (IRT). For those expected to perform well, we chose to use the Short Physical Performance Battery (SPPB) [9,10]. It has been used in several large epidemiological studies and has been shown to have predictive validity for those with moderate to high levels of mobility disability and morbidity. For those physically weak and nonambulatory participants, we chose to use items on the Physical Performance Mobility Exam (PPME) not included on the SPPB [11].

Participants.
Participants were 244 centenarians and near centenarians (aged 98 and older) and 80 octogenarians recruited from 44 counties in northeast Georgia, with full details described elsewhere [2]. Because the study was population based, there were no exclusions although, to be included, all centenarians were required to provide blood samples. Overall, the recruitment rate (of those contacted participating) was 67.2% for centenarians and 46.0% for octogenarians. Further, our sample represents an estimated 19.6% of the entire population of centenarians in this geographic area. The GCS employed internationally established criteria in age verification [12] using convergent multiple and creditable sources and public records, such as birth and marriage certificates of the individuals as well as their offspring and relatives to create a consistent chronology. Driver's licenses, Social Security documents, census records, as well as death records of offspring are used.

Materials and Procedure.
A complete list of measures included in the GCS appears elsewhere [13].

Short Physical Performance Battery (SPPB)
. Is a valid measure of lower extremity mobility, predictive of mortality and institutionalization in community-dwelling older adults with a broad range of abilities [9]. The SPPB consists of (1) three standing balance measures (tandem, semi-tandem, and side-by-side stands), (2) five continuous chair stands, and (3) a 2.44-meter walk. The scaling was developed by dividing the performance times on the original population Established Populations Epidemiological Studies in the Elderly (EPESE) into quartiles from 1 (the lowest) through 4 (the highest, with 0 assigned to nonperformers. The three balance tests are considered a hierarchy of difficulty when assigning a single score of zero to four for standing balance. Individuals unable to complete tasks are given the score of zero on that task. Completed tasks were assigned scores from one to four based on time, where the shortest time received the score of four. The scores were summed to get a total score ranging from zero to 12. Poor performance is a risk factor for mortality in data gathered from epidemiological studies on communitydwelling populations in their eighth and ninth decade [10].

Physical Performance Mobility Exam (PPME).
It was developed and validated on hospitalized patients and includes lower functioning tasks in addition to those on the SPPB described above [11]. The additional tasks include (1) bed mobility to assess the ability to move from lying to sitting positions, (2) transferring from sitting on the edge of a bed to sitting in a chair, and (3) stepping up one step with or without the use of a bed handrail. This measure used a 3level scoring system where 0 was assigned to nonperformers, and 1 was assigned to those completing without assistance in ≥10 seconds (bed mobility), with assistance (transfer), with use of handrail (step-up). 2 was assigned to those completing in <10 sec (bed mobility), without assistance (transfer), or without use of handrail (step-up).

GCS Composite Scale (GCS).
It was developed using item response theory (IRT) methodology based on scores on the SPPB scores (using GCS cut-off values for timed tasks) along with PPME and grip strength. Participants' latent ability was estimated as a z-score from the difficulty of each test item and participants' responses to them. These scores were then rescaled in 11 even division points (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12), with 1 assigned to nonperformers. ( Figure S1 shows the information provided by each task as a function of latent ability. Table S1 shows time cut-offs to provide quartiles in the EPESE and the GCS data sets.)

Direct Assessment of Functional Status (DAFS).
It is a clinician-rated scale based on performance on time orientation, communication, transportation, preparing for grocery shopping, financial skills, grocery shopping, dressing and grooming, and eating [14]. Transportation, preparing for grocery shopping, and grocery shopping tasks of the DAFS were omitted due to increased physical demands and low likelihood that centenarians were currently engaged in these activities. Each activity of daily living (ADL) tasks on the DAFS was scored on a dichotomous scale based on the participant's successful completion of the functional task. The BADL score was calculated by summing the grooming, dressing, and eating scales (possible range = 0-23 points and higher scores represent higher functional status); the IADL score was calculated by summing the time orientation, communication, and financial skills scales (possible range = 0-58 points and higher scores represent higher functional status). The DAFS has been validated with communitydwelling samples [15] and older adults with dementia [14].

Grip Strength.
It was assessed using the Jamar (Detecto, Jackson, MI) hand grip dynamometer. After adjusting the handle to the second metatarsal, while sitting in a chair with the arm allowed to hang down at the side, maximal grip strength was tested three consecutive times on both the right and left hands. Peak force to the nearest tenth kilogram (0.1 kg) was calculated for each hand. Analyses use the average peak value across both hands (average values correlated r > .97 with values obtained from each hand.) 2.2.6. Knee Extensor Strength. It was tested using a manual muscle manometer (Nichols, LaFayette IN). Positioned in a straight backed chair with the lower leg hanging freely where the foot did not touch the floor and arms were folded across the chest to avoid use of the upper body, the participant was asked to straighten the leg as forcefully as possible while administrator maintained stability. Peak force to the nearest tenth kilogram (0.1 kg) was calculated for each leg. Analyses use the average peak value across both legs (average values correlated r > .98 with values obtained from each leg).

Test Administration.
Based on results from pilot testing with 10 centenarians (not included in this sample), administration of SPPB and PPME was originally tailored to reduce participant burden using a decision rule based on participant ambulatory ability. If participants could stand, only items of the SPPB and the step-up of the PPME were administered. Otherwise, if they are unable to stand, only the bed mobility and transfer tasks were administered. During testing of the current sample, it was determined that these tasks were not strictly hierarchical for this population. As a result, the protocol was changed so that all tasks were offered to all participants. In most cases for participants who were administered only one scale or the other, it was possible to recreate ability on the nonadministered test by working with data from participants administered both scales as well as detailed administration notes provided by interviewers.
(Procedures for completing these partial datasets is described fully under "Missing Values" in the Supplementary Material of this paper; see Supplementary Material available online at doi: 10.1155/2010/310610.) Conclusions were not altered by whether partial cases were included or excluded.

Statistical
Analysis. SPSS (Version 17.0, Chicago, IL), Stata 11.1 (StataCorp, College Station, TX), and MULTILOG (Scientific Software International, Lincolnwood, IL) were used for all analyses. Descriptive statistics were used to determine means and standard deviations. T-tests were used to compare mean differences between age groups. Pearson's r was used for zero-order correlations, followed by comparisons of Fisher's z-transformed values across age groups [14] and for dependent correlation coefficients [14,16]. Item response theory was used to develop the GCS Composite Score. Significance level was set at P < .05.

Comparison of Physical Performance Data across Age
Groups. Table 1 presents descriptive statistics for octogenar-ians and centenarians. As can be seen, octogenarians have significantly higher (P < .001) physical performance than centenarians on leg strength, grip strength, the SPPB, the PPME, and the IRT-derived physical performance measure. Consistent with the population-based nature of this study, a higher proportion of the centenarian sample was female and institutionalized compared with the octogenarian sample, but there were no differences in race. Figure 1 compares the proportion of the Georgia Centenarian Study sample in each of the four scoring categories reported in [5,9]. For comparative purposes, we present our results alongside those derived from the EPESE sample for men and women aged 70 to 79 [9]. As can be seen, a large proportion of centenarians (73.0% and 86.0% of men and women, resp.) fall into the severely disabled categories whereas none could be classified as having no disability (0% for both men and women). Comparable values for octogenarians indicated that 22.2% and 30.2% of men and women, respectively, were in the most disabled category whereas 14.8% and 9.4% of men and women, respectively, were classified as having no disability. (Supplemental Table S2 provides norms by gender and age group on each performance scale. Table S3 presents the age group proportions of the sample performing at floor and ceiling for the three scales. Table S4 describes characteristics of the sample performing at the floor on each scale.) Table 2 presents zero-order correlations among physical performance measures for octogenarians (above diagonal) and centenarians (below diagonal). For octogenarians, the GCS scale generally shows similar magnitude correlations with each of the other measures. GCS Composite scores correlate more highly with DAFS BADL scores than do SPPB scores but there are no other differences. In contrast with centenarians, GCS Composite scores correlate more highly  than SPPB and PPME with DAFS BADL and IADL scores, leg extensor strength, and grip strength. The PPME is more highly correlated with DAFS BADL scores than the SPPB, but there are no other differences between the SPPB and PPME for this age group. (Figure S2 shows a scatterplot of GCS Composite scores against SPPB and PPME scores.)

Evidence for Predictive Validity of the GCS Scale.
Predictive validity is a very important criterion for any measure of physical performance in centenarians. The distribution of time to mortality by SPPB, PPME, and GCS Composite scores are shown in Table 3 with mortality within 0-6, 7-12, 13-24, or 25+ months from interview. Both the SPPB and PPME show some irregularity in proportionality of higher performers dying earlier and low performers still alive. In sharp contrast, the GCS Composite scale shows a regular progression of mortality where no high performers died within 6 months and a more systematic stepwise proportionality of those who died at successively longer times following assessment.

Discussion
Because of the vast range of functioning observed, centenarians present unique challenges to evaluation and assessment, particularly in the context of a population-based research. We set out to provide norms for physical performance in centenarians using established scales and to demonstrate the concurrent and predictive validity of an extended scale developed through IRT using the SPPB, PPME, and grip strength.
With regard to normative functioning, severe impairment is the modal category when the SPPB instrument was used as the criterion, and no centenarians performed at the highest levels on that scale. Centenarians score significantly lower on every indicator of physical performance than octogenarians. At the same time, however, use of a measure Current Gerontology and Geriatrics Research 5 intended for more severely impaired populations did not solve the problem. Rather, many centenarians performed at the ceiling on the PPME. Thus, of necessity, a scale that combines the information provided at each end of the continuum is essential. By combining the tasks from two psychometrically sound instruments (SPPB and PPME) and adding a measure of grip strength in order to provide information about those with the very lowest physical performance, we were able to capture a larger range of abilities, particularly among those in the lowest functioning range. Although many approaches to scaling could have been used, we adopted IRT methodology because its origins in scaling measures across disparate ability levels when underlying true values are unknown.
In terms of concurrent validity, our GCS Composite scale performed favorably compared with either the SPPB or PPME measures, correlating more highly with observed performance on BADLs and IADLs among centenarians, as well as grip strength and leg extensor strength. Equally importantly, it performed as well as these scales among octogenarians, suggesting that our methodology was sufficient to capture the wide differences in physical performance between these age groups.
Finally, the GCS Composite scale also had favorable properties in terms of predictive validity, with higher scores associated with progressively longer time to mortality. The patterning in the other scaling methods lacks the systematic pattern of longevity.
A primary limitation of this study was the missing data which resulted from the initial attempts to limit participant burden. This was addressed through statistical and field note procedures to recover a full complement of data. Likewise, it would have been desirable to have test-retest data on our instrument, but this was not generally possible due to the taxing nature of providing physical population in a study which was already divided into 5 2-hour sessions. The strengths of this study are that data from this population sampling of the oldest provides information on the order and patterning of the most commonly measured tasks. It also provides a single performance scale with negligible floor or ceiling effects. Given the incredibly rapid growth among the centenarian population, having high quality normative data available to researchers and clinicians is of the utmost importance.