Large-Scale Survey of Unselected Automated Visual Fields in a Major Reading Center: Patterns and Data Analysis

A prospective, randomized study was conducted to survey a large number of automated perimetry examinations in a central reading institute, obtaining practical information on unselected referred patients and their clinician “consumers”. Visual field records of 1041 patients were obtained, each evaluated by one of three glaucoma specialists. Statistical analysis was applied on demographics, physician characteristics, test reliability and visual field scores. Reliability was scored on a scale of 1 (excellent) to 5 (uninterpretable). Data from earlier examinations of these patients was also analyzed. The large majority of patients (70.4%) were referred due to glaucoma, ocular hypertension or suspected glaucoma. Most of the patients had threshold strategies: FastPac 24-2 or 30-2 (88.9%), Full Threshold (0.7%), and 10-2 (0.5%). In only 7 patients was short-wavelength automated perimetry (SWAP) performed. The Swedish Interactive Testing Algorithm (SITA) was applied in 1.0% of cases. More than half (56.8%) of the population had a reliability score of 1, and 22.7% had a score of 2, indicating a valid result for 79.4% of patients, providing clinically useful information. Linear regression analyses indicated that the Mean Defect was a better predictor of the visual field score than the Corrected Pattern Standard Deviation (CPSD), for the entire group and for each visual field score subgroup.


Introduction
Visual �eld evaluation is critical in glaucoma and neuroophthalmic diagnosis and followup. All the major studies of the last decade or so including EMGT [1], AGIS [2], OHTS [3], and CIGTS [4] have used visual �eld criteria to establish eligibility and for followup. Each of those studies created their own scoring system and their criteria for de�ning progression. In all cases the reliability requirements were high, and the patients needed to be available for fairly frequent exams.
In the real world of clinical practice our patients are oen less than perfect candidates, they may not return exactly on schedule for their exams, and the communitybased ophthalmologists may not all be "up to speed" on perimetry interpretation. e study scoring systems are not always available or practical for everyday clinical use.
In our country, most visual �eld examinations are referred to diagnostic centers. e Mor Institute provides comprehensive state-of-the-art diagnostic services to 500,000 patients per year. It receives referrals from about 150 ophthalmologists nationwide and performs more than 10,000 perimetry examinations annually. One of three glaucoma specialists reads the printouts and returns a written evaluation to the referring physician.
We sought to utilize this large group of unselected patients' examinations to gain insight into real everyday practice and to determine how to improve test results and patient care as well as physician practice patterns. T 1: Visual �eld scoring system.

Score
Degree of damage Description  1  No damage  Normal visual �eld   2  Mild damage  Enlarged blind spot, slight nasal or temporal depression, mild peripheral constriction, and mild  diffuse or nonspeci�c changes  3 Moderate damage Changes in one quadrant or moderate nasal depression, relative central, or cecocentral scotomas 4 Advanced damage Changes in two quadrants above or below the midline or advanced nasal depression. Deep central or cecocentral depression 5 Severe damage Signi�cant defects above and below the midline 6 End stage Tubular �eld with or without a temporal island To the best of our knowledge, no similar survey of this scale has been reported to date.

Materials and Methods
e present prospective study included all automated perimetry examinations conducted at the Mor Institute during one month (August 2002). e participating technicians were experienced and have been trained to both monitor �xation and encourage the patients during testing. e printouts were read by three of the authors (L. Zborowski-Naveh, M. Lusky, and D. D. Gaton). e readers evaluated the printout as usual and sent a letter with the results to the referring physician. ey then completed the survey questionnaire. e right eye or the only eye of each patient was included in the survey.
e survey queried four categories of information: patient-related characteristics (age, gender, and referring diagnosis); physician-related characteristics (test algorithm and test target size requested); test results (reliability score, reasons for loss of reliability, visual �eld score, mean deviation, and corrected pattern standard deviation); and longitudinal data, where applicable (number of previous examinations, change in reliability score, and change in visual �eld score).
Reliability was scored on a scale of 1 (excellent) to 5 (uninterpretable). e reader assigned a score on the basis of four parameters: �xation loss; false-positive errors; falsenegative errors; white scotomas. For any �eld with a reliability score greater than 1, the reason was cited as any of the above, alone or in combination.
A linear visual �eld scoring system was designed to correspond to practical clinical usage. e criteria for score assignment are provided in Table 1.

Data Analysis.
Data were analyzed with the SPSS for Windows, version 15.0.1 (SPSS Inc., Chicago, IL, USA). For continuous variables (age and visual �eld score), descriptive statistics were calculated and reported as mean ± standard deviation. Categorical variables (speci�c diagnosis and patient gender) were described using frequency distributions. Continuous variables were compared by reader and by age group using one-way analysis of variance (ANOVA). Continuous variables were compared by diagnostic group (glaucoma versus ocular hypertension) and gender using t-test for independent samples. Associations between continuous variables and age were analyzed with Pearson's correlation coefficients. Previous test results were compared to present results using t-test for paired samples. All tests were twosided and considered signi�cant at .

Results
During the study period, 1041 visual �eld records were obtained.

Patient-Related Data.
Mean age of the surveyed population was 61.87 (±17.10) years (range 6-93). Only 18.2% were younger than 50 years; 70.8% were aged 50 to 79 years, and 11.9% were aged 80 years or more. Signi�cant differences were noted in the proportion and age of men and women referred for testing. Women comprised 56.6% of the survey population and were of mean age 59.95 (±17.42) years. Men comprised 43.4% of the population and were of mean age 64.32 (±16.38) years. ere were more women than men in each decade of life except the ninth ( ). e distribution of referring diagnoses by number of patients and mean age is shown in Table 2. e large majority of patients (70.4%) were referred for reasons related to glaucoma; known glaucoma, ocular hypertension (OHT), or suspected glaucoma. ere was a signi�cant difference in mean patient age among the diagnostic groups ( ), with the youngest patients referred mainly for neurology/headache and myopia. Clinically, the most signi�cant variation was the 5-year difference between the glaucoma and the OHT/suspected glaucoma groups ( ). Forty-two of the 395 patients (10.6%) with OHT/suspected glaucoma  were aged 50 years or less and an additional 5 were aged 51 to 55 years.

Physician-Related
Data. e referring physician spec-i�ed the test algorithm to be used. Most of the patients were tested with threshold strategies: FastPac 24-2 or 30-2 (88.9%), full threshold (0.7%), and 10-2 programs (0.5%). e 3-zone 120-point screening program was speci�ed by 6.7% of physicians, and the macula program with red and white targets, by 1.1%. In only 7 patients short-wavelength (blueyellow) automated perimetry (SWAP) was performed; 6 of them had OHT/suspected glaucoma and were aged 55 years or less. e Swedish interactive testing algorithm (SITA) was applied in 1.0% of cases.

Test Results
3.3.1. Reliability. e mean reliability score was 1.68. ere was a small but statistically signi�cant difference in mean score between men (1.71) and women (1.66) ( ). e tests of more than half (56.8%) the survey population had a reliability score of 1, and 22.7% had a score of 2, indicating a valid result for 79.4% of patients. An additional 15.8% had a reliability score of 3 which can at times still provide clinically useful information. Reliability was correlated with patient age. When children were excluded, we found a steady decrease in the proportion of "excellent" reliability test scores (score 1 by age group), from 91.7% for patients in the third decade to 47.1% for patients in the ninth decade ( ). e decrease in reliability with age was sustained when we grouped patients with test scores 1 or 2 together. Nevertheless, even in the over-80-year group, 72.6% of patients had a valid test result ( Figure  1). e most common reason cited by the readers for low test reliability was loss of �xation (90.3% of cases), followed by false-negatives (25%) and false-positives (7.0%).

Visual Field
Score. �ach �eld was assigned a damage score, and each of the glaucoma specialists (L. Zborowski-Naveh, M. Lusky, and D. D. Gaton) read about one-third of the �elds. No statistically signi�cant difference was found among the mean scores of the 3 readers (Table 3). e mean score for the entire group was 2.01. e distribution of visual �eld scores is shown in Figure 2. Table 4 summarizes the mean visual �eld scores by diagnosis. e myopia group (which was also the youngest) had the highest score, and the hydroxychloroquine group the lowest. Although the difference in mean score between the glaucoma group (2.28) and OHT/suspected glaucoma group (1.92) was not large, it was statistically signi�cant ( ).

Correlation of Visual Field and Reliability
Scores. e visual �eld score was positively correlated to the reliability score ( 6): the lower the reliability, the greater the chance of a defective visual �eld.

Mean Deviation (MD)
. e mean MD for the 935 size III threshold tests was −3.30 dB (±3.57 dB). e mean MD in the group of patients aged 40 years or less was − 68 ± 4 4 dB. However, from the ��h through the ninth decade, there was a steady decrease in MD, from −2.3 dB to −4.24 dB ( ).

Corrected Pattern Standard Deviation (CPSD). e mean CPSD for the 935 size III threshold tests was 1.94 dB.
A nonsigni�cant trend toward an increase in CPSD with age was observed. ere was a signi�cant correlation of the MD with the CPSD ( , ), and of both the MD and the CPSD with the visual �eld score ( , and .54, , resp.). Linear regression analyses indicated, however, that the MD was a better predictor of the visual �eld score than the CPSD, for the entire group and for each visual �eld score subgroup.

Longitudinal Data.
A previous visual �eld test was available for comparison for 608 of the 1041 patients (58.1%) ( Table 5). Analysis of the number of patients in whom the disease progressed yielded a signi�cant difference between the glaucoma and the OHT/suspected glaucoma groups (25% versus 15.8%, resp., ). In 505 of the 608 cases, the previous or the �rst in a series of visual �eld exams was available. e current mean reliability score for this subgroup was 1.75. e mean reliability score assigned to the earlier visual �eld test was 1.63. is difference was not statistically signi�cant, suggesting the absence of a "learning effect" for reliability.

Discussion
is survey is unique in the literature of perimetry in that it included a very large population of entirely unselected cases. Our purpose was to extract useful clinical information that could be applied to daily practice. All other studies were either limited to small groups without pathology [5] or selected study groups with exclusion of unreliable �elds [1][2][3][4][5][6][7].
e �nding that glaucoma was the main reason for the test (70%) was expected. e progressive increase of the MD with age has also been described in the literature [5]. Other results were less obvious.

Demographics.
Female patients accounted for 56.6% of the whole population; however, this rate is only slightly higher than the proportion of women in the general population (53.9% for those 45 years old or more) [8]. e mean age of the women was signi�cantly lower than that of the men (59.95 and 64.32 years, resp.), which may re�ect an actual gender-related difference in the age distribution of patients with these pathologies. Some of the earlier population-based studies of the incidence or prevalence of glaucoma [9][10][11] did not show any gender-or age-related difference, whereas others reported a higher prevalence in men [12][13][14]. Only in the Hispanic population in the United States [15] was a higher prevalence noted in women.
Our �nding may also be explained by a bias in diagnostic referrals or a difference in utilization of primary health care services. e possibility that the difference is a result of chance cannot be entirely discounted either. is question merits further study.
e older mean age (by more than 5 years) of the glaucoma group than the OHT/suspected glaucoma group, combined with the signi�cantly higher visual �eld damage score in the patients with glaucoma, supports the notion that glaucoma is preceded either by OHT or by suspicious disc changes.

Algorithm
Requested. e low rate of requests for the SITA program (1%) suggests a certain degree of conservatism among our referring physicians and highlights an important area in which educational activities may be applied.
e SWAP program was used in only 7 patients, although we identi�ed 47 patients with OHT or suspected glaucoma who were 55 years old or less at the time of examination. SWAP testing can detect visual �eld damage earlier than standard white on white perimetry, [16] and it is particularly suitable for younger patients because of their low incidence of nuclear sclerosis. Yet it was not applied in most of the patients in our population who might have bene�ted from it, maybe due to lack of clinician awareness at the time of patient recruitment to the study.

Reliability and Visual
Field. About 80% of our random, unselected population were able to cooperate well enough for the clinician to obtain a reliable �eld (Figure 1). Test reliability in an additional 15% was rated intermediate, that is, sufficient for the clinician to obtain partially useful, if not entirely accurate, information. Even the group of patients in the ninth decade of life had a 72.6% rate of valid �eld results. ese data suggest that clinicians should not hesitate to test the elderly with automated perimetry. e ocular hypertension treatment study (OHTS) [3] reported better reliability results than ours with 79% of patients having reliable �elds by strict criteria and 97% by slightly more liberal criteria, corresponding roughly to our level 1 and 2. Although the trend in our study was similar, the difference in reliability was probably attributable to the more restricted patient population in the OHTS.
Like in the OHTS, the most common reason for loss of reliability in our study was �xation loss. Accordingly, reliability correlated with the visual �eld score. ese �ndings suggest a confounding effect of reliability on test results. erefore, focusing efforts on improving �xation of the examinees will result in better reliability and "cleaner" test results.
Patients with advanced �eld loss may �nd it di�cult to maintain �xation, and even slight movements at the margins of deep scotomas can easily produce false-negative results. is might explain some of the shared variability in reliability and damage. In addition, a true decrease in reliability can be the cause of a falsely poor result, producing a confounding effect. erefore, it is important to improve reliability for each examination, especially in patients with advanced �eld loss.
Previous studies have described a learning effect of repeated visual �eld testing on test results, even when the initial reliability was good [17][18][19]. Our data, however, do not support a speci�c learning effect. In general, reliability or lack thereof persisted from the �rst test to the later one. We do not suggest that repeated attempts should not be made to improve test reliability, only that these attempts may not always succeed as intended. Learning effects may be stronger if the examinations are closely spaced in time. Further analysis of the time elapsing between repeated examinations should be made in future studies.
e MD and the CPSD both correlated with the visual �eld score, but the MD was a better predictor. is is contrary to our expectations of a lower predictive value of the MD, given its greater sensitivity to lens changes or refractive errors than the CPSD [10,20], at least at the level of early to moderate �eld damage.

Implications for
Physicians. is study was not designed to evaluate the criteria of the community physicians for requesting perimetry. However, the fact that 42.5% of the �elds were normal might suggest that the "threshold" for referral was fairly low and that a healthy degree of suspicion was being maintained.
In our survey, the referring physician assigned the diagnosis; consequently, there was no standardization of the diagnostic criteria or diagnostic con�rmation. Nevertheless, we found a clear and signi�cant difference in the mean visual �eld score between the glaucoma group (2.28) and the OHT/suspected glaucoma group (1.92), in addition to a signi�cantly higher rate of deterioration in the glaucoma group (25% versus 15.8%). e �ndings were not necessarily con�rmed by repeated �elds, but the statistics of progression could imply that more aggressive treatment of glaucoma in the community is needed. is is consistent with the results of the Advanced Glaucoma Intervention Study (AGIS) [21] and others [13,22], which demonstrated a bene�t for consistently lower pressure in preventing further �eld loss in patients with glaucoma.

Summary
In summary, we describe the results of a large-scale survey of unselected patients undergoing automated perimetry. We were able to gain signi�cant insight into reliability according to patient age and, perhaps, the behavior of disease progression in this group of patients as well as into visual �eld scores according to referring clinical diagnosis. In addition, the actual practice patterns of a large and representative group of ophthalmologists were delineated.