How Accurately Can Prostate Gland Imaging Measure the Prostate Gland Volume? Results of a Systematic Review

Aim The measurement of the volume of the prostate gland can have an influence on many clinical decisions. Various imaging methods have been used to measure it. Our aim was to conduct the first systematic review of their accuracy. Methods The literature describing the accuracy of imaging methods for measuring the prostate gland volume was systematically reviewed. Articles were included if they compared volume measurements obtained by medical imaging with a reference volume measurement obtained after removal of the gland by radical prostatectomy. Correlation and concordance statistics were summarised. Results 28 articles describing 7768 patients were identified. The imaging methods were ultrasound, computed tomography, and magnetic resonance imaging (US, CT, and MRI). Wide variations were noted but most articles about US and CT provided correlation coefficients that lay between 0.70 and 0.90, while those describing MRI seemed slightly more accurate at 0.80-0.96. When concordance was reported, it was similar; over- and underestimation of the prostate were variably reported. Most studies showed evidence of at least moderate bias and the quality of the studies was highly variable. Discussion The reported correlations were moderate to high in strength indicating that imaging is sufficiently accurate when quantitative measurements of prostate gland volume are required. MRI was slightly more accurate than the other methods.


Introduction
There are many clinical situations in in the management of prostate diseases in which the measurement of the prostate gland volume (PGV) has a role [1][2][3]. For some of these the measurement does not need a high level of accuracy and simply detecting that the prostate is enlarged can be sufficient. For example, if a general practitioner is considering the choice of medication when treating benign prostatic hyperplasia (BPH), more precise measurements of the PGV may be required in other situations, for example, to calculate prostate specific antigen (PSA) density. For radiation oncologists, the PGV is used to determine the suitability of prostate cancer patients for low dose rate brachytherapy and the number of brachytherapy seeds to order. In those situations, a more accurate measure of the PGV is required and is usually obtained by medical imaging methods.
A number of imaging methods have been used to estimate the PGV, including ultrasound (US), either transrectally or suprapubically (TRUS, SPUS), Computer Tomography (CT), and Magnetic Resonance Imaging (MRI). Although many publications have described their accuracy, these have never been systematically reviewed, making it difficult to compare them. Our aim was to review the literature in order to determine the accuracy of imaging as a measure of PGV in a future planned study of the effects of neoadjuvant androgen deprivation therapy (NADT). relevant [4][5][6]. The proposal for the review was submitted for registration to PROSPERO [7], but the review was completed before a response was received. Ethics committee approval was not required and no funding was obtained for this study.
The patient populations studied were those men undergoing imaging of the prostate for any reason, including those attending health services for prostate conditions. The interventions to be reviewed were the US, CT, and MRI, recognising that variations existing in the way each of these can be used to measure PGV. All study designs were considered and the outcome was to be any quantitative measure of accuracy when compared against the reference standard, meaning in vitro measurement of the PGV after radical prostatectomy.
Multiple medical literature databases were accessed in August 2018, including CINAHL Plus, Embase, Medline, Pubmed, and ScienceDirect and were searched for abstracts containing the terms "prostate volume" and "imaging OR US OR CT OR MRI" and "prostatectomy". No other review protocol or similar previous publication existed. Titles and abstracts were reviewed by both of the authors and relevant full text articles were obtained for further review. The results were then tabulated so that the range of results could be seen, including correlations, concordance, and tendencies to overor underestimate. For each study the date of publication, the numbers of patients, and the average age of the patients were tabulated.
Although there were relevant articles published over a period of more than 50 years, we arbitrarily adopted a time limit of 22 years (since 1995), as we assumed that the extensive developments in the technology of the imaging and reference methods would render articles published before that time less relevant. Titles that were published only published in abstract form or relating to animal studies were also excluded. Several articles have compared the accuracy of the other less invasive imaging methods with the TRUS including SPUS, transperineal US, CT, and MRI. However, unless these involved a comparison against an in vitro reference method they were not considered further here. For the same reason we excluded several articles that compared different formulae used to calculate the PGV from standard imaging measurements [8][9][10] and one study that compared in vivo and ex vivo MRI measurements (all showing high correlation) [11]. We excluded many articles describing other aspects of the measurement of PGV, such as interobserver variation, or the ability to detect diseases.
No source data extraction for meta-analysis was attempted. Assessment of publication bias was not considered to be necessary. However, the tools for reporting reviews and particularly the QUADAS-2 tool encourage review authors to develop review-specific bias and quality assessments [6]. We considered that the authors of each study might report more favourable results if they were performing most of the imaging themselves, or if those undertaking the reference measurement were not blinded to the results of the imaging. Thus, a bias score was derived with a total score 0-2, a higher score indicating greater potential for bias. The quality of each study was also assessed by considering the imaging measurement (using either a planimetric calculation or autosegmentation method), the reference measurement (using a fresh specimen that had the seminal vesicles removed), the number of patients (more than 50), and whether both concordance and correlation were considered (total score 0 to 4, a higher score indicating higher quality).

Results
The search strategy initially generated 758 titles. Selected abstracts were reviewed by both authors blindly, but only 57 were considered relevant. Complete text versions of those articles were obtained, but only 11 had usable data. Secondary searching through 43 titles generated a further 17 articles, identifying a total of 28 articles. Some of these reported imaging measurements from more than one imaging method, describing a total of 33 comparisons between the PGV measured by an imaging method and by the reference method. The search strategy is described in Figure 1.
The 28 articles described studies with a wide variety of sample sizes (5 to 1844 patients) but had a combined total of 7768 patients. The patients were from countries all over the world, mostly USA and Korea but also five different European countries and Australia. The dates of publication were well spread across the range of dates, from 1995 to 2018. The results were tabulated depending on the imaging method used, as shown in Tables 1 (US), 2 (CT), and 3 (MRI). Ages, weights, and volumes were rounded up or down to the nearest whole numbers.
Two articles included both US and CT imaging methods, and these appear in both Tables 1 and 2 [26,28]. Four articles included both US and MRI imaging methods, in three of these articles both imaging methods were compared with the reference standard, so all three articles appear in both Tables 1 and 3 [20,22,29]. In the fourth article, the TRUS measurements were not compared with a reference standard so the results only appear in the table relating to MRI scans, Table 3 [39].
The 18 articles that related to the use of US are shown in Table 1. They were published between 1995 and 2016 and included a total of 4792 patients. All of these used TRUS, but two also used SPUS [26,28]. The correlation coefficients most commonly fell in the range of 0.70-0.90, indicating high levels of correlation.
Only two articles were related to the use of CT [26,28]. They involved 223 patients in total and were published in 2013 and 2014. Both of these also included results about TRUS, as shown in Table 2. Only one of these [28] recorded a correlation coefficient at 0.78. Both indicated that the CT volumes were generally larger than TRUS and less accurate. Both also assessed SPUS and found little difference between SPUS and TRUS.
There were 13 articles that related to the use of MRI as shown in Table 3. They included 3388 patients and were published between 2003 and 2018. Correlation coefficients commonly lay between 0.8 and 0.96, a slightly higher range than TRUS and CT. Four articles that described both MRI and TRUS all indicated slightly better results for MRI [13,20,22,29].     While reviewing the articles we made various observations about the methods that were used. The articles often applied geometric terms to describe the shape of the prostate in order to calculate the PGV using each imaging method. The term "ellipsoid" was often used, which is a 3-dimensional volume with three perpendicular axes. The term "spheroid" was sometimes used, meaning that two of the axes are identical. The term "prolate spheroid" was also sometimes used, meaning that these two axes are shorter than the lengthened third axis (rugby ball shape). To convert the measurements of the three axes to a volume, the ellipsoid calculation (EC) was often made by applying the standard formula (height × length × width × /6). A wide variety of modifications to this were used. Other articles often used a planimetric calculation (PC or volumetry), which involves contouring the periphery of the gland on consecutive 3-5 mm slices, either axial or sagittal, and summating the series of volumes.
The reference tests were laboratory (in vitro) assessments of prostatectomy specimens which could be analysed by either weighing the specimen or measuring displacement. Weighing was done either by weighing the fresh specimen or after fixation with formalin. In some articles, the specimen was weighed after removal of fat, seminal vesicles or remnants of the vasa deferentia. Some articles subtracted a standard weight for the seminal vesicles from the prostate weight, which might be expected to be more inaccurate in prostates that were unusually large or small. Also in some articles, the weight of the prostate was converted to a volume by applying standard values for the specific gravity of prostate tissue (1.05 g/mL). In some articles, the volumes were identified by displacement of fluid or by measuring the maximum dimensions and using these to calculate an ellipsoid. These variations in the imaging and reference tests were recorded in the tables. These variations in methodology appeared to make little or no difference to the accuracy measures.
The bias and quality scores revealed that no articles were completely free of bias as in nearly all of the articles the authors conducted the imaging assessment themselves and it was rarely stated that those undertaking the reference measurement were blinded to the results of the imaging measurement. Quality scores generally improved with the date of publication. There was no indication that bias or quality played a major role in influencing the reported accuracy of the imaging methods used for PGV measurement.

Discussion
We found that no previous review of this topic had been performed and that the accuracy of imaging as a method of measuring the PGV was most commonly defined by correlation statistics that were generally moderate to high, most commonly between 0.70 and 0.96. Overall these results suggest that imaging is an accurate test for quantitatively measuring PGV and could be used in a study of the effects of NADT. Of the various imaging methods, TRUS was the most commonly studied. It had been studied long before our cutoff date of 1995, but the accuracy could be expected to depend on technical factors such as the image acquisition time and the resolution of the image, which have improved over time. Immobilisation of the patient may also have improved, especially if the lithotomy position is used rather than the lateral decubitus position. There were only two CT articles, both of which suggested that the scan overestimated the PGV. MRI articles only appeared after 2003, but MRI appeared slightly more accurate, including all three articles that directly compared TRUS and MRI. TRUS could be expected to be more operator dependant than MRI and TRUS measurements are likely to be affected more by pressure on the prostate from the balloon than by an endorectal coil (ERC), although the ERC also involves a balloon that can affect the volume [40]. MRI software may include multifeature active shape models (MFA's) which provide an accurate, automated method of planimetric measurement [32]. The software may also include sophisticated mechanisms for aligning the prostate images ex vivo with in vivo images, providing an additional means of assessing the PGV [41].
For those articles that described the EC method of volume measurement, there were inconsistent findings about which planes or axes to use. Some showed that the dimensions of the prostate measured on a midsagittal plane were more accurate than an axial plane on TRUS [22] and MRI [30] although an earlier TRUS study had found no difference [16]. Several articles showed that the PC method was more accurate than EC for TRUS and MRI [22,30,32,38]. When PC was done by automated methods, these were just as accurate and could be recorded faster than by manual methods [32,33,39].
Regarding the tendency to over or underestimate the PGV, seven articles described this tendency without dividing the patients into those with larger or smaller prostates and found mixed results. For TRUS, four were underestimated while one was overestimated. With CT both were overestimated, while with MRI four were underestimated. There were four articles that divided patients into those either above or below their median values and three found the imaging tended to overestimate smaller glands and tended to underestimate larger glands, while in the remaining one it was the reverse. The underestimation of larger PGVs was the most consistent finding. The optimal way to assess the over and underestimation with volume is with Bland-Altman statistical methods, as these can show how the pattern changes across the range of volumes [42,43]. There were few articles in this review that used this method [32,36].
Our review had some limitations. Firstly, the methods used to perform the imaging, to calculate the volume, and to compare it with the reference methods all varied widely, making it difficult to combine them. Secondly, there were variations in the reference test methods used, with many using specimen weight rather than volume. Thirdly, none of the articles were completely free of bias, and none achieved maximum potential quality. However, none of these limitations seem likely to affect the conclusions we have drawn.
Future studies into the measurement of the PGV should use the MRI when the highest level of accuracy is needed using planimetric methods of calculation. Ideally a 3-tesla machine would be used to achieve optimal image quality and without an ERC as that can distort the PGV. The assessment of the volume of individual zones within the prostate could be studied as these can be affected differently by different diseases and treatments. When assessing a method of measurement of the PGV, multiple operators and blinding should be incorporated to avoid bias. The reference method would ideally involve assessment of the PGV by displacement as soon as the prostate is removed, avoiding the effects of shrinkage during fixation and avoiding the need for a volume conversion factor when weight is used. Extraneous tissue should be removed, including the seminal vesicles and remnants of the vasa deferentia. Measures of correlation and concordance should be included, and Bland-Altman plots should be presented to graphically demonstrate agreement, including under and overestimation.

Conclusions
Our study suggests that the use of imaging to measure the PGV is still a topic of significant interest and that no previous systematic reviews have been undertaken. The correlation of the PGV measured by imaging with the reference methods was in the range of a distribution from 0.70 to 0.96, which is accurate enough for some of the purposes that require quantitative PGV measurements. MRI was slightly more accurate than the other methods.