Measurement of Weight in Clinical Trials: Is One Day Enough?

Background. Weight is typically measured on a single day in research studies. This practice assumes negligible day-to-day weight variability, although little evidence exists to support this assumption. We compared the precision of measuring weight on one versus two days among control participants in the Weight Loss Maintenance trial. Methods. Trained staff measured weight on two separate days at baseline, 12 months, and 30 months (2004–2007). We calculated the standard deviation (SD) of mean weight change from baseline to the 12- and 30-month visits using (a) the first and (b) both daily weights from each visit and conducted a variance components analysis (2009). Results. Of the 316 participants with follow-up measurements, mean (SD) age was 55.8 (8.5) years, BMI was 30.8 (4.5) kg/m2, 64% were women, 36% were black, and 50% were obese. At 12 months, the SD of mean weight change was 5.1 versus 5.0 kg using one versus two days of weight measurements (P = .76), while at 30 months the corresponding SDs were 6.3 and 6.3 kg (P = .98). We observed similar findings within subgroups of BMI, sex, and race. Day-to-day variability within individuals accounted for <1% of variability in weight. Conclusions. Measurement of weight on two separate days has no advantage over measurement on a single day in studies with well-standardized weight measurement protocols.


Introduction
Short-and long-term variability in outcomes can affect power in research studies. Weight is usually measured on a single day per study visit (i.e., time point in study: baseline and followup) in clinical trials of weight loss or maintenance. However, day-to-day variability in weight within an individual may decrease precision, thus impacting necessary sample sizes. Because of this type of variability in blood pressure, sample size is optimized by using at least 3 sets of daily blood pressure measurements per study visit [1].
We aimed to evaluate the hypothesis that measurement of weight on one day is sufficiently precise compared to two days by (1) comparing the standard deviation of weight change based on one versus two days of measurements and (2) evaluating components of weight variability using data from the Weight Loss Maintenance (WLM) Trial [2].

Study Design.
WLM was an NHLBI-supported, 4-site, and randomized controlled trial comparing 2 weight loss maintenance interventions to a self-directed control group in two phases: an initial 6-month weight-loss phase common to all participants (phase 1) and a randomized, 30-month weight-maintenance phase (phase 2). This study uses data from phase 2. Data collection occurred between 2004 and 2007. Methods and results have been published previously [2][3][4]. Participants had a body mass index (BMI) between 25 and 45 kg/m 2 , treated hypertension, hyperlipidemia, or both, access to a telephone and the Internet, no active cardiovascular disease, diabetes, or medical condition(s) precluding participation, absence of >9 kg weight loss in the prior 3 months, no use of weight loss medication, and no history of weight loss surgery [2]. Participants provided informed consent, and institutional review boards at each study site approved the protocol [2]. Participants were eligible for phase 2 if they lost ≥4 kg during the initial weight loss phase [2].
The current study is limited to the 316 (out of 342) control group participants who attended either the 12-or 30-month follow-up visits in phase 2. This group received printed lifestyle guidelines (for diet and physical activity) while meeting with an interventionist at randomization [3] and after the 12-month visit [2].

Study Measurements.
Weight was measured during screening, at the conclusion of phase 1 (baseline), and at 12-and 30-month postrandomization into phase 2. Height was measured once during screening. Trained and certified staff used high-quality calibrated digital scales and calibrated stadiometers, and participants wore light clothes and no shoes [2].
On any given day, weight was measured as the average of two independent measurements. For the baseline and for the 12-and 30-month visits, this process was repeated on two separate days (day 1 and day 2). At baseline, these repeated measurements could have been separated by as much as 32 days, and for purposes of this analysis, we restricted participants to those individuals whose baseline weight measures were within 21 days apart from each other. Only 5 (1.6%) out of a possible 314 participants were excluded from the 12-month analysis for this reason, and only 4 (1.4%) out of a possible 285 participants were excluded from the 30-month analysis. Because the baseline weights were taken during the period of active weight loss intervention when the cohort as a whole was still losing weight, we adjusted the initial (day 1) weights to have the same mean as the final weights (day 2) by adding to each person's day-1 weight the mean change in weight across all participants between day 1 and 2. For the 12-and 30-month visits, adjustment for temporal drift between days 1 and 2 was not needed since participants were not undergoing an active weight loss intervention. The mean differences in the two weights at 12and 30-months were −0.02 and −0.03 kg, respectively.
Participants were not required to fast, and weight measurement did not occur at a specific time of day. However, each participant was required to present in a fasting state for at least one day per visit which included laboratory testing; this visit would likely occur in the morning.

Statistical Methods.
Analyses were completed using SAS 9.2. Using (a) the first and (b) both daily weights from each visit, we calculated the mean and variance in weight change from baseline to 12 months and from baseline to 30 months. We defined the variance in weight change as the standard deviation of weight change and used F-tests to compare them. We conducted similar analyses stratified by screening BMI (25-29.9 kg/m 2 and ≥30 kg/m 2 ), sex, and race (African American and non-African American). We also conducted a nested analysis of variance to estimate betweenperson, between-visit, and day-to-day (within-visit) variability in weight. Within-day variability was not assessed, but was negligible. A P value <.05 was considered statistically significant. Analyses were performed in 2009.
Although the standard deviation of weight change differed for measurements taken 12 and 30 months apart, it was essentially unchanged regardless of whether weight was measured as the mean of one or two days of measurements (Table 1). For example, the standard deviation for 30-month weight change for all participants was 6.3 kg when weight was measured on a single day (day 1) and 6.3 kg when weight was measured on two separate days (days 1 and 2, P = .98). We found no statistically significant differences overall or in any of the subgroups studied. Day-to-day variability (i.e., of mean change in weight from a single day compared to that using both day 1 and 2 of the visits) within individuals accounted for <1% of total variability in weight, while between-visit variability within individuals accounted for 6-17% of the variability in weight change depending on the subgroup (Table 1).

Discussion
In the context of this study being conducted with a rigorous weight measurement protocol, the use of one versus two measurement days per study visit (time point) had essentially no impact on the variability of weight change in adults measured over 12 and 30 months, and day-to-day variability in weight correspondingly accounted for less than 1% of the total variability in weight.
A limitation of the study is that the baseline weights were measured while the participants were still engaged in an active weight-loss intervention. We minimized the added variability in weight caused by this by adjusting for temporal drift and by only including participants with 21 or fewer days between day 1 and 2 measurements at the baseline visit. Also, we evaluated the contribution of day-to-day variability in weight to variability in weight change over 12 and 30 months. Day-to-day variability in weight may be of more significance in shorter studies thus limiting the generalizability of our results to studies brief in duration. To our knowledge, this is the first study of its kind to evaluate day-to-day variability in weight as it applies to measurement in clinical studies. Especially pertinent is our subgroup analysis showing the stability of day-to-day weight variability across subgroups defined by BMI, sex, and race. Finally, WLM used a stringent weight measurement protocol that included calibrated digital scales, trained staff, and duplicate within-day measurements to minimize the probability of measurement or transcription error [2]. Observed day-to-day variability could be expected to be greater in studies with less stringent weight measurement protocols, though our findings would suggest such increased variability would likely be due to measurement error rather than to true day-to-day variability.
In conclusion, we show that in the setting of a wellstandardized weight measurement protocol with negligible measurement error, there is little day-to-day variability in weight. Weight measurement on a single day is sufficiently precise for use in a clinical trial.