Computing the Average Body Mass Index: A Study with Systematic Sampling Using Auxiliary Information

Background . The use of body mass index (BMI) is prevalent, to measure the fat in the body. Sometimes, during a clinical survey, diﬀerent measures of body parts of people may be available, but the actual weight and height are not available. In this article, we have shown a method to estimate the body mass index using the measures of diﬀerent body parts. Systematic sampling is to be applied only if the given population is logically homogeneous because systematic sample units are uniformly distributed over the population. Methods . The method of estimation for the mean of the study variable under systematic sampling using auxiliary information has been used to estimate the body mass index (BMI). We also have shown the eﬀect of observational error in the estimation. The measures of diﬀerent body parts are taken as auxiliary variables. The correlation coeﬃcient between BMI and the circumference of diﬀerent body parts has been obtained. The eﬃcacy of methods in terms of mean square error has been obtained in the estimation of BMI. Also, the observations available on diﬀerent body parts are assumed to be recorded with observational error. Thus, we propose a method of estimation of BMI in the presence of observational error. A simulation study has been conducted to demonstrate the eﬀect of the observational error on the estimation of body mass index. Results . The properties of the proposed estimation method have been derived under large sampling approximation, and the conditions under which the proposed method is more eﬃcient are found. We assume the presence of observational error in the study of 252 men. The eﬃciency of the diﬀerence estimators is better in the presence of observational error. Also, the presence of observational error does not change the properties of the estimators. Conclusions . The study provides an easy approach and the simplest way to obtain the BMI estimation with and without observational error. Thus, the suggested method may be used by statisticians for this problem and for many other similar problems in the estimation of mean.


Background
In a survey, it may often happen that the data are observed with some error, and it is termed as measurement error or observational error. It is defined as the discrepancy between the observed value and the true value of the sample. ere are several examples of real-life situations when data are obtained with errors [1,2]. e observational error in the context of linear and nonlinear regression models also has been thoroughly discussed in the literature [3][4][5][6]. Several research studies have been performed on the observational errors in the estimation of ratio, product, and regression methods of estimation, which are available in [7][8][9][10][11]. If the data are systematically distributed, the systematic sampling has nice features of selecting every k th element by choosing the first element arbitrary. Many authors have done pioneered work using systematic sampling at the estimation stage (see [12][13][14][15][16]). e estimation of parameters for certain natural population is convenient using systematic sampling [17,18]. e use of auxiliary variables is prevalent as ratio, product, and regression estimator. In case of estimating the volume of timber, the proposed ratio estimator under systematic sampling suggested that the leaf area or the girth of the tree may be taken as the auxiliary variable [19]. e product estimators in the context of systematic sampling have been discussed in [20]. Some pioneer works in systematic sampling have been introduced in [21,22].
A study was conducted to derive a prediction equation for body fat percentage in men (n � 252, age 22-81 years) from simple body measurements [23]. Body density determined by underwater weighing and body fat percentage was determined. e dataset includes the following variables, given in [24], pp. 45-48), for observational techniques: density determined from underwater weighing, percent body-fat from Siri's equation [25], age in years, weight in lbs, height in inches, and circumference of the neck, chest, abdomen, hip, thigh, knee, ankle, bicep, arm, and wrist in centimetre. In this article, we propose a different method to estimate the body mass index rather than the already established multiple regression method in [23,26]. e body mass index is highly correlated with the body parts, so in case if BMI is not known for a large population, we can estimate using sampling methods ratio, product, and difference estimator. We attempt to estimate the body mass index in place of body fat by using one of the auxiliary variables. e circumference of hip, thigh, knee, ankle, bicep, arm, and wrist can be taken as a single auxiliary variable to estimate the body mass index. e correlation coefficient for each auxiliary variable has been obtained. Since the data are natural, we used systematic sampling. An estimated optimal sample size by using the body mass index for a dietetic supplement has been calculated [27]. e method of estimation for the mean of the study variable under systematic sampling using auxiliary information has been used to estimate the body mass index (BMI). We also have shown the effect of observational error in the estimation. e measures of different body parts are taken as auxiliary variables. e correlation coefficient between BMI and the circumference of different body parts has been obtained. e efficacy of methods in terms of mean square error has been obtained in the estimation of BMI. Also, the observations available on different body parts are assumed to be recorded with observational error. us, we also propose a method of estimation of BMI in the presence of observational error. A simulation study has been conducted to demonstrate the effect of the observational error on the estimation of body mass index.
Suppose, the population consists of N units u � (u 1 , u 2 , . . . , u N ) from a finite population. e population size is divided into k intervals such that N � nk. To select a sample, the first unit is selected at random from the first k units.
is sampling method is similar to that of selecting a cluster at random out of k cluster (each cluster containing n units) made such that i th cluster contains serially numbered units i, (i + k), (i + 2k), . . . , (i + { (n − 1)k)}. After the sampling of n units, we observe both the study and auxiliary variables. In this article, we consider a situation where each data value may be observed with error. In order to compute the effect of observational error, it is assumed that (x ij , y ij ) are the observed values instead of their true values (X ij , Y ij ) for every ij th (i � 1, 2, . . . , k, j � 1, 2, . . . , n) unit. In such a way, these values are expressible in additive form as x ij � X ij + V ij and y ij � Y ij + U ij . We consider that the errors (U, V) are normally distributed with mean zero and variance (σ 2 U , σ 2 V ). We assume that the error variables U and V are uncorrelated to each other as well as uncorrelated to all combinations with X and Y, respectively. is implies Let μ Ysy , μ Xsy be the population mean and σ 2 Ysy , σ 2 Xsy be the population variance of the study and the auxiliary variables, respectively. ρ is the correlation coefficient between the study and auxiliary variable. Furthermore, the sample means of the observed data are the unbiased estimators of the population means μ Xsy and μ Ysy , respectively. e population means are μ Ysy � 1/nk k i�1 n j�1 y ij and μ Xsy � 1/nk k i�1 n j�1 x ij . e sample means are the unbiased estimators of the population means μ Ysy and μ Xsy , respectively. (1) For determining variance, it is expressed by means of error terms e 0 and e 1 , which are defined as y sy � μ Ysy (1 + e 0 ) and x sy � μ Xsy (1 + e 1 ).
We can write (2)

Methods
ree well-known forms of the estimator have been proposed to estimate the body mass index. We use ratio estimator [19], product estimator [20], and difference estimator under systematic sampling.
e mean square error of the ratio estimator is given as e mean square error of the product estimator is obtained in [20] as e variance of the difference estimator is given as

e Proposed Estimation under Observational
Error. e observation recorded during data collection is obtained with some error. We consider the severity of misleading inference based on data obtained with observational error. In this section, we propose ratio, product, difference, and mean estimators when the data are recorded with observational errors. In the previous section, we have used wellknown methods of estimation, but in this section, we derive the expression for mean square error and variance for all estimators when the data are observed with error.
Considering that the observations are recorded with observational error, then the variance is where the term σ 2 Usy is the variance due to observational error.
ere are situations when both the study variables and the auxiliary variables are observed with observational error. In that case, we propose the ratio estimator as In order to obtain the bias and mean square error, we can write equation (8) as y Rsym � μ Ysy 1 + e 0 1 + e 1 −1 .
For the bias of the estimator, we obtained from equation (10) as Taking the expectation of equation (11), we get the bias of the estimator as bias y Rsym � 1 kμ Ysy R 2 σ 2 Xsy + σ 2 Vsy − ρRσ Xsy σ Ysy . (12) For the mean square error, we can write from equation (10) as Taking the expectation of equation (13), we get the mean square error as We can obtain the result under no observational error by putting σ 2 Usy and σ 2 Vsy equal to zero. is will give the same result as obtained in [19]. From equations (4) and (14), we can write that MSE in the presence of the observational error is always high. e product estimator is proposed under the consideration of observational error as To obtain the bias and mean square error, we can write equation (15) as For the bias, by taking the expectation of equation (16), we get bias y Psym � 2ρRσ Ysy σ Xsy . (17) For the mean square error, we can write from equation (16) as Taking the expectation of equation (18), we get the mean square error as By substituting the value σ 2 Usy and σ 2 Vsy equal to zero, we can obtain the MSE without observational error which is the same as obtained in [20]. From equations (4) and (19), we can conclude that MSE is always high in the presence of observational error. e difference type estimator as proposed under the influence of observational error is In order to obtain variance, we can write equation (20) as Mathematical Problems in Engineering y ds ym − μ Ysy � μ Ysy e 0 − be 1 μ Xsy .
By squaring both sides of equation (22) and taking expectation, From equation (23), we can get the variance of the estimator as By substituting the value of b in equation (24), we get the minimum variance of the estimator as From equations (6) and (25), we can write that MSE in the presence of observational error is always high. By putting σ 2 Usy and σ 2 Vsy equal to zero, we can obtain the MSE under no observational error which is the same as given in equation (6).

Results
A numerical study has been carried out to show the efficacy of the proposed methods. We have taken the data from https://lib.stat.cmu.edu/datasets/bodyfat. is is a comprehensive dataset that lists estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men. With this population, two sample populations for k � 10, 25 have been chosen using systematic sampling.
In this manuscript, we also consider the presence of observational error in sample data. For the study of observational error, we have conducted a simulation study. A hypothetical population has been generated by using the mean and variance of original data under study. A population of size 5000 units with mean vector and a covariance matrix has been generated. e data matrices on X, Y, u, and v have been generated using multivariate normal distribution for four variables with the mean vector μ Y μ X 0 0 and covariance matrix Two sets for k � 10, 25 have been chosen by using systematic sampling. e mean and variances have been computed for all the auxiliary variables. e mean square error and the variance have been computed. e above process has been replicated 5000 times, and the corresponding grand mean has been obtained. e percent relative efficiency of an estimator ϕ(� y Rsym , y Psym , y dsym ) with respect to the usual unbiased estimator y sym is calculated by e results of the numerical and simulation studies are given in Tables 1 and 2. Table 1 shows the MSE and PRE of the data linked in the abstract. From the table, we can see for all the measures of body parts, ratio and difference estimators perform better than the usual estimator. In all cases, the use of body measures of the hip has maximum efficiency over other body measures as it has maximum correlation coefficient with the body mass index. After the hip, the use of body measures of the thigh has more efficiency in the estimation.
e body measures of the abdomen have also better correlation with the body mass index, so it has better efficiency. e body measures of the ankle and the forearm have less correlation coefficient with the body mass index, and the resultant has less efficiency in the estimation. e circumference of the wrist has minimum correlation coefficient with the body mass index. e mean square error for the wrist is maximum; thus, it is better not to use the wrist circumference in the estimation of the body mass index. Table 2 shows the results of the data with error variance (σ 2 U , σ 2 V � 0.5, 0.1). e MSE in the presence of observational errors is always high for all the estimators. e above results of different body measures follow the same trends in the presence of observational error. Hence, the properties of estimators do not change in the presence of observational error, but the value of mean square error is large. In a study related to the sample size, the value of mean square error is less when the size of sample is large, i.e., k is small. When k is large, the size of the sample is small and MSE is high for all the proposed estimators for all the body measures. is result can be seen from Tables 1 and 2.

Conclusions
We have given a different approach to estimate BMI rather than the available method [21]. is study is used for systematic sampling by using auxiliary variables. e different measures of the body are used as auxiliary variables. From the study, we may conclude that a difference estimator under systematic sampling has maximum efficiency in the estimation of the body mass index. e efficacy of the methods depends on the correlation between the body mass index and the circumference of the different measures of the body. e correlation coefficient for the body measurement of the hip, abdomen, and thigh is good, so these variables provide better estimation for the body mass index when the circumferences of these parts are used as auxiliary variables. e circumferences of body parts the wrist, forearm, and ankle have the least correlation coefficient with the body mass index and thus may not be used in the estimation of BMI. From the tables, we can also conclude that the ratio estimator and difference estimator are always more efficient than the unbiased mean estimator. So it is better to use ratio and difference methods of estimation by using the different measures of the body as auxiliary variables. Since in this article, we are assuming the presence of observational error in the study of 252 men. e efficiency of the difference estimators is better in the presence of observational error. Also, the presence of observational error does not change the properties of the estimators. From Tables 1 and 2, we can conclude the effect of the observational error on mean square error. e above study provided an easy approach and the simplest way to obtain the BMI estimation with and without observational error. us, the suggested method may be used by statisticians for this problem and many other similar problems in the estimation of parameters of a natural population.

Limitations of Study
e present study proposes a simple method to estimate BMI. Although, the current methodology is confined to the homogeneous population or natural population or population for close geographical areas. e strengths contain the fact that BMI is cheap and relatively easy to use. e weaknesses include the fact that BMI percentiles are not extensively used, and the classification of BMI percentiles may not satisfactorily define the risk of comorbid conditions. In addition, for stratifying children and adolescents with a very high BMI, percentiles are not optimal. In spite of limitations, BMI and BMI percentiles have immense utility in the clinical setting, and the impending to be even more constructive as BMI is used more frequently and more suitably by primary care providers.

Abbreviations
BMI: Body mass index MSE: Mean square error PRE: Percent relative efficiency.

Data Availability
All the relevant data are included in the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest.