Coverage Properties of a Neural Network Estimator of Finite Population Total in High-Dimensional Space

The problem in nonparametric estimation of finite population total particularly when dealing with high-dimensional datasets is addressed in this paper. The coverage properties of a robust finite population total estimator based on a feedforward backpropagation neural network developed with the help of a superpopulation model are computed, and a comparison with existing model-based estimators that can handle high-dimensional datasets is conducted to evaluate the estimator’s performance using simulated datasets. The results presented in this paper show good performance in terms of bias, MSE, and mean absolute error for the feedforward backpropagation neural network estimator as compared to other identified existing estimators of finite population total in high-dimensional datasets. In this regard, the paper recommends the use of the proposed estimator in estimating population parameters such as population total in the presence of high-dimensional datasets.


Introduction
Assume that there is a nite population of N unique and identi able units; U 1, 2, . . . , N { } . Let each population unit to have the variable of interest Y. It is assumed that auxiliary variable X ∈ R d exists which is closely related with Y and is known for the entire population (i.e., X 1 , X 2 . . . , X N ). Researchers encounter the problem of estimating a population function (i.e., a function of Y ′ s ), for instance, the population total.
While estimating the population total T, a sample s is picked so that the pair (x i,j , y i ), i 1, 2, . . . , n and j 1, 2, 3, . . . , d, is obtained from the variables X and Y.
ese are then used in the design, estimation, or both stages. For these auxiliary variables, a superpopulation model [1,2] can be used at the estimation stage of inference. It should be noted that all these methods are based on simple statistical models that describe the underlying relationships between the survey and auxiliary variables (linear regression models). Hansen [3] showed that under the parametric superpopulation, model misspeci cation can lead to substantial errors in inference. To solve this problem, nonparametric regression involving robust estimators in nite population sampling has been proposed [4][5][6].
When applying nonparametric kernel-based regression estimators over a nite range in estimating nite population parameters, one of the most common problems that is encountered is the bias at the edges [7]. It is also known that kernel and polynomial regression estimators provide good estimates for the population totals when x ∈ R d and d 1 [6,8].
Despite the fact that high-dimensional auxiliary knowledge can be accounted for in the aforesaid estimators, the problem of sparse regressors in the design space makes kernel methods and local polynomials unfeasible, as performance deteriorates signi cantly as the dimension increases [8][9][10]. e reason behind this poor performance is due to the curse of dimensionality. e "curse of dimensionality" is a phenomenon induced by the sparsity of data in high-dimensional spaces leading to a decrease in the fastest attainable rates of convergence of regression function estimators towards their target curve as the dimension of the regressor vector grows. Friedman [11] provided an overview of the concept of the curse of dimensionality. Given the challenge of the curse of dimensionality, one has to use different nonparametric estimators to retain a large degree of flexibility. Using recursive covering in modelbased approach [12] and generalized additive modelling in a model-assisted framework [13] is one way to get around this curse of dimensionality when dealing with multivariate auxiliary information. ese estimation methods come at a cost of reduced flexibility with the associated risk of increased bias [9][10][11]14].
In this regard, a robust nonparametric estimator of finite population total based on a feedforward backpropagation neural network method is proposed in this paper to help in resolving the failures of previously identified estimation approaches. Despite the fact that kernel and local approximators have the same property as artificial neural networks (ANNs), they usually require a large number of components to achieve similar approximation accuracy [15]. As a consequence, ANNs are regarded as an efficient method of performing parametric and nonparametric functional analysis.

Neural Network Estimator of Finite Population Total
In describing this estimator, the procedure provided in [16] is followed. Let Y be the survey variable associated with an auxiliary variable X assumed to follow a superpopulation model under a model-based approach. A commonly used working model for the finite population is such that x ij ∈ R d , ε 1 , ε 2 , . . . , ε N i.i.d with mean zero and x ij , i � 1, 2, . . . , N, j � 1, 2, . . . , d are considered as the auxiliary information. Also let be the finite population total where s are the sample units and r are the nonsampled units. Assume that y i is given according to equation (2) with Consider estimating m(x) based on a feedforward backpropagation neural network. e neurons which act as the basic building blocks can be considered as a nonlinear transformation of the input variables x � (x 1 , . . . , x d ) Feedforward neural network that has least one layer of hidden units is considered to be a complex network and allow for information feedback can be specified. Without loss of generality, the paper will only concentrate on the structure presented in equation (4), which is commonly used for a wide range of applications and has appealing features of being implemented in statistical softwares.
In the simplest case of one hidden layer with H ≥ 1 neurons, the network can be written to represent the network function as follows: with w h � (w 1h , . . . , w dh ) ∈ R d and where M(H) � (d + 1)H + H + 1 represents the vector of all parameters of weights of the network. ψ: R↦R is a given activation function. Regarding regression issues, sigmoid functions that resemble the distribution function of a genuine random variable, for example, typically produce good results. e logistic sigmoid and the bipolar sigmoid are two extensively used sigmoid functions that can be employed depending on the needed output. Whenever the goal is to approximate functions that map into probability space, the logistic function is preferred. e activation function is viewed as a smooth equivalent of the indicator function when the input signals are "squashed" between zero and one. As an illustration of the logistic function, consider the following: which tends to one (zero) since its arguments approach infinity (negative infinity). As a result, based on the received input signals, the logistic activation function creates partially on/off signals.
For this work, f H (x; θ) specifies a one-dimensional mapping from the input space R d to the output space.
for each continuous function m, any ε > 0, and any compact set C⊆R d , there exists a function f H ∈ O with uniform approximation qualities [17][18][19], for example, is suggests that any regression function m(x) can be well estimated with a big sufficient number of neurons and the right parameters θ. erefore, a nonparametric estimate for m(x) is obtained by first choosing H, which serves as a tuning parameter and determines the smoothness of the estimate. e parameter θ is estimated from the data by nonlinear least squares: with Under the right circumstances, θ n converges in probability as n ⟶ ∞ and H, constant to the parameter vector Also, under some stronger assumptions, the asymptotic normality of θ n and thus the estimator of m(x) � f H (x; θ n ) also follow the regression function m(x). erefore, the immediate consequence of these is that e estimation error θ n − θ can be broken down into two asymptotically independent parts: minimizes the sample version of D(θ) [20]. f H (x; θ) converges to the regression function m(x) for H ⟶ ∞, due to the universal approximation property of neural networks. As H grows with n at an adequate rate, f H (x; θ n ) becomes a consistent nonparametric estimator of m(x). As a result of these findings, Were and Orwa [16] showed that the corresponding estimate of the finite population total is as follows: which is the proposed estimator for the finite population total where m n (x j ) � f H (x; θ n ). As noted in [16], T NN is a model-based estimator, so that all the inference is with respect to the model for the y i ′ s, not the survey design. is estimator is identical to that proposed in [5], except that the NN is replaced by a kernel-based regression. Lastly, this estimator can be used to estimate the population totals of a finite population as long as each of the unsampled elements has the same distribution as the sample.
It should be noted that (1). Where certain conditions are satisfied and if the activation function ψ(u) is Lipschitz continuous and strictly increasing, then it can be shown that the neural network estimate T NN of the population total T given by (12) with m n (x) � f(x, θ n ) and θ n given by (8) is consistent in the following sense.
where N, n ⟶ ∞ with (n/N) ⟶ π ∈ (0, 1), provided that the number H n and the bound Δ n of the network weights satisfy H n , Δ n ⟶ ∞ such that where α determines how fast the tail probability of the ε i and y i decreases. White [19] showed that the appropriate choice for Δ n is such that Δ n ⟶ ∞ as n ⟶ ∞ and Δ n � o(n 1/4 ), i.e., n 1/4 Δ n ⟶ 0 as n ⟶ ∞. (2). Where certain conditions are met, it can be shown that the mean squared error defined by E(T NN − T) 2 where T denotes the true population total of the proposed estimators reduces to where estimate of var(ε i ) is given as For the details and complete proof of these properties, see [16].

Coverage Properties
In order to compute and understand the coverage properties of the proposed estimator and how it is compared against other existing nonparametric regression estimators, the proposed estimator's performance is compared to that of identified estimators: multivariate adaptive regression splines (MARS), generalized additive model (GAM), and local polynomial (LP), which can handle high-dimensional data through a simulation study. Scenarios where the true function is the sum of twodimensional linear function, two-dimensional quadratic function, and three-dimensional mixed function given below are considered: For all of the simulation performed, data are generated according to model 2 where ε ∼ N(0, 1). e auxiliary variable vector X ∈ R d was generated from iid uniform (0,1) random vector. e errors ϵ were generated from i.i.d N(0, 1) with noise level σ � 0.1, 0.4. tanh is used as the activation function in this neural network. 1000 samples of sizes 4, 000 and 8, 000 were generated using simple random sampling from a population of size 10, 000. Because of the hypothesized relationship between the study variable and the auxiliary variable, which must be depicted in the simulation, the sampling is done with indices.
Tables 1-3 summarize the findings of this simulation investigation. e unconditional bias (UB), unconditional mean square error (UMSE), unconditional relative mean square error (URMSE), and unconditional mean absolute error (UMAE) for said estimators at different sample sizes are shown in Tables 1-3. e MAE reveals how near the Journal of Mathematics estimate being examined is to the true value, while the MSE and RMSE represent the estimator's precision. For example, if TNN's UMSE and URMSE are comparable, it will reasonably be considered "better" or "more desirable" than other estimators. e deviation of the estimator's expected value from the true total value is known as the bias of a population total estimator. All of the estimators of the finite population total discussed here are biased, but T NN is the least biased. T NN can be seen as most efficient estimator of finite population total in all models and sample sizes, closely followed by T MARS . Because of their relatively large bias values, the generalized additive estimator and the local polynomial regression estimator overestimate the finite population total under all models.
In addition, T NN has lower mean square error, relative mean square error, and mean absolute error which is followed closed with the estimator T MARS . It is also observed that as the sample increases, all the estimators recorded a significant improvement in their performance in estimating the finite population totals. e local polynomial regression estimator with a significant reduction in bias and mean square error is noteworthy. is follows the argument by Stone [10]: to improve the efficiency of the local smoother in high-dimensional spaces, one has to use a large sample size. e neural network estimator still outperforms other estimators with significant reduction in biases, mean square errors, relative root mean square errors, mean absolute errors, and mean absolute percentage errors as sample sizes increases. e results provided in Table 3 which provides the results for performance of the estimators for a three-dimensional mixed model are noteworthy. Compared to the two-dimensional case, the performance of all the estimators has marginally decreased as indicated by marginal increase in biases, mean square errors, relative mean square errors, and mean absolute errors across all the estimators of finite population total. It is also observed that the generalized additive estimator and local polynomial regression still recorded poor performance in terms of biases, mean square errors, relative mean square errors, and mean absolute errors in estimating the finite population total. In the other case, T NN has lower biases, mean square errors, relative root mean square errors, mean absolute errors, and mean absolute percentage errors which is followed closely by the estimator T MARS .
Even with the increasing sample size, all the estimators record a significant improvement in their performance in estimating the finite population totals. For instance, a local polynomial regression estimator is noted to have a significance reduction in bias and mean square errors as the sample size increases. e neural network estimator still remains the estimator of choice compared to other estimators as sample sizes increases. e estimator's conditional performance was assessed and compared to that of other finite population total estimators in high-dimensional space that have been identified. To do this, 1000 simple random samples were sorted using the sample means of X s value criterion. e samples were then grouped into sets of twenty samples such that the first set is made of samples with the lowest sample means of X s values, the second set consists of samples with means of X s that are larger than the sample means of the first set, and so on until the last set that consists of samples with the largest sample means of X s values. In each of the group, the bias, mean square error, relative mean square error, and mean absolute error were computed. e results of group conditional bias (CB), conditional mean square error (CMSE), conditional relative mean square error (CRMSE), and conditional mean absolute error (CMAE) for the finite population total estimators T NN , T MARS , T GAM , and T LP are plotted against group average values X denoted as Xbar in the fifty groups of mean of X s . e conditional findings for the estimators under the two-dimensional linear model, two-dimensional quadratic model, and three-dimensional mixed model are shown in Figures 1-3. e bias characteristics of the numerous estimators differ significantly in the majority of circumstances. A closer look at the plots reveals that T NN and T MARS have lower levels of bias overall, as seen by the displayed curves' proximity to the horizontal (no bias) line at 0.0 on the vertical axis. Consequently, despite the complex structure of

Conclusion
In this paper, the coverage properties of an estimator for finite population total based on a feedforward backpropagation neural network technique in nonparametric regression have been studied. e properties such as the bias, mean squared error, and mean absolute error have been computed for the case of high-dimensional datasets through a simulation, and the findings were compared with those of existing estimators such as multivariate adaptive regression splines (MARS), generalized additive model (GAM), and local polynomial (LP) which can handle high-dimensional data.
From the results, the following observations and conclusions have been made: (i) e neural network estimator estimates the finite population total better than all other robust estimators in high-dimensional case. (ii) e performance of local polynomial estimator in the estimation of finite population becomes poor as the dimension of the data increases. (iii) For all the estimators, as the sample sizes increases, biases, mean square errors, relative root mean square errors, mean absolute errors, and mean absolute percentage errors decrease for the four models considered.
(iv) For all the estimators, as the dimension increases, biases, mean square errors, relative root mean square errors, mean absolute errors, and mean absolute percentage errors decrease for all the four models considered.
To this end, the main conclusion is that the estimator of finite population total based on the feedforward backpropagation neural network has proved to yield results with great precision, and therefore it is recommended for estimating finite population total. It should be noted that the proposed estimator has been considered in case of simple random sampling without replacement (SRSWoR). An extension to other sampling techniques such as stratification may be done since they rely on SRSWoR, and it is hypothesized that efficiency will be improved compared to other existing estimators in the literature.

Data Availability
e data used are artificial data from simulation process using a specified model.