In practical utilization of stratified random sampling scheme, the investigator meets a problem to select a sample that maximizes the precision of a finite population mean under cost constraint. An allocation of sample size becomes complicated when more than one characteristic is observed from each selected unit in a sample. In many real life situations, a linear cost function of a sample size nh is not a good approximation to actual cost of sample survey when traveling cost between selected units in a stratum is significant. In this paper, sample allocation problem in multivariate stratified random sampling with proposed cost function is formulated in integer nonlinear multiobjective mathematical programming. A solution procedure is proposed using extended lexicographic goal programming approach. A numerical example is presented to illustrate the computational details and to compare the efficiency of proposed compromise allocation.
1. Introduction
It is common practice in sample survey related to agriculture, market, industries, and social research, and so forth that usually more than one characteristic is observed from each sampled unit of population. Stratified random sampling is more suitable than other survey designs used for obtaining information from heterogeneous population for reasons of economy and efficiency. The theory of stratified random sampling deals with the properties of estimator constructed from stratified random sample and with the best (optimum) choice of sample size to be selected from various strata either to maximize the precision of constructed estimator for a fixed cost or to minimize the cost of survey for fixed precision of estimator. The sample sizes selected according to above criteria are known as “optimum allocation.” In general, variance of study variate varies from stratum to stratum that provides basis for selecting optimum sample size.
Tschuprow [1] and Neyman [2] independently proposed an allocation procedure that minimizes variance of sample mean under a linear cost function of sample size nh in stratified random sampling scheme. Neyman [2] used Lagrange multiplier optimization technique to get optimum sample size for single variable under study. In stratified sampling, sample allocation problem becomes complicated when more than one characteristic is observed from each selected unit of a finite population. An allocation which is optimum for single characteristic may not be optimum for others unless the characteristics are highly correlated. There is need to use some compromise allocation criteria which produce an optimum allocation for all characteristics in some sense, for example, an allocation that minimizes the trace of variance-covariance matrix of the estimator of population mean or an allocation that minimizes the weighted average of variances or an allocation that maximizes the total relative efficiency of the estimators as compared to corresponding individual optimum allocation (Varshney et al. [3]). Many authors such as Dalenius [4, 5], Ghosh [6], Folks and Antle [7], Chromy [8], Bethel [9], Jahan et al. [10, 11], Khan et al. [12], Khan et al. [13, 14], Ansari et al. [15], Khan et al. [16], and Varshney et al. [17] used different compromise criterion to solve allocation problem in stratified random sampling scheme.
The cost of survey is an important factor of sample allocation to various strata. The linear cost function used in stratified sampling is given as
(1)C=c0+∑h=1Lchnh,
where C denotes total budget available for survey, ch for h=1,2,…,L represents measurement per unit cost in the hth stratum, c0 represents fixed cost of survey, and nh is number of sample units selected in the hth stratum. In many practical situations, measurement unit cost and travel cost within strata are important factors of survey cost. The nonlinear cost function including measurement unit cost and traveling cost within strata is good approximation to actual cost of survey. Beardwood et al. [18] suggested that the shortest rout among k randomly disperse destination within a region is asymptotically proportional to k for large k. Varshney et al. [17] used nonlinear cost function for large sample size given in (2). Consider
(2)C=c0+∑h=1Lchnh+∑h=1Lthnh,
where th is travel cost within hth stratum. The problem of finding the shortest rout among nh selected units in hth stratum is often called the “shortest rout problem” in the operation research literature. If rout map and its length is given for each strata, we find shortest rout among nh units within strata that nh is either small or large. This shortest rout is used for practical purpose with confidence (Beardwood et al. [18]).
Consider following proposed nonlinear cost function:
(3)´C=∑h=1Lchnh+∑h=1Lthnhδ,
where ´C=C-c0 and δ represents the effect of travel within strata to cost function. The value of δ is determined by solving shortest rout problem using methods discussed by Hiller and Lieberman [19]. The cost function in (2) becomes particular case of our proposed cost function given in (3) if δ=0.5.
Generally, Lagrange multiplier technique (LMT) is used to determine sample size. However, the constraint 2≤nh≤Nh, where nh(h=1,2,3,…,L) is an integer neglected in using LMT. For integer value of sample size nh, rounding rule is used which may lead to violating the optimality or feasibility conditions (or both). We need integer value of sample size nh for practical implementation. Therefore, the authors did not try to use LMT and used integer programming for integer value of strata sample sizes nh.
In this paper, we discuss compromise allocation based on minimization of coefficients of variation of regression estimators of population mean in multivariate stratified random sampling design under proposed nonlinear cost function (3). The problem is formulated in multiobjective integer nonlinear programming. The extended lexicographic goal programming technique is applied to solve formulated allocation problem. The GAMS—AlphaECP Rosenthal [20] optimization software is used to solve numerical example which illustrates the computational detail of allocation procedure.
2. Formulation of the Problem
Consider a population of N units divided in to L mutually exclusive strata of size Nh(h=1,2,…,L) such that ∑h=1LNh=N. The simple random sample of size nh is drawn from each stratum independently. Suppose we observe Yji(i=1,2,…,Nh,j=1,2,…,p), p≥2, characteristics from each unit in hth stratum and estimate population mean of p≥2 characteristics. Let y¯jh and x¯jh be the sample means and Y¯jh and X¯jh the population means of study variable Yjh and auxiliary variable Xjh, respectively, of jth characteristics in the hth stratum. Syjh2 and Sxjh2 are population variance and Syxjh is population covariance between the jth study and auxiliary variable in the hth stratum. bjh=syxjh/sxih2 and βjh=Syxjh/Sxjh2 are sample and population regression coefficients and Wh=Nh/N is stratum weight.
Consider an estimator,
(4)y¯j,lrs=∑h=1LWhy¯j,lrh,
where y¯j,lrh=y¯jh+bjh(X¯jh-x¯jh).
The mean square error (MSE) of y¯j,lrs is given as
(5)MSE(y¯j,lrs)=∑h=1LWh2(1nh-1Nh)[Syjh2-2βjhSyxjh+βjh2Sxjh2],(6)MSE(y¯j,lrs)=∑h=1LWh2´Ujhnh-∑h=1LWh2´UjhNh,
where
(7)´Ujh=Syjh2-2βjhSyxjh+βjh2Sxjh2.
If we ignore the second term in RHS of (6) because it is independent of sample size nh, then
(8)MSE(y¯j,lrs)=∑h=1LWh2´Ujhnh.
Since different characteristics are measured with different units, we need to use an estimate which should be independent of measurement unit. Therefore, coefficient of variation is used instead of mean square error; that is,
(9)C.V(y¯j,lrs)=MSE(y¯j,lrs)Y¯j2
or
(10)C.V(y¯j,lrs)=Zj=∑h=1L´ujhnh,
where
(11)´ujh=Wh2´UjhY¯j2.
A sample size nh is determined under proposed nonlinear cost function in (3) that minimizes coefficients of variation of the estimator of population mean for each characteristics Yj(j=1,2,…,p). This problem may be formulated in multiobjective integer nonlinear programming as in (12). Consider
(12)Minimize(Z1,Z2,…,Zp)Subjectto∑h=1Lchnh+∑h=1Lthnhδ≤´Cvvvvvvvvvvvvvvvvvv2≤nh≤Nhvvvvvvvvvvvvvvvvvvnhareintegersvvvvvvvvvvvnh∈F;h=1,2,…,L,
where F represents the feasible region that fulfills all constraints and sign restrictions. Any solution that exists within feasible region is implementable in practice.
3. Extended Lexicographic Goal Programming
Romero [21] proposed extended lexicographic goal programming method that provides a general framework which covers and allows the mixture of most common method of solving multiobjective decision making problems. It is also encompasses distance based multicriteria decision making technique. Romero [22] extended this work to make more general form of objective function. It is a technique used by decision makers for optimizing more than one objective under some constraints. In goal programming, all specified objectives are included in the model. The decision maker tries to minimize the potential deviations from specified objectives.
Consider the following individual optimum problem:
(13)MinimizeZjSubjectto∑h=1Lchnh+∑h=1Lthnjhδ≤´Cvvvvvvvvvvvvvvvvv.2≤njh≤Nhvvvvvvvvvvnjhareintegers,njh∈F.ccccch=1,2,…,L,j=1,2,…,p.
Let Zj* be the individual optimum values of Zj obtained by solving above problem. These optimum values Zj* specify objectives and try to achieve these objectives using multiobjective mathematical programming. Let Z^j be values of objectives obtained by applying multiobjective optimization method. It is obvious that Z^j≥Zj* or Z^j-Zj*≥0 is the increase in Zj due to compromise among objectives using compromise criterion. Suppose this increase is dj+≥0. To achieve these specified objectives, we must have
(14)Z^j-Zj*≤dj+
or
(15)Z^j-dj+≤Zj*.
In goal programming method, we minimize the deviations dj+ using additional constraint equation (15). To solve multiobjective allocation problem (12), the extended lexicographic goal programming has following mathematical model:
(16)Minimizeα(λ)+(1-α)(∑j=1pdj+)Subjecttodj+≤λvvvvvvvvvvZj-dj+≤Zj*vvvvvvvvvv∑h=1Lchnhc+∑h=1Lthnhcδ≤´Cvvvvvvvvvvvvvvvvv2≤nhc≤Nhaaaaaaaaaaaaaaanhcareintegersvvvvvvvvnhcznhc∈F,nhc≥0,dj+≥0zzzzzh=1,2,…,L,j=1,2,…,p,
where α is a constant that can assume minimum value zero and maximum value one. dj+ is positive deviational variable.
4. Some Other Compromise Allocations
In this section, some other compromise allocations are discussed for the sake of comparison with the proposed allocation.
4.1. Cochran Compromise Allocation
Cochran [23] proposed a compromise allocation criteria by averaging the individual optimum allocation njh*(j=1,2,3) that is solution to integer nonlinear programming problem (INLPP) (13) over the characteristics.
Cochran’s compromise allocation is given by
(17)nh=1p∑j=1pnjh*.
4.2. Khan et al. Compromise Allocation
Khan et al. [13] compromise allocation is obtained by minimizing the weighted sum of variances. The mathematical model of Khan et al. [13] compromise allocation is given by
(18)Minimize∑j=1p´αjZjSubjectto∑h=1Lchnh+∑h=1Lthnjhδ≤´Cvvvvvvvv.vvvvvvvvv2≤njh≤Nhvvvvvvvvvvnjhareintegers,njh∈F.vvvvvh=1,2,…,L,j=1,2,…,p,
where ´αj=∑h=1LSjh2/∑j=1p∑h=1LSjh2 is the relative weights proposed by Khan et al.
5. Numerical Example
The data are taken from agricultural census in Iowa state conducted by National Agricultural Statistics Service, USDA, Washington DC as reported by Khan et al. [16]. We assume that c1=12, c2=8, c3=6, c4=10, t1=6, t2=4, t3=3, and t4=5.
Let
denote the quantity of corn harvested in 2002;
Y2 denote the quantity of oats harvested in 2002;
X1 denote the quantity of corn harvested in 1997;
X2 denote the quantity of oats harvested in 1997.
The data summary is given as Y-1=474973.90, X-1=405654.19, Y-2=1576.25, and X-2=2116.70. The detailed summary of data is given in Tables 1 and 2.
Data summary.
h
Nh
Wh
Sy1h2
Sx1h2
Sy2h2
Sx2h2
1
8
0.0808
29267524195.5
21601503189.8
777174.1
1154134.2
2
34
0.3434
26079256582.8
19734615816.7
4987812.9
7056074.8
3
45
0.4545
42362842460.8
27129658750.0
1074510.6
2082871.3
4
12
0.1212
30728265336.9
17258237358.5
388378.5
732004.9
Data summary.
h
Sx1y1h
Sx2y2h
β1h
β2h
u1h´
u2h´
1
24360422802.3
902170.6
1.1249
0.7834
0.000066
0.000181
2
22003466630.3
5813439.5
1.1150
0.8239
0.000809
0.009411
3
33367597192.0
1285355.6
1.2300
0.6171
0.001212
0.023390
4
21033769867.3
456991.5
1.2188
0.4243
0.000332
0.000610
The allocation problem formulated in multiobjective integer nonlinear programming is(19)Minimize(Z1=0.000066n1+0.000809n2+0.001212n3+0.000332n4Z2=0.000181n1+0.009411n2+0.023390n3+0.000610n4)Subject to
(20)12n1+8n2+6n3+10n4+6n1δvv.v+4n2δ+3n3δ+5n4δ≤´Cvvvvvvvvvvvvvvv2≤n1≤8vvvvvvvvvvvvvv2≤n2≤34vvvvvvvvvvvvvv2≤n3≤45vvvvvvvvvvvvvv2≤n4≤12vn1,n2,n3,andn4areintegers.
Consider
(21)MinimizeZ1=0.000066n11+0.000809n12+0.001212n13+0.000332n14
subject to
(22)12n11+8n12+6n13+10n14+6n11δvv.vc+4n12δ+3n13δ+5n14δ≤´Cvvvvvvvvvvvvvvvv2≤n11≤8vvvvvvvvvvvvvvv2≤n12≤34vvvvvvvvvvvvvvv2≤n13≤45vvvvvvvvvvvvvvv2≤n14≤12n11,n12,n13,andn14areintegers.
Consider
(23)MinimizeZ2=0.000181n21+0.009411n22+0.023390n23+0.000610n24
subject to
(24)12n21+8n22+6n23+10n24+6n21δvv.v+4n22δ+3n23δ+5n24δ≤´Cvvvvvvvvvvvvvvvv2≤n21≤8vvvvvvvvvvvvvvv2≤n22≤34vvvvvvvvvvvvvvv2≤n23≤45vvvvvvvvvvvvvvv2≤n24≤12n21,n22,n23,andn24areintegers.Z1* and Z2* are coefficients of variation under individual allocation at different values of δ and ´C given in Table 3.
Individual optimum allocation.
δ
C´
Allocation
n1
n2
n3
n4
Used C´
Z1*
Z2*
Trace = Z1* + Z2*
0.5
300
Y1
2
9
18
5
298.00
0.01602
0.05057
0.06659
Y2
2
10
22
2
298.27
0.01830
0.04899
0.06729
1
500
Y1
4
11
21
7
498.00
0.01398
0.04584
0.05982
Y2
2
13
29
3
498.00
0.01601
0.04271
0.05872
1.5
850
Y1
4
12
15
9
847.56
0.01420
0.04950
0.06376
Y2
2
14
21
3
833.18
0.01610
0.04560
0.06170
2
1500
Y1
3
10
13
7
1470.00
0.01560
0.05374
0.06934
Y2
2
10
16
3
1467.00
0.01733
0.05193
0.06926
5.2. Proposed Compromise Allocation
We used extended lexicographic goal programming model (16) for sample allocation to different strata taking into account two characteristics Y1 and Y2. Consider
(25)Minimize0.2(λ)+(1-0.2)(d1++d2+)
subject to
(26)d1+≤λd2+≤λ0.000066n1c+0.000809n2c+0.001212n3c+0.000332n4c-d1+≤Z1*0.000181n1c+0.009411n2c+0.023390n3c+0.000610n4c-d2+≤Z2*12n1c+8n2c+6n3c+10n4c+6n1cδvv+4n2cδ+3n3cδ+5n4cδ≤´Cvvvvvvvvvvvvvvv2≤n1c≤8vvvvvvvvvvvvvv2≤n2c≤34vvvvvvvvvvvvvv2≤n3c≤45vvvvvvvvvvvvvv2≤n4c≤12nhc(h=1,2,3,4)ϵFareintegervvvvvvvvvvvvvvvvvvvd1+,d2+≥0.
Let Z^1 and Z^2 be the coefficients of variation at various values of constants δ and ´C under proposed allocation given in Table 6.
5.3. Khan et al. Compromise Allocation
We have applied model (18) to find compromise allocation proposed by Khan et al. Consider(27)Minimize(0.9999440.000066n1+0.000809n2+0.001212n3+0.000332n4+0.0000560.000181n1+0.009411n2+0.023390n3+0.000610n4)subject to
(28)12n1+8n2+6n3+10n4+6n1δvvv+4n2δ+3n3δ+5n4δ≤´Cvvvvvvvvvvvvvv2≤n1≤8vvvvvvvvvvvvv2≤n2≤34vvvvvvvvvvvvv2≤n3≤45vvvvvvvvvvvvv2≤n4≤12n1,n2,n3,andn4areintegers.
The values Z^1 and Z^2 are the coefficients of variation under Khan et al. compromise allocation obtained by solving above model at different values of constants δ and ´C given in Table 5.
6. Discussion
In this section, a comparative study of proposed compromise allocation with Cochran compromise allocation, Khan et al. compromise allocation, and individual optimum allocation has been made. The comparison is based on trace of variance-covariance matrix of the estimates of finite population means under compromise allocations. We assume that characteristics are independent; therefore, covariances are zero. Table 3 gives a individual optimum allocation. Tables 4 and 5 give Cochran compromise allocation and Khan compromise allocation as discussed in Section 5. The proposed compromise allocation is given in Table 6.
Cochran’s compromise allocation.
δ
C´
n1c
n2c
n3c
n4c
Used C´
Z^1
Z^2
Trace = Z^1+Z^2
0.5
300
2
10
20
4
308.55
0.01605
0.04851
0.06456
1.0
500
3
12
25
5
498.00
0.01429
0.04361
0.05790
1.5
850
3
13
18
6
829.25
0.01438
0.04675
0.06113
2.0
1500
3
10
15
5
1510.00
0.01581
0.05180
0.06761
Khan et al. compromise allocation.
δ
C´
n1c
n2c
n3c
n4c
Used C´
Z^1
Z^2
Trace = Z^1+Z^2
0.5
300
2
9
18
5
298.00
0.01602
0.05057
0.06659
1
500
4
11
21
7
498.00
0.01398
0.04584
0.05982
1.5
850
4
13
15
8
844.91
0.01418
0.04903
0.06321
2.0
1500
4
12
12
5
1495.00
0.01585
0.05386
0.06971
Proposed compromise allocation.
δ
C´
n1c
n2c
n3c
n4c
Used C´
Z^1
Z^2
Trace = Z^1+Z^2
0.5
300
2
11
19
3
299.49
0.01676
0.04879
0.06557
1.0
500
2
16
23
4
495.00
0.01480
0.04299
0.05779
1.5
850
2
11
22
5
848.58
0.01473
0.04583
0.06056
2.0
1500
3
9
15
6
1491.00
0.01575
0.05260
0.06835
Table 4 shows that Cochran compromise allocation gives high trace values for δ=1,1.5 as compared to proposed compromise allocation given in Table 6. For δ=0.5,2.0, Cochran compromise allocation gives slightly low value of trace but is infeasible because corresponding cost exceeds the available cost. Table 5 shows that Khan et al. compromise allocation gives higher trace values than proposed compromise allocation. The performance comparison of proposed compromise allocation relative to individual optimum allocation of one characteristic is used for both characteristics given in Table 7 based on percentage relative efficiency (PRE) expression given as
(29)PRE=TITC×100,
where TI is the value of trace using individual optimum allocation and TC is the value of trace using proposed compromise allocation. Table 7 shows that proposed compromise allocation provides more efficient estimates of population means as compared to individual optimum allocation.
PRE of proposed compromise allocation to individual optimum allocation.
δ
C´
Y1
Y2
δ
C´
Y1
Y2
0.5
300
101.59
102.65
1.5
850
105.28
101.88
1.0
500
103.51
101.61
2.0
1500
101.45
101.33
7. Conclusion
On the basis of the comparison made in Section 6, we can conclude that the extended lexicographic goal programming approach always secures a feasible solution which is not granted Cochran’s compromise method and it provides better results comparative to Khan et al. compromise approach and individual optimum allocation approach from the point of view of efficiency.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
TschuprowA. A.On the mathematical expection of the moments of frequency distributions in the case of correlated observationsNeymanJ.On the two different aspect of representative method: the method of stratified sampling and method of purposive selectionVarshneyR.AhsanM. J.KhanM. G. M.An optimum multivariate stratified sampling design with nonresponse: a lexicographic goal programming approachDaleniusT.The problem of optimum stratification-IIDaleniusT.GhoshS. P.A note on stratified random sampling with multiple charactersFolksJ. L.AntleC. E.Optimum allocation of sampling units to strata when there are R responses of interestChromyJ. R.Design optimization with multiple objectivesProceeding of the Survey Research Section1987Washington, DC, USAAmerican Statistical Association194199BethelJ.An optimum allocation algorithm for multivariate surveysJahanN.KhanM. G. M.AhsanM. J.A generalized compromise allocationJahanN.KhanM. G. M.AhsanM. J.Optimum compromise allocation using dynamic programmingKhanE. A.KhanM. G. M.AhsanM. J.Optimum stratification: a mathematical programming approachKhanM. G. M.KhanE. A.AhsanM. J.An optimal multivariate stratified sampling design using dynamical programmingKhanM. G. M.KhanE. A.AhsanM. J.Optimum allocation in multivariate stratified sampling in presence of non-responseAnsariA. H.NajmusseharAhsanM. J.On multiple response stratified random sampling designKhanM. G. M.MaitiT.AhsanM. J.An optimal multivariate stratified sampling design using auxiliary information: an integer solution using goal programming approachVarshneyR.NajmusseharAhsanM. J.Estimation of more than one parameters in stratified sampling with fixed budgetBeardwoodJ.HaltonJ. H.HammersleyJ. M.The shortest path through many points195955299327Mathematical Proceedings of the Cambridge Philosophical SocietyMR0109316HillerF. S.LiebermanG. J.RosenthalR. E.RomeroC.Extended lexicographic goal programming: a unifying approachRomeroC.A general structure of achievement function for a goal programming modelCochranW. G.