Estimation of Sensitive Proportion by Randomized Response Data in Successive Sampling

This paper considers the problem of estimation for binomial proportions of sensitive or stigmatizing attributes in the population of interest. Randomized response techniques are suggested for protecting the privacy of respondents and reducing the response bias while eliciting information on sensitive attributes. In many sensitive question surveys, the same population is often sampled repeatedly on each occasion. In this paper, we apply successive sampling scheme to improve the estimation of the sensitive proportion on current occasion.


Introduction
Social survey sometimes includes stigmatizing or sensitive issues of enquiry, such as habitual tax evasion, sexual behaviour, substance abuse, and excessive gambling that it is difficult to obtain valid and trustworthy information. If the respondents are asked directly about controversial matters, it often results in refusal or untruthful answers, especially when they have committed stigmatizing behaviour. To overcome this difficulty, Warner [1] introduced randomized response techniques to estimate the proportion of people bearing such a stigmatizing or sensitive characteristic in a given community. This technique allows the respondent to answer sensitive questions truthfully without revealing embarrassing behaviour. Following the pioneering work of Warner [1], some researchers have made important contributions in this area, such as Christofides [2,3], Singh [4], Kim and Elam [5], Huang [6,7], Singh and Sedory [8], Chang and Kuo [9], Arnab et al. [10]. All these results are based on a sample on one occasion, which is not the case in the present study.
In many sensitive question surveys, the same population is often sampled repeatedly on each occasion, so that the development over time can be followed. In such situations, the use of successive sampling scheme can be attractive alternative to improve the estimators of level at a point in time or to measure the change between two time points. In successive sampling on two occasions, previous theory [11,12] aimed at providing the optimum estimator of mean on the current (second) occasion. Successive sampling has also been discussed in some detail by Narain [13], Raj [14], Singh [15], Ghangurde and Rao [16], Okafor [17], Arnab and Okafor [18], Biradar and Singh [19], G. N. Singh and V. K. Singh [20], Artes et al. [21], and so forth, and Singh et al. [22]. However no effort has been made to estimate the proportions of sensitive infinite population on the current occasion. This motivation led the authors to consider the problem of estimating the binomial proportions of sensitive or stigmatizing attributes in the population of interest in successive sampling on two occasions. In addition, cluster sampling is usually preferred when the target population is geographically diverse. In this paper, we utilize the rotation cluster sample design to construct a class of estimators for the case of randomized response survey. The rest of the paper is organized as follows. In Section 2, we proposed a new scientific survey method using the Simmons model with cluster rotation sampling. In Section 3, corresponding formulas for the mentioned survey method are 2 Computational and Mathematical Methods in Medicine found followed by the aforementioned method and corresponding formulas were successfully designed and applied in a survey of premarital sexual behaviour among students at Soochow University in Section 4. Section 5 contains the conclusion.

Simmons Model.
Simmons model which is based on Warner's randomized response technique was put forward by Horvitz et al. [23]. The basic thought is to develop a random rapport between the individuals and two unrelated questions. Simmons design consists of two unrelated questions, A and B, to be answered on probability basis, where A is "do you possess the sensitive characteristic" and B is a nonsensitive question such as "is your birthday number odd or not. " The two questions A and B are presented to respondents with preset probabilities and 1 − , respectively. The simple random sampling with replacement (SRSWR) is assumed. The selected respondent is asked to select a question A or B and report "yes" if his/her actual status matches with the selected question and "no" otherwise.

Simmons Model in Cluster Rotation
Sampling. In the following sampling on two occasions is considered to estimate population proportion with a sensitive characteristic on second occasion when the rotation sampling units are clusters. The sampling steps for Simmons model under partial clusters rotation are as follows.
Firstly, the population is divided into primary sampling units (or cluster) and the units within the clusters are the secondary sampling units (persons).
Secondly, in the first occasion a random sample of clusters with replacement is drawn from the population. The people within the drawn clusters are asked to select a question A or B and report "yes" if his/her actual status matches with the selected question and "no" otherwise, using the Simmons model.
Thirdly, in the second occasion of the clusters selected on the first occasion are retained at random and the remaining = − of the clusters are replaced by a fresh selection. All the people within the total clusters in the second occasion are investigated using the Simmons model.

The Estimator of the Population Proportion on the Second
Occasion and Its Variance. Consider a random sample of clusters with replacement drawn from the population which consists of clusters and the th cluster of units ( = 1, 2, . . . , ).
In the second (current) occasion of the clusters selected on the first occasion are retained at random and the remaining = − of the clusters are replaced by a fresh selection. Let 1, be the number of the th retained cluster (including units) with the sensitive characteristic under study on the first occasion ( = 1, 2, . . . , ) and let 1, + be the number of the th rotated cluster (including + units) with the sensitive characteristic under study on the first occasion, respectively ( = 1, 2, . . . , ). 2, is the number of the th retained cluster (including units) with the sensitive characteristic under study on the second (current) occasion ( = 1, 2, . . . , ) and 2, + is the number of the th fresh cluster (including + units) with the sensitive characteristic under study on the second (current) occasion ( = 1, 2, . . . , ). Similarly, let 1, be the proportion of the th retained cluster with the sensitive characteristic under study on the first occasion ( = 1, 2, . . . , ) and let 1, + be the proportion of the th rotated cluster with the sensitive characteristic under study on the first occasion ( = 1, 2, . . . , ), respectively. 2, is the proportion of the th retained cluster with the sensitive characteristic under study on the second (current) occasion ( = 1, 2, . . . , ) and 2, + is the proportion of the th fresh cluster with the sensitive characteristic under study on the second (current) occasion ( = 1, 2, . . . , ). Assume that the variance and the correlation coefficient between the first occasion and second occasion are constant s and the overall correction coefficient is ignored.
Define the following: 1 : the population proportion of the sensitive characteristic on the first occasion; The following is according to the formula and results given by Cochran [24].
The estimator of 1 iŝ The estimator of 1 iŝ The estimator of 1 iŝ The estimator of 2 iŝ Computational and Mathematical Methods in Medicine 3 The estimator of 2 iŝ Consider a generalized estimator̂2 of the population proportion of the sensitive characteristic on the second occasion or current occasion aŝ where , , , and are suitable constants. We have Because the estimator̂2 of 2 is an unbiased estimator of 2 , we have + = 0 and + = 0. Hence, the estimator (6) takes the form The variance of estimator̂2 is Other covariance terms are zero. Minimizing the variance of estimator̂2 with respect to and when is sufficiently large, Then we get ( 1 2 ) = ( We derive One has for + = .

Theorem 1. Under the Simmons model in partial clusters rotation, one haŝ
and the variance of estimator̂2 is 4

Computational and Mathematical Methods in Medicine
Remark 2. In practice, the and 2 ℎ are unknown. The estimator of iŝ And the estimator of 2 ℎ is Theorem 3. Under the Simmons model in partial clusters rotation, one has the optimum rotation rate as And the optimum variance of estimator̂2 is Practically, the costs of sample survey usually represent the following simple function, according to Cochran [24]: where is the total cost of sampling, 0 is the fundamental cost of the survey, 1 is the average fundamental cost of investigating one retained cluster on the second occasion, and 2 is the average fundamental cost of investigating one fresh cluster on the second occasion.

Theorem 4. Under the given cost of sample survey , one has
And the estimation of sample size in partial clusters rotation is 3.2. The Estimator of ℎ, . Let ℎ, be the proportion of the selected th cluster (including units) with the nonsensitive characteristic under study on the ℎth occasion; ℎ, and ℎ, denote the number and the proportion of "yes" answers in the th cluster, respectively, wherêℎ , = ℎ, / , ℎ = 1, 2, ( = 1, 2, . . . , ).

Survey Design.
The survey is about premarital sexual behavior among students in Dushu Lake Campus of Soochow University. We regard every class as a cluster of 45 persons per class on average. In the first occasion (2011), 12 classes were drawn from all the classes randomly. All the persons in the In our design, each person was asked to draw a ball at random with replacement from a bag containing 6 red balls and 4 white balls with known probability (the proportion of red balls was 0.6). If a red ball was selected by the respondent, then he or she would be asked the sensitive question A, where A is "are you a member of the group having premarital sexual behavior. " If a white ball was selected, he or she would answer the nonsensitive question B, where B is "is your student number odd or not. " The respondent reports "yes" if his/her actual status matches with the selected question and "no" otherwise.
All the questionnaires of two occasions had been checked to ensure that they are completed independently and no questions were omitted. The recovery rate of the survey was 100% with no failure questionnaire. All data was processed and analyzed by Excel 2003 and SAS 9.13.

Result of the Survey.
In our design, each person was asked to draw a ball at random with replacement from a bag containing 6 red balls and 4 white balls with known probability (the proportion of red balls was 0.6). If a red ball was selected by the respondent, then he or she would be asked the sensitive question A, where A is "are you a member of the group having premarital sexual behavior. " If a white ball was selected, he or she would answer the nonsensitive question B, where B is "is your student number odd or not. " The respondent reports "yes" if his/her actual status matches with the selected question and "no" otherwise. According to (31), we get the sample proportion of the undergraduate students who have premarital sexual behavior ℎ, , ℎ = 1, 2, as is shown in Table 1.

The Estimator of the Population Proportion on the Second Occasion and Its
Variance. By (1), the estimator of the population proportion with premarital sexual behavior on the first occasion is as follows:̂1 = 0.142933.
Computational and Mathematical Methods in Medicine 5 According to the results of investigation premarital sexual behavior among students in Dushu Lake Campus of Soochow University on the second occasion, from formulae (4) and (5), By (23) and (24), we obtain 2 2 = 0.004262 and̂= 0.936, respectively.
So, 95% confidence interval of the population proportion with the premarital sexual is [0.1353, 0.1961].

Discussion and Conclusion
To sum up, in this study, we proposed a new sampling method to solve the question of sensitive questions surveys repeated over time, which is the first attempt made by the authors in this direction. Then the corresponding formulas for the estimator of the population proportion with sensitive characteristic and its variance for the proposed sampling method are provided. In addition, formulas for the optimal rotation rate and sample size under the given cost of sample survey are given.
The aforementioned method and corresponding formulas were successfully designed and applied in the premarital sex survey in Dushu Lake Campus of Soochow University. In a word, the designed sampling method and corresponding formulas have important theory and application value to achieve the sensitive questions continuous survey.

Proofs of Theorems
Proof of Theorem 1. Using the optimum values of and given by (16) and (19), estimator̂2 reduces to (21).