Partial Randomized Response Model for Simultaneous Estimation of Means of Two Sensitive Variables

In this study, a new partial randomized response model (RRM) has been proposed for estimating the population mean of two quantitative sensitive variables simultaneously. e utility of proposed model under stratication is also explored. e eciency comparisons of the proposed model under simple and stratied random sampling are carried out numerically. A real data set was collected through direct questioning, proposed partial RRM and competitor randomized device from the students of statistics and animal sciences departments of Quaid-I-Azam University Islamabad, Pakistan. e performance of the proposed partial RRM is better than competitor RRM under simple and stratied random sampling.


Introduction
e social survey is one of leading mechanisms to obtain reliable data on attitudes, behaviors, and opinions of the human population. Sometimes, the facts about the individuals are inaccessible to the investigators due to social stigma, such facts are considered as sensitive information. When asked directly, respondents may consciously or unconsciously provide incorrect information on stigmatizing characteristics. To reduce the bias and to procure reliable data, Warner [1] developed a randomized response model (RRM) to estimate the population proportion of a sensitive attribute. In Warner [1]'s model, a randomly selected proportion P of respondents are asked the sensitive question, and the remaining proportion (1 − P) of respondents are asked complement of the sensitive question. e researcher does not know whether the respondents answered the sensitive or insensitive question. Greenberg et al. [2] extended the Warner's idea for mean estimation of quantitative sensitive variables. Some other developments for the estimation of mean are due to Eichhorn and Hayre [3]; Bar-Lev et al. [4]; Gupta et al. [5]; Gupta et al. [6] Hussain et al. [7]; Singh et al. [8]; Singh and Suman [9]; Lee and Hong [10]; Narjis and Shabbir [11,12]; and Muneer et al. [13].
Scrambled randomized response models are built on the idea of obtaining masked rather than actual responses. Masking can be achieved by adding to, subtracting from, or multiplying a random component to actual responses. Scrambled response models may be categorized as full, partial, and optional models. In full RRM, all respondents are requested to provide the scrambled response, whereas, in partial RRM, a randomly selected group of respondents are requested to provide the truthful response and remaining are requested to provide the scrambled response. In optional RRM, respondents are requested to provide the scrambled response if he/she considers the question sensitive, and truthful response if he/she considers the question to be nonsensitive. Mangat and Singh [14] and Gupta et al. [15] introduced the partial RRM and the optional RRM, respectively. e purpose of all RRMs are to protect privacy and increase cooperation.
Researchers in the eld of social, medical, and environmental sciences have well documented the situations where they may be interested to estimate the two dependent sensitive characters at the same time. For example, a researcher may have interest in estimating the proportion of population having income greater than a specific amount weighted according to whether they are tax evaders or not. Another example may be to estimate the proportion of gamblers who are also involved in robbery. Or one may estimate the proportion of induced abortion among females who have pre-marriage sexual relations. Christofides [16] introduced the RR model to estimate the proportions of two dependent sensitive attributes at the same time. Studies on estimation of two dependent sensitive attributes have been reported by Lee et al. [17]; Batool and Shabbir [18]; Ewemooje and Amahia [19]; Ewemooje [20]; and Ewemooje et al. [21].
Similarly, the surveys related to household spending/ expenditures that present households income and expenditure on different commodities comprises sensitive questions. For example, an economist may have interest in estimating the average difference between amount spent on alcoholic beverages and tobacco items and on food and nonfood items. Or the interest may be in estimation of people's actual income and the income reported in their tax return. However, the choice of RR models are very limited in aforementioned situations where one may need to estimate the mean or average of two quantitative sensitive variables at the same time. Recently, Ahmed et al. [22] introduced full scrambled RR model for simultaneous estimation of means of two quantitative sensitive variables, Hussain and Murtaza [23] had written corrigendum on Ahmed et al. [22] and provides the correct expression of E(S 3 1 ), E(S 3 2 ), σ 2 , respectively. e notions and terminology are given in Sections 1.1 and 1.2.

Notations and Terminology under Simple Random
Sampling. Suppose a sample s of size n is drawn under simple random sampling with replacement (SRSWR) from a finite population Ω � 1, 2, 3, . . . . . . ., N { }. Let Y 1i and Y 2i be two quantitative sensitive variables of interest with unknown mean and variance, which we wish to estimate. Assume S 1 and S 2 are two scrambling variables independent of both quantitative sensitive variables and with each other, the distribution of scrambling variables are known. Let where, a and b are nonnegative integers.

Lemma 1.
e moments of order four or less of scrambling variables S 1 and S 2 , the expression of E(S a 1 S b 2 ), a and b being non-negative integers with (a + b) ≤ 4, are given by Let e interest of researchers is always to investigate the true response from a population. To attain this desire, Mangat and Singh [14] proposed an ingenious partial RRT model by injecting an element of truthful responses into the Warner [1]'s model. Gupta and ornton [24] described the partial RRT model for quantitative variables and many others also give the improvement in this area. It is important to note that almost all such types of RRT models can only estimate one quantitative sensitive variable at a time. So, in this article keeping in mind the desire of researchers, we propose an additive partial RRT model to estimate two quantitative sensitive variables simultaneously.
e proposed model is an extension of Ahmed et al. [22]'s model under simple and stratified random sampling.
e basic purpose of this study is to obtain truthful responses from some proportion of people and increase efficiency.
is paper is organized as follows: In Section 2, we give some existing RRT models. In Section 3, we introduce a partial randomized response model under SRSWR and numerically compare it with Ahmed et al. [22] model. In Section 4, we present a partial randomized response model under stratification and numerically compare it with stratified model of Ahmed et al. [22]. In Section 5, an application of real life data is given and comparison of proposed partial RRM is made with Ahmed et al. [22] model on basis of direct response technique. Finally, Section 6 provides a conclusion.

RRM in Literature
In this section, we consider the following existing RRMs.

Model under Simple Random Sampling.
Ahmed et al. [22] proposed an additive and multiplicative model for estimation of mean of two quantitative sensitive variables simultaneously. Two responses are taken from each respondent, a scrambled response and fake response. e scrambled response from the i th respondent is obtained as For the second response, each respondent is requested to rotate a spinner and respond accordingly as: the respondent is asked to report the value of scrambling variable S 1 when the pointer lands in a shaded area, otherwise report the value of scrambling variable S 2 . Let P be the proportion of shaded area and (1 − P) be the proportion of non-shaded area of the spinner. us, the second response from the i th respondent, is given by From equations (25) and (26), generate the response Z A 2i as follows: e unbiased estimators of the population means μ y 1 and μ y 2 , are given by and e variance of the proposed estimators μ y A 1 and μ y A 2 , are given by and Mathematical Problems in Engineering 3 where and σ Z A 1 Z A 2 � μ 2 y 1 + σ 2 y 1 Pe 30 +(1 − P)e 21 + μ 2 y 2 + σ 2 y 2 Pe 12 +(1 − P)e 03 + 2 μ y 1 μ y 2 + σ y 1 y 2 Pe 21 +(1 − P)e 12 − μ y 1 e 10 + μ y 2 e 01 Pe 20 +(1 − P)e 11 μ y 1 + Pe 11 +(1 − P)e 02 μ y 2 . (34)

Model under Stratified Random
Sampling. From Ahmed et al. [22], in stratified random sampling, the scrambled response from the i th respondent of the h th stratum is obtained as e second response from the i th respondent of the h th stratum is obtained as e unbiased estimators of the population means μ y 1 and μ y 2 are given by and e variance of the proposed estimators μ y A 1 (st) and μ y A 2 (st) are given by and where σ 2 and and

Proposed Partial RRM under Simple Random Sampling
In this section, we propose a partial randomized response model for simultaneous estimation of means of two quantitative sensitive variables. In the proposed partial RRM, each respondent selected in the i th sample is requested for two responses by using two randomized response (RR) devices. e RR Device I provides the scramble and true response of sensitive variables, whereas the RR Device II provides the fake response, that is free from the sensitive variables. e RR Device I, bearing two types of statements: (i) Report the additive true value of both sensitive variables, say (Y 1i + Y 2i ) with probability T and (ii) Report the scrambled response as, with probability (1 − T). Mathematically, each respondent is requested to report the response Z 1i as e partial randomized response Z 1i in the i th sample is given by For the second response, each respondent is requested to use RR Device 2 which is same as equation (26); thus, the fake response Z i from equation (26) in the i th sample is given by where c i and β i are Bernoulli random variables with means P and T, respectively, which are the known parameters. From equations (47) and (48), we generate response Z 2i as follows: e generated response Z 2i in the i th sample is given by Taking expected values on both sides of equations (47) and (50), we have and E Z 2i � PT e 10 μ y 1 + e 10 μ y 2 From equations (51) and (52), by the method of moments, we have and where Solving equations (53) and (54) by using Cramer's rule, we have unbiased estimators of μ y 1 and μ y 2 , respectively, given by and Theorem 1. e variance of proposed estimators μ y 1 and μ y 2 is, respectively, given by and Mathematical Problems in Engineering and Proof. Note that the variance expressions for two estimators μ y 1 and μ y 2 can be obtained through the formula or On substituting the values of E(S 2 1 ), E(S 2 2 ), and E(S 1 S 2 ) in equation (63), we have equation (59). e variance σ 2 Z 2 is given by or On substituting the values of E(S 2 1 ), , and E(S 1 S 3 2 ) in equation (65), we have equation (60). e covariance σ Z 1 Z 2 between Z 1i and Z 2i is given by or On substituting the values of e unbiased estimators for V(μ y 1 ) and V(μ y 2 ) are, respectively, given by and where Remark 1. When T � 0, the proposed partial RRM reduces to Ahmed et al. [22] model.

Percent Relative Efficiency under SRSWR.
In this section, we compute the percent relative efficiency (PRE) of proposed estimators μ y 1 and μ y 2 over the estimators μ y A 1 and μ y A 2 , respectively, as and We performed a simulation study to verify the superiority of the proposed partial RRM through FORTRAN program and showed the situation where the proposed methods might be more efficient than the method of Ahmed et al. [22]. e simulation results give large number of situations where PRE(i), i � 1, 2 values of proposed partial RRM are more than 100. However, we presented only few values of PRE(1) and PRE(2) in Table 1, for different parameter values, P � 0.5, various values of T, μ y 1 � 20, μ y 2 � 30, two values of σ y 1 and σ y 2 , θ 1 � 4, θ 2 � 7.2, δ 20 � 2, δ 02 � 4, δ 30 � −0.17, δ 03 � −1.65, δ 40 � 12.13, and δ 04 � 52.29. e efficiency comparison on the values of scrambling variables that were earlier used in Ahmed et al. [22] model is also carried out, but our proposed model is less efficient on those values. us, we conclude that the efficiency of proposed partial RRM model can be increased or decreased by using different scrambling variables.

Proposed Partial RRM under Stratified Random Sampling
In this section, we present a partial randomized response model under stratification; a subsample in each stratum is drawn using a SRSWR sampling. Each sampled respondent in the h th stratum is requested for two responses by using two randomized response (RR) devices. e RR h Device I provides the scramble and true response of sensitive variables, whereas the RR h Device II provides the fake response, that is free from the sensitive variables. e RR h Device I, bearing two types of statements: (i) Report the additive true value of both sensitive variables, say Mathematical Problems in Engineering (Y 1 hi + Y 2 hi ) with probability T h , and (ii) Report the scrambled response as, Mathematically, each i th respondent of the h th stratum is requested to report the response Z 1 hi as: e partial randomized response Z 1 hi in the i th sample of the h th stratum is given by and for the second response, each respondent is requested to use RR h Device II which is same as equation (36); thus, the fake response Z hi from in the i th sample of the h th stratum is given by where c hi and β hi are Bernoulli random variables with mean P h and T h , respectively, which are the known parameters. From equations (73) and (74), we generate response Z 2 hi as follows: (76) Taking expected values on both sides of equations (73) and (76), we have and From equations (77) and (78), by the method of moments, we have and where Solving equations (79) and (80) by using Cramer's rule, we have unbiased estimators of μ y 1 and μ y 2 , respectively, given by and Theorem 2. e variance of proposed estimators μ y 1 (st) and μ y 2 (st) is, respectively, given by and where and Mathematical Problems in Engineering and Proof is simple, so omitted.

Corollary 2.
e variance of proposed estimators μ y 1 (st) and μ y 2 (st) under different methods of sample allocation are as: and (ii) Proportional allocation: and (iii) Optimum Allocation: and are estimated by using linear cost function such as C � C 0 + L h�1 n h C h (where C 0 is fixed cost, and C h is the variable cost in each stratum) and the compromised variance as:

Percent Relative Efficiency under Stratification.
In this section, we compute the percent relative efficiency (PRE) of proposed partial RRM over Ahmed et al. [22] model under stratification using proportional allocation method. For numerical comparison, we use real data set that is taken from Rosner [25]; the childhood respiratory disease study of Boston. We consider Y 1 � AGE of a child and Y 2 � FEV (forced expiratory volume) both as sensitive variables as earlier used by Ahmed et al. [22]. e population is subdivided into two strata on the basis of gender. e PREs of proposed estimators μ y 1 (st) and μ y 2 (st) with respect to Ahmed et al. [22] estimators μ y S1 (st) and μ y S2 (st) , respectively, are defined as: and e results are presented in Table 2, the scrambling variable are same as in Ahmed et al. [22] for both strata, that is, e PRE(1) and PRE (2) are free from the sample size. e numerical comparison shows that the efficiency of proposed partial RRM may be increased by choosing appropriate values of design parameters (P h , T h ). We also observe that when T h � 0, the proposed model reduces to Ahmed et al. [22] model.
We also compute the PRE of proposed partial RRM under SRSWR over proposed partial RRM under stratification to observe the gain in efficiency due to stratification. For both estimators, the PRE is almost 100 at different values of design parameters. is is because the variation between strata is almost same, and the randomization devices are also identical.
In the next section, we consider an application of a real data set.

Application of Real Data Set
Hussain et al. [26] estimated the average total number of classes that were missed by the students and Gjestvang and Singh [27] considered the problem of estimating the average GPA of students. In this study, we simultaneously estimate the average total number of classes that were missed by the students and average GPA of the students, by using the proposed partial RRM under SRSWR.
We took a sample of 80 students from Stat 317, Stat 629, and Zoo 203 classes to estimate the average GPA and average total number of missed classes by the students, at Quaid-i-Azam University, Islamabad. We generated 20 random numbers of S 1 and S 2 separately from Poisson distribution with means 5 and 2, respectively. To collect the data through proposed RRM, two decks of cards were used: Deck I, a deck of yellow cards and Deck II, a deck of blue cards. A deck of yellow cards bearing two different types of statements by setting T � 0.5 to get the scrambled and true responses, whereas deck of blue cards bearing values of two scramble variables by setting P � 0.7 to get fake responses.
In Deck I or deck of yellow cards, 20 cards, on 10 cards, we wrote the statements:

PleaseReport:
GPA scored in last semester and, on remaining 10 cards, we wrote the statements: PleaseReport : GPA scored in last semester In this process, 20 values of S i , i � 1, 2 were written on each card.
e Deck II or deck of blue cards, consists of 20 cards, out of which 14 cards had the values of S 1 and the remaining six cards had values of S 2 , with statement "kindly report one selected random number." Before starting the data collection process, we highlighted the importance of randomized response methods and explained the proposed partial RRM and Ahmed et al. [22] model to the students. Each student was requested to draw one card from the Deck I and report the response as requested on the card, on yellow color paper. Similarly, repeat the process for Deck II and report the response as requested on the card, on blue color paper. en, each student was provided a pink color card written on Ahmed et al. [22] model which was similar to equation (103) and requested to provide the scramble response on pink color paper. After marking the three responses on three different color papers, the students were advised to staple these papers together and put them into the box lying on the table. Finally, all students wrote the actual GPA and true total number of classes which they had missed during last semester on white color paper without disclosing their identity. Table 3 presents the responses obtained from the students. Table 4 presents results of the survey. e estimates of the means μ y 1 and μ y 2 of proposed partial RRM are close to the estimates based on true responses as compared to the estimates obtained from Ahmed et al. [22] model. We noted that the standard error values are large due to small sample sizes; thus, we suggest that a large scale sample survey should be conducted in future to reach more realistic outcomes. e estimates of randomized response models reflect that the students are more reluctant to admit the total number of missed classes to do something unrelated to university study through direct questioning, whereas GPA is a less sensitive question for students. In conclusion, one can see that the proposed method of collecting scrambled data on sensitive issues can be used safely and securely.  Total classes missed during last semester from direct response 6, 7, 2, 9, 4, 10, 5, 3, 6, 0, 0, 5, 12, 6, 0, 8, 4, 2, 5, 3, 3, 4, 3, 3, 2, 12, 12, 6, 6, 7, 2, 4, 5, 4, 0, 6, 10, 4, 3, 4, 3, 9, 4, 6, 5, 12, 2, 5, 5, 5, 6, 6, 3.3, 8, 5, 8, 4, 0, 6, 0, 0, 7, 3, 5, 10, 4, 7, 9, 5, 9, 6, 3, 10, 0, 5, 2, 6, 5, 6

Conclusion
In the present paper, we have suggested a partial randomized response model for estimating two population means simultaneously. rough simulation study and real life data application, it is observed that the proposed partial RRM performs better than the Ahmed et al. [22] model. e superiority of suggested partial RRM under stratification revealed through numerical comparison and it is observed that the proposed partial RRM under stratified random sampling performs better as compared to stratified model of Ahmed et al. [22]. Moreover, we also observed that design parameters play an important role in increasing or decreasing the efficiency of suggested models. e main advantage of proposed partial RRM is that it enables researchers to collect truthful responses at least from some proportion of people. us, the proposed partial randomized response model are therefore recommended for its use in practice as an alternative to Ahmed et al. [22] randomized response model.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.