An Efficient Approximation Method for Calculating Confidence Level of Negative Survey

. The confidence level of negative survey is one of the key scientific problems. The present work uses generation function to analyse the confidence level and uses a greedy algorithm to calculate that, which is used to evaluate the dependable level of negative survey. However, the present method is of low efficiency and complex. This study focuses on an efficient approximation method for calculating the confidence level of negative survey. This approximation method based on central limit theorem and Bayesian method can get the results efficiently


Introduction
Artificial immune system simulates the mechanism of biology immune system to model and design effective algorithm for solving some complex issues.Negative selection principle [1] is one of the unique mechanisms of biology immune system, and the implication of negative selection principle is that the immaturity T cell dies if it matches with itself as it grows, and it survives if it mismatches with itself.Inspired by negative selection principle, the negative selection algorithm [2] is proposed and can be used for network security, virus detection [3,4], and anomaly detection [5].
Similarly, the negative survey [6], which is inspired by negative selection principle, is a novel and promising indirect question method for information security and enhancing privacy in collecting sensitive data and individual privacy [7].Negative surveys consist of a question and  ( ≥ 3) categories for the interviewees to select from.In contrast to traditional surveys, the participants are required to select a category that does not agree with the fact [6,8]; that is, randomly select a category from the other  − 1 unreal categories.For convenience, it defines positive category as the category that agrees with the fact, while it defines negative category as the other  − 1 categories that do not agree with the fact [6].
The negative survey method can attain privacy protection with lower power and higher degree and boost participants' confidence.The main calculation of collecting sensitive data with negative survey is reconstructing the corresponding positive survey in the central processor.The privacy preserving properties of negative survey do not rely on anonymity, cryptography, or any legal contracts, but rather participants not revealing their own privacy information.And the negative survey method is applicable to collecting data at a high speed in low-powered mobile devices such as smart phones and tablets [9].
The positive survey can be reconstructed from a result of negative survey.For a survey consisting of a question and  ( ≥ 3) categories for  interviewees to select from, a negative survey result is  = ( 1 ,  2 , . . .,   ), where   is the results of category  in negative survey.Meanwhile, the original positive survey is  = ( 1 ,  2 , . . .,   ), where   is the number of interviewees belonging to category .Define V , as the probability that category  is chosen given that a respondent positively belongs to category , where ∑  =1 V , = 1 and V , = 0. Define the probability matrix as  as Formula (1), and  =  and  =  −1 .In consequence, the positive survey  can be reconstructed from a negative survey : Generally, V , |  ̸ = = 1/(−1), which means the probability of selecting negative categories follows uniform distribution [6].Following the work in [6], Xie et al. proposed Gaussian Negative Survey (GNS) [10], where the probabilities of selecting negative categories (i.e., V , ) follow a Gaussian distribution centered at the corresponding positive category.The GNS could attain higher accuracy but lower ability of privacy protection.
The traditional reconstructing method in [6] may lead to the reconstruction of positive survey with negative values.Based on the problem, two methods [11] were proposed for reconstructing positive survey which had no negative values.In [12], Bao et al. proposed a greedy algorithm for calculating the confidence level, which is analysed in generating function.But this method is of low efficiency and complex and could not achieve the high efficiency of negative survey.
In this study, an efficient approximation method is proposed to calculate the confidence level of negative survey.This work reinforces the efficiency of negative survey.
In the remainder of this study, Section 2 introduces the related work of this study.Section 3 describes the problem in this study.Section 4 describes the efficient approximation method.Section 6 discusses some existing problems of this approximation method and Section 7 concludes the whole study.

Related Work
In this study, the probability of selecting negative categories follows uniform distribution (i.e., V , |  ̸ = = 1/( − 1)) as general negative survey in [6,8,11,12].So, in this section, the related work of negative survey [6,8,11,12] is introduced.For convenience, some definitions are given in the followoing list: : the number of interviewees for surveys.
: the number of categories in surveys.
: the number of interviewees selecting category  in negative survey.
Define  as the number of interviewees participating in the negative survey and  as the number of categories.The results of the negative survey are  = ( 1 ,  2 , . . .,   ), where   (1 ≤  ≤ ,  ≥ 3) represents the total number of participants who select the th category in the negative survey.Similarly, the real positive survey is  = ( 1 ,  2 , . . .,   ), and  = ∑  =1   = ∑  =1   .In [6,8], the reconstructed positive survey can be calculated by Formula (2).In this study, a positive category , which has  interviewees,  category, and the proportion of category  which is   , is written as PS(, ,   ) for simplicity.And the corresponding negative category is written as NS(, ,   ): Although p = (  ), it can be observed that p < 0 when   > 1/( − 1).Therefore, this traditional method is not practical sometimes.Following the traditional method in [6,8], two methods were proposed for reconstructing positive survey in [11].Method I [11] uses an iteration method to reconstruct the positive survey.The advantage of Method I is that no negative values are in the reconstructed positive survey; that is, p > 0 (1 ≤  ≤ ).But this method only uses an implicit function to reconstruct the positive survey approximately.And the accuracy of this method lacks theoretical basis.
Method II [11] eliminates the negative values through adjusting the results of reconstructed positive survey.This method sets the negative value of the category in the reconstructed positive survey to 0 and then keeps the sum of the reconstructed positive survey unchanged by the proportion of the values in the other categories.This method is more efficient than Method I, but there is no theoretical analysis of this method.In [12], the confidence level of negative survey is analysed in generation functions and calculated in a greedy algorithm.

Problem Formulation
Efficiency is one of the greatest advantages in collecting data by the negative survey method, because each participant only needs to send one of her or his negative categories (i.e., unreal information).The reconstructed positive survey from negative survey has nonexact values, so there are two important issues, which are the confidence level and the efficient, respectively.It is not necessary and inefficient to use a generation function method to exactly calculate the confidence level [12] with the nonexact values reconstructed from negative survey.More importantly, it is so complicated to exactly calculate the confidence level that a greedy algorithm uses [12].
This study proposes an efficient method, which is analysed by central limit theorem and Bayes method, to calculate the confidence level approximately, and this approximation method can reinforce the efficiency of negative survey.The core concept of this approximation method is using Normal Distribution to approximate the original distribution for fast calculation (more details in Section 4).The Bayes method is then used to calculate the confidence level of each category in negative survey, which is studied based on the analysis of the distribution of possible positive survey results.

The Efficient Method of Approximation
This section gives the proposed efficient approximation method for calculating the confidence level.In Section 4.1, central limit theorem is used to calculate the approximated distribution of   .In Section 4.2, the Bayes method is used to estimate the probability density function of   .In Section 4.3, the confidence level is calculated based on Bayes method.

The Distribution of Negative Survey
. Theorem 1 gives the distribution of category  in negative survey when that of positive survey is known.
.So   = , and Owing to the De Moivre-Laplace central limit theorem,   follows Normal Distribution as  goes to infinity; that is, So In consequence,   follows the Normal Distribution when  goes to infinity and Theorem 1 and Formula (3) are both valid.
Define (  |   ) to be the conditional probability density function for   with given   , so Figure 1 illustrates the function cure of Formula ( 7) varying with   , , or .

The Distribution of Reconstructed Positive Survey.
There are some differences between reconstructing positive survey from a given negative survey and traditional method for parameter estimating.The reason is that the given result of negative survey is only one sample for its original positive survey.In consequence, we use Bayes method to reconstruct the positive survey.The distribution of the reconstructed   is given in Theorem 2.
Theorem 2. If a negative category is (, ,   ), the probability density function of corresponding (, ,   ) is Proof.Define (  ) to be the prior probability density function of   , and (  |   ) is the conditional probability density function for   .According to Bayes function form of probability density function, the probability density function of   with given   is the following formula: Suppose that we have no knowledge of   .Based on Bayesian assumption, the prior probability density function (  ) can be considered as uniform distribution (0, 1).On this occasion, the density function (  ) can be calculated in the following Formula (10).In addition, (  |   ) can be calculated in Formula (7).Consider the following: So the conditional probability density function of   with given   is Combing Formula (7) and Formula (11), Formula (8) can be gotten and Theorem 2 is valid.
Figure 3 illustrates the function curve of (  |   ) for different values of   , , or .Figures 3(a) and 3(b) show that less   makes   centred around 1 − ( − 1)  more closely, Figure 3(c) shows that greater  makes that, and Figure 3(d) shows that less  makes that, too.In addition, Figures 3(a) and 3(b) also show that greater   may lead to 1 − ( − 1)  < 0, and the corresponding   is 0 with a great probability.

The Confidence Level.
In this subsection, an approximation method is used for calculating confidence level of reconstructed positive survey.
Proof.According to Theorem 2, Theorem 3 is valid obviously.(2) when   ≥ (1 − /2)/( − 1), the confidence level increases with   firstly (Figure 4(d)).Because in this case the   is 0 with a high probability, the confidence level decreases severely (Figure 4(d)).These values of   are nearly impossible because the prior probability to attain such a large value of   is very low, and   may be the survey error (if   >  + 3 as described in Section 4.1).

Simulation Experiments
In this section, some examples of negative survey (similar with that in [12]) are specially designed to verify this approximation method.In Tables 1 and 2, the confidence level is calculated independently by category when the confidence interval (abbreviated as CI) length is 0.1.As is indicated in Table 1, the confidence interval is (p  − /2, p + /2) as   < (1−/2)/(−1) = 0.475.In this case, the confidence level increases with  and decreases with   .As shown in Table 2, the confidence level is diverse and complicated.If p < 0, then the confidence level is very small as  is large.That means an excessive rise of   may even be a survey error because the The probability density function q i = 0.133 q i = 0.233 q i = 0.333 q i = 0.433 The probability density function q i = 0.133 q i = 0.233 q i = 0.333 q i = 0.433 prior probability to attain such a greater value of   is very low.In addition, when p is a negative value, the second method in [11] is needed to correct the reconstructed positive survey.Table 3 shows the confidence levels of seven groups of negative survey.The confidence level includes three values, which is the confidence level of each category, respectively.It is worth reminding that the confidence levels in the last three groups of negative survey are less when  = 1000.The reason that the probability to get such a large value of   is rather low if  = 1000.When  = 1000, the confidence levels of the last three groups of negative survey are low, and the survey results may be faulty.

Discussion
In this study, we propose an efficient approximation method for calculating the confidence level of negative survey, but there are some works for future study.
Firstly, this approximation method is based on central limit theorem, which is valid when  is sufficiently large.However, the degree of "sufficiently large" (of ) is diverse when   has various values.So the "sufficiently large" cannot only be measured in  and should be measured in both   and .If   or (1 −   ) is smaller in amount, the Poisson Distribution is the better approximation distribution rather

1 Figure 1 :
Figure 1: The function curve of (  |   ) with different values of   , , or .

2 Figure 3 :
Figure 3: The function curve of (  |   ) with different values of   , , or .

Figure 4 :
Figure 4: The confidence level of estimated   varying with   , , or .