Computing Simplicial Depth by Using Importance Sampling Algorithm and Its Application

Simplicial depth (SD) plays an important role in discriminant analysis, hypothesis testing, machine learning, and engineering computations. However, the computation of simplicial depth is hugely challenging because the exact algorithm is an NP problem with dimension d and sample size n as input arguments. ,e approximate algorithm for simplicial depth computation has extremely low efficiency, especially in high-dimensional cases. In this study, we design an importance sampling algorithm for the computation of simplicial depth. As an advanced Monte Carlo method, the proposed algorithm outperforms other approximate and exact algorithms in accuracy and efficiency, as shown by simulated and real data experiments. Furthermore, we illustrate the robustness of simplicial depth in regression analysis through a concrete physical data experiment.


Introduction
With the development of computer technology and multivariate statistical analysis, scientists deal with a large amount of multidimensional data in many fields, such as biogenetics and industrial engineering. e demand for multivariate data analysis tools has become increasingly urgent. As a powerful multivariate nonparametric and robust statistical tool, the statistical depth function extends the concept of one-dimensional data order statistics and provides the central-outward sorting of multivariate data [1][2][3][4]. In recent years, the interest of researchers in statistical depth has increased due to the extensive application of the statistical depth function in multivariate statistical analysis, robust estimation, discriminant analysis, hypothesis testing, machine learning, economics, and hydrological data analysis [5,6]. e first statistical depth function concept, which was proposed by Tukey in 1975, is known as the halfspace depth (also known as the Tukey depth) [7][8][9]. e other concepts of the statistical depth function include projection depth [3,10], simplicial depth (SD) [11,12], and regression depth [13,14]. Zuo and Serfling defined a general structural property of the statistical depth function [1]. Among the many concepts of this statistical depth function, SD is a relatively attractive one not only because of its simple form and ability to achieve the maximum depth value in the center and satisfy monotonicity but also because of its important applications in sign test and centralization test [1,12].
However, the computation of SD is complicated. e exact calculation of SD is an NP problem, which is only feasible when the dimension is no higher than three. Serfling and Wang emphasized that the computation of SD for higher-dimensional data still requires further study [12]. e computation and application of the statistical depth function are active research topics.
Similarly, Monte Carlo (MC) methods have become important statistical, computational tools that are widely used in finance, engineering computation, genetic biology, computational chemistry, and other related fields [15][16][17][18]. As a critical MC strategy, the importance sampling (IS) method concentrates most of the test samples in the important area of the objective function by introducing the transfer probability density function [15,19]. is method dramatically improves the computational efficiency and is an important MC acceleration algorithm. In this study, we apply an efficient IS algorithm to the approximate computation of SD and demonstrate the advantages of such an algorithm over other MC methods and exact algorithms through simulated and real data examples. Furthermore, we extend the SD to regression analysis and obtain a robust estimation of regression analysis. e results of a real physical data experiment show that the estimation based on the SD method is more robust than that based on the traditional least squares (LS) method. e remainder of this paper is organized as follows. In Section 2, we review the preliminary concept and existing algorithms for SD. Section 3 describes the IS algorithm used for the computation of SD. e advantages of the IS algorithm are illustrated through simulated data examples in Section 4. e extension of SD to regression analysis and a real data experiment are presented in Section 5. Lastly, the conclusions are provided in Section 6.

Preliminary of SD and the State of the Art
In this section, we present the preliminary of SD and the existing algorithms for its computation.
Consider a sample set X n � X 1 , , where X n is one sample of size n in R d and x is a given point in R d . e sample version [11] of SD of x with respect to the sample set X n is expressed as where 1 A { } denotes the indicator function of event A, and S[X i 1 , . . . , X i d+1 ] denotes the simplex determined by the d + 1 sample points X i 1 , . . . , X i d+1 .
Serfling and Wang stated that no algorithms are faster than simply generating all simplices and counting the ones enclosing the given point (using O(n d+1 ) complex time) when dimension d ≥ 5 [12]. erefore, designing an efficient approximate algorithm for the computation of SD is necessary.
A direct MC method for the computation of SD contains two steps: (1) randomly selecting d + 1 points from X n and then (2) taking the average of the points that enclose the given point x (i.e., using SD(x, X n ) to estimate the true SD value SD(x, X n )).
where X i 1 , . . . , X i d+1 is randomly chosen from X n and M is the trying number for the estimation. Another approach for the computation of SD is the use of the IS algorithm, which is the proposed method in this study. e computation of SD is an expectation computation. erefore, SD can be estimated by the IS algorithm. e simple MC method uses the randomly selected d + 1 points to estimate the SD, whereas the IS approach selects d + 1 points with a high probability that they contain the given point x. eoretically, the results of the latter will have a smaller variance than those of the former. e simulated data examples in Section 4 illustrate the advantage of the IS algorithm over the MC method.

New Algorithm for SD in R d
3.1. Overview of the IS Algorithm. Many engineering problems can be expressed as computations of a multidimensional integral. Using the MC method to compute the integral involves drawing samples from a uniform distribution on a regular area and using the sample mean to approximate the true integral. In higher-dimensional cases, the efficiency of the MC method is extremely low if the region where the target function is not equal to zero is extraordinarily sparse. On the contrary, the IS algorithm draws most samples in the important area.
is strategy improves the efficiency of the integral computation. e IS algorithm plays an important role in the field of statistical physics, molecular simulation, and Bayesian statistics.
For example, we want to compute the integral of h(x) on region A; that is, and the integral computation (3) can be treated as an expectation calculation: where X is a random variable (r.v.) with its own probability density function (p.d.f.) π(x); that is, X ∼ π(x). If X 1 , . . . , X n denote samples with size n from X, the MC method draws X from a uniform distribution on region A. From the Law of Large Numbers [20], the sample mean can be used to estimate the expectation in (4) as where S(A) is the area of A and X is the r.v. from the uniform distribution on A (X ∼ U(x)). However, the efficiency of the MC method (5) will be extremely low if region A is extremely wide or sparse (especially in high-dimensional cases). By contrast, the IS method uses a special p.d.f. g(x) instead of π(x) in (4) to compute mean μ and utilizes the corresponding sample mean to estimate the expectation in (4): and the variance Var(μ) � 1/nVar(h(X i )/g(X i )), which means that we can choose appropriate g(x) close to h(x) to reduce the variance of μ. In extreme situations, if we select , the variance of μ will drop to zero, and μ is equal to the exact value A h(x)dx. However, we cannot directly use the IS method defined in (6) during such an extreme situation because we do not know the exact value of A h(x)dx in advance.

Mathematical Problems in Engineering
Nevertheless, it gives us a significant hint that the closer g(x) and h(x) are, the more accurate the IS method's result is. e steps of the IS method for the computation of integral (3) are listed as follows: (1) Draw the samples ∞ from g(x).
(3) Use the mean of the computed weights to estimate the integral in (3): e following theorem shows that the IS estimator in (7) is unbiased.
e IS estimator μ in (7) is an unbiased estimator of μ.
Proof. To prove that the IS estimator is unbiased, we only need to show that the expectation of μ is equal to μ: Because ω i is a r.v. and We obtain the expression E(μ) � μ, which verifies that the IS estimator μ in (7) is unbiased. So we complete the proof of this theorem.
Aside from being an unbiased estimator of the integral presented in (3), the IS estimator exhibits a more efficient and powerful integral computation than the MC estimator defined in (5), especially in higher-dimensional cases. □ 3.2. IS Algorithm for SD Computation. We use the previously described IS method to compute the SD. Using the definition of SD in (1) is not appropriate in computing the SD value of a data point with respect to a dataset. e MC method in (6) becomes extremely inefficient when dimension p or sample size n is excessively large because the number of simplices containing the original data point decreases with the increase in p or n.
e IS algorithm can transform the original p.d.f. into a highly efficient one that can construct the simplex containing the original data point. In the computation of SD, the MC method randomly selects p data points to construct the simplex, whereas the IS method chooses the data points that are likely to let the original data point inside the simplex. Figure 1 is a 2D example that is composed of 20 sample data points. e data point x 0 is used to compute the SD value. After sampling the two data points (x 1 and x 2 ), only two more (x 3 or x 4 ) are needed to construct the simplex that contains the original data point x 0 . In this illustrated example, we do not need to count all the simplices after getting x 1 and x 2 ; only x 3 or x 4 is considered as the final vertex of the simplices containing x 0 .
We list the details of the IS algorithm for the computation of SD in high-dimensional cases. Suppose that X n is a sample with size n in R d (i.e., X n � X 1 , X 2 , . . . , X n ) and x is a given point in R d , (d ≥ 2). e data points are in general position (i.e., any d data points can define a unique d − 1-dimensional hyperplane in R d ). e procedure of using the IS algorithm to compute SD (i.e., the computation of SD(x, X n ) ) is summarized as follows: (1) Set the IS parameters, including the number of samples tries N.
(i) Randomly choose d sample points from X 1 , X 2 , . . . , X n , and denote them as . . , d, and compute the simplex data point set U t k (i.e., the datasets consist of the possible data points that can construct a simplex containing the original data point x).
Replace the k-th data point X t k with the original data point x to obtain a dataset P t k with size d . Compute the unique director d t k which is perpendicular to the hyperplane determined by P t k . Project all data points X 1 , X 2 , . . . , X n and x along d t k , and compute the projected value (3) e sample mean of ω t (t � 1, . . . , N) can be treated as the IS estimator of SD(x, X n ); that is, Theorem 2. e computational complexity of using the IS algorithm to calculate SD is O Nd 5 n , (11) where N is the number of samples tries of the IS algorithm, d is the dimension of the sample data, and n is the sample size.
Proof. According to the steps for computing SD using the IS algorithm, we need to compute every ω i for i � 1, . . . , N. For every ω i , every selected sample data point X t k for k � 1, . . . , d must be replaced. e computational complexity of finding the unique director perpendicular to the hyperplane is O(d 3 ), whereas that of projecting all data points to the unique director is O(dn). e total computational complexity is O(Nd 5 n).
en we complete the proof of this theorem. eorem 2 shows that the computational complexity of the IS algorithm for the computation of SD is a polynomial with dimension d and sample size n as its input arguments. While all other exact algorithms for the computation of SD are NP problems, especially when the dimension d ≥ 5, there is no algorithm that can run faster than simply generating all simplices and computing the exact SD value (i.e., using O(n d+1 ) time) [12]. According to the definition of the IS algorithm in (7) and eorem 1, the IS estimator defined in (10) is an unbiased estimator of SD(x, X n ).

2D Simulated Data Example.
In the simulated data experiment, we compare the computed SD results of the IS, exact, and approximate algorithms, including the MC method. e simulated dataset is sampled from a 2D multivariate normal distribution (i.e., N( 0 → 2 , E 2 ), where 0 → 2 is 2D zeros vector and E 2 is a 2D unit matrix), and the sample size is 100.
We used the exact algorithm [21], the MC method, and the IS algorithm to compute the SD. e selected points x are (0, 0), (0.5, 0.5), (1, 1), and (2, 2). We used the exact and approximate algorithms to compute the SD of x with respect to the dataset. e number of random simplices was set to 100 for the MC and IS algorithms. All computations were repeated 50 times. e computed results (mean, standard deviation (sd), and total CPU time (s)) are summarized in Table 1 and Figure 2.
Since there is an exact algorithm for the SD computation in the 2D case, we can evaluate the accuracy of the IS and MC methods through their mean values and sd values. Moreover, the total CPU time consumed by every algorithm can reflect its efficiency. So, in this experiment, we use these three indicators (mean, sd, and total CPU time) to compare the performances of these algorithms (exact, MC, and IS methods) for the computation of SD. e results reveal that (1) the exact algorithm consumes less CPU time (approximately 0.1 s), (2) the approximate algorithms (MC and IS) can achieve accurate results because their means are extremely close to the exact value, (3) IS performs better than MC as indicated by the smaller sd of the results of the former compared with those of the latter under the same CPU time, (4) all computed SD results from exact and approximate algorithms are zeros at point (2, 2), which means that (2, 2) is outside the data cloud, and (5), with the exact algorithm, the simulated example also indicates that the IS algorithm can obtain highly accurate results.

Higher-Dimensional Simulated Data Example.
In this subsection, we compute the SD of different data points by using the MC and IS algorithms in 3D and five-dimensional simulated dataset. We did not use the exact algorithm [21] because it cannot obtain any result within three hours.
In the 3D case, the dataset was sampled from N( 0 → 3 , E 3 ), and the sample size was 1000. We used MC and IS methods to compute the SD of points (0, 0, 0), (0.5, 0.5, 0.5), and

Mathematical Problems in Engineering
(1, 1, 1). We set the number of random simplices to 100 and repeated the computation 50 times. e computed results are summarized in Table 2 and Figure 3. Because the exact algorithm cannot get any computed SD results within three hours when dimension d ≥ 3, we can only use MC and IS methods for the computation of SD in this subsection. ree indicators (mean, sd, and total CPU time) are summarized for the evaluation of the approximate methods. e mean values can be seen as the final computed SD results and the sd reflects the accuracy of the method (the smaller, the more accurate). e total CPU time reflects the efficiency of the method because it is more efficient if the method consumes less CPU time in the same computation of SD. Table 2 and Figure 3 indicate that (1) the computed SD results decrease when the data points are changed from (0, 0, 0) to (1, 1, 1); the data point (0, 0, 0) is deeper than the data point (1, 1, 1) with respect to the dataset; (2) the two methods have similar computational efficiencies because they consume almost the same total CPU time; (3) the sd obtained by the IS method is smaller than that calculated by the MC method, which means that the former is more accurate than the latter in this case.
In the five-dimensional case, the dataset was sampled from N( 0 → 5 , E 5 ), and the sample size was 1000. We used MC and IS methods to compute the SD of points (0, 0, 0, 0, 0), (0.5, 0.5, 0.5, 0.5, 0.5), and (1, 1, 1, 1, 1). e number of random simplices was 100, and the computations were repeated 50 times. e computed results (mean, sd, and total CPU time in s) are presented in Table 3 and Figure 4. Table 3 and Figure 4 show that (1) the computed SD values decrease when the data points are changed from (0, 0, 0, 0, 0) to (1, 1, 1, 1, 1), thereby suggesting that the former is deeper than the latter; (2) the SD values in the fivedimensional examples are slightly smaller than those in 3D examples because the sparsity of the data points increases when the dimension is increased from three to five; (3) the IS algorithm performs better than the MC approach as indicated by the smaller sd of the results of the former compared with those of the latter; (4) the two approximate algorithms consume almost the same CPU time; (5) even after using 100 random simplices, the MC algorithm cannot find any simplex containing point, whereas the IS algorithm can identify many simplices. In conclusion, the IS method outperforms the MC method in terms of accuracy in these simulated examples.
We also evaluated the MC and IS methods with other numbers of random samples tries in different datasets. e findings show that the result's accuracy increases with the

Application to Regression and Real Data Example
One of the most important extensions of SD is the robust estimation of regression based on SD. To demonstrate the relevant concept, we consider the linear regression model as follows: where random variables X and Y are in R 1 , ε ∼ N(0, σ 2 ), and α, β, and σ 2 are unknown parameters. Considering that SD(x, X n ) can measure the depth of x with respect to X n , we extend the definition of SD to regression (12) and determine the simplicial regression depth: where θ � (α, β) are the parameters, W n � (Y n , X n ) are the samples of the model defined in (12), and r i (θ) � Y i − α − βX i is the residual based on the i-th sample and A r i (θ), r j (θ), r k (θ) � 1, r i (θ), r j (θ), r k (θ)have alternating signs, 0, otherwise.
e SD based estimator of (12) can be defined as the maximum of SD(θ, W n ); that is, We consider the physical experiment data concerning the relationship between the atmospheric pressure and boiling point of water, which was discussed by a Scottish physicist named James D. Forbes [22]. In the mid-nineteenth century, this experiment can illustrate whether the simple measurement of the boiling point of water can substitute for the direct reading of the barometric pressure.
e dataset was collected in the Alps in Scotland (Table 4 and Figure 5). e linear regression model in (12) was used to fit the Forbes dataset. We used LS and SD methods to estimate the parameters of the model in (12). e function "lm" in R Stats package ("stat") can be used to determine the LS estimator of Table 3: Results (mean, sd, and total CPU time in s) were obtained by MC and IS methods in five-dimensional experiments. the model in (12). For the SD based method, we combined quasi-Newton [23] and IS methods to find the maximum point of (15). Moreover, we performed three statistical tests (i.e., the R square value, normality test, and the test of goodness of fit [24]) for every fitted regression model to get a more insightful analysis. e R square (or adjusted R square) value from the significance test gives the percentage that the dependent variable (Y) can be explained by the fitted model (α + βX) (see (12)). e normality test is used to test whether the residuals of the fitted model obey normal distribution which is the basis of other statistical tests. For example, under the assumption of normality, the F statistic value in the test of goodness of fit can be used to determine whether the fitted regression model makes sense. We first used the LS method and SD approaches to compute the linear regression model with the original Forbes dataset (Table 4, denoted as original data in this section). e computed regression results are summarized in Table 5 and Figure 5(a); their corresponding statistical tests are summarized in Table 6 and Figure 6. Table 5 and Figure 5(a) show that the LS and SD estimators obtained the very similar intercept parameter and slope parameter. is finding suggests that the SD method can capture the same accurate regression results compared with LS method. e statistical test results have also confirmed the finding since the results from LS and SD methods were also very similar. ey have very high R square values which indicate more than variance of the dependent variable that can be explained by the fitted model. Under significance level 0.01, we accept the assumption of normality and they pass the goodness of fit test (i.e., the p value of F statistic is almost zero). In addition, if one needs a higher level of significance (such as 0.05) in this example, then some statistical techniques (e.g., Box-Cox transformation or strong influence points detection) can be used to improve the regression model (see more details in [22]). However, this is another research topic and there is a lack of sample points in this example; we only focus on the robustness of the regression model computed from different methods, especially when the dataset is contaminated, and that is what we do in the next experiment.
In the following experiment, we worked with a contaminated dataset from Forbes data. We intentionally changed the pressure of the 16th data point from 29.88 to 59.76. e new dataset was denoted as the contaminated data ( Figure 5(b)). We compared the SD and LS methods' performances in the linear regression model with the contaminated dataset. e regression results are presented in Table 5 and Figure 5(b). eir corresponding statistical tests are summarized in Table 6 and Figure 7.    Table 6: e statistical tests for regression analysis with original data and contaminated data using LS and SD methods.
Original data Contaminated data   e results show that the LS estimator is greatly influenced by the contaminated data point, whereas the SD estimator can maintain satisfactory performance. e slope parameter estimated by the LS estimator changes from 0.522 9 to 1.026 6, which cannot reflect the actual variation trend of the pressure-temperature curve. By contrast, the SD estimator is not affected by the contaminated data point and can still provide the actual variation trend. e estimated slope parameters obtained using SD method for two different datasets are 0.508 6 and 0.508 5, respectively. e statistical test results show that, under the influence of the contaminated data point, the residuals of the fitted models from the two methods do not pass the normality test. However, the R square (or Adjusted R square) value from the SD method (0.991 7) is much large than that of the LS method (0.765 0) which means that the regression line from the SD method can explain more percentage of the variance of dependence variable compared with that of the LS method.
ese results imply that the SD estimator outperforms the LS estimator in the contaminated dataset experiment in terms of robustness.

Conclusions
e concept of statistical depth plays an important role in mathematical sciences, engineering, regression analysis, and life sciences. In this study, we computed the SD using the IS method and found that this new approach performs better than other exact and MC methods in terms of accuracy and efficiency. e simulated and real data examples illustrated the advantage of this new method. Finally, we tested the SD method based regression analysis through a concrete physical data example. e result indicated the excellent robustness of the proposed method compared with the LS estimation.
Given the many favorable properties of the proposed method, further research can be conducted on different angles. First, the IS parameter (i.e., number of sample tries N) plays an important role in the computation of SD, so the determination of N before the performance of IS algorithm is yet to be thoroughly investigated. Second, the IS method for the SD computation can be improved by sampling the data points via other more important simplices (not the last data point in the possible simplices). ird, with the development of modern computer science, multicore high-performance computer is gaining popularity. erefore, the IS method can be extended to a parallel computation based version. Lastly, the approximate algorithms (advanced MC methods) for other statistical depths (e.g., halfspace depth, projection depth, and regression depth) can be further explored.

Data Availability
e experimental data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.