Two General Extension Algorithms of Latin Hypercube Sampling

. For reserving original sampling points to reduce the simulation runs, two general extension algorithms of Latin Hypercube Sampling (LHS) are proposed. The extension algorithms start with an original LHS of size m and construct a new LHS of size m + n that contains the original points as many as possible. In order to get a strict LHS of larger size, some original points might be deleted. The relationship of original sampling points in the new LHS structure is shown by a simple undirected acyclic graph. The basic general extension algorithm is proposed to reserve the most original points, but it costs too much time. Therefore, a general extension algorithm based on greedy algorithm is proposed to reduce the extension time, which cannot guarantee to contain the most original points. These algorithms are illustrated by an example and applied to evaluating the sample means to demonstrate the effectiveness.


Introduction
Latin Hypercube Sampling (LHS) is one of the most popular sampling approaches, which is widely used in the fields of simulation experiment design [1], uncertainty analysis [2], adaptive metamodeling [3], reliability analysis [4], and probabilistic load flow analysis [5].Compared with other random or stratified sampling algorithms, LHS has a better space filling effect, better robustness, and better convergence character.The extension of LHS is to obtain a LHS of a larger size that reserves the preexisting LHS (or the original LHS).There are at least two situations that need the extension of sampling, especially for time consuming simulation systems.One is sequential sampling for sequential analysis, adaptive metamodeling, and so on.The other is to consider the extension of LHS when the original LHS was subsequently determined to be too small and a new LHS of a larger size without original sampling points might be time consuming.But the LHS structure makes it difficult to increase the size based on an original LHS while simultaneously keeping the stratification properties of LHS.
A special extension case is the integral-multiple extension where the new LHS is integral times the size of the original sampling.Tong [6] proposed integral-multiple extension algorithms for stratified sampling methods including LHS.Sallaberry et al. [7] gave a two-multiple extension algorithm of LHS with correlated variables.Later, two related techniques appeared in the papers named "nested Latin hypercube design" [8] and "nested orthogonal array-based Latin hypercube design" [9].A nested Latin hypercube design with two layers is defined to be a Latin hypercube design that contains a smaller Latin hypercube design as a subset.A special integralmultiple extension method called -extended LHS method was illustrated, where the new LHS contains  smaller LHSs [10].Some related papers were produced by Vorechovsky [11][12][13], where the new sampling size is multiple times more than the original sampling size.The integral-multiple extension algorithms have a good feature that can obtain a strict LHS of larger size and simultaneously preserve all the original sampling points.
In this study, we consider the general extension algorithm of LHS where the new sample size is more controllable and the algorithm can be applied more widely.Wang [14] and Blatman and Sudret [15] obtained an approximate LHS of a larger size, which might have two or more original points falling into the same variable interval.Wei [16] also proposed a general extension algorithm to get an approximate LHS, which might have no point falling into a variable interval.However, a sample is a LHS if (and only if) there is only one point in each variable interval.The approximate LHS does not satisfy the definition and is harmful to the extension with some criteria, such as correlated variables, maximizing minimum distance, orthogonal array, and so on.
In this paper, we would like to obtain a strict extension of LHS (ELHS) rather than an approximate one and the new LHS contains original sampling points as many as possible.The extension algorithm includes two parts: the reservation of original sampling points and the generation of new ones.As the generation of new sampling points is almost the same as integral-multiple extension, the reservation of original sampling points is the main problem to discuss.The relationship of original sampling points in new LHS is expressed by a graph.Then, the reservation problem can be solved by graph theory.
In Section 2, the procedure of LHS and the mathematical description for extension problem of LHS are given, which is the basis of extension algorithm.In Section 3, a basic general extension algorithm of LHS (BGELHS) is proposed based on the graph theory, which reserves the most original sampling points.A general extension algorithm of LHS based on greedy algorithm (GGELHS) is intended to balance the simulation runs and the generation time of ELHS.In Section 4, the proposed extension algorithms are illustrated by an application, which is performed to evaluate sample means.Finally, the conclusion and some thoughts for the future research are given.

Latin Hypercube Sampling and Extension Problem
where   is a uniform random number ranging from 0 to 1.So all the probability values can be noted as Prob = (Prob  ) × .(c) Transform the probability into the sample value   by the inverse of the cumulative distribution function (⋅): Then, the sample matrix is (d) The  values of each variable are paired randomly or in some prescribed order with the  values of the other variables.Then the sample matrix of LHS can be written as where each row is a sampling point.
Figure 1 shows an example of LHS in size of 5, where each interval of each input variable has one sampling point.

Extension
where card(⋅) denotes the number of elements in set.It means that the objective is to obtain ELHS in size of  +  that contains the original LHS points as many as possible.Figure 2 shows the original LHS points in Figure 1 and the structure of ELHS in size of 7, which can be seen that the original sampling points do not accord with the new structure.So we need to delete original sampling points as little as possible and then create some new sampling points.The reserved points and the new ones are ELHS in size of 7.

General Extension Algorithms of LHS
The general extension problem can be described as (5).
In this section, two general extension algorithms of LHS are demonstrated to get ELHS.The problem of selecting reserved points is transformed into a problem to get the maximum independent set, which is a graph theory problem.The generation procedure is based on a subtraction rule [14].These two stages construct BGELHS in Section 3.1.As BGELHS costs too much time to obtain ELHS, GGELHS is proposed in Section 3.2.

Basic General Extension Algorithm of LHS.
Some original sampling points might be deleted before generating new sample.At the beginning of BGELHS, the original sample is transformed into a graph.Here the definition of simple graph is introduced.
Definition A (see [17]).A simple graph  = (, ) is a graph where there is no loop or multiple edges, where () is a vertex set and () is an edge set.No loop means that there is no edge from a vertex to itself.No multiple edges means that there is 0 or 1 edge between two vertices.Let the original sample matrix be  (old) × and note the sampling points as 1, 2, . . .,  by the row orders.Transform the sampling points into vertices of a graph as follows: Every point is a vertex and links two vertices by a line if the corresponding two sampling points are in the same interval of the ELHS structure.Then we get a simple undirected graph.Order the sampling points in Figure 2 from left to right and the corresponding simple undirected graph is obtained as shown in Figure 3.The original problem in ( 5) is to delete the least vertices and their corresponding lines so that the remaining vertices have no line to link, which can be proved by reduction.Independent set, maximal independent set, and maximum independent set are introduced to solve this problem.
Definition B (see [17]).It is supposed that there is a simple graph  = (, ) where  is the set of vertices and  is the set of edges.If a set of vertices  ⊆ () and there is no edge between any two vertices of , the set  is called an independent set.Definition C (see [17]).It is supposed that there is a simple graph  = (, ) where  is the set of vertices and  is the set of edges. and   are independent sets.If ∀V  ∈  and V  ∉ ,   =  ∪ {V  } is not an independent set.Then, the set  is the maximal independent set.If ∀  , card() ≥ card(  ).Then the set  is the maximum independent set.The number of elements in  is the independent number of graph  noted as () or .
From the definitions, reserving the most original sampling points can be transformed into computing the maximum independent set; that is, delete the least vertices so that there is no edge between any two vertices.In graph theory, the simple graph can be expressed as an adjacent matrix  × = (  ), which shows the lines between vertices.If there is a line between vertex V  and vertex V  ,   = 1.Otherwise   = 0. Obviously, the properties are given as follows: So the reservation problem is transformed into a problem to get the maximum independent set; that is, delete the least vertices so that the corresponding adjacent matrix of remaining vertices is a zero matrix.Suppose that there is a LHS of size , the sample matrix is  (old)  × = { (old)  }, the order matrix is  (old)  × = { (old)  } ( (old)  is the interval order number of  (old)   in the th variable), the cumulative probability matrix is  (old) × = { (old)   } ( (old)    is the cumulative probability of  (old)   , which is the Prob  in (1)), and the random matrix is is the random value of  (old)  , which is the   in (1)).The objective is to get ELHS of size  +  noted as  (new)  (+)× .The detailed steps of BGELHS algorithm are as follows.
Step 1. Compute the positions of original sampling points in ELHS structure.
Step 2. Reserve the most original sampling points, noted as the sample matrix  (2)    × .(a) Compute the adjacent matrix  × of the original LHS based on  (1)  × as shown in the following: ̸ =  (1)   ., where  =  0  +  1   +  2  + ⋅ ⋅ ⋅ +    = 2  ,   is a combination, and the number of elements in   is not more than that in   , 1 ≤  <  ≤  (e.g., let  = 2 , where null means there is no element).Let  = 1 and delete the point(s) corresponding with the element(s) of   ; then judge whether the remaining adjacent matrix   is equal to ⃗ 0. If   = ⃗ 0, record the order number of deleted point(s) as    and the dimension number of   is noted as   , that is, the number of remaining points.Otherwise, set  =  + 1, delete the points corresponding with the elements of   , and repeat the Step 2-(b).
(a) Generate a new order matrix   (+)× of size ( + ) ×  and every column of   (+)× is a random permutation of the integer number from 1 to  + .
(b) For every column, delete the element that is the same as the corresponding column in  (2)    × and get the matrix  (3)  (+−  )× .
Example 1. BGELHS is illustrated by the original LHS in Figure 1.Let  1 have a uniform distribution on [0, 10] and let  2 have a triangular distribution on [5,10] with mode at 7.5.The distribution function (⋅) is ( The inverse of the distribution function (⋅) is The sample matrix and related matrices are Then, the objective is to get ELHS of size 7, which contains the original sampling points as many as possible.So  = 5,  = 2.
From Step 1, the distribution of original sampling points in the new sample is From Step 2-(a), the adjacent matrix is computed: Delete the point(s) corresponding with the element(s) of   ,  = 1, 2, . . ., 32.For the fifth iteration,  = 5 and  5 = [4 null null null null].Delete the fourth point (i.e., delete the fourth row and fourth column of  5×5 ) and the new adjacent matrix is   4×4 = ⃗ 0.Then,   4 = [1 2 3 5] and   = 4. From Step 2-(c), the addition sample is computed: From Step 3, an order matrix is original: Then ] ,   Finally, the ELHS of size 7 is obtained, which is shown in Figure 4: 4×2  (21)

General Extension Algorithm of LHS Based on Greedy
Algorithm.In graph theory, obtaining the maximum independent set is a NP-hard problem.
Step 2-(b) of BGELHS needs the most time and the time complexity of BGELHS is (2  ).It means that, with the increase of the number of sampling points, the time of BGELHS will be increased exponential.In this section, GGELHS is proposed to reduce the time of extension algorithm.First of all, the definition of degree is introduced.
Definition D (see [17]).In a simple graph  = (, ), the number of edges associated with the vertex V is the degree of V noted as (V).(V) indicates the number of vertices linked with the vertex V.The basic idea of greedy algorithm is to delete a point one time whose (V) is the maximum until (V) of any remaining vertex is 0. The steps of GGELHS are the same as those of BGELHS except for Step 2-(b).The details of Step 2-(b) are given as follows.
Step 2-(b): compute the degree of every vertex (V  ) = ∑  =1   ,  = 1, 2, . . ., .Delete the vertex V  whose degree is the maximum.Delete the corresponding row and column in the adjacent matrix  ( × at the beginning) and note the remaining matrix as   .Record the order of V  in   .If   ̸ = ⃗ 0,  =   , and repeat Step 2-(b) to reconstruct   .Otherwise, the element of   is the order of points to delete.The dimension of   is noted as   and the order of points to remain is recorded in    .In Step 2-(a) of GGELHS, when computing the adjacent matrix  × , the time required is ( − 1)/2.It is easy to know that the time complexities of other steps are not greater than ().So the time complexity of GGELHS is ( 2 ), which is much less than BGELHS.But this algorithm gets the maximal independent set rather than the maximum independent set; that is, this algorithm cannot guarantee to reserve the most original sampling points.As computing the degree of original sampling points costs a little, GGELHS is less time consuming and more practical.From the result above, we can know that Step 2-(b) gets the maximum independent set in this example, which means that this algorithm deletes the least original sampling points.But the greedy algorithm cannot guarantee to have the same effect for every circumstance.

Application
BGELHS and GGELHS can be applied for many other applications, such as adaptive metamodel building and sequential experiments design and analysis.In Section 4, pick up two test examples to evaluate the sample means [6].The functions are shown in ( 22) where   ∈ [0, 1],  = 1, 2, 3, 4, and   obeys uniform distribution: (c) Compute the average of every sample outputs   as shown in the following: (d) Compute the standard deviation  of   as shown in the following: where  is the mean of   ,  = 1, 2, . . ., .
(e) Let  = +.If  >  max , stop.Otherwise, generate sample by the five algorithms and go to step b.
In this application,  = 10,  = 10,  max = 100, and  = 10 for MCS, CLHS, BGELHS, and GGELHS.Figure 5   is the result of these four algorithms.It can be concluded that the convergence rates of two extension algorithms are the same as CLHS and better than MCS.Therefore, the extension algorithms reserve the good convergence of CLHS.Compared with CLHS, an outstanding of extension algorithms is the less number of runs. Figure 6 shows the number of function runs in this application.The run number of CLHS is (10 + 20 + 30 + ⋅ ⋅ ⋅ + 100) × 10 = 5500.BGELHS saves 3302 runs for function 1 and saves 3325 runs for function 2. GGELHS saves 3263 runs for function 1 and saves 3259 runs for function 2. Therefore, BGELHS and GGELHS save more than half of the runs compared with CLHS.
The total times of these extension algorithms are shown in Figure 7.The times of MCS and CLHS are almost the same.The time of GGELHS is one and two orders of magnitude slower than that of MCS and CLHS because of Step 2. BGELHS cost the most time, which is a serious problem when applied.GGELHS is easy to apply in practice.

Conclusion
A drawback to LHS is the extension problem and this paper mainly discusses how to obtain a new strict LHS that contains the original sampling points as many as possible.There are three problems to solve including how to express the relationship of original sampling points in the ELHS structure, how to reserve the most original sampling points and how to generate the new sampling points.In this paper, BGELHS and GGELHS are proposed to solve these problems.BGELHS can reserve the most original points in theory.However, it is difficult to apply in practice because of the problem of time consuming.GGELHS cannot guarantee to reserve the most original points, but the time complexity is much less than that of BGELHS.Therefore, BGELHS is suitable for the analysis of time consuming system when the running time of system is much more valuable than the generation time of ELHS.GGELHS can be applied for more situations.A simple example and application for the evaluation of sample means show the effectiveness of these algorithms.
This paper illustrates the relationship of ELHS and graph theory.Some conclusions of graph theory can be applied for the general extension of LHS directly, which is the main contribution of this paper.The relationship of original sampling points in ELHS is expressed in the form of adjacent matrix, but the reverse does not hold.The reason is that some information of original sampling points is lost including the dimensions of variables, the value of sampling points, and so on.Therefore, we can improve BGELHS based on the lost information and some conclusions of graph theory, which is under investigation.On the other hand, LHS is often applied with some criteria for a better distribute or space-filling, such as correlated variables, maximizing minimum distance, and minimizing discrepancy.The general extension of LHS with these criteria is also the future work.

2 Figure 2 :
Figure 2: Distribution of generated sampling points in new structure.

Figure 3 :
Figure 3: The simple undirected acyclic graph of Figure 2.

Example 2 .
To compare BGELHS and GGELHS, we consider the same question of Example 1.Since GGELHS is almost the same as BGELHS except for Step 2-(b), we mainly give Step 2-(b) of GGELHS.From Step 2-(b), the degrees of five points are  5 = [0 1 1 2 0]  .The fourth point has the maximum degree.So delete the fourth point, and the adjacent matrix is   4×4 = ⃗ 0. So,    = [1 2 3 5],   = 4. BGELHS run five times in Step 2-(b) and GGELHS run only one time.

Problem of LHS. In this paper, the extension problem of LHS has two purposes. One is to construct
a strict LHS of a larger size than original LHS.The other is to reserve the most original sampling points in the new LHS.Assume that there is an original LHS and the sample set is  = { 1 ,  2 , . . .,   }, where   is a sampling point,  = 1, 2, . . ., .