Least Square Support Tensor Regression Machine Based on Submatrix of the Tensor

For tensor regression problem, a novel method, called least square support tensor regression machine based on submatrix of a tensor (LS-STRM-SMT), is proposed. LS-STRM-SMT is a method which can be applied to deal with tensor regression problem more efficiently. First, we develop least square support matrix regression machine (LS-SMRM) and propose a fixed point algorithm to solve it. And then LS-STRM-SMT for tensor data is proposed. Inspired by the relation between photochrome and the gray pictures, we reformulate the tensor sample training set and form the new model (LS-STRM-SMT) for tensor regression problem. With the introduction of projection matrices and another fixed point algorithm, we turn the LS-STRM-SMT model into several related LS-SMRM models which are solved by the algorithm for LS-SMRM. Since the fixed point algorithm is used twice while solving the LS-STRM-SMT problem, we call the algorithm dual fixed point algorithm (DFPA). Our method (LS-STRM-SMT) has been compared with several typical support tensor regressionmachines (STRMs). From theoretical point of view, our algorithm has less parameters and its computational complexity should be lower, especially when the rank of submatrix K is small.The numerical experiments indicate that our algorithm has a better performance.


Introduction
As we all know, in the past decades, matrices or more generally multiway arrays (tensors) types of data have an increasing number of applications.For example, all raster images are essentially digital readings of a grid of sensors and matrix analysis is widely applied in image processing, for example, photo realistic images of faces [1], palms [2], and medical images [3].In web search, a large amount of tensors that stand for images [4] can be found easily.Therefore, tensor data analysis [5], particularly regression problem [6,7], has become one of the most important topics for face recognition [8], palmprint recognition [9], and so on.
Tensor types of data have greatly drawn the attention of people.Recently, several tensor learning for regression approaches [10,11] appears, but the majority of them dealing with tensor regression problems work on vector spaces that are derived by stacking the original tensor elements in a more or less arbitrary order.This vectorization of data causes many new problems.First, the structural information is destroyed.Second, the vectorization of a tensor may bring an extremely high dimensionality vector which may lead to high computational complexity, overfitting, and large memory requirement.The rest of the methods mainly take advantage of the decomposition of a matrix [12] or tensor [6], which can reduce the high computational complexity as well as high dimensionality at the expense of slight decline of accuracy, but the structural information is destroyed totally.So a more reasonable method which can reserve the underlying structural information and avoid overfitting, high dimensionality, and high computational complexity is needed.
Considering the fact that a colorful photograph can be expressed as a third-order tensor, of which each frontal slice is indeed a gray image that contains almost all the information of the colorful photograph, we can take advantage of this by introducing submatrix of a tensor and abstract vector space when solving tensor learning for regression problems.That means, each tensor data sample can be regarded as an abstract vector [13] whose elements are submatrix types of features.Gathering together the same feature information of different tensor data sample, we can construct new submatrix training sets and the same number of related training models, from which we can get an equal amount of weight submatrix.And then, the weight tensor is obtained.Besides, we improve the fixed point algorithm [14] via some projection matrices [15], including a series of left projection matrices and a right projection matrix.The improved algorithm is called dual fixed point algorithm (DFPA).The projection matrices can not only join the training models up but also reduce computational complexity and large memory requirement.That is to say, we turn the LS-STRM-SMT problems into solving a battery of least square support matrix machine (LS-SMRM) by fixing the projection matrices and then solve the LS-SMRMs problems with fixed point algorithm.The numerical experiments indicate that our method and algorithm have a better performance.
The paper is organized as follows: in Section 2, notations and preliminaries are introduced, such as definitions related to tensors and notation that will be used.In Section 3, we propose our (LS-SMRM) for matrix regression problems and the fixed point algorithm for them.In Section 4, we propose the LS-STRM-SMT models and develop the DFPA to solve them.Computational comparisons on both UCI data sets and artificial data are done in Section 5 and conclusions in Section 6.

Notations and Preliminaries
Here, we will give a brief description of the notations that will be used in the later sections.More specifically, boldface capital letters, for example, A, boldface lowercase letters, for example, a, and lowercase letters, for example, , are used to denoted matrices, vectors, and scalars, respectively.Tensors regarded as multidimensional arrays will be denoted by Euler script calligraphic letters, for example, X ∈   1 × 2 ⋅⋅⋅×  , where  denotes the order of the tensor.Inspired by the fact that th element of a vector x ∈   is denoted by   ,  = 1, 2, . . ., , the elements of an -order tensor X will be denoted by   1  2 ⋅⋅⋅  ,   = 1, 2, . . .,   ,  = 1, 2, . . ., .
For an -order tensor X ∈   1 × 2 ⋅⋅⋅×  , the -mode matricization also known as unfolding or flattening is denoted by It is quite clear that we can reorder the elements of the tensor into a matrix in such a way.On the contrary, define a mapping function to recover a tensor from its unfolding matrix.Particularly, when  = 3, we have The inner product of the two same size tensors X, Y ∈   1 × 2 ⋅⋅⋅×  is defined as The Frobenius norm of a tensor is thus defined as And it can be shown that The Contrast Pyramid, referred to as CP, decomposition factorizes an  order tensor X ∈   1 × 2 ⋅⋅⋅×  into a linear combination of  rank-one tensors, written as where the operator ∘ is the outer product of vectors and the factor matrix U  = [u  1 , u  2 , . . ., u   ] ∈    × ,  = 1, 2, . . ., , of the size   × ,  = 1, 2, . . ., .For the convenience, the mentioned tensor is the third-order tensor in the following content if there is no special instruction.

Least Square Support Matrix Regression Machine (LS-SMRM)
In the section, we propose least square support matrix regression machine, shorten as LS-SMRM, for the regression problem with matrix input.Give a training set where X  ∈   1 × 2 is the input and   ∈  is the output,  = 1, 2, . . ., .Our task is to find a predictor where W denotes the weight matrix and  is the bias.For a new input matrix, we can predict its output through the above-mentioned predictor.
In order to get predictor (9), we develop the following optimization problem: where  > 0 is penalty parameter.According to the CP decomposition (7), the matrices V and U can be found to make where The fixed point algorithm is applied to solve optimization problem (12).When U = (u 1 , u 2 , . . ., u  ) is fixed, we need to compute a set of k  for  = 1, 2, . . .,  and .Firstly, denote f = ( and the optimization problems (12) are equivalent to min k,, And let optimization problems ( 14) are reformulated as min k,, The Lagrange function of optimization problems ( 16) can be expressed as where  = ( 1 ,  2 , . . .,   )  is Lagrangian multiplier vector.Then, the KKT system of ( 17) is Rewrite ( 18), (19), and ( 20) into the form of equation; we can get where ) . ( and  can be got by solving linear system (21).Then, k is obtained according to (18) and the right projection matrix U is accessed through the relation among k, (13) and (15).In summary, when we fix k  ,  = 1, 2, . . ., , the solution of optimization problems ( 12) can be computed by solving linear system (21) directly.Similarly, when V = (k 1 , k 2 , . . ., k  ) is fixed, we can also change the formulation of our algorithm in optimization problems (12) and derive its optimal U and  by solving another linear system.That is to say, we need to compute a set of u  for  = 1, 2, . . .,  and .Firstly, denote ĝ = ( ( (23) optimization problems ( 12) is equivalent to min û,, And let the optimization problems (24) are reformulated as min k,, The Lagrange function of optimization problems (26) can be expressed as where  = ( 1 ,  2 , . . .,   )  is Lagrangian multiplier vector.Then, the KKT system of ( 27) is Rewrite ( 28), (29), and (30) into the form of equation; we can get where ) . ( Repeating the iterative operation until convergence, the weight matrix W is obtained by (11), and the predictor is According to the above description, we can summarize the following algorithm.
Output.Left and right projection matrices V, U, and bias .(1) Update V and .

LS-STRM Based on Submatrix of the Tensor (LS-STRM-SMT)
In this section, we propose least square support tensor regression machine, shorten as LS-STRM, for the regression problem with the tensor input.In fact, with the introduction of submatrix of the tensor, the LS-STRM problem is turned into  3 LS-SMRM problems which are independent.However, the right project matrices should be made equal to fit the practical situation.That is to say, we need to solve  3 LS-SMRM problems with the same right projection matrix.To show the effectiveness of the proposed algorithm, we will provide some deep analyses in the section.

Formulation. Give a training set
where X  ∈   1 × 2 × 3 is the input and   ∈  is the output,  = 1, 2, . . ., .Our task is to find a predictor where W denotes the weight tensor and  is the bias.For a new input tensor, we can predict its output through predictor (36).For convenience, we set  = 0.
Particularly, when  = 3, the submatrices of a third-order tensor are indeed the frontal slices.
However, we do not construct the model for training set (35) directly.We transform the training set (35) into  3 training sets similar to training set (8) by the introduction of the submatrix of the tensor and then construct  3 regression problems.That means, for the training set (35), every input tensor X  ∈   1 × 2 × 3 ,  = 1, 2, . . ., , can be regarded as an abstract vector where X () ,  = 1, 2, . . .,  3 , is the th frontal slice of the tensor X  .Next, we take the th frontal slice of each tensor X  ,  = 1, 2, . . ., , out and construct the following training set: where  () ∈  denotes the th element of   ,   = ( (1) ,  (2) , . . .,  ( 3 ) )  is a vector obey normal distribution with mean   and variance .
According to training set (39),  3 optimization problems are constructed as follows: min where W  = W(:, :, ) = V  U   , X () = X  (:, :, ),  = 1, 2, . . .,  3 ,  = 1, 2, . . ., .So we can get W (1) = (W 1 , W 2 , . . ., W  3 ), and then the weight tensor can be obtained by It is clear that the  3 model is independent, but this is contrary to the truth.In order to reflect the relation among them, we set so that the models can fit the practical situation better.The  3 models can be expressed together as follows: min where That means what we need to solve are optimization problems (43).

Dual Fixed Point Algorithm (DFPA) for LS-STRM-SMT.
Fixing V 1 , V 2 , . . ., V −1 , V +1 , . . ., V  3 , U and optimizing V  , optimization problems (43) can be reformulated as follows: min where That is to say, a series of problems similar to optimization problems (44) which are indeed LS-SMRMs rather than optimization problems (43) are needed to be solved.

Extension.
For a more general tensor, that means  > 3. We can also take advantage of the submatrix of the tensor to turn the tensor learning for regression problems into   LS-SMRM problems that can be solved by Algorithm 1.The details can be shown as follows.Give a training set where X  ∈   1 × 2 ×⋅⋅⋅×  , ( > 3) is the input and   ∈  is the output,  = 1, 2, . . ., .Our task is to find a predictor where W ∈   1 × 2 ×⋅⋅⋅×  , ( > 3) denotes the weight tensor,  is the bias.For a new input tensor, we can predict its output through predictor (50).The new training set is constructed through (1) as follows: where mat 1 (X(:, :, . . ., :, )) ∈   1 ×( 2  3 ⋅⋅⋅  ) denotes the th submatrix of the th sample,  () denotes the th element of   which is a -dimension vector obey normal distribution with mean   and variance .Then we can get a   -dimension abstract vector whose elements are matrices of the size  1 × ( 2  3 ⋅ ⋅ ⋅  −1 ).According to mapping function (2), the matrices can be reformulated as equal number of ( − 1)order tensors.And then the abstract vector with (−1)-order tensor elements is indeed a -order tensor that we need to get.

Numerical Experiment
In the following numerical experiments, we use four groups of vector data from the UCI database and an artificial tensor data for evaluation of our algorithm.The data are Slump, Ticdata2000, ConcreteData, and BlogData from the UCI database.The artificial data is given by the function "rand" in Matlab.We reformulate the vector data into matrices or tensors by rearranging the order of vector's elements.The detailed statistical characters are listed in Table 1.
Since our method is solved in a fixed point algorithm or in other words an alternative way, the final solution has close relationship with initialization.We would like to introduce two different kinds of initialization strategies in this part.The first initialization is random.We generate some random elements to form the matrices with corresponding sizes.In the following experiments, this kind of initialization is used without specification.The second initialization is an empirical method.That means we can set V  = [I × , 0 ×(  −) ],  = 1, 2, . . ., .When solving the LS-STRM-SMT, penalty parameter  is selected from the set {2 −8 , 2 −7 , . . ., 2 7 , 2 8 }.Another important parameter that may influence the result is .Experiment performances show that the mean square error (MSE) which we used to measure whether the solutions to the LS-STRM-SMT problems are good or not decrease as the growth of the  value.That means the larger  is, the smaller MSEs are.But, this phenomenon is not very obvious and when  increases the computational complexity will rise quickly.We will give the MSE of different  for two data sets.And in the other experiments, we simply set  = 2. Besides, the parameter  can be obtained by fivefold cross validation.In this section, we compute the tensor learning regression problems for third-order data with our LS-STRM-SMT algorithm and compare it with some other regression method, three problems are talked about.First, how does the MSE for different data change with the increasing of ?Second, compare our algorithm with support vector regression (SVR) [16], least square support vector regression (LSSVR) [17], regularized matrix regression RMR, [18], and lasso regression [19,20].The last problem which is being talked about is the convergence of our algorithm.
The information of the data used is listed in Table 1.Tables 2 and 3 show the influence of different  for artificial data and Ticdata2000, respectively, and the comparison of our algorithm and others are in Table 4. From Tables 2 and 3, we know that, with the increasing of , the MSE tends to reduced gradually.But that is not very obvious and the largest  does not always means the best performance.In Figure 2, (a) and (b) show the relation between  and MSE more intuitively.From (a) and (b) in Figure 3 which is the average MSE of different  for artificial data and Ticdata2000, respectively.We can easily come to the conclusion that when  become large, the change of MSE is not that big, so we can set  = 2 to reduce computation complexity.
In Table 4, the variance is represented by the number in parentheses while numbers out of the parentheses denote the MSE of different algorithm for different data set.Since the vector sample can be reformulated as matrix (tensor) of different size, numbers with italic font mean the average value of different data size.The numbers which are bold denote the best performance.It is clear that our LS-STRM-SMT always has a better performance which is obvious in Figure 4.More details are included in Tables 5, 6, and 7.
In order to show the convergence behavior, we computerize the objective value of our method for artificial data and Ticdata2000.After 20 times iterations, the objective function values are shown in Figure 1.There are mainly two observations from the result.First, the objective values of our algorithm are extremely small, which implies that our algorithm performs good to some degree.Second, with the increasing of iteration, our algorithm tends to be a certain number which guarantee the convergence of our algorithm.
In Figure 1, the abscissa and the ordinate represent the iterations and the objective value, respectively, and the curve with different color and mark in the same figure stands for the relationship between objective value and iterations for the same data of different data size.
In Figure 2, the abscissa and the ordinate represent the parameter  and the MSE, respectively, and the line graph with different color and mark in the same fig stands for the relationship between the parameter  and the MSE for the same data of different data size.
Figure 3 show the relationship between average MSE and  for the artificial data and Ticdata2000, respectively.
Figure 4 is plotted based on Tables 2, 3, and 4. In (a) the full line stands for the performance of our algorithm while the imaginary line stands for the best performance of other algorithm.In (b) the bar graph in blue represents the performance of our algorithm while the bar graph in brown represents the best performance of other algorithm.From Figure 4, we can easily come to the conclusion that our method has a better performance.

Conclusion
In this paper, we propose a novel method for tensor learning regression problems, called least square support tensor regression machine based on submatrix of the tensor (LS-STRM-SMT), which is inspired by the idea of multiple rank multilinear SVM for matrix data classification (MRMLSVM).And we give our dual fixed point algorithm (DFPA) to solve the problems.However, LS-STRM-SMT is not a straightforward extension of MRMLSVM.On one hand, MRMLSVM is a method for matrix data classification while LS-STRM-SMT is used to solve tensor data regression problems; on the other hand, the LS-STRM-SMT and MRMLSVM are connected   by the introducing of submatrix of the tensor which can be applied to reformulate the LS-STRM-SMT problem as a series of LS-SMRM problems that can be figure out in a way similar to MRMLSVM.What is more, LS-STRM-SMT differs from traditional approach which destroys the structural information of the tensor totally; it can reserve the structure tensor information of the tensor to some extent which may be the reason why our method has a better performance than some other proposed algorithms.Besides, a small parameter  can decrease the dimensions of the unknown quantity which can avoid overfitting to some degree.In addition, from a practical point of view, it has also been shown by the preliminary numerical experiments that the performance of our LS-STRM-SMT is better.
Convergence of LS-STRM-BST for artificial data Convergence of LS-STRM-BST for Ticdata2000

Figure 2 :
Figure 2: MSE of different  for artificial data and Ticdata2000.

Table 1 :
Characters of different data sets.

Table 2 :
MSE of different  for Ticdata2000.

Table 3 :
MSE of different  for artificial data.

Table 4 :
MSE of different algorithm for different data set.

Table 5 :
RMR for Slump and ConcreteData of different data size, respectively.

Table 6 :
RMR for artificial data, Ticdata2000, and BlogData of different data size, respectively.

Table 7 :
LS-STRM-SMT for BlogData of different data size.