A Gradient Based Iterative Solutions for Sylvester Tensor Equations

This paper is concerned with the numerical solution of the Sylvester tensor equation, which includes the Sylvester matrix equation as special case. By applying hierarchical identification principle proposed by Ding and Chen, 2005, and by using tensor arithmetic concepts, an iterative algorithm and its modification are established to solve the Sylvester tensor equation. Convergence analysis indicates that the iterative solutions always converge to the exact solution for arbitrary initial value. Finally, some examples are provided to show that the proposed algorithms are effective.


Introduction
This paper is concerned with the solution of the following equation: where matrices  ∈  × ,  ∈  × , and  ∈  × , and tensor D ∈  ×× .The properties of the operators × 1 , × 2 , and × 3 are described in detail in Section 2. When X is a 2-mode tensor, that is, a matrix, (1) reduces to It is just a Sylvester matrix equation, which has received much attention, often motivated by applications in control theory, model reduction, signal processing, filtering, and system identification [1][2][3][4][5][6][7].So we call (1) a Sylvester tensor equation in this paper.The simplest direct approach for solving the matrix equation ( 2) is Kronecker product method [8] in which the huge memory is needed, for example, if  =  = 100, the dimension of the new coefficient matrix is 10000.Alternative direct method is to firstly transform coefficient matrices into special forms, such as Schur canonical form [9], Hessenberg-Schur form [1], and [10][11][12], and then to solve another matrix equation which may be readily computed.Such kinds of methods have been widely adopted.For example, 3D Schur decomposition was developed to solve the Sylvester tensor equation (1) (for more details see [13]).However, these methods require the transformations of coefficient matrices, which need considerable cost.
Iterative algorithms for solving matrix equations have been studied extensively, for example, [14][15][16][17][18][19][20][21].For example, Ding and Chen proposed a few simple iterative methods for matrix equation in [22][23][24].The methods, resembling the classical Jacobi and Gaussian iterations for linear systems, are easy to implement and cost little per step.Gradient based iterative algorithms and least squares based iterative algorithms have also been proposed in [25][26][27] for (coupled) matrix equations, which include the Sylvester matrix equations and Lyapunov matrix equations as a special form.However, to the best knowledge of the authors, there is hardly any iterative method for the Sylvester tensor equation in references up to now.
In this paper, we first extend the gradient based iterative algorithm for Sylvester matrix equation ( 2) to solve the Sylvester matrix equation (1) based on previous work in [22].The basic idea of such an algorithm is to regard the unknown tensor as the system parameter tensor by applying hierarchical identification principle.Then, we present a more efficient modified iterative method.The convergence properties of the two algorithms are proved.Numerical examples are given to show that the two iterations are effective.
The rest of this paper is organized as follows.In Section 2, we describe tensor notations and common operations used throughout the paper.An extension and modification of the gradient based iterative algorithms to the Sylvester tensor equation ( 1) are presented in Section 3. The convergence of the algorithms are analyzed in Section 4. Section 5 contains some numerical experiments to illustrate the theoretical results in this paper.Finally, some conclusions are outlined in Section 6.

Tensor Notations and Basic Operations
In this section, we briefly review some concepts and notations that are used throughout the paper.Matrices are denoted by capital letter, for example, , , and .Tensors are denoted by Euler script letter, for example, X, Y, and Z.The  ×  identity matrix is denoted by   .The order of a tensor is the number of dimensions, also known as ways or modes.As a special case, a vector is 1-mode tensor and a matrix is 2-mode tensor.The operator vec(⋅) stacks the columns of a matrix to form a vector.‖ ⋅ ‖ denotes the Frobenius norm unless otherwise stated.

Tensor-Matrix
Multiplication.An important operation for a tensor is the tensor-matrix multiplication [28][29][30].The 1-mode product of a tensor X ∈  ×× by a matrix  ∈  × , denoted by X× 1 , is a  ×  ×  tensor in which the entries are given by Similarly, 2-mode multiplication of a tensor by a matrix  ∈  × 3-mode multiplication is analogous and is omitted here.It follows immediately from the definition that -mode and -mode multiplication commutes if  ̸ = : If the modes are the same, then 2.2.Inner Product and Tensor Norm.The inner product of two tensors X, Y ∈  ×× is defined by and the norm induced by this inner product is For two tensors X and Y, the computation of ‖X − Y‖ can be simplified as follows: Moreover, -mode multiplication commutes with respect to the inner product, that is, Finally, the tensor norm of -mode multiplication is mutually consistent, that is,

Iterative Algorithms
In this section, we apply the tensor arithmetic concepts and the hierarchical identification principle to solve the Sylvester tensor equation ( 1).Firstly, we consider the existence and uniqueness of the solution of (1).By using the Kronecker product, we have which is equivalent to (1), This vectorized representation immediately leads to the following result.
In the case of the Sylvester matrix equation (2), this corresponds to the well-known condition   () +   () ̸ = 0, for any  and .
In order to drive the iterative algorithms for solving (1), we need to introduce the following residual tensors: According to the hierarchical identification principle [22], (1) is decomposed into three subsystems: Consider the following least squares problem: Note that the derivatives of   ( = 1, 2, 3) with respect to X are derived in [30] and are given by Minimizing   (X) ( = 1, 2, 3) leads to the following recursive equations: where  is iterative step length and is given in the following section.Substituting ( 13) into ( 17) and replacing the unknown variable X in (13) with their estimate at time ( − 1), we have where . Taking the average of X  1 , X  2 , and X  3 , we obtain Algorithm 1.
Remark 2. Note that when we compute X  2 in Algorithm 1, the X  1 has been computed.Similarly, when we compute X  3 , the X  1 and X  2 have been computed.Hence, we can fully take advantage of the information of X   ( = 1, 2, 3) to update the X −1 and get a more efficient algorithm as in Algorithm 2.

Convergence Analysis
In this section, we will discuss convergence properties of the algorithms in the previous section.The following theorem provides a sufficient condition to guarantee convergence of the Algorithm 1.
Based on Theorem 3, we have the following theorem.
The gradient based iterative algorithm (GI).
The modified gradient based iterative algorithm (MGI).
Proof.The proof is similar to Theorem 3 and hence omitted.Now we consider the convergence rate of the Algorithm 1. From ( 19) and ( 21), it is not difficult to get the following residual error equation: where We observe that the closer the eigenvalues of  are to 3, the closer the eigenvalues of   − (/3) tend to zero, and hence, the faster the residual error X converges to zero.So the convergence rate of Algorithm 1 depends on the condition number of the associated system [22]; that is, Algorithm 1 has a fast convergence rate for a small condition number of .The analysis of Algorithm 2 is similar; hence, it is omitted here.

Numerical Experiments
This section gives some examples to illustrate the performance of the proposed algorithms.All the implementations are based on the codes from the MATLAB Tensor Toolbox developed by Bader and Kolda [31,32].
Then, the solution of X is ) .
Taking X 0 = 0,  = 2.8 * 1/(‖‖   sequence {X  }.These values are plotted on logarithmic scales, if the curve is approximate decreasing line, then it is indicated that the convergence of the sequence is at least linear. From Figure 1, we can see that two algorithms are convergent linearly and MGI algorithm converges much faster than GI algorithm.Remark 6.The effect of changing the step length  and  is illustrated in Figure 2. We observe that the larger the step length is, the faster the convergence of the algorithm.However, if the step length is too large, the algorithm may diverge.How to choose a best step length is still a issue to be studied [22].Example 7.This example comes from [22], except for  and D which will be set to a random matrix (and tensor).The MATLAB codes which generate these matrices and compute the step lengths are as follow:  Since the exact solution in this example is unknown, we plot the relative residual errors From Figure 3, we can see that the convergence of GI algorithm tends to slow down after some iterative steps.The MGI algorithm converges linearly, and much faster than GI method.
Example 8.In this example, we consider the effect of the condition number of matrix  on two algorithms.The stoping criterion is set to be 10 −6 .The MATLAB codes which generate these matrices are as follows: For different , the number of iterations and the corresponding condition number (()) of the matrix  are outlined in Table 1.
The results of Table 1 affirm the analysis at the end of Section 4. In particular, we see that the convergence rate becomes faster as the condition number of  decreases.

Conclusions
We have presented the gradient based iterative algorithms for solving Sylvester tensor equation, Sylvester matrix equation as a special case included.They are based on the hierarchical identification principle and tensor arithmetic concepts.The convergence analysis indicates that the iterative solutions given by the proposed algorithms converge to exact solution for any initial values.Such statement is also confirmed by the given numerical examples.In addition, the idea of this paper can be easily extended to study iterative solutions of high order (X is -mode tensor) Sylvester tensor equation.

Figure 1 :
Figure 1: Comparison of convergence curves using GI and MGI algorithms.

Figure 2 :
Figure 2: The effect of changing the step length.

Figure 3 :
Figure 3: Comparison of convergence curves using GI and MGI algorithms.
∑ =1      L       2 .( Since the exact solution is known in this example, we plot the values ||X  − X||/||X|| to indicate the convergence of the

Table 1 :
The effect of the condition number of matrix  on two algorithms.