On the Degrees of Freedom of Mixed Matrix Regression

With the increasing prominence of big data in modern science, data of interest are more complex and stochastic. To deal with the complex matrix and vector data, this paper focuses on the mixed matrix regression model. We mainly establish the degrees of freedom of the underlying stochastic model, which is one of the important topics to construct adaptive selection criteria for efficiently selecting the optimal model fit. Under some mild conditions, we prove that the degrees of freedom of mixed matrix regression model are the sum of the degrees of freedom of Lasso and regularized matrix regression. Moreover, we establish the degrees of freedom of nuclear-norm regularizationmultivariate regression. Furthermore, we prove that the estimates of the degrees of freedom of the underlying models process the consistent property.


Introduction
With the increasing prominence of large-scale data in modern science, data of interest are more complex, which may be in the form of a matrix, not a vector.At the same time, the random noises are not always normal.These complex stochastic data are frequently collected in a large variety of research areas such as information technology, engineering, medical imaging and diagnosis, and finance [1][2][3][4][5][6][7].For instance, a wellknown example is the study of an electroencephalography data set of alcoholism.The study consists of 122 subjects with two groups, an alcoholic group and a normal control group, and each subject was exposed to a stimulus.Voltage values were measured from 64 channels of electrodes placed on the subject's scalp for 256 time points, so each sampling unit is a 256 × 64 matrix.To address scientific questions arising from those data, sparsity or other forms of regularization are crucial owing to the ultrahigh dimensionality and complex structure of the matrix data.Often, a variety of models in statistics lead to the estimation of matrices with rank constraints.The true signal often has low rank, which can be well approximated by a low rank matrix.Recently, Zhou and Li [5] proposed the so-called regularized matrix regression model to deal with these matrix form data, which is based on spectral regularization.This model includes the well-known Lasso as a special case; see [8] for more details.Moreover, one of the main results in [5] claimed the degrees of freedom of the proposed model under orthonormal assumption.
Degrees of freedom of the underlying stochastic model are one of the important topics.As we know, if we want to evaluate the performance of a model when we use it to analyze data, we need to choose the optimal tuning parameter in the same model.Many methods have been proposed to solve this problem.The popular methods include   , AIC, and BIC [9][10][11].There is also a computational cost method named cross-validation.Efron [11] showed that   is an unbiased estimate of prediction error, and in most cases   provides an accurate parameter over cross-validation.Thus,   and AIC outperform the cross-validation.The fundamental idea of   , AIC, and BIC is connected with the concept of degrees of freedom.
Degrees of freedom can be easily understood in linear model.In linear case, the degrees of freedom are the number of prediction variables.However, if there exist constraints on the prediction variables, the degrees of freedom do not exactly correspond to the number of variables; see, for example, [5,[12][13][14][15][16][17][18].After Stein [12] got Stein's unbiased estimation, analytical forms of the degrees of freedom of different models have been studied for vector case.For instance, Hastie and Tibshirani [13] showed that the degrees of freedom of a linear smoother equal the trace of the prediction matrix.In general, it is difficult to get the degrees of freedom of many models.In 1998 Ye [15] and in 2002 Shen and Ye [16] used the computational method to predict the degrees of freedom.However, there is a deficient thing that the more data, the more cost of computation.For high-dimension vector case, Zou et al. [14] gave the degrees of freedom of Lasso.Furthermore, Tibshirani and Taylor [17,18] gave the degrees of freedom of generalized Lasso.
However, for matrix case, there are a few results about the degrees of freedom of matrix regression.One can see that getting the analytical form of the degrees of freedom of our model is very essential both in theory and in practice.Thus, it is important to study the degrees of freedom in matrix case in the big data era.Notice that, besides Zhou and Li's work [5] about the degrees of freedom of regularized matrix regression, Yuan [19] got the degrees of freedom in low rank matrix estimation, which includes the cases of the rank constraints and nuclear-norm regularization.Note that Yuan [19] just considered the rank constraints of multivariate regression, and Zhou and Li [5] did not consider the mixed case, which is combined with matrix and vector.If we use the nuclear norm as the penalty, what are the degrees of freedom of that model?If the variables are mixed, what are the degrees of freedom of that model?
We will answer the above questions affirmatively in our paper.Firstly, we prove that the degrees of freedom of mixed matrix regression model are the sum of the degrees of freedom of Lasso and regularized matrix regression; this result can be useful to construct adaptive selection criteria for efficiently selecting the optimal model fit.Then, following the same idea we establish the degrees of freedom of nuclear-norm regularization multivariate regression.It is worth noticing that Zou et al. [14] not only gave the unbiased estimate of the degrees of freedom of Lasso model, but also proved the following consistency of the estimate.This is an interesting and important work on the estimates of the degrees of freedom of Lasso.Based on their work, we finally prove that the estimates of the degrees of freedom given in this paper are consistent.
Our paper is organized as follows.In Section 2, we introduce the primary model, basic concepts, and notations used in our paper.In Section 3, we show the process of computing the degrees of freedom of model (3).In Section 4, we give the degrees of freedom of multivariate regression with nuclear-norm regularization.In Section 5, we verify the consistent property of the estimates.We conclude the paper with a discussion of potential future research in Section 6.

Preliminaries
In this section, we mainly introduce our model and basic concepts.First we present mixed matrix regression model.Then for convenient discussion and understanding of our work, we give some basic knowledge and notations.
Suppose  ∈ R is the response variable,  ∈ R  0 is the prediction vector, and  ∈ R  1 ×2 is the prediction matrix.They are known.Let z and  be unknown prediction vector and matrix.The statistical model of matrix regression is given as where ⟨, ⟩ is the sum of multiply of corresponding element of  and ;  is the prediction error of the model.Suppose we take  samples Note that, in the real data case, there are always some special structures of  and z such that  has low rank and z is usually sparse.In this case, we define mixed matrix regression model as min where  3), we will get Lasso model.For Lasso model, the research is very mature including algorithm and the degrees of freedom.In statistical parlance, Lasso uses an ℓ 1 penalty which has the effect of forcing some of the coefficient estimates to be exactly equal to zero when the tuning parameter  is sufficiently large.We say that Lasso yields sparse models that just involve a subset of the variables, performing variable selection.Lasso has been widely used in statistical and machine learning.In model (3), if z = 0, we will get regularized matrix regression model mainly studied in [5].Now we review some basic results on the degrees of freedom.Based on Stein's unbiased estimation, Efron et al. [20] showed that the effective degrees of freedom of any fitting procedure  has a rigorous definition under the differentiability condition on the estimate ŷ of y based on , where y = ( 1 ,  2 , . . .,   ) denotes the response vector.That is, given a method , let ŷ = (y) denote its fit.Then under the differentiability of ŷ, the degrees of freedom of  are given by df (ŷ) = tr {ŷ (y)} . ( This means that the degrees of the freedom of  are the trace of the Jacobian matrix which is a special case of Definition 1.
Once we get the degrees of freedom, we can establish three well-known information criteria   , AIC, and BIC under the normal noise case.That is, In order to deal with the degrees of freedom of mixed matrix regression model, we define an operator to simplify the expression of optimal question and the Jacobian matrix of matrix function.Definition 1. Suppose there is a matrix function :  : R ×  → R × ,  →  =  () . ( Then one defines the Jacobian matrix as () =  vec()/ vec  ().
Suppose  = ( ) . ( Definition 2. Let the operator ⋆ be defined from It is easy to verify that the operator ⋆ is linear and ( ⋆ ) = () ⋆ .

𝑦 𝑛
).Then, we can rewrite mixed matrix regression model (3) as min Let B = (, z) denote the unknown coefficients, and let A = (, ) denote the prediction matrix.Our paper is based on the assumptions that A  A is full rank and the matrix data and vector data are independent, that is,    = 0.

The Unbiased Estimate of the Degrees of Freedom
We begin with the least squares estimate of our mixed matrix regression, which is the optimal solution of the problem We are ready to present our main result in this section.

Theorem 3. Let B𝐿𝑆 be the usual least squares estimate of B and assume that it has distinct positive singular values
then the unbiased estimate of the degrees of freedom of model ( 9) is where ẑ is the estimate of z and ‖ẑ‖ 0 is the number of nonzero elements in ẑ.Clearly,  = ( d) is the degrees of freedom of mixed matrix regression.
Theorem 3 is an immediate result of the following two propositions whose proofs are relegated to the Appendix for the sake of presentation.Proposition 4. For any  1 ≥ 0, the unbiased estimate of the degrees of freedom of regularized matrix regression model equals tr{ B( B )} given by where B is the usual least squares estimate of B and assume that it has distinct positive singular values Proposition 5. ∀ 2 ≥ 0, the unbiased estimate of the degrees of freedom of Lasso equals tr{ẑ(ẑ  )} given by
Very recently, Yuan [19] studied the degrees of freedom of multivariate regression with low rank constraint via the following optimal model: min Since the above optimal model with the low rank constraint is difficult to compute, it is NP-hard problem.In this case, we usually relax the rank constraint to nuclear-norm regularization.Then we get the nuclear-norm regularization multivariate regression model min Following the same technique as in the proof of Theorem 3, we can easily obtain the degrees of freedom of the nuclear-norm regularization multivariate regression.We omit its proof for brevity.Theorem 6. Assume that rank(  ) =  in ( 22).Let B be the usual least squares estimate and assume that it has distinct positive singular values  1 >  2 > ⋅ ⋅ ⋅ >   > 0, where  = min{ 1 ,  2 }.With the convention   = 0 for  > , the following expression is an unbiased estimate of the degrees of freedom of the regularized fit (22): Thus  = ( d()) is the degrees of freedom of the nuclearnorm regularization multivariate regression.

Consistency of the Unbiased Estimate
The consistency of an estimate is important because it implies that the estimate is convergent to true value in probability.
In this section, we prove the consistent property of the estimates of the degrees of freedom given in the former sections.We will first prove the consistency of the unbiased estimate df of regularized matrix regression.To do so, we need the following proposition on the continuous property of df.

Proposition 7. An unbiased estimate of the degrees of freedom of regularized matrix regression model is
where   is the singular value of the least square estimate.In this case, the degrees of freedom d are only continuous in { |  ̸ =   ,  = 1, 2, . . ., }.

Conclusions
In this paper, we mainly obtain the degrees of freedom of mixed matrix regression model.Moreover, we prove that the obtained estimates of degrees of freedom are consistent.Note that our results of the degrees of freedom are given under the assumption that the prediction matrix and vector are independent.However, if they are not independent but in linear relationship or another nonlinear relationship, or the number of samples is less than the number of variables, what is the analytical form of degrees of freedom?We will leave this as a future research topic.
Suppose the estimated random variable is (); we use statistical methods to get an estimate T (), which is a function of the size of sample.If T () is a consistent estimate of (), it means that, with the sample size increasing, T () equals () almost everywhere.That is, for any  > 0, we can get lim Therefore, we get lim → −  df = lim → +  df + 1.Clearly, df is not continuous in {  ,  = 1, 2, . . ., }.Now, we show the unbiased estimate df is consistent to the true degrees of freedom.Suppose   is the singular value of the least square estimate of the regularized matrix regression model, and  *  →  * > 0, where  * is not equal to the singular values, that means By assumption and Proposition 7, it holds that df is a continuous function in { |  ̸ =   ,  = 1, 2, . . ., }.If we have a sequence  *  satisfying  *  →  * ̸ =   ,  = 1, 2, . . ., , the continuous mapping theorem implies that lim →∞ df( *  ) = df( * ).Immediately, we see df( *  )→  df( * ).By using the dominated convergence theorem, we can get df ( Notice that, for the vector case, Zou et al. [14] not only gave the unbiased estimate of the degrees of freedom of the Lasso model, but also proved the following consistency of the estimate.For the Lasso model, if  *  / →  * > 0 with  * being a nontransition point, d( *  ) − ( *  ) → 0 in probability.Based on Theorems 3, 8 and Proposition 9, we can easily show the following theorem.