Key-Frame Detection and Super-Resolution of Hyperspectral Video via Sparse-Based Cumulative Tensor Factorization

Thanks to the rapid development of hyperspectral sensors, hyperspectral videos (HSV) can now be collected with high temporal and spectral resolutions and utilized to handle invisible dynamic monitoring missions, such as chemical gas plume tracking. However, using such sequential large-scale data eﬀectively is challenged, because the direct process of these data requires huge demands in terms of computational loads and memory. This paper presents a key-frame and target-detecting algorithm based on cumulative tensor CANDECOMP/PARAFAC (CP) factorization (CTCF) to select the frames where the target shows up, and a novel super-resolution (SR) method using sparse-based tensor Tucker factorization (STTF) is used to improve the spatial resolution. In the CTCF method, the HSV sequence is seen as cumulative tensors and the correlation of adjacent frames is exploited by applying CP tensor approximation. In the proposed STTF-based SR method, we consider the HSV frame as a third-order tensor; then, HSV frame super-resolution problem is transformed into estimations of the dictionaries along three dimensions and estimation of the core tensor. In order to promote sparse core tensors, a regularizer is incorporated to model the high spatial-spectral correlations. The estimations of the core tensor and the dictionaries along three dimensions are formulated as sparse-based Tucker factorizations of each HSV frame. Experimental results on real HSV data set demonstrate the superiority of the proposed CTCF and STTF algorithms over the comparative state-of-the-art target detection and SR approaches.

Basically, target detection is a kind of binary classifier with the purpose of labeling every image pixel as a target or background. In HSIs, pixels with a significantly different spectral signature from their neighboring background pixels are defined as spectral anomalies. Anomaly detectors are statistical or pattern recognition methods used to detect distinct pixels that differ from the background. It is worth mentioning that, in spectral anomaly detection approaches [19][20][21][22], such as Reed-Xiaoli (RX) algorithm [23], no prior information of the target spectral signature is assumed or used. However, we focus on the detection of invisible gas plumes in this paper, and the prior knowledge of the desired targets spectral characteristics is assumed to be known. In such cases, signature-based target detection algorithms are presented instead of anomaly detection. In these algorithms, the spectral characteristics of the target can be represented by a target subspace or a single target spectrum [24]. Likewise, the characteristics of background can be statistically expressed by a Gaussian distribution or a subspace defining the local or whole background statistics. As for this category, the matched subspace detector (MSD) method [25] is one of the most typical algorithms. In the MSD, the target pixel vectors are represented by a linear combination of the target spectral signature and the background spectral signature, which stand for the subspace target spectra and the subspace background spectra, respectively. en, the generalized likelihood ratio test (GLRT) is applied, using projection matrices associated with the background subspace and the target-and-background subspace. At last, the comparison between the output of GLRT and a preset threshold makes a final decision about whether the target is absent or present. From pixel level to subpixel level, a single pixel may contain several distinct pure materials (endmembers), also known as the mixed pixel. e presence of mixed pixels is a tough problem caused by the low spatial resolution of HSIs. Accordingly, some unmixing approaches [26][27][28] have been designed to compute fractional abundance of endmembers. In [29], a hyperspectral unmixing approach based on constrained matrix factorization (CMF) was proposed. Unlike conventional methods, each column vector of endmember matrix is represented as a nonnegative linear combination of pixel spectra. After endmember matrix and the corresponding fractional abundance matrix are obtained by solving optimization problems, abundance map of the target endmember shows the detection result.
As mentioned before, the HSIs often suffered from low spatial resolution. To acquire an HSI, the number of sun photons in each spectral band has to be greater than a minimum value, and the number of spectral bands is so huge in an HSI that the spatial resolution has to be sacrificed. erefore, super-resolution (SR) techniques have aroused great interest in the last decade. Generally, the SR methods of HSI can be classified into four categories: Bayesian [30], component analysis [31], deep learning [32], and sparse representation. Due to the limited length of this paper, we focus on the introduction of sparse-based algorithms. In such HSI super-resolution schemes, images are expressed by dictionaries and corresponding sparse coefficients. On the basis of the spatial-spectral sparsity in the HSIs, the dictionaries and sparse coefficients are estimated jointly [33]. Huang et al. [34] introduced a fusion method of multispectral images (MSIs) with different spectral and spatial resolutions based on sparse matrix factorization. Akhtar et al. [35] presented an MSI-HSI fusion approach using sparse coding and Bayesian dictionary learning. Moreover, some algorithms based on matrix factorization [36][37][38] or unmixing [39] can also be regarded as the sparse representation schemes because the source images are decomposed into some basis and the corresponding coefficients. Yokoya et al. proposed a couple nonnegative matrix factorization (CNMF) [40] algorithm, where the unmixing techniques are employed to yield the endmember matrices and the high-resolution (HR) abundance matrices of HSI. In [41], Lanaras et al. suggested a joint scheme to solve the spectral unmixing problems. In [42], Zhang et al. fused the low-resolution (LR) HSI and HR-MSI based on the group spectral embedding and low-rank factorization.
However, the matrix factorization based schemes cannot fully exploit the spatial-spectral correlations of the HSIs. It is believed that considering HSIs as tensors is better because an HSI can be naturally expressed as a third-order tensor. In this paper, a detection algorithm based on cumulative tensor CP factorization (CTCF) is proposed. e sequential HSV data is expressed as a four-dimensional (4D) cumulative tensor; factor matrices are obtained by decomposing original 4D tensor using CP factorization. When a new frame presents and is added to the time dimension of the original tensor, this 4D cumulative tensor is updated together with the factor matrices. Consequently, a CP tensor approximation of the new frame is computed by updated factor matrices and the fitness between the new frame and the approximation is calculated. After comparing the fitness to a preset threshold, we can make the decision that whether the new frame continues to be used to update the cumulative tensor or the new frame is the key-frame where the target presents. CTCF-based method exploits not only the spatialspectral correlations of the HSIs by applying tensor model, but also the temporal correlation between adjacent frames of the HSV.
On the other hand, tensor-based analysis has also been widely used in HSI super-resolution [43][44][45]. To the best of our knowledge, most of the SR algorithms enhance spatial resolution by fusing high-resolution MSI (HR-MSI) and low-resolution HSI (LR-HSI) from the same scene. Unfortunately, it is less practical in real application. In some situations, LR-HSI is the only data we have rather than both. In this paper, we suggest an SR algorithm using sparse-based tensor Tucker factorization (STTF). Inspired by the Tucker factorization and its related works, the HSV frames are represented as third-order tensors, which are approximated by the multiplication of the dictionaries along three dimensions (i.e., the dictionaries of the height mode, the width mode, and the spectral mode: they are named "three modes dictionaries" for short in the rest of this paper) and a core tensor. en, the problem of SR is transformed into the estimations of the three modes dictionaries and estimation of the core tensor. Specifically, the spatial information is represented by the height mode dictionary and the width mode dictionary, the spectral information is represented by the spectral mode dictionary, and the correlations of the three modes dictionaries are modeled by the core tensor. HSIs are generally selfsimilar so that a sparse prior can be imposed on the core tensor; then, the estimations of the core tensor and three modes dictionaries are formulated as the STTF of the LR and HR HSV frames. In the iteration of STTF, core tensor and dictionaries are all updated and accurate estimates are yielded when convergence is achieved. e remainder of this paper is organized as follows. Section 2 presents the materials and methods, including the basic notations and preliminaries of tensor and tensor factorization, the proposed CTCF approach for key-frame detection, and the proposed STTF method for key-frame super-resolution problem. In Section 3, experimental results on real HSV and the discussions are given. e paper is 2 Mathematical Problems in Engineering summarized in Section 4 with ideas for future work along the path presented here.

Tensor Notations and Preliminaries
2.1.1. Tensor Notations. In this paper, vectors are denoted by boldface lowercase letters (a, b, c, · · ·), matrices are denoted by boldface capital letters (A, B, C, · · ·), and tensors are denoted by bold Euler script letters (A, B, C, · · ·). Generally, a tensor is a kind of multidimensional array, denoted by A ∈ R I 1 ×I 2 ×···×I N . Here, tensor A is an Nth-order tensor and I n (1 ≤ n ≤ N) is the dimension of the nth mode. Obviously, vectors are first-order tensors and matrices are second-order tensors. We use A(i 1 , · · · , i n−1 , : , i n+1 , · · · , i N ) to denote the mode-n fiber, which are vectors yielded from tensor A by changing index i n with other indexes fixed. e mode-n unfolding matrix of tensor A is generated by placing all the mode-n fibers in a matrix as columns, denoted by A (n) ∈ R I n ×I 1 ,···,I n−1 I n+1 ,···,I N .
An important calculation between a tensor and a matrix is the n-mode product, which is defined as where B ∈ R J n ×I n and F ∈ R I 1 ×···×I n−1 ×J n ×I n+1 ×···×I N . e elements of A are denoted by a i 1 i 2 ,···,i N , so the elements of F are computed by Given the definition of n-mode product, we can obtain For continuous multiplication of a tensor and matrices in distinct modes, the result is not affected by the multiplication order, described by If the modes are equivalent, equation (4) is transformed into A× n B× n C � A× n (CB).
Suppose that E n ∈ R J n ×I n (1 ≤ n ≤ N) is a collection of matrices; we define tensor G ∈ R J 1 ×J 2 ×···×J N as e matricization form of equation (6) is presented by where g � vec(G) ∈ R J (J � N n�1 J n ) and a � vec(A) ∈ R I (I � N n�1 I n ) are vectors yielded by arranging the mode-1 fibers of the tensors G and A. e Kronecker product is denoted by symbol "⊗." Moreover, given the tensor A, ‖A‖ 0 represents the ℓ 0 -norm which equals the number of nonzero elements of A, e definition of rank-one tensor is introduced at last. e Nth-order tensor A is rank-one if it can be written as the outer product of N vectors, i.e., A � a 1 ∘ a 2 ∘ · · · ∘ a N . e symbol "∘" denotes the vector outer product [46].

Tensor
Factorizations. CANDECOMP/PARAFAC (CP) factorization decomposes a tensor into a sum of component rank-one tensors [47]. For example, given a third-order tensor X ∈ R I×J×K , we may formulate it as where R is a positive integer and a r ∈ R I , b r ∈ R J , and c r ∈ R K (r � 1, 2, · · · , R). e element of tensor X can be computed by a ir b jr c kr , i � 1, · · · , I, j � 1, · · · , J, k � 1, · · · , K.
e factorization result can be expressed by factor matrices of three dimensions. Factor matrices refer to the combination of the vectors from the rank-one components; i.e., A � a 1 , a 2 , · · · , a R , Following [48], the CP model can be concisely represented as On the basis of factor matrices, the mode-n unfolding matrices X (n) (n � 1, 2, 3) of X can be represented as where the symbol "⊙" denotes the Khatri-Rao product [49]. In this way, loss functions can be modeled as the approximation of the mode-n unfolding matrices; then the factor matrices of CP factorization can be obtained by solving the corresponding optimization problem.
Tucker factorization is another popular tensor decomposing approach [50]. It decomposes a tensor into a core tensor multiplied by a matrix along each mode. us, in the same case as above where X ∈ R I×J×K , the factorization can be described as where A ∈ R I×P , B ∈ R J×Q , and C ∈ R K×R are factor matrices which can be regarded as the principal components in each mode. erefore, Tucker factorization is a form of higher-Mathematical Problems in Engineering order principal component analysis (PCA). Tensor Z ∈ R P×Q×R is the core tensor and its elements stand for the correlation level between the different components. Similar to (11), the Tucker model can be concisely represented by X ≈ MZ; A, B, CN. Elementwise equation (13) can be represented as z pqr a ip b jq c kr , i � 1, · · · , I, j � 1, · · · , J, k � 1, · · · , K.
e Tucker factorization is illustrated in Figure 2.

e Proposed CTCF-Based Detection Method.
In this subsection, the optimization problem of updating factor matrix is presented, followed with the proposed cumulative tensor CP factorization (CTCF) of third-order tensors. It is then extended to Nth-order tensors. e CTCF-based detection method is described in the end of this subsection with its flowchart shown in Figure 3.

CP Tensor Approximation by Factor Matrices.
Similar to equation (12), the mode-n unfolding matrix of X ∈ R I 1 ×I 2 ×···×I N can be approximated by factor matrices; i.e., where the factor matrices A (1) , · · · , A (N) are obtained by CP factorization. e corresponding loss function is e Alternating Least Squares (ALS) algorithm is often applied to obtain factor matrices by solving the following optimization problem: When the tensor updates, the new tensor can be computed by the updated factor matrices which are given by equation (17).

CTCF of ird-Order Tensor.
Generally, an image is a second-order tensor; then sequential images form a thirdorder tensor, i.e., a video, adding a temporal dimension on two spatial dimensions. When a new video frame presents and is added to the time dimension of the original tensor, it is defined as a three-dimensional (3D) cumulative tensor. With the number of new frames increasing, the 3D cumulative tensor updates frame by frame.
In conventional CP tensor approximation, whenever a new frame of image is added in the time dimension, ALS algorithm needs to be reused to approximate the new cumulative tensor, which is a time consuming process. In addition, the temporal correlation between neighboring frames is not exploited in the decomposition of the cumulative tensor. is paper proposes CTCF to update the CP factorization of original cumulative tensor, obtain the updated factor matrices, and approximate the new frame.
Given an original 3D cumulative tensor X ori ∈ R I×J×T ori , the result of CP factorization is denoted by X ori ≈ MA ori , B ori , C ori N. When a new tensor X new ∈ R I×J×T new is added in the time dimension, the updated cumulative tensor is X ∈ R I×J×(T ori +T new ) , of which the CP factorization appears as X ≈ MA, B, CN. We focus on obtaining A, B, and C by updating A ori , B ori , and C ori . e updating process is operated in an alternating way. Firstly, temporal dimensional factor matrix C is computed while factor matrices A and B are fixed; i.e.,  Figure 1: CP factorization of a third-order tensor.
where C is divided into two terms. For A and B are fixed as A ori and B ori , the first row of (18) will be minimized if C (1) � C ori . To minimize the second row, according to (12), the optimal solution of C (2) where the symbol " †" denotes Moore-Penrose pseudoinverse of the matrix [51]. So, C can be updated by adding C new which is represented by Secondly, factor matrix A is computed while factor matrices B and C are fixed. Similar to 16, the loss function of estimating A is written as Derive L with respect to A; then, we have To simplifyequation (21), denote P � X (1) (C⊙B) and Q � (C⊙B) T (C⊙B); thus, when zL/zA � 0, we have A � PQ −1 . According to [47], Q can be rewritten as For computing P, we also divide X (1) and C into two terms; i.e.,  Figure 3: Flowchart of CTCF-based detection method.

Mathematical Problems in Engineering
Since B are fixed as B ori , the first term of equation (23) contains only the information of original tensor, which can be expressed by so,equation (23) is rewritten as Hence, P can be updated from P ori using mode-1 unfolding matrix of X new and factor matrix mentioned above C new . Generally, P is initialized by X(τ) ∈ R I×J×τ , which is a small front part of X ori , and updated iteratively by (25). Analogously, the update process of Q can be represented by e update of A may be summarized as Finally, the update of factor matrix B may likewise be expressed by where U � X (2) To make the process clearer, the proposed CTCF of third-order tensor is summarized by Algorithm 1.

CTCF of Nth-Order Tensor.
On the basis of Section 2.2.2, we try to extend CTCF to higher-order tensors. Suppose an N-dimensional cumulative tensor X ori ∈ R I 1 ×I 2 ×···×I N−1 ×T ori where the last dimension is temporal dimension. e CP factorization of X ori is represented as Similar to Section 2.2.2, temporal dimensional factor matrix A (N) is firstly updated with other N − 1 matrices fixed. Like 17, the optimization problem of estimating A (N) is formulated by We also separate original part from new added part; i.e., e original part is minimized by fixing the first N − 1 factor matrix and the new part is updated by X new . e updates of nontemporal dimensional factor matrices A (n) (n ∈ [1, N − 1]) may refer to the ones of factor matrices A and B in Section 2.2.2. e loss function L (n) of estimating A (n) is the same as 16. Let zL (n) /zA (n) � 0 and introduce matrices P (n) and Q (n) ; the update of A (n) may be summarized as where

CTCF-Based Detection Method.
In HSV, the sequential data is expressed as a 4D cumulative tensor; the temporal dimension increases with new frames are added in. Whenever a new frame presents, the results of original cumulative tensor CP factorization are updated to obtain the factor matrices of the new cumulative tensor, and the CP tensor approximation of the newly added frame is obtained at the same time. If the target is absent, the CP tensor approximation will lead to a small error, since the background information is similar between adjacent frames. On the contrary, if the error is large, the target is likely to present. We define the fitness between the new frame and its approximation in 34. If the fitness is smaller than the threshold, the target is supposed to appear in the new frame. Otherwise, the new frame is added in the temporal dimension and used to update original cumulative tensor. e original 4D cumulative tensor is denoted by X ori ∈ R I 1 ×I 2 ×I 3 ×n ; n denotes the frame number of initial video.
e factor matrices of four dimensions are represented as where A (1) ori ∈ R I 1 ×R , A (2) ori ∈ R I 2 ×R , A (3) ori ∈ R I 3 ×R , and A (4) ori ∈ R n×R and R denotes the number of component rankone tensors in CP factorization. When a new frame X new ∈ R I 1 ×I 2 ×I 3 is added in the temporal dimension of original 4D cumulative tensor, the 4D cumulative tensor is updated and denoted by X ∈ R I 1 ×I 2 ×I 3 ×(n+1) . e factor matrices of X are expressed by where (4) and obtain the approximation of X and X new , where X new ≈ MA (1) , A (2) , A (3) , A (4) (n + 1, : )N. Actually, it is the specific case when N � 4. We define the fitness (X new , X new ) as If the target does not appear, the approximation error is small and the result of fitness is large. Given a preset threshold η, when fitness(X new , X new ) > η, i.e., the fitness is larger than η, we decide that the target is absent. en, the nontarget frame is added in temporal dimension and the updated 4D cumulative tensor becomes the new original 4D cumulative tensor, which can be expressed as If the target appears, the approximation error is large and the fitness is smaller than η. e residual of X new and X new is the approximation of the target tensor; i.e., e target of each frame will be shown in 2D form by taking the maximum value of every spectrum. In this way, the proposed CTCF-based detection method can extract not only the key-frames where the target presents, but also the approximate region of target in every key-frame. e flowchart of the proposed method is shown in Figure 3. In Section 3, experiments on real HSV data are conducted and the proposed method is compared with some representative techniques.

2.3.
e Proposed STTF-Based Super-Resolution Method. In Section 2.2, we present an approach to detect the frames where the target appears in HSV and the approximate region of the target. However, as discussed in Section 1, there has to be a tradeoff between spectral resolution and the spatial resolution in HSI imaging systems [52]. e spatial resolution is always low since high spectral resolution is required in HSIs and HSV. So, we are interested in improving the spatial resolution of targets after the detecting process. Instead of fusing HR-MSI and LR-HSI, we try to handle the target SR problem by what we have got, which is more practical in real cases.

Problem Formulation.
In this subsection, HSIs are represented as 3D tensors with three indexes (H, W, S), which stand for the height, width, and spectral modes. X ∈ R H×W×S denotes the HR-HSI and the LR-HSI is denoted by Y ∈ R h×w×S , where W > w and H > h. e goal is to estimate X from Y.
ere are two significant characteristics of HR-HSIs [53]: the first one is that spectral vectors can be well approximated in low dimensional subspaces, and the second one is that HSIs are spatially self-similar. is means that sparsity exists in both spectral and spatial dimensions. Inspired by sparse representation [54], the low dimensionality in spectral domain gives the possibility to form a spectral mode dictionary S with few nonzero atoms; the self-similarities in spatial domain guarantee the sparse representations of the height and width modes with spatial dictionaries H and W. In this way, the conventional Tucker factorization is transformed into the multiplication of the core tensor and three modes dictionaries. e factorization is illustrated in Figure 4. e HR-HSI is represented as where H ∈ R H×z h , W ∈ R W×z w , and S ∈ R S×z s . e variables z h , z w , and z s denote the atoms (i.e., the number of columns) of H, W, and S, respectively. e core tensor Z contains the coefficients of X over three modes dictionaries. We can see that 37 incorporates the information of separated modes into a unified framework. e LR key-frame of HSV Y can be seen as the spatially downsampled version of HR-HSI X, which is written as where D 1 ∈ R h×H and D 2 ∈ R w×W are downsampling matrices of the height and width modes. Substituting 37 into (38), Y is represented by where H * � D 1 H ∈ R h×z h and W * � D 2 W ∈ R w×z w denotes the downsampled dictionary of height and width modes. To Input: original 3D cumulative tensor X ori ∈ R I×J×T ori ≈ MA ori , B ori , C ori N new tensor X new ∈ R I×J×T new Step 1: new tensor is added in the time dimension and X ∈ R I×J×(T ori +T new ) is obtained Step 2: decompose X by CP factorization X ≈ MA, B, CN Step 3: update C by (19), with A and B are fixed Step 4: update A by (27), with B and C are fixed Step 5: update B by (28), with A and C are fixed Step 6: estimate X by updated A, B and C Output: approximation of updated cumulative tensor X ALGORITHM 1: CTCF of third-order tensor.
Mathematical Problems in Engineering 7 recover X, we focus on estimating the dictionaries H, W, and S and the core tensor Z.

e Proposed STTF-Based SR Algorithm.
Since Y is a downsampled version, recovering X from Y is a typical inverse problem, which is badly ill-posed. So, some prior knowledge of X is needed to regularize the super-resolution problem. In HSI processing, the spectral sparsity is a widespread regularizer applied to solve varieties of ill-posed problems [55][56][57][58]. In such regularization, spectral vectors are linearly combined by a small quantity of different spectral signatures. However, these schemes only take advantage of the sparsity of the spectral domain. In the proposed algorithm, taking into account the HSI self-similarity, sparsity regularization is extended to the spatial domain by exploiting the sparse-based tensor Tucker factorization (STTF). In STTF, the HR-HSI performs a united sparse representation of the core tensor and three modes dictionaries.
On the basis of equation (39), the HSV frame superresolution is formulated as a constrained least-squares optimization problem: where ‖ · ‖ F represents the Frobenius norm and θ denotes the number of maximum nonzero elements of Z. Because of the ℓ 0 -norm constraint, equation (40) is nonconvex. To make the optimization processable, the ℓ 0 -norm is replaced by the ℓ 1 -norm and 40 is transformed into an unconstrained version: where λ is the parameter of sparse regularizer. Equation (41) is also nonconvex, and the solutions of H, W, and S and Z are not unique. Nonetheless, if we focus on only one variable with other variables fixed, the objective function in equation (41) is convex. Inspired by [59,60], equation (41) can be solved by proximal alternating optimization scheme, which is guaranteed to reach convergence in a particular situation. Concretely, H, W, S, and Z are updated iteratively by where (·) pre denotes the previous estimation in the last iteration and α denotes a positive number. Equation (41) defines the object function f(H, W, S, Z). e optimizations of H, W, S, and Z will be presented detailedly in the appendix. e conjugate gradient (CG) method [61] and the alternating direction method of multipliers (ADMM) [62] will be used in the optimizations. (41) is nonconvex, the solution would result in poor local minima if we set the initialization carelessly. In this paper, we initialize the spatial dictionaries H * and W * from Y (1) and Y (2) dictionary-updates-cycles KSVD (DUC-KSVD) [63]; this method can promote sparse representations. en, initialization of spectral dictionary S is accomplished by simplex identification split augmented Lagrangian (SISAL) algorithm [64]; this approach can efficiently identify a minimum unit that contains the spectral vectors. e proposed STTF-based SR algorithm is summarized in Algorithm 2.

Experimental Data Set.
To highlight the advantages of HSIs, we choose invisible gas plume to be the target. e proposed algorithms can be extended to other types of data reasonably. In this section, the HSV data set is acquired by the infrared imaging spectrometer "HyperCam-LW." Sulfur hexafluoride (SF 6 ) is chosen to be the target, since it is a kind of odorless and colorless gas plume with a distinct absorption peak in LWIR range. e HSV data set consists of 60 infrared hyperspectral frames with the size of e imaging interval is 4.8 s, and the wavelength of the data ranges from 7.8 μm to 11.8 μm.
In SR method, only the middle 128 × 128 pixels are used in the experiment (specifically, column 71 to column 198) for reasons connected with the algorithm process. And we remove the spectral band 41-127 because of water vapor absorption and extremely low SNR. At last, the size of input LR-HSI is 128 × 128 × 40.

Compared Methods.
For CTCF-based detection method, we compare it with two representative methods: MSD (matched subspace detector) [25] and CMF (constrained matrix factorization) [29]. For STTF-based SR method, we compare it with three state-of-the-art algorithms: bicubic interpolation, sparse representation-based SR method [54], and sequence information-based SR method [65].

Qualitative and Quantitative
Metrics. For detection methods, receiver operating characteristic (ROC) curves [66] are used to evaluate the performance. Generally, a detector outperforms another one if the area under its ROC curve is larger [67]. As suggested in [68], the area under the ROC curve (AUC) is also calculated as a measure of performance of these detection methods. Usually, a better detector gets a higher AUC value.
For SR algorithms, since we directly process the LR-HSI, there is no original HR-HSI (i.e., the ground truth) for reference. us, some popular quantitative metrics are not available, such as RMSE (root-mean-square error) [69], PSNR (peak signal to noise ratio), and SAM (spectral angle mapper). In this section, entropy and average gradient are introduced to evaluate the performance of SR methods.

Entropy.
Super-resolution aims to introduce more useful information into images, so we may measure the performance of SR methods by calculating the contained information in the experimental results. e entropy is indicated as e probability of a pixel i in the image is denoted by P(i) and n denotes the grey value range (0 ∼ 255). e larger the entropy value of the image, the richer the information contained in the image.

Average Gradient.
Another assessment to measure the performance of super-resolution is the change of the amount of detailed information in the image. We may evaluate the experimental results by average gradient, since it can reflect the ability of expressing the details and measuring the clarity of the image. e gradient increases if the greyscale level rate in one direction of the image varies quickly. e average gradient is formulated as where m and n denote the height and width of the image, respectively; f i,j denotes the greyscale value of pixel (i, j) in the image. e larger the average gradient value of the image is, the clearer the image will be. Besides, the visual quality of output images is an important qualitative metric.

Parameters Setting.
In MSD, we pick 463 spectrums of gas target and 846 spectrums of background from the 12th frame of HSV to build up the training set. e size of the target subspace and background space is 127 × 112 and 127 × 115, respectively. In CMF, the number of endmembers is 3, the sparsity of factor matrices is 2, and number of iteration is 3. In the proposed CTCF-based method, the original cumulative tensor is obtained by ALS, the tensor rank is 3, the maximum iteration number is 100, and the reconstruction error is 10 −8 ; in update stage, the threshold of fitness is 0.9. In the proposed STTF-based SR method, the    number of iterations is 5; the parameter α is the weight in (42) and we set α � 10 −3 ; parameter λ controls the sparsity of Z; we set λ � 10 −5 ; parameter μ is set by μ � 10 −2 ; the size of Z is set by z h � 240, z w � 240, and z s � 12. e parameters above are decided after sufficient number of experiments to make a balance between efficiency and stability.

Experimental Results and Discussion.
In this subsection, we show the experimental results of the various methods for detection and super-resolution. After processing the HSV by the proposed CTCF-based method, we compute the values of Frobenius norm of each frame, which are presented in Figure 5. It is obvious that target gas appears in the 12th frame and disappears in the 51st frame. Figure 6 compares the ROC curves of test methods on four frames in detail, and Figure 7 illustrates the general trends of ROC curves of MSD, CMF, and CTCF, respectively. As can be seen from Figures 6 and 7, the proposed CTCF-based detection algorithm outperforms the other two methods. e AUC values of three approaches are shown in Table 1. In each row, the bold value represents the highest AUC value. Although the AUC values of CMF in some frames are better, we can see that the AUC values of CMF in some other frames are very low (less than 0.98). On the contrast, all the results of CTCF lie in the range of 0.98 to 1. From the average value and the variance (the bold value represents the highest value), we can conclude that the proposed method is superior and more stable. e graphical results are illustrated in Figure 8. e target of each key-frame is shown in 2D form (grey image) by taking the maximum value of every spectrum. To save the length of the paper, we choose 8 frames to show the comparison of three detectors, which are shown in Figure 9.
e first row to the eighth row present the detection result of the chosen frame, of which the frame number is 15, 18, 22, 28, 31, 39, 48, and 50. e higher the greyscale of the pixel in the image is, the closer it is to the target. It is apparent that our method extracts more accurate targets. Table 2 shows the entropy and average gradient of the keyframes by four SR algorithms. Since sequence-based method needs 5 LR frames to form 1 HR frame, the compared frame number is changed from range 12∼50 to range 14∼48. In each row, the bold values represent the highest entropy value and the highest average gradient value. From Table 2, we can conclude that firstly, although interpolation can add more information in the frame, the details of the target are lost; secondly, sparse representation SR and sequence information SR have almost the same entropy, but the latter approach offers more details because in the method the HR dictionary is formed by several LR dictionaries; finally, the proposed STTF-based SR method outperforms the other three methods in both metrics. Figure 10 presents the visual quality of the results obtained by four test methods. We choose the 16th, 21st, 34th,      and 47th frames as a representative. e smaller one with size of 128 × 128 is the LR 2D-form frame. e bigger ones with size of 256 × 256 are the SR results of different algorithms. As can be seen from Figure 10, the proposed approach yields clearer outputs with sharper edges and more textures. A drawback is the "checkerboard artifacts," which may be caused by the deconvolution operations in the method. We desired to fix it in our future work.

Conclusions
In this paper, aiming at hyperspectral video, we propose a novel key-frame and target detection method based on cumulative tensor CP factorization, termed as CTCF, and a super-resolution algorithm based on sparse-based tensor Tucker factorization, called STTF. Unlike conventional matrix factorization based methods, CTCF considers hyperspectral video (HSV) as 4D cumulative tensor and approximates new added frames by updating factor matrices. To break the limit of conventional methods and make super-resolution (SR) more practical, STTF exploits the sparsity of HSV frames and factorizes them as a sparse core tensor multiplied by three modes dictionaries. In this way, spatial resolution of LR-HSI is enhanced directly without HR samples. e experimental results systematically prove that the proposed CTCF and STTF methods outperform other state-of-the-art algorithms.
In the future works, we focus on tensor factorization based target tracking methods which are able to extract target region more accurately and clearly. For super-resolution, we aim at exploiting nonlocal similarities in tensor factorization framework, which has been widely used in inverse problems. Besides target tracking and super-resolution, regions of interest (ROI) approaches will be investigated, in order to make HSV target recognition more efficient and full featured. Inspired by [70] and other related works, we believe that the researches of chemical gas detecting methods will benefit the agricultural application of HSI/HSV. ese studies will be of great significance in internet of things (IoT), smart agriculture, pollution monitoring, etc.
e conjugate gradient (CG) method is utilized to solve (A.3). After several iterations, CG will reach the convergence in certain conditions. In our experiments, it has been found that the solution of (A.3) is well approximated after 20 iterations. (2) Optimization of W: when H, S, and Z are fixed, the optimization of W in (42) is expressed by where W pre denotes the previous estimation of width mode dictionary in last iteration. Similar to the optimization of H, where Y (2) denotes the mode-2 unfolding matrix of Y and M w � (Z × 1 H * × 3 S) (2) . Equation (A.5) is also quadratic and can be solved by computing general Sylvester matrix equation; i.e., Likewise, CG is used to solve (A.6).
(3) Optimization of S: when H, W, and Z are fixed, the optimization with respect to S in (42) can be formulated as where S pre denotes the previous estimation of spectral mode dictionary in last iteration. Same as the processing in the two subsections above, we have We apply CG to solve (A.9) and the convergence is achieved in a few iterations.
(4) Optimization of Z: when H, W, and S are fixed, the optimization of Z in (42) can be written as where Z pre denotes the previous estimation of core tensor in last iteration. Equation (A.10) is convex, so we can employ the ADMM to solve the optimization problem. Introducing splitting variables Z 1 � Z and Z 2 � Z, (A.10) can be transformed into the equivalent constrained form: s.t. Z 1 � Z, Equation (A.11) is a typical form of optimization problem that corresponds to the standard ADMM. e augmented Lagrangian function for (A.11) is represented as where β denotes the Lagrangian multiplier and μ denotes the penalty parameter. e process of ADMM is formulated as (A.14) Here, the optimizations of Z 1 and Z 2 are independent because function g(·) is decoupled with respect to these variables. Next, (A.14) will be discussed more detailedly.
(ii) Update Z 2 : based on (A.13), we have Based on (6) and (7), (A.17) is equivalent to arg min where the vectors y � vec(Y), z 2 � vec(Z 2 ), z 1 � vec(Z 1 ), and β � vec(β) are the vectorization form of tensors Y, Z 2 , Z 1 , and β, respectively, and matrix E � S ⊗ W * ⊗ H * . Equation (A.18) has the closed-form solution which is denoted by However, E ∈ R hwS×z h z w z s is so large that (A. 19) is too heavy to be solved. We rewrite the first term of (A.19) as follows: where P i and Q i (i � [1,3]) denote eigenvector matrices and eigenvalue matrices of H * T H * , W * T W * , and S T S, respectively. So, (Q 3 ⊗Q 2 ⊗Q 1 + μI) −1 is diagonal and can be computed easily. Moreover, the operation of P i and of P T i is i-mode products and the multiplication in (A.20) is elementwise. Finally, E T y in the second term of (A.19) can be computed by (A.21) (iii) Update β: based on (A.14), β is updated by Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.