1. Introduction

JECE

Journal of Electrical and Computer Engineering

2090-0155 2090-0147

Hindawi

10.1155/2017/8191537

8191537

Research Article

Improved Collaborative Representation Classifier Based on l2-Regularized for Human Action Recognition

Huo

Shirui

¹ Hu

Tianrui

http://orcid.org/0000-0002-3081-6751

³ ⁴ Guan

Naiyang

City University of Hong Kong

Kowloon Tong

Hong Kong

cityu.edu.hk

Beijing University of Posts and Telecommunications

Beijing

China

bupt.edu.cn

State Key Laboratory of Coal Resources and Safe Mining

China University of Mining & Technology

Beijing

China

cumt.edu.cn

⁴

University of Chinese Academy of Sciences

Beijing

China

ucas.ac.cn

2017

20112017

2017 10 04 2017 15 08 2017 28 09 2017 20112017

2017

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs) and extracting deep convolutional neural network (DCNN) features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM.

State Key Laboratory of Coal Resources and Safe Mining

SKLCRSM16KFD04

SKLCRSM16KFD03

National Natural Science Foundation of China

61601466

Fundamental Research Funds for the Central Universities

2016QJ04

1. Introduction

Human action recognition has been studied in the computer vision community for decades, due to its applications in video surveillance [1], human computer interaction [2], and motion analysis [3]. Prior to the Microsoft Kinect, the conventional research focused on human action recognition from RGB, but Kinect sensors provide an affordable technology to capture RGB and depth (D) images in real time, which can offer better geometric cues and less sensitivity to illumination changes for action recognition. In [1], a bag of 3D points and graphical model are obtained to characterize spatial and temporal information from depth images. In [3], three depth motion maps (DMMs) are projected to capture body shape and motion, which is a discriminant feature to describe the spatiotemporal information of a specific action from a sequence of depth images. Seen from the literature review, although depth based methods appear to be compelling toward a practical application, even if there are a few of deep-learned features for depth based action recognition, the performance is still far from satisfactory due to the large variations of the motion. In this paper, we focus on leveraging one kind of the structure of representative model to improve performance in multiclass classification with handcrafted DMMs descriptor. In [4], three channel deep convolutional neural networks are trained to extract features of depth map sequences after projecting weighted DMMs on three orthogonal planes at several temporal scales. It was verified that the method using DCNN features can achieve almost the same state-of-the-art results on the MSRAction3D and MSRGesture3D dataset. DCNNs have been demonstrated as an effective kind of models for performing state-of-the-art results in the tasks of image recognition, segmentation, detection, and retrieval. With the success of DCNN, we also take it as feature extraction and apply it in our classifier model.

As for representative models, many achievements based on space representation include image restoration [5], compressive sensing [6, 7], morphological component analysis [8], and super-resolution [9, 10]. As the advances of classifiers based on representation, several pattern recognition problems in the field of computer vision can be effectively solved by sparse coding or sparse representation methods in recent decades. In particular, the linear models can be represented as y=Aα [11], where y, α, and A represent the data, a sparse vector, and a given matrix with the overcomplete sample set, respectively. Because of the great success of sparse coding algorithms on image processing, the sparse representation based classifiers, such as sparse representation classification (SRC) and collaborative representation classification (CRC), have gained more attention nowadays.

The basic idea of SRC/CRC is to code the test sample over a set of samples with sparsity constraints, which can be calculated by l1-minimization. In [12], Wright proposed a basic SRC model for classification by the discriminative nature of sparse representation, which is based on the theory that newly signals are recognized as the linear combinations of previously observed ones. Based on SRC, Yang and Zhang proposed a Gabor occlusion dictionary based SRC, which can significantly reduce the computational cost [13]. In [14], the authors combined sparse representation with linear pyramid matching for image classification. Rather than using the entire training set, Zhang and Li [15] proposed learned dictionary using SRC. In [16], l1-graph is constructed by a sparse representation subspace over the other samples. Yang et al. [14] also proposed a method to preserve the l1-graph for image classification by using a subspace to solve misalignment problems in image classification task. Besides, SRC is used for robust illumination [17], image-plane transformation [18], and so on. However, Zhang argued that the good performance of SRC should be largely attributed to the collaborative representation of a test sample by training samples across all classes and proposed more effective CRC. In summary, SRC/CRC simply uses the reconstruction error or residual by each class-specific subspace to determine the class label, and many modified models and solution algorithms to SRC/CRC are also proposed for visual recognition tasks, including Augmented Lagrange Multiplier, Proximal Gradient, Gradient Projection, Iterative Shrinkage-Thresholding, and Homotopy [19]. Recently, some researchers [20, 21] have pointed out the purpose of l1-regularized based sparsity in pattern classification. On the contrary, using l2-regularized based representation for classification can do a similar job to l1-regularized but the computational cost will reduce a lot.

Motivated by the work of modifications of CRC, in this paper, we mainly present the improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition. Based on three DMMs’ descriptor feature, the ICRC approach is to jointly maximize the likelihood that a test sample belongs to each of the multiple classes, then the final classification is performed by computing the likelihood for each class. The experiments on human action classification tasks, including MSRAction3D and MSRGesture3D datasets, are demonstrated and analyzed on the superior performance of this algorithm over the state-of-the-art methods, including SRC, CRC, and SVM. The rest of the paper is organized as follows. In Section 2, we introduce related feature descriptors using DMM. Section 3 details the action classifier based on ICRC, and Section 4 shows the experimental results of our approach on relevant datasets. The conclusion and acknowledgment are drawn in Section 5 and Acknowledgments section.

2. Feature Descriptors 2.1. Using Depth Motion Maps

In this section, we explain the extracted feature descriptor using depth motion maps (DMMs) from depth images, which is generated by selecting and stacking motion energy of depth maps projected onto three orthogonal Cartesian planes, aligning with front (f), side (s), and top (t) views (i.e., pf, ps, and pt, resp.). As for each projected map, its motion energy is computed by thresholding the difference between consecutive maps. The binary map of motion energy provides a strong clue of the action category being performed and indicates motion regions or where movement happens in each temporal interval. We suggest that all frames should be deployed to calculate motion information instead of selecting frames. Considering both the discriminability and robustness of feature descriptors, we use the l1-norm of the absolute difference of a frame to define the salient information on depth sequences. Because l1-norm is invariant to the length of a depth sequence, and l1-norm contains more salient information than other norms (e.g., l2), we have(1)DMMf,s,t=∑i=1N-vpi+vf,s,t-pif,s,t,where v is the frame interval, i represents the frame index, and N is the total number of frames in a depth sequence. In the case that the sum operation in (1) is only used given a threshold satisfied, the scale of v affects little the local pattern histogram on the DMMs.

2.2. Using Deep Convolutional Neural Networks

In this section, we introduce three deep convolutional neural networks (DCNNs) to train the features on three projected planes of DMMs and perform fusion of three nets by combining the softmax in fully connected layer. The layer configuration of our three CNNs is schematically shown in Figure 1, in which there are five convolutional layers and three fully connected layers in each net. The detail of our implementation is illustrated in Section 4.1.2.

Figure 1

Three DCNNs architecture for a depth action sequence to extract features.

3. Action Classifier Based on ICRC

Based on depth motion maps, to incorporate the feature descriptors into a powerful classifier, an improved collaborative representation classifier (ICRC) is presented for human action recognition.

3.1. <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M27"><mml:mrow><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>-Regularized Collaborative Representation Classifier

The basic idea of SRC is to get a test sample by sparsely choosing a small number of atoms from an overcomplete dictionary that contain all training samples [12]. Denoted by Aj∈Rd×n, the set of training samples form class j, and suppose we have C class of subjects. So A=A1,A2,…,AC∈Rd×n involves many samples from all classes, and Aj (j=1,2,…,C) is the individual class of training samples, n is the total number of training samples, and d is the dimension of training samples. A query sample y∈Rd can be presented by y=Aα, where y, α, and A represent the data, a sparse vector, and a given matrix with the overcomplete training samples, respectively.

To be specific, in the mechanism of collaborative representation classifier (CRC), each data point in the collaborative subspace can be represented as a linear combination of samples in A, where α=[α1,α2,…,αC] is an n×1 representation vector associated with training sample and αj (j=1,2,…,C) is the subvector corresponding to Aj. Generally, it is formulized as a l1-norm minimization problem with a convex objective and solved by(2)minα y-Aα22+θα1,s.t. y=Aα,where θ is a positive scalar to balance the sparsity term and the residual. The residual can be computed as(3)ej=y-Ajαj2,where αj is the coefficient vector corresponding to class j. And then the output of the identity of y can be obtained by the lowest residual as(4)classy=arg⁡minj⁡ej.

For more details of SRC/CRC, one can refer to [12]. Because of the computational time consuming in l1-regularized minimization, (1) is approximated as(5)minα y-Aα22+λLα22,s.t. y=Aα,where y∈Rd is representative by α and A if the l2-norm of α is smaller. In (5), L is the Tikhonov regularization [27] to calculate the coefficient vector, and λ is the regularization parameter. Lα2 is the l2-regularization term to add a certain amount of sparsity to α, which is weaker than l1-norm minimization. The diagonal matrix L and the coefficient vector α are calculated as follows [21]: (6)L=y-h12⋯0⋮⋱⋮0⋯y-hk2,α=Py,where P=(ATA+λLTL)-1AT is independent of y and precalculated. With (3) and (4), the data y is assigned different identities based on α.

3.2. The Proposed ICRC Method

Based on the training sample set, we propose an improved collaborative representation classifier based on l2-regularized term, which assigns the data points with different probabilities based on α by adding a term ∑jCAα-Ajαj22 that attempts to find a point Ajαj close to the common point y inside each subspace of class j. The first two terms y-Aα22+λLα22 still form a l2-regularized collaborative representation term, which encourages to find a point Aα close to y in the collaborative subspace. Therefore, (5) is rewritten as(7)minα y-Aα22+λLα22+γC∑jCAα-Ajαj22,s.t. y=Aα.Obviously, the parameters λ and γ balance three terms, which can be set from the training data. Accordingly, a new solution of representative vector α is obtained from (7).

In the condition of γ=0, (7) will degenerate to CRC with the first two terms, and y-Aα22+λLα22 will play an important role in determining α. When γ>0, these two terms y-Aα22+λLα22 will be the same for all classes, and thus the term ∑jCAα-Ajαj22 will be dominant to further fine-tune αj by Aj yielding to a precise α. That is, the last newly added term is introduced to further adjust αj by Aj, resulting in a more stable solution to representative vector α.

We can omit the first two same terms for all classes, make the classifier rule by the last term, and formulize it as a probability exponent:(8)classy=arg⁡maxj⁡exp⁡-Aα-Ajαj22.The proposed l2-regularized method for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then the experiments in the following section show that it obtains a final classification by checking which class has the maximum likelihood. So far, the abovementioned classifier model in (7) and (8) is named as the improved collaboration representation classifier (ICRC).

4. Experimental Results

Based on depth motion maps, to incorporate the feature descriptors into a powerful classifier, an ICRC is presented for human action recognition. To verify the effectiveness of the proposed ICRC algorithm on action recognition applications using DMM descriptors of depth sequences, we carry out experiments on challenging depth based action datasets MSRAction3D [1] and MSRGesture3D [1] for human action recognition.

4.1. Feature Descriptors 4.1.1. DMMs

The MSRAction3D [1] dataset is composed of depth images sequence captured by the Microsoft Kinect-V1 camera. It includes 20 actions performed by 10 subjects facing the camera. Each subject performed each action 2 or 3 times. There are 20 action types: high arm wave, horizontal arm wave, hammer, hand catch, forward punch, high throw, draw X, draw tick, draw circle, hand clap, two-hand wave, side boxing, bend, forward kick, side kick, jogging, tennis swing, tennis serve, golf swing, and pick up and throw. The size of each depth image is 240 × 320 pixels. The background information has been removed in the depth data.

The MSRGesture3D [1] dataset is for continuous online human action recognition from a Kinect device. It consists of 12 gestures defined by American Sign Language (ASL). Each person performs each gesture 2 or 3 times. There are 333 depth sequences. For action recognition on the MSRAction3D and MSRGesture3D dataset, we use the feature computed from the DMMs, and each depth action sequence generates three DMMs corresponding to three projection views. The DMMs of high arm wave class from the MSRAction dataset are shown in Figure 2, and the DMMs of ASL Z class from the MSRGesture3D dataset are shown in Figure 3.

Figure 2

Three DMMs of a depth action sequence “ASL Z” from the front (f) view, side (s) view, and top (t) view, respectively.

Figure 3

Three DMMs of a depth action sequence “Swipe left” from the front (f) view, side (s) view, and top (t) view, respectively.

4.1.2. DCNNs

Furthermore, our implementation of DCNN features is based on the publicly available MatConvNet toolbox [28] using one Nvidia Titan X card. The network weights are learned by mini-batch stochastic gradient descent. Similar to [4], the momentum is set to 0.9 and weight decay is set to 0.0005, and all hidden weight layers use the rectification activation function. At each iteration, 256 samples in each batch are constructed and resized to 256 × 256, then 224 × 224 patches are randomly cropped from the center of the selected image to artificial data augmentation. The dropout regularization ratio is 0.5 in the nets. Besides, the initial learning rate is set to 0.01 with pretrained model on ILSVRC-2012 to fine-tune our model, and the learning rate decreases every 20 epochs. Finally, we concatenate three 4096 dimensional feature vectors in 7th fully connected layer to input the subsequent classifier.

4.2. Experiment Setting

The same experimental setup in [1] was adopted, and the actions in MSRAction3D dataset were divided into three subsets as follows: AS1: horizontal wave, hammer, forward punch, high throw, hand clap, bend, tennis serve, and pickup throw; AS2: high wave, hand catch, draw x, draw tick, draw circle, two-hand wave, forward kick, and side boxing; AS3: high throw, forward kick, side kick, jogging, tennis swing, tennis serve, golf swing, and pickup throw. We performed three experiments with 2/3 training samples and 1/3 testing samples in AS1, AS2, and AS3, respectively. Thus, the performance on MSRAction3D is evaluated by the average accuracy (Accu., unit: %) on three subsets. On the other hand, the same experimental setting reported in [26, 29, 30] was followed. 12 gestures were tested by leave-one-subject-out cross-validation to evaluate the performance of the proposed method.

4.3. Recognition Results with DMMs and ICRC

We concatenate the sign, magnitude, and center features to form the feature based on DMMs as the final feature representation. The compared methods are similar to [29, 30]. The same parameters reported in [26] were used here for the sizes of SI and block. A total of 20 actions are employed and one-half of the subjects (1, 3, 5, 7, and 9) are used for training and the remaining subjects are used for testing. The recognition performance of our method and existing approaches are listed in Table 1. It is clear that our method achieves better performance than other competing methods.

Table 1

Recognition accuracies (unit: %) comparison on the MSRAction3D dataset and MSRGesture3D dataset.

Method	Accu. (%) on two datasets
Method	MSRAction3D	MSRGesture3D
DMM-HOG [3]	85.5	89.2
Random Occupancy [22]	86.5	88.5
Actionlet Ensemble [23]	88.2	88.2
Depth Cuboid [24]	89.3	90.5
Vemulapalli [25]	89.5	87.7
DMM + CRC [26]	92.3	92.5
DMM + ICRC (proposed)	93.8	94.8

To show the outcome of our method, Figures 4 and 5 illustrate the recognition rates of each class in two datasets. It is stated that there are 14 classes obtaining 100% recognition rates in the MSRAction3D dataset, and the performance of 3 classes reaches up to best in the MSRGesture3D dataset. All experiments are carried out using MATLAB 2016b on an Intel i7-6500U desktop with 8 GB RAM, and the average time of video processing gets about 26 frames per second, meeting a real-time processing demand basically.

Figure 4

Recognition rates (unit: %) of 20 classes in MSRAction3D dataset (average results of three subsets).

Figure 5

Recognition rates (unit: %) of 12 classes in MSRGesture3D dataset.

4.4. Comparison with DCNN Features and ICRC

Furthermore, in order to evaluate our proposed classifier method, we also extract the deep features by the abovementioned conventional CNN model and then input the 12288 dimensional vectors to the proposed ICRC for action recognition. Table 2 shows that DCNN algorithm indeed has advances as good as in other popular tasks of image classification and object detection, and it can improve the accuracy greatly up to 6% in MSRAction3D and MSRGesture3D. This would also explain the importance of effective feature to ICRC classifier.

Table 2

Recognition accuracies (unit: %) comparison of DMM + ICRC and DCNN + ICRC on the MSRAction3D dataset and MSRGesture3D dataset.

Method	Accu. (%) on two datasets
Method	MSRAction3D	MSRGesture3D
DMM + ICRC (proposed)	93.8	94.8
DCNN + ICRC (proposed)	99.99	100.0

5. Conclusion

In this paper, we propose improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition. The DMMs and DCNN feature descriptors are involved as an effective action representation. For the action classifier, ICRC is proposed based on collaborative representation with the additional regularization term. The new insight focuses on a subspace constraints on the solution. The experimental results on MSRAction3D and MSRGesture3D show that the proposed algorithm performs favorably against the state-of-the-art methods, including SRC, CRC, and SVM. Future work will focus on involving the deep-learned network in the depth image representation and evaluating more complex datasets such as MSR3DActivity, UTKinect-Action, and NTU RGB+D, for the action recognition task.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work was supported by the State Key Laboratory of Coal Resources and Safe Mining under Contracts SKLCRSM16KFD04 and SKLCRSM16KFD03, in part by the Natural Science Foundation of China under Contract 61601466, and in part by the Fundamental Research Funds for the Central Universities under Contract 2016QJ04.

Zhang

Liu

Action recognition based on a bag of 3D points

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW '10)

June 2010

San Francisco, Calif, USA

9 14

10.1109/cvprw.2010.5543273

2-s2.0-77956552331

Wang

Yang

Pang

Hauptmann

A. G.

Semi-supervised multiple feature analysis for action recognition

IEEE Transactions on Multimedia 2014 16 2 289 298

2-s2.0-84893349593

10.1109/TMM.2013.2293060

Yang

Zhang

Tian

Recognizing actions using depth motion maps-based histograms of oriented gradients

Proceedings of the 20th ACM International Conference on Multimedia, MM 2012

November 2012

jpn

1057 1060

2-s2.0-84871394796

10.1145/2393347.2396382

Wang

Gao

Zhang

Tang

Ogunbona

Deep convolutional neural networks for action recognition using depth map sequences

Computer Vision and Pattern Recognition arXiv preprint, https://arxiv.org/abs/1501.04686, 2015

Vinje

W. E.

Gallant

J. L.

Sparse coding and decorrelation in primary visual cortex during natural vision

Science 2000 287 5456 1273 1276

2-s2.0-0034681515

10.1126/science.287.5456.1273

Candès

Compressive sampling

Proceedings of the International Congress of Mathematics

2006

1433 1452

MR2275736

Guan

Tao

Luo

Yuan

NeNMF: an optimal gradient method for nonnegative matrix factorization

IEEE Transactions on Signal Processing 2012 60 6 2882 2898

10.1109/TSP.2012.2190406

MR2924058

Starck

J.-L.

Elad

Donoho

Redundant multiscale transforms and their application for morphological component separation

Advances in Imaging and Electron Physics 2004 132 287 348

2-s2.0-3142656055

10.1016/S1076-5670(04)32006-9

Yang

Wright

Huang

Image super-resolution as sparse representation of raw image patches

Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition

June 2008

1 8

10.1109/CVPR.2008.4587647

2-s2.0-51949105499

Dong

Zhang

Shi

Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization

IEEE Transactions on Image Processing 2011 20 7 1838 1857

10.1109/TIP.2011.2108306

MR2840121

2-s2.0-79959594311

Huang

Aviyente

Sparse representation for signal classification

Proceedings of the NIPS

December 2006

Vancouver, Canada

609 616

2-s2.0-84864043302

Wright

Yang

A. Y.

Ganesh

Sastry

S. S.

Robust face recognition via sparse representation

IEEE Transactions on Pattern Analysis and Machine Intelligence 2009 31 2 210 227

10.1109/TPAMI.2008.79

2-s2.0-61549128441

Yang

Zhang

Gabor feature based sparse representation for face recognition with gabor occlusion dictionary

Proceedings of the 11th European Conference on Computer Vision (ECCV '10)

2010

Crete, Greece

Springer

448 461

10.1007/978-3-642-15567-3_33

Yang

Gong

Huang

Linear spatial pyramid matching using sparse coding for image classification

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '09)

June 2009

1794 1801

10.1109/CVPRW.2009.5206757

2-s2.0-70450209196

Zhang

Discriminative K-SVD for dictionary learning in face recognition

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10)

June 2010

2691 2698

10.1109/CVPR.2010.5539989

2-s2.0-77955998411

Cheng

Yang

Yan

Huang

T. S.

Learning with l1-graph for image analysis

IEEE Transactions on Image Processing 2010 19 4 858 866

10.1109/TIP.2009.2038764

MR2752089

Wagner

Wright

Ganesh

Zhou

Towards a practical face recognition system: Robust registration and illumination by sparse representation

Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009

June 2009

usa

597 604

2-s2.0-70450162109

10.1109/CVPRW.2009.5206654

Huang

J. Z.

Huang

X. L.

Metaxas

Simultaneous image transformation and sparse representation recovery

Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08)

June 2008

Anchorage, Alaska, USA

1 8

10.1109/cvpr.2008.4587640

2-s2.0-51949108630

Yang

A. Y.

Genesh

Zhou

Sastry

A Review of Fast L1-Minimization Algorithms for Robust Face Recognition

Defense Technical Information Center 2010

10.21236/ADA525384

Zhang

Yang

Feng

Sparse representation or collaborative representation: Which helps face recognition?

Proceedings of the IEEE International Conference on Computer Vision (ICCV '11)

November 2011

Barcelona, Spain

471 478

10.1109/ICCV.2011.6126277

2-s2.0-84863011302

Berkes

White

B. L.

Fiser

No evidence for active sparsification in the visual cortex

Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, NIPS 2009

December 2009

can

108 116

2-s2.0-83955163299

Wang

Liu

Chorowski

Chen

Robust 3D action recognition with random occupancy patterns

Computer Vision—ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7–13, 2012, Proceedings, Part II 2012

Berlin, Germany

Springer

872 885

10.1007/978-3-642-33709-3_62

Wang

Liu

Yuan

Mining actionlet ensemble for action recognition with depth cameras

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '12)

June 2012

Providence, RI, USA

1290 1297

10.1109/cvpr.2012.6247813

2-s2.0-84866672692

Xia

Aggarwal

J. K.

Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera

Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '13)

June 2013

Portland, Ore, USA

IEEE

2834 2841

10.1109/cvpr.2013.365

2-s2.0-84887324355

Vemulapalli

Arrate

Chellappa

Human action recognition by representing 3D skeletons as points in a lie group

Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14)

June 2014

Columbus, Ohio, USA

588 595

10.1109/cvpr.2014.82

2-s2.0-84911376484

Chen

Liu

Zhang

Han

Junjun

Liu

3D action recognition using multi-temporal depth motion maps and fisher vector

Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI 2016

July 2016

usa

3331 3337

2-s2.0-85006085657

Tikhonov

A. N.

Arsenin

V. Y.

Solutions of Ill-Posed Problems 1977

New York, NY, USA

John Wiley & Sons

MR0455365

Zbl0354.65028

http://www.vlfeat.org/matconvnet/

Yang

Zhang

Yang

Chen

Yang

Action recognition using completed local binary patterns and multiple-class boosting classifier

Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition, ACPR 2015

November 2016

mys

336 340

2-s2.0-84978864758

10.1109/ACPR.2015.7486521

Vieira

A. W.

Nascimento

E. R.

Oliveira

G. L.

Liu

Campos

M. F. M.

On the improvement of human action recognition from depth map sequences using space-time occupancy patterns

Pattern Recognition Letters 2014 36 1 221 227

10.1016/j.patrec.2013.07.011

2-s2.0-84893704329