HuAc : Human Activity Recognition Using Crowdsourced WiFi Signals and Skeleton Data

The joint of WiFi-based and vision-based human activity recognition has attracted increasing attention in the human-computer interaction, smart home, and security monitoring fields. We propose HuAc, the combination of WiFi-based and Kinect-based activity recognition system, to sense human activity in an indoor environmentwith occlusion, weak light, and different perspectives. We first construct a WiFi-based activity recognition dataset named WiAR to provide a benchmark for WiFi-based activity recognition. Then, we design a mechanism of subcarrier selection according to the sensitivity of subcarriers to human activities. Moreover, we optimize the spatial relationship of adjacent skeleton joints and draw out a corresponding relationship between CSI and skeleton-based activity recognition. Finally, we explore the fusion information of CSI and crowdsourced skeleton joints to achieve the robustness of human activity recognition. We implemented HuAc using commercial WiFi devices and evaluated it in three kinds of scenarios. Our results show that HuAc achieves an average accuracy of greater than 93% using WiAR dataset.


Introduction
Human activity recognition is an important research problem in the social life, pervasive computing, and security monitoring fields [1][2][3].Daily activities [4] were seen as an important means of communicating in our daily life, and we can communicate through body language like hands and head rather than speaking.Therefore, human activity recognition systems have been proposed in terms of application demand, technical support, and auxiliary devices.
Previous works related to activity recognition are roughly divided into three categories including wearable-based, vision-based, and WiFi-based.Wearable-based sensing behavior has been popular and widely used in elder healthcare, smart sensing, sports application, and tracking [1,5,6].Researchers leverage the collecting information via sensors to recognize human behavior and analyze human health condition.However, it has several limitations such as increasing the burden of users, the inconvenience of routine life, and sensors with limited power.Vision-based activity recognition has been popular and achieves high accuracy.The light, shadowing, privacy protection, and angle factors increase the difficulty of activity recognition and constrain the application fields.Microsoft released Kinect technology and Kinect can provide skeleton information using built-in sensors [7,8].Although Kinect-based activity recognition solves the lightenvironment problem and can track the skeleton joints of an activity with high accuracy, it cannot recognize the imperfect activity due to the crowded room, the presence of obstacles, and out of the monitoring range.
With the coverage of WiFi signals and the improvement of wireless infrastructures in public places, WiFi-based activity recognition systems [4,[9][10][11] leverage the change pattern of WiFi signals reflected by a human body to recognize the activity.WiFi-based activity recognition systems [12][13][14] not only ease the burden of wearable-based users, but also can sense the presence of obstacles in comparison with Kinectbased works.For example, WiVi [14] can sense the user's behavior through the wall, and RF-Capture [11] tracks the 3D positions of a human body when the person is occluded completely and captures the human figure without wearable devices.
We are interested in BodyScan system [15], and it is estimated on the idea of the combination of the wearable (iv) We implement the fusion framework of CSI and skeleton data to sense the activity and solve the limitations of CSI-based and skeleton-based activity recognition, respectively.Experimental results show that HuAc achieves the accuracy of greater than 93%.
The rest of this paper is organized as follows.We introduce the related work in Section 2. Section 3 introduces preliminaries of WiFi-based activity recognition, and we describe the overview of HuAc in Section 4. Section 5 describes Kinect module, and WiFi module is shown in Section 6. Section 7 describes the process of human activity recognition.Section 8 evaluates the performance of HuAc system, and we give a case study about a motion-sensing game using WiFi signals in Section 9. Section 10 lists several discussions, and we give the conclusion of this paper in Section 11.

Related Work
In this section, related works on human activity recognition can be divided into two categories: Kinect-based, WiFi-based.

Kinect-Based Activity Recognition.
Vision-based activity recognition has been proposed and developed in the computer vision field.With the release of Kinect, researchers explore the human activity recognition using depth information and skeleton joints data provided by Kinect [7,8,16].Biswas and Basu [8] leverage the histogram of depth information to recognize eight gestures.Moreover, the differences between continuous frames can obtain the motion profile to describe various gestures.Other works [7,16] leverage depth information in combination with color image to improve the accuracy of gestures recognition.The limitations of Kinectbased activity recognition contain the restriction of sensing field, skeleton joints overlapping, and position-dependence factors.HuAc system explores the spatial relationship of skeleton joints to describe the trajectory of an activity and combines with CSI to improve the robustness of human activity recognition in a dynamic environment.

WiFi-Based Activity Recognition.
Early works [17][18][19] explore the attenuation characteristics of WiFi signals to locate the position of someone and count the number of people in the indoor environment.Researchers study the signal pattern reflected by a human body to sense human behavior [11,[20][21][22].These works describe human behavior recognition using coarse-grained RSSI information.For example, WiGest [18] studies the relationship between RSSI fluctuation and gestures to control media player actions without training.Therefore, we explore the relationship between RSSI fluctuation and human movement to detect the presence of an activity.
With the requirement of the practical application and the limitations of RSSI, an increasing number of researchers begin to explore fine-grained channel state information (CSI) to sense human behavior.Compared with RSSI, CSI can capture the tiny behavior [2,9,[23][24][25][26][27][28] in terms of location, speed, and direction.WiFall system [2] detects a fall behavior by learning the specific CSI pattern.E-eyes [9] recognizes walking activity and in-place activity by adopting moving variance of CSI and fingerprint technique.Walking activity causes significant pattern changes of the CSI amplitude over time, since it involves significant body movements and location changes.In-place activity (watching TV) only involves relative smaller body movements and will not cause significant amplitude changes with repetitive patterns.The relationship between an activity and the place where an activity occurs motivates the novel idea on human activity recognition.CARM [10] shows the correlation between CSI value and human activity by constructing CSI-speed and CSIactivity model.WiDance [28] explores the Doppler shifts reflected by human behavior to predict the motion direction for the Exergames.We design the combination system of Kinect-based and WiFi-based methods to recognize an activity in different environments such as gaming system, supermarket, and elder health applications.

RSSI and CSI.
Received Signal Strength Indicator (RSSI) [29] in the level of packet represents signal-to-interferenceplus-noise ratio (SINR) over the channel bandwidth as follows: Right elbow (5) Right hand (6) Le shoulder (7) Le elbow (8) Le hand ( 9) Torso (10) Right hip (11) Le hip (12) Right knee (13) Right foot (14) Le knee (15) Le foot where  is signal voltage.RSSI is the received signal strength in decibels (dB) and mapped into the distance according to Log-distance path loss model to roughly locate users or devices.
Channel State Information (CSI) depicts multipath propagation at the granularity of OFDM subcarrier in the frequency domain.It contains amplitude and phase measurements as follows: where |ℎ| and  are the amplitude and phase, respectively.The variable ℎ shows CSI value of each subcarrier.We study the characteristics of each subcarrier to sense activity in the following work.females and five males, and the height of human body ranges from 150 cm to 185 cm.
The environmental complexity according to the room layout divides into three levels including empty environment, normal environment, and complex environment.First, empty environment describes no people and furniture around it.We obtain the high-quality WiFi signals from the empty room due to less noise and treat it as a baseline of WiAR dataset.Then, the normal environment contains furniture and working people.Compared with the empty environment, the multipath effect reflected by the furniture enriches collecting WiFi signals.Finally, a complex environment with furniture and moving people increases the difficulty of human activity recognition.The performance of WiAR dataset is given in Section 8.
In our work, we attempt to collect WiFi signals and crowdsourced skeleton joints to reduce the training burden for collecting activity dataset.We obtain the activity label by leveraging the help from Kinect's user.The framework of crowdsourced WiFi signals and skeleton joints are shown in Figure 2.
The Impact of Indoor Environment on WiFi Signals Has a Difference with Time.RSSI and CSI keep stability in the static indoor environment, and RSSI fluctuation ranges from 0 dB to 5 dB (empty environment: 0-3 dB; home environment: 0-7 dB; office: 0-5 dB; dynamic environment: 5-10 dB).Although RSSI sharply changes with environmental change, it cannot describe the fine-grained change of indoor environment due to the multipath effect.However, CSI is able to sense the change of fine-grained environment and detects what happened in an indoor environment.Specifically, RSSI only can find the environmental change and cannot sense how  It Is Hard to Distinguish Similar Activities.Existing works [2,15,45] explore the similar activity recognition.For example, WiFall [2] extracts seven features to describe fall behavior because similar activity causes the similar patterns of CSI, and it is difficult to distinguish them only using anomaly detection.The following RT-Fall system adopts the CSI phase difference to segment fall and fall-like activities because the phase difference of CSI is a more sensitive signature than CSI amplitude for activity recognition.The phase of CSI depends on the variation of LOS (Line-of-Sight) length.Therefore, the breakthrough point of the similar activity recognition rests on the physical difference between similar activities.

The Same Activity Operated by Different People Has Various
Signal Patterns.According to our observations, the amplitude of CSI reflected by the same activity changes continuously in the different time and environments.Therefore, we cannot recognize activity with high accuracy according to the amplitude of CSI.The changing pattern of signals reflected by an activity can describe the characteristic of activity as verified by Smokey [25].Therefore, we explore the changing pattern of signals to recognize an activity.

The Impact of Activity with Different Directions on Activity
Recognition.In order to explore the impact of direction on activity recognition, we design a simple and clear experiment on the playground because the playground does not have rich multipath effect and other wireless devices.We explore the impact of four directions including east, west, north, and south on the change pattern of signals, and the difference between face and back to the AP is biggest.Moreover, CSI data we collect in the playground contains less noise than that in an indoor environment.

Framework of HuAc.
The HuAc framework consists of the Kinect-based module and WiFi-based module in Figure 3.We describe details of each module, respectively.Kinect module consists of the preprocessing and posture analysis.We detect the overlap of skeleton joints using the statistical method and complete the normalization of skeleton joints.In order to obtain effective features of skeleton joints, we analyze postures of an activity according to the sequence of skeleton joints.Moreover, we design a selection method of skeleton joints named SSJ according to the result of posture analysis.Finally, we extract features of skeleton joints according to effective skeleton joints and also consider the spatial relationship of adjacent joints as auxiliary information to sense human activity.
WiFi module consists of the preprocessing and features extraction.In the preprocessing stage, we detect and remove the outlier data of an activity sequence according to the variance of RSSI reflected by an activity.After removing outlier data, we leverage the weighted moving average to smooth the activity data.For features extraction, we first analyze the amplitude distribution of CSI reflected by an activity to evaluate the sensitivity of the subcarrier on an activity.Then, we use -means algorithm to cluster effective subcarriers.Finally, we extract important features from effective subcarriers to improve the stability of human activity recognition.
We use the combination information of CSI features set and the skeleton features set as an input of SVM to recognize human activity.Compared with the result of predict label, we give a feedback to the previous process of HuAc framework by using a train label, respectively.

Kinect Module
We mainly describe the details of Kinect module on the human activity recognition.Kinect module contains the preprocessing and posture analysis.

Preprocessing.
The collected skeleton data contain empty values due to the overlap of skeleton joints or the occlusion in the motion-sensing game.Therefore, we need to detect the overlapping joints and replace the invalid values by recovering the true value of the overlapping joints.We leverage the relationship between the coordinates of adjacent joints to detect the overlapping joints.Certainly, we discard the sample of an activity when the percent of invalid joints exceeds the threshold.
After recovering the invalid data, we normalize the coordinates of skeleton joints due to the differences of people's height and the distance between the user and the sensor.The work [7] extracts 11 joints (except right shoulder, left shoulder, right hip, and left hip) from 15 joints in Figure 4, and we explore 30 subcarriers with the similar pattern reflected by a human body.Therefore, we select 15 joints to match the 15 subcarriers.Let   be one of the 15 joints detected by the Kinect, and the coordinates vector  is given by where   is the vector containing the 3D normalized coordinates of the th joint   detected by Kinect.Thus, where  is the scale factor which normalizes the skeleton according to the distance ℎ, between the neck and the torso joints of a reference skeleton, and The translation matrix, , needs to set the origin of the coordinate system to the torso.After preprocessing phase, we obtain high-quality skeleton data.

Postures
Analysis.An activity consists of subactivity sequence over time.According to the skeleton structure, a human body is divided into two parts including upper body and lower body.Upper body contains five joints (right elbow, left elbow, right hand, left hand, and head) and two baseline joints (neck, torso) as in Figure 4. Lower body contains four joints (right foot, left foot, right knee, and left knee).We reproduce the tracking of skeleton joints using QT tool and plot the trajectory chart of each activity.We observe that the adjacent joints keep the similar track in Figure 5, and some joints have slight movement influenced by human activity.For example, when the right elbow and right hand move in the clockwise direction to complete the horizontal arm wave, we observe that right hip and left hip have slight movement.
According to the change of joints sequence, we can segment an activity into several subactivities in terms of direction and pause factor.Horizontal arm wave behavior consists of four postures (subactivities) as in Figure 6.Each subactivity roughly contains 14 frames and   represents the th frame (packet) of the activity reported by Kinect.We can evaluate the rough activity according to the sequence of subactivity.Except for related joints of each subactivity, torso and hip joints have a weak swing.We neglect the impact of weak swing on the activity recognition.We pay more attention to the selection of skeleton joints in the following section.

SSJ:
Selecting Skeleton Joints.We design a selection method of skeleton joints named SSJ to describe a finegrained subactivity.After postures analysis, we know the relationship between a subactivity and key skeleton joints.We expend the coordinated system of human skeleton to miniature coordinated system of subactivity skeleton by the above-mentioned relationship.The miniature coordinated system needs to determine a fixed skeleton joint and different subactivities have different fixed skeleton joints.For example, we observe that shoulder joint is a fixed joint from the process of high arm wave behavior.Therefore, we determine the starting point coordinate of the miniature coordinated system corresponding to the subactivity.

WiFi Module
We introduce the design details of WiFi module on the human activity recognition.WiFi module consists of the preprocessing and features extraction.
6.1.Preprocessing.The collected data with noises increases the difficulty of activity recognition due to the tiny differences between noises and WiFi signals reflected by a fine-grained activity.Outlier data also weaken the quality of collecting data.Therefore, we detect outlier using the variance-based method and remove high-frequency signals using the lowpass filter.Moreover, we reduce the sawtooth wave of the filtered signal by using the weighted moving average.Then, we combine the variance of RSSI and the experiencethreshold to detect outlier.After removing outlier data, the activity corresponds to the low-frequency change of CSI according to the waveform of CSI reflected by an activity.Therefore, we adopt the low-pass filter to remove the highfrequency data in Figure 7.
where CSI ,1 is the averaged new CSI.The value of  decides in what degree the current value is related to historical records.In our study, we select  according to the experience and trial method.We first set  as 5 which means the length of 5 packets.A weighted moving average algorithm and median filter have the similar effect on the original signals recorded by the receiver in Figure 7.They can remove the galling of signals and alleviate the sharp change of signals.With the  increasing, the weighted moving average algorithm becomes more smooth than the low-pass filter and the median filter.Finally, we set  to 10 because each activity produces a sharp change in 10 packet periods.

Feature Extraction.
Plenty of related works summarize the importance of features extraction for human activity recognition in a dynamic indoor environment.We segment activity after smoothing CSI and extract features of each activity according to activity characteristics.Kinect-based features extraction quotes the work [3].

Activity Segmentation.
Activity segmentation mainly detects the start and end of an activity and removes the nonactivity packets from a sample which corresponds to the whole activity.We propose two methods to detect the start and end of an activity and improve the robustness of segmentation algorithm.First, we remove the first second and the last-second data sequence of an activity to reduce the error of true activity sequence in our experimental environment.But this method is invalid in the practical environment due to the unknown time which each activity starts.Therefore, we leverage moving variance of CSI to detect the start and end of each activity.Moving variance of CSI describes the difference of the local packets reflected by the activity.Packet sequences on the corresponding activity are defined as  = { 1 ,  2 , . . .,   }.  represents data sequence (a sample) of an activity, and   represents the th packet in the data sequence.We often use the standard deviation instead of the variance of CSI as follows: where  represents step-size and  is the mean value of samples.
We construct a window per 10 packets from the packet sequence of each sample and compute the variance of the window.Then, we construct the moving variance histogram and compare with other strength windows.Finally, we can detect the sharp points of each activity and roughly recognize rectangle represents the duration of activity.Moreover, the black dotted line roughly represents the true start and end of the activity.According to our experimental results, detecting the start and end of the activity still causes a small error due to the sensitivity of signals.

Subcarrier Selection and Feature Detection.
According to our observation, subcarriers have the similar tendency for the same activity in Figure 9, but they have different sensitivity.Therefore, we select the obvious subcarriers reflected by an activity using -means to achieve the robustness of human activity recognition.Thirty subcarriers are divided into 3 clusters using -means algorithm in Figure 10.According to the output of -means algorithm on subcarriers, CSI features we extract include variance, the envelope of CSI, signal entropy, the velocity of signal change, median absolute deviation, the period of motion, and normalized standard deviation.Finally, we construct the features set of CSI.

HuAc: Activity Recognition
We explore the relationship between CSI-based and skeletonbased methods on human activity recognition in Figure 11.
The CSI-based method leverages the signal pattern to recognize an activity.The skeleton-based method uses the coordinate change of skeleton joints to recognize the same activity.From the opinion of experiment results, an activity with back to the AP has more complex CSI pattern and has the smaller amplitude than that with face to AP.We mainly introduce several classification algorithms used by the human activity recognition field including kNN, Random Forest, Decision Tree, and SVM.In the following sections, we verify that the performance of SVM outperforms others.We select SVM classification algorithm to recognize sixteen activities in the WiAR dataset.CSI features set and skeleton features set as the inputs of SVM train the optimal model to achieve the stable accuracy of activity recognition.The outputs of SVM contain the ,  , and  .We evaluate the performance of classification algorithm according to the accuracy and achieve the accuracy of activity recognition using the  .According to the match level between   and  , we obtain the false positive rate and the false negative rate.We analyze the result and give a feedback on the previous step.According to the feedback, we pay more attention to the activity with low accuracy.12.

Experimental Data.
We deal with data from three cases: For WiFi-based activity data, we collect activity data in different indoor environment.For skeleton data, we directly leverage the KARD dataset [3] to get the skeleton data.For environmental data, we mainly collect data from the empty room, meeting room, and office with the human.Our goal is to explore the impact of the environmental factor on the WiFi signals and analyze the differences between an activity and environmental change on WiFi signals according to the above-mentioned three kinds of data.We collect WiFi signals to construct a new dataset named WiAR which contains 16 activities with 50 times performed by ten volunteers.The details of WiAR have been introduced in Section 3. The KARD contains RGB video (.avi), depth video (.avi), and 15 skeleton points (.txt).Each volunteer performs 18 activities 3 times each with ages ranging from 20-30 years and height from 150-180 cm.In this paper, we only select 16 activities as target activity listed in Table 1.We design three experimental schemes to analyze the accuracy of activity recognition.First, we collect RSSI and CSI to recognize an activity as the reference point.Second, we leverage the skeleton data of KARD to recognize an activity by using our method and previous method [3] in the similar indoor environment.Third, we propose a fusion scheme which CSI combines with skeleton data to recognize an activity.Moreover, we design another experimental scheme in which volunteer performs an activity with repeating 10 times.The goal of the experimental scheme is to investigate the periodic regularity of CSI change influenced by the same activity.We study the impact of subcarriers and antennae on the performance of activity recognition by using four classification algorithms shown in Table 2.It shows that the accuracy using SVM outperforms other classification algorithms and 10 subcarriers obtained by subcarrier selection mechanism increase 4.26% when compared with activity recognition using 30 subcarriers.Three antennae such as A, B, and C increase the diversity of CSI data and keep more than 80% of activity recognition accuracy.The four algorithms verify the effectiveness of WiAR dataset.

Performance of Activity Recognition Using RSSI.
The section evaluates the performance of RSSI on the human activity recognition.The difficulty we encounter in the process of activity recognition using RSSI is how to deal with the multipath effect caused by indoor environment and reflection effect caused by human behavior.We select an indoor environment as a reference environment which keeps static and only contains a volunteer and an operator.We leverage RSSI variance as an input of SVM to obtain the 89% of average recognition accuracy in the static environment.When other people move and are close to the control area of WiFi signals, the accuracy of activity recognition decreases to 77% with the high stability.Several activities face the low accuracy such as two-hand wave, forward kick, side kick, and high throw.The average false positive rate is 8.9% and increases to 15.3% in a dynamic environment.Therefore, human activity recognition using RSSI needs the help of CSIbased method to improve the accuracy and the robustness of human activity recognition.

Performance of Activity Recognition
Using CSI.This section elaborates the impact of interference factors on human activity recognition using CSI in the following four aspects: human diversity, similar activities, different indoor environments, and the size of a training set.Moreover, we keep the fixed position of volunteers and the distance between receiver device and transmitter device in the whole experiment.
The Impact of Human Diversity on the Accuracy.Human diversity not only increases the diversity information of CSI but also raises the difficulty of activity recognition because different people have different motion styles such as speed, height, and strength.We achieve 93.42% of average recognition accuracy for all volunteers in Figure 13(a).We select two volunteers including volunteer A and volunteer B to verify the impact of human diversity on the accuracy.Volunteer A which often regularly exercises obtains 97.1% of average recognition accuracy.Volunteer B which rarely exercises in the routine lives achieves 92.3% of average recognition accuracy.Therefore, the exercise experience increases the differences between activities due to standard activity and improves the recognition accuracy.
The Impact of Similar Activity on the Accuracy.We explore two group similar activities including high arm wave, horizontal arm wave, high throw, and toss paper in Figure 13(b).The first group activity achieves 92.5% of average recognition accuracy and 94.6% for the second group.The false positive for similar activity is higher than independent activity.For example, forward kick and side kick also belong to the similar activity, and the difference between them is the moving direction.In order to obtain the better accuracy, we will consider the impact of moving direction on the signal change in the future work.
The Impact of Indoor Environment on the Accuracy.As shown in Figure 12, there are three experimental environments including empty room, meeting room, and office in terms of the complexity.The accuracy about three environments is shown in Figure 13(c).The accuracy of the meeting room with 94.7% outperforms the other two environments, and then accuracy was 93% for empty room and 87% for office due to multipath effect.The meeting room generates 2.6% of average error, and 9.8% of average error in the office due to paths excessively reflected by the body.We will deeply explore the multipath effect using the amplitude and phase of CSI in the future work.
The Impact of Training Size on the Accuracy.We design three proof schemes to analyze the accuracy of human activity recognition by using different training sizes in Figure 13(d).We first introduce three activity sets and three training sets.Activity set 1 consists of horizontal arm wave, high arm wave, high throw, and toss paper.Activity set 2 contains two-hand wave and handclap activity.Activity set 3 consists of phone, draw tick, draw x, and drink water.Moreover, these activity sets come from the same people.With the training size increasing, the accuracy of activity recognition is improved by about 10% for the activity set 1. Activity set 1 has a low accuracy because activity set 1 contains more similar activities.Although activity set 3 also contains similar activities, the accuracy is better than activity set 1 due to the strength of activity.

Performance between Kinect-Based and WiFi-Based
Activity Recognition.It is hard for the waveform of RSSI with noise to keep the stability when controlling area changes during collecting data.Therefore, we use waveform shape of RSSI to recognize an activity that is not a better choice for the current level of technology.Waveform pattern of CSI can describe an activity with credibility and fine-grained way.The mapping relationship between CSI-based and Kinect-based activity recognition for various activities is represented by using several parameters shown in Table 3.The environmental factor is evaluated by using the number of multipaths and the complexity of the indoor environment.In order to extend the application field of activity sensing, we construct the mapping relationship between CSI-based and Kinectbased activity recognition.The mapping relationship can avoid information loss.For example, once one of the two datasets is lost, activity recognition system still works by using another dataset information.
We evaluate the performance of human activity recognition from KARD dataset [3].The highest recognition rate is 100% (side kick, handclap), while the worst is 80% (high throw).We propose a selection method of skeleton joints named SSJ to improve the accuracy of activity recognition and reduce the computing cost.SSJ achieves 93.15% of the average recognition accuracy.Existing three activities, such as high arm wave, draw kick, and sit down, achieve the low accuracy of 80%, 75%, and 70%, respectively.Table 4 shows the performance of four methods including CSI-based, KARD-based (skeleton joints), SSJ-based, and HuAc.Table row of the bold font shows that skeleton-based method outperforms CSI-based method on the accuracy of activity recognition.Table row of the italic font shows that several activities are sensitive to CSI.HuAc improves the accuracy of activity recognition and increases the stability of activity recognition in a dynamic indoor environment.We focus attention on the stability of activity recognition algorithm or system in the future work.

Case Study: Motion-Sensing Game Using WiFi Signals
We introduce the application based on our work in the motion-sensing game.At present, Kinect provides the angle with limitations in which the horizontal viewing angle is 57.5 ∘ and 43.5 ∘ for vertical viewing angle, and distance with limitation ranges from 0.5 m to 4.5 m.Moreover, Kinect loses the sensing ability when barrier occurs and occludes game user in the control area.An interesting point of our work is that we pay more attention to the activity itself, and we do not care about the user location.However, Kinect needs to adjust the location of a user before activity recognition to achieve well sensing.Therefore, we will propose a framework instead of Kinect in the future when the accuracy of human activity recognition using WiFi can satisfy the requirement in an indoor environment.We list a motion-sensing game using WiFi signals in Figure 14.One or two people are located in the middle of the transmission and receiving terminal and prolong the distance between the TV and user.The area below the blue dashed line represents the control area, and our work can sense human behavior within 10 m and achieve a better performance in the range of black circle.The user operates the same activity as well as the TV set, and receiving terminal collects corresponding data.By the phase of signals processing, we achieve an activity with the probability and match it with the game of TV set.Once the matching result satisfies the threshold value, activity recognition matches success in the motion-sensing game using WiFi signals.

Discussion and Future Work
10.1.Extending to Shadow Recognition.In our research, we consider the relationship between the WiFi signals and skeleton data on the human activity recognition.Moreover, we describe the interesting topic of the shadow activity recognition.Shadow is an important issue to vision-based activity recognition or monitoring; however, WiFi-based activity recognition can sense human behavior through wall or shadow.First, we explore the characteristics of CSI to enhance the sensing ability by using the high-precision device.Second, WiFi signals can help vision-based activity recognition to improve the ability of sensing environment.In this study, we also need to consider the material attenuation.According to our observations, there is a little difference between the impact of wall reflection and body reflection on the WiFi signals.WiVi [14] leverages the nulling technique to explore the through-wall sensing behavior by using CSI and  analyzing the offset of signals from reflection and attenuation of the wall.We recommend researchers to read this paper and their following work [11].
10.2.Extending to Multiple People Activity Recognition.Multiple people activity recognition needs multiple APs to obtain more signals information reflected by a human body.At present, existing works can locate target location [46] and detect the number [19] of multiple people using CSI in the indoor environment.Kinect-based activity recognition system recognizes two skeletons (six skeletons for Kinect 2.0) and locates skeletons of six people.Therefore, the combination of WiFi signals and Kinect facilitates the development of multiple people activity recognition.In the future, our team wants to deeply research the character of WiFi signals and propose a novel framework to facilitate the practical application of human activity recognition in the social lives.10.3.Data Fusion.Skeleton data detect the position of each joint for each activity and track the trajectory of human behavior.CSI can sense a fine-grained activity without attaching device in the complex indoor environment.The balance point between CSI and skeleton joints and the selection method of effective features are important factors for improving the quality of fusion information.Moreover, time synchronization of fusion information is also an important challenge in the human activity recognition field.

Conclusion
In our work, we construct a WiFi-based public activity dataset named WiAR and design HuAc, a novel framework of human activity recognition using CSI and crowdsourced skeleton joints, to improve the robustness and accuracy of activity recognition.First, we leverage the moving variance of CSI to detect the rough start and end of an activity and adopt the distribution of CSI to describe the detail of each activity.Moreover, we also select several effective subcarriers by using -means algorithm to improve the stability of activity recognition.Then, we design SSJ method on the basis of KARD to recognize similar activities by leveraging spatial relationship and the angle of adjacent joints.Finally, we solve the limitations of CSI-based and skeleton-based activity recognition using fusion information.Our results show that HuAc achieves 93% of average recognition accuracy in the WiAR dataset.

Figure 3 :
Figure 3: The framework of HuAc system.

Figure 4 :
Figure 4: Skeleton structure [7].(a) A skeleton structure contains 15 skeleton joints.(b) The white circle represents skeleton joints without direction such as shoulder and hip.The gray circle represents the neck and the torso which has a weak effect on the upper-body activity and the lower-body activity except the squat.The black circle represents normal skeleton joints.

Figure 5 :
Figure 5: High arm wave tracking using skeleton data.The activity has two active joints (right hand, right elbow), and the direction changes with every clockwise movement.However, adjacent joints have the slight change in a certain range.

6. 1 . 1 .
Outlier Detection and Removing High Frequency.Outlier has an important impact on the quality of collecting data because outlier increases or decreases the fluctuation strength of WiFi signals.We analyze the RSSI distribution of an activity to evaluate the possible experience-threshold.

Figure 11 :
Figure 11: Skeleton joints sequence and CSI change of squat behavior.(a)-(c) represent the skeleton sequence of squat behavior.(d) is the CSI change reflected by squat behavior in terms of face to AP and back to AP.

Figure 13 :
Figure 13: Performance analysis of activities using CSI.(a) Sixteen activities include horizontal arm wave, high arm wave, two-hand wave, high throw, draw x, draw tick, toss paper, forward kick, side kick, bend, handclap, walk, phone, drink water, sit down, and squat.(b) Four activities contain horizontal arm wave, high arm wave, high throw, and toss paper.(c) The impact of experimental environments on accuracy.(d) The impact of training samples on accuracy of three activity sets.

Table 1 .
Each activity is performed 50 times by 10 volunteers which consist of five 3.2.Kinect Technology.Kinect (RGB-D camera) refers to the advanced RGB/depth sensing, hardware, and the softwarebased technology that interprets the GRB/depth information.The hardware contains a normal RGB camera, a depth sensor (infrared projector and infrared camera), and a fourmicrophone array, which is able to provide depth
two-hand wave, high throw, toss paper, draw tick, draw x, hand clap, high arm wave Empty room, meeting room, office Router, laptop with 5300 card [2]2.Weighted Moving Average.For filtered signal, signal data still contain sawtooth wave.Because CSI is sensitive to indoor layout or human movement, and the received CSI fluctuation caused by the environment is hard to distinguish from the fluctuation caused by a fine-grained activity.Therefore, we smooth the CSI data using the weighted moving average as proposed in WiFall[2].We randomly select 15 subcarriers from 30 subcarriers which correspond to 15 skeleton joints of Kinect technology.Each CSI stream contains 15 subcarriers as {CSI 1 , CSI 2 , . . ., CSI 15 }.CSI ,1 is the first subcarrier of CSI at time .{CSI 1,1 , . . ., CSI ,1 } indicates the CSI sequence of first subcarrier in the time period .The latest CSI has weight , the second latest  − 1, and so on.The expression of CSI series is shown as follows:

Table 2 :
Performance comparison by four classification algorithms.

Table 3 :
Mapping relation between WiFi and Kinect.

Table 4 :
Accuracy of activity for CSI-based and Kinect-based.