Multifeature Fusion Human Motion Behavior Recognition Algorithm Using Deep Reinforcement Learning

the


Introduction
Human action recognition refers to the recognition and use of human behavior patterns and action categories [1,2].
rough the recognition and analysis of human motion information, the actual motion state of the human body can be obtained, and a variety of related services can be provided in combination with its actual needs. Some scholars abroad have divided human behavior into three levels: motor unit, action, and activity. e movement unit refers to the actual movement element composition of the movement [3]. Action refers to the combination of action elements according to a certain order to form an action sequence of a certain type of movement [4,5]. Activity refers to a broader concept, representing the complex movement of the human body, which is closely related to the object and the surrounding environment [6]. Recognition of human behavior is usually combined with multifeature fusion, and its application prospects are very broad. It has been extensively developed in many fields, including intelligent prosthetics, elderly monitoring, and motion detection [7,8]. For smart prostheses, the recognition of human behavior can help obtain the phase of a person's gait and formulate motor control strategies based on different phases to help the human body move. Regarding the monitoring of the elderly, the recognition of human behavior can help to detect and issue alarms in real time for various dangerous behaviors such as falls and monitor various actions of the elderly [9,10]. For exercise detection, the recognition of human behavior can help to calculate the calories consumed by exercise behavior based on the duration of different behaviors and exercise intensity data [11], so as to achieve the purpose of real-time monitoring of human exercise [12].
Literature [13] proposed deep reinforcement learning based on fuzzy reasoning to recognize the control behavior of intelligent traffic lights. Intelligent transportation systems overcome the limitations of traditional transportation systems and have become an important part of smart cities and have been widely used. In order to improve the efficiency of the traffic light control system, a dynamic intelligent traffic light control system that takes real-time traffic information as input and dynamically adjusts the duration of the traffic light is proposed. In addition, the proposed Dynamic and Intelligent Traffic Light Control System (DITLCS) operates in three modes: fair mode, priority mode, and emergency mode. Among them, all vehicles are regarded as equal priority, different types of vehicles are given different priority levels, and emergency vehicles are given the highest priority, respectively. Literature [14] proposed the study of the local optical flow method based on You Only Look Once (YOLO) in human action recognition. By calculating the optical flow modulus of the area where the human target is located, the amount of calculation is reduced, and the calculation time is saved. en a threshold is set to complete the person's behavioral identity. rough the steps of design algorithm and experimental verification, the walking, running, and falling states of the human body in the indoor sports video in real life were identified. Experimental results show that the algorithm is more beneficial to the recognition of jogging behavior. Literature [15] proposed human behavior recognition in multiview videos, using deep learning techniques, including convolutional neural networks and long-short-term memory networks. A deep network is constructed under a multiviewpoint framework to learn the long-term relevance of human behavior recognition from videos. e use of two cameras as sensors effectively overcomes the problems of occlusion and contour blur and improves the accuracy of the multiview frame. Literature [16] proposed the application of the Principal Component Analysis-Long-Short-Term Memory (PCA-LSTM) model in human behavior recognition, collecting and processing the surface electromyogram (EMG) signals of the human upper limbs, and putting them into the PCA model for data dimensionality reduction. In addition, the dimensionality-reduced data is put into the Long Short-Term Memory (LSTM) neural network model to classify human behavior, thereby calculating the classification efficiency and recognition rate. e above methods have certain limitations in recognition accuracy and robustness. Deep reinforcement learning is combined to study the multifeature fusion human action recognition problem, and a multifeature fusion human action recognition algorithm based on deep reinforcement learning is proposed. e main contributions of the algorithm in this paper are as follows: (1) is algorithm can realize the recognition of various human behaviors and has achieved good recognition results.
(2) e paper constructs a reinforcement learning model and proposes a specific structure of the deep reinforcement learning network recognition model (3) is algorithm can quickly lock the character area after observation, the learning ability is more remarkable, and the overall robustness is strong.
(4) Several indexes are used to verify the effectiveness of the proposed algorithm.

Related Works
Regarding the problem of multifeature fusion human action recognition, related research has been carried out abroad for a long time, and various recognition algorithms of noncontact type and contact type have been proposed [17,18]. Among them, the noncontact type algorithms include the multifeature fusion human action recognition algorithm based on visual inspection technology. e contact type algorithms include the multifeature fusion human action recognition algorithm based on sensor detection and so on. Although domestic research on this issue started relatively late, breakthrough research results have been obtained in recent years, mainly for the recognition of a variety of human behaviors. Literature [19] proposed realizing the level control of human behavior through deep reinforcement learning, obtaining effective representation of the environment from high-dimensional sensory input, and using these to extend previous experience to new situations. Deep Q network is a new artificial intelligence trained by deep neural network. It can use end-to-end reinforcement learning to learn successful strategies directly from highdimensional sensory input. Literature [20] proposed the use of multiscale deep reinforcement learning to establish a large-scale quantitative image database and integrate the preliminary experience of human behavior analysis. In addition, the paper also explores the feasibility of a full-body volume analysis full-automatic workflow based on deep reinforcement learning as well as the influence of contrast and slice thickness on the calculation of organ volume. Multiscale DRL is used to detect the three-dimensional anatomical landmark points of the whole-body organ volume and three-dimensional organ segmentation. But the accuracy is not high.
Literature [21] proposed a human behavior recognition algorithm based on the fusion of image multiple features and conditional random field. e algorithm is composed of three basic cascade modules. Firstly, a cyclic neural network is constructed, and then feature similarity is introduced to form a more comprehensive and accurate use of feature similarity. Finally, the human behavior of image is recognized through conditional random field by using multiple features, and the algorithm has poor processing effect for multiple features. Literature [22] proposed a recognition algorithm based on multifeature fusion. In this algorithm, the change of local binary pattern feature distribution is related to projection error. For quick and accurate detection, the research data are extracted from professional facial expression database. e comparison shows the efficiency of this algorithm, but the recognition time is long. Literature [23] proposed a recursive neural network technology, which uses RGB and skeleton sequence for human activity recognition.
e recognition accuracy needs to be further improved. Literature [24] proposed a driving assistant algorithm based on multifeature fusion, establishes a multifeature fusion model from the perspective of infrared image, establishes a spatiotemporal correlation model according to fuzzy set theory, and comprehensively analyzes and realizes the algorithm research, but it takes a long time. Literature [25] constructs a deep learning network structure, extracts and combines shallow and deep features, and uses neural network for feature weighted fusion to realize human behavior recognition, but the algorithm design is complex and time-consuming. Literature [26] constructs a multiview human motion map, extracts the gradient histogram of the image, uses the fusion algorithm to generate the feature vector, and further carries out the image feature separation to complete human behavior recognition, but the accuracy is not high.

Construction of Human Behavior Data.
Several typical human behavior data sets are selected as research data from the benchmark data sets released by multiple human behavior research institutions. First, perform cluster analysis on the research data, using the K-means algorithm. e specific clustering criterion function is as follows: where k is the number of clusters, and n is the size of data.
In this algorithm, the samples are allocated according to the minimum distance principle, which is as follows: where D i refers to the minimum distance, x refers to the concentrated sample, and the specific formula is as follows: where O n refers to the m sample. rough the algorithm, the final output clustering is as follows: where A k refers to the k output cluster.
In the research data, the behavior categories contained in each video in the selected data set are the same behavior and have category tags. e data set is divided into a test subset and a training subset, and there is no intersection between the two.
For role-playing data sets with fixed scenes, the test subset and the training subset must contain different roles. Among them, the training subset is mainly used in parameter training. After the training is completed, the test subset is used to test the model, and the model parameters are not adjusted during the test, as shown in Table 1.
KTH is a total of four controlled scenes, and twenty-five people make six actions: clapping, waving, boxing, running, jogging, and walking. In the original video, samples are obtained by downsampling, and the sample pixels are 160 × 120. e camera is fixed when the sample is obtained, the background is single, and there are changes in lighting, appearance, and clothing, as well as changes in the scale of people.
ere are nine types of actions included in Weizmann, which are swinging arms to take off, single jump, twohanded swing, one-handed swing, moving to one side, flying jump, jumping, running, and walking. All actions are singleplayer actions, keeping the background still.
ere are category markers and corresponding silhouette information of the human body in the data set.
ere are fourteen types of actions included in IXMAS. Eleven people complete these fourteen types of actions. Each person repeats the same actions three times, fixes the camera, and keeps the light changes small. e video was shot at five angles, one top view direction and four side view directions [27]. Due to the large shooting angles, the visual difference of the same kind of action in the data set is relatively large, and thus it is relatively difficult to recognize.
ere are two data sets in Hollywood; e first one was shot in a controlled environment, and the amount of video data is relatively small. e second is to extract action clips from Hollywood movies, which contain ten different scenes and twelve action categories. In the extracted video, the characters' poses, clothes, and expressions are quite different, there are interfering factors such as occlusion, light changes, the overall scene changes greatly, and the camera is also moving. Since the sample scene is close to the real scene, it is very challenging to recognize human behavior [28].
ere are 51 behavior categories in HMDB51, including drinking, climbing, golfing, and horse riding. In the data set, there are videos with larger category differences and videos with smaller category differences. ere are both videos with significant operating characteristics and larger operating ranges and videos with insufficient operating characteristics and smaller operating ranges [29]. Since this data set is collected on the Internet, the scale changes and the appearance of the characters varies greatly, and the background is also more complex, which is extremely challenging. Examples of samples in this data set are shown in Figure 1.
UCF101 mainly collects 101 human behaviors on the YouTube website. In general, these 101 human behaviors can be classified into five categories: playing musical instruments, human actions [30], interacting with objects, interacting with people, and sports. Examples of samples in this data set are shown in Figure 2.

Construct an Attention Model.
In order to solve the selection problem of visual regions and time series, the attention model is constructed; that is, in the deep reinforcement learning network, a small sampling area is used as the model input, and the next vision is based on the time series information obtained after the input [31,32]. e corresponding location of the area is estimated. First, build a reinforcement learning model, which mainly describes the reinforcement learning task through Markov decisionmaking: the intelligent body continuously interacts with the external environment to obtain environmental information. e environmental information is the environmental state, which is the perceptual environment described by the agent. e agent decides the next action based on the obtained environmental information and influences the environment through the actions taken. First, a reinforcement learning model is built, as shown in Figure 3.
In the reinforcement learning model constructed in Figure 3, the agent is the core of the model. e agent continuously interacts with the external environment to obtain environmental information, which is the state of the environment (State), and at the same time obtains the enhanced information provided by the environment (Reward). e agent will decide the next action (Action) based on the perception information and influence the environment through the action taken. e attention mechanism is realized through the constructed reinforcement learning model, where the environment is the input video sequence, denoted as V, the category is denoted as y, and the framework input at step t is denoted as v t . e deep reinforcement learning network is used as an agent [33].
First the reward function is defined as follows: where R refers to the reward function and p T refers to the times of enhancement. en the target function is defined. It is shown in the following formula:     Mobile Information Systems where J(θ) refers to the target function, S refers to the interaction between the environment and the intelligent body, and θ refers to the strategy parameter. By gradient descent [34], the dynamics of strategy parameter θ are updated: where θ new refers to the strategy parameter with the update dynamics; θ old refers to the strategy parameter before the dynamics are updated; λ refers to the dynamically updated threshold; ∇ θ refers to the improved threshold. After updating, the actual gradient of the objective function is as follows: In the actual solution, the approximate solution of the actual gradient is taken. It is shown in the following formula: where M refers to the number of samples taken for mathematical expectation value estimate; m refers to the expected threshold. e strategy probability is expanded. It is shown in the following formula: where T refers to the sample length; t refers to Monte Carlo threshold. e logarithmic derivative of the strategy probability is as follows: where μ refers to the estimated value; μ ′ refers to the randomly output value; σ refers to Gaussian distribution. e strategy probability of data expansion obtains the expression of the gradient of the final objective function: en the estimated reward function b is introduced. To ensure the smoothness of the updated gradient [35], the updated gradient value is as follows: Finally, the updated gradient value is used as the attention mechanism.

Build a Network Recognition Model for Deep Reinforcement
Learning. Based on deep reinforcement learning, the deep reinforcement learning network recognition model is constructed, and the constructed model is mainly used for multifeature fusion human action recognition through 3D convolutional network. e specific structure of the model is shown in Figure 4.
In the construction of the deep reinforcement learning network recognition model, the human behavior image is first divided into T units, and then, based on the deep reinforcement learning, the human and environment information is sensed and reinforced learning is performed on each unit in turn, combined with the 3D convolutional neural network method for feature calculation, as well as fusion of multiunit features, using the activation function (Softmax) to convert the feature value into a nonlinear feature, and finally the feature classification is completed by the mean value calculation.

Human Action Recognition Algorithm Integrating Multiple
Features.
e deep reinforcement learning algorithm is used to train the connection relationship between the hidden layer and the output layer, and the human action recognition algorithm is designed and integrated to realize the multifeature fusion human action recognition. e hidden layer of algorithm refers to the deep integration of advanced information technology, giving full play to the characteristics of artificial intelligence technology and achieving autonomous decision-making, autonomous execution, and dynamic optimization in the identification process. e output layer of the algorithm refers to the deep mining of the data value in the recognition process to achieve auxiliary decision-making. e fusion of human action recognition algorithm is described as follows: Input: training samples and test samples of the deep reinforcement learning network recognition model Output: recognized human behavior record organization (1) In the multifeature fusion human action recognition, the operating environment of the artificial intelligence system uses a level 3 cloud platform to provide back-end services to securely control basic data and user information. All cloud platform servers provide on-site management and inspection of the computer room environment and equipment all the time. (2) Registration and authentication are required for entry and exit, which will be provided by a dedicated person throughout the entire process.

Mobile Information Systems
Network security and remote management of the cloud platform must be accessed through SSLVPN, different roles are assigned different access rights, and core device passwords are updated regularly.
Public services only open specific services, applications, and ports. (3) Based on application and port management and control, the firewall device is in a transparent mode, dedicated to data filtering. (4) Port and address conversion is completed by a dedicated router. (5) Suppose that k f ′ refers to root tag tree generated from human behavior difference feature records, f s ′ and f w ′ are used to refer to node i ′ and node j ' ′ of the first-level subtree of the human action recognition feature tree, R w and R k are used to refer to the same denotation name, and then equation (14) is used to distinguish human behaviors with high similarity.
(6) Suppose that f ' per refers to the content recognition of human behaviors; the calculation formula of f ' In the above formula, p v refers to the tissue recognized in human body.
e operation flow of the fusion human behavior recognition algorithm is shown in Figure 5.

Experimental Environment and Data Set.
Simulation experiments are performed on the designed multifeature fusion human action recognition algorithm based on deep reinforcement learning.
Running environment of simulation experiment is as follows: Windows 10 system, with CPU memory of 8 GB.
e main operating tool in the experimental environment is MAX + PLUSII, which provides an interface that can be used for other industry standard EDA tool software. is interface complies with EDIF200 and EDIF300 standards, LPM2.1 parametric module library, SDF2.0, VITAL95, Verilog HDL, VHDL1987, and VHDL1993 standards. Other EDA tools and software can be used to design input and then compiled and processed by MAX + PLUSII computer.
ird-party EDA tools can also be used for equipment and board level simulation.
In the research data set, 88 videos are selected for model training in the experiment, and 37 videos are selected for model testing in the experiment. After completing the experimental parameter settings, the robustness and recognition accuracy of the multifeature fusion human action recognition algorithm based on deep reinforcement learning are tested, and the impact of camera movement on performance is also tested.

Experimental
Steps. (a) Build simulation environment and debug operation parameters. (b) Collect human behavior data from six data sets: KTH, Weizmann, IXMAS, Hollywood, HMDB51, and UCF101, and preprocess the data as the data source. (c) e human behavior recognition effect of this algorithm is simulated, and other literatures are selected as comparison algorithms to simulate, and the

Experimental Standards.
In the experiment, the parameter settings of the deep reinforcement learning network recognition model are shown in Table 2.
After setting the experimental parameters, the robustness and recognition accuracy of the multifeature fusion human behavior recognition algorithm based on deep reinforcement learning are tested, and the influence of camera movement on the performance is tested.
(1) e recognition accuracy of the algorithm: In order to enhance the comparison of the experimental results, the existing algorithms are used as the comparison test items for comparison experiments, and the experimental data of the recognition accuracy are also obtained and compared. e comparative experiment algorithm includes multifeature fusion human action recognition algorithm based on visual detection technology, sensor detection, and radio frequency electronic tag.
In the above equation, Z FJ refers to the number of samples correctly recognized, and G JI refers to the size of the entire test samples. (2) Algorithm robustness: e robustness of the multifeature fusion human action recognition algorithm based on deep reinforcement learning is tested, including the situation where the initial sampling affects the performance and the situation where the camera movement affects the performance. For the test of the impact of initial sampling on performance, the selected observation window is of 76 × 76 pixels, and the initial observation window is randomly selected, or the character area is determined as the initial observation window.  Sequence length value 10 3 Model initial parameter acquisition method CNN 4 Minimum group size 32 5 Momentum factor 0.9 6 Weight attenuation 5 × 10 −4 7 Initial learning rate 10 -3 8 Number of training rounds 128 9 Select frame policy superparameters 10 10 Select image block policy superparameters 70 11 Frame relative position 1-10 frames 12 Frame sequence length 10 13 Image block threshold 70 pixels  Figure 7. e highest accuracy of the algorithms in literature [13], literature [14], and literature [15] is no more than 80%, which is significantly lower than that in this paper. It can be seen that the deep reinforcement learning technology    Mobile Information Systems adopted in this paper has very efficient computational performance and can effectively improve the accuracy of human behavior recognition. In this paper, the highest recognition accuracy is about 98%, and the recognition effect is better.

Comparison of Robustness Test.
e specific test results of initial sampling affecting performance are shown in Table 3.
According to Table 3, the recognition rate is higher after the character area is determined as the initial observation window, but the improvement effect is not significant. is shows that the algorithm can quickly lock the character area after observation, the learning ability is more remarkable, and the overall robustness is strong.

Comparison of Camera Movement Performance Test.
When testing the impact of camera movement on performance, the initial observation window is randomly selected. e specific test results of camera movement affecting performance are shown in Table 4.
According to Table 4, the recognition rate is slightly improved after eliminating camera movement, indicating that camera movement does not have a big impact on performance, and the overall performance of the algorithm is relatively strong.

Comparison of Recognition
Time Consumption e human behavior recognition time of different algorithms is tested on six different data sets. e results are shown in Table 5.
According to the time consumption comparison results of human behavior recognition of different algorithms in Table 5, it can be found that the average time consumption of the algorithm in literature [15] is the highest, reaching 49.3 s, while the average time consumption difference of the algorithms in literature [13] and literature [14] is small, but all   of them are more than 35 s, while the average time consumption of the proposed algorithm is only 12.7 s, indicating that the algorithm in this paper runs faster and is more efficient. It has certain advantages in practical application.

Conclusions
Combining deep reinforcement learning to study the problem of multifeature fusion human action recognition, a multifeature fusion human action recognition algorithm based on deep reinforcement learning was designed, which achieved the improvement of recognition accuracy and stable robustness. e results show that the recognition accuracy of the proposed algorithm is higher than those of four other algorithms. After determining the person area, the recognition rate is higher, and the robustness performance is stronger after the person area is used as the initial observation window. After eliminating camera movement, the recognition rate has been slightly improved, and it has good applicability.
Multifeature fusion human action recognition is still under development, and various types of recognition algorithms are gradually being proposed. It is necessary to continuously optimize human action recognition according to the development of deep reinforcement learning, so as to provide a basis for the integration of multifeature human action recognition truly and accurately. e data set used in the proposed algorithm is relatively old, and it will be updated in future research, and the recognition accuracy of the algorithm will be further improved. (1) In terms of feature recognition, the proposed algorithm does not involve selecting suitable predictive features in different projects. erefore, in the follow-up research process, fully consider the different characteristics of the identification algorithm, and carry out a new measurement based on the actual development situation. (2) In the recognition process, the recognition accuracy is improved through deep reinforcement learning. However, from an experimental perspective, research should be carried out on the multifeature fusion human action recognition obtained after deep reinforcement learning to improve the prediction effect.

Data Availability
e data used to support the findings of this study are included within the article. Readers can access the data supporting the conclusions of the study from Weizmann, KTH, IXMAS, Hollywood, HMDB51, and UCF101 data sets.

Conflicts of Interest
e author declares that there are no conflicts of interest with any financial organizations regarding the material reported in this manuscript.