A Novel Detection Framework for Detecting Abnormal Human Behavior

Public safety issues have always been the focus of widespread concern of people from all walks of life. With the development of video detection technology, the detection of abnormal human behavior in videos has become the key to preventing public safety issues. Particularly, in student groups, the detection of abnormal human behavior is very important. Most existing abnormal human behavior detection algorithms are aimed at outdoor activity detection, and the indoor detection effects of these algorithms are not ideal. Students spend most of their time indoors, and modern classrooms are mostly equipped with monitoring equipment. ,is study focuses on the detection of abnormal behaviors of indoor humans and uses a new abnormal behavior detection framework to realize the detection of abnormal behaviors of indoor personnel. First, a background modeling method based on a Gaussian mixture model is used to segment the background image of each image frame in the video. Second, block processing is performed on the image after segmenting the background to obtain the space-time block of each frame of the image, and this block is used as the basic representation of the detection object. ,ird, the foreground image features of each space-time block are extracted. Fourth, fuzzy C-means clustering (FCM) is used to detect outliers in the data sample.,e contribution of this paper is (1) the use of an abnormal human behavior detection framework that is effective indoors. Compared with the existing abnormal human behavior detection methods, the detection framework in this paper has a little difference in terms of its outdoor detection effects. (2) Compared with other detection methods, the detection framework used in this paper has a better detection effect for abnormal human behavior indoors, and the detection performance is greatly improved. (3) ,e detection framework used in this paper is easy to implement and has low time complexity. ,rough the experimental results obtained on public and manually created data sets, it can be demonstrated that the performance of the detection framework used in this paper is similar to those of the compared methods in outdoor detection scenarios. It has a strong advantage in terms of indoor detection. In summary, the proposed detection framework has a good practical application value.


Introduction
In recent years, with the frequent occurrences of abnormal group events such as fights, stampedes, riots, and demonstrations, video surveillance equipment has been widely used in public places such as railway stations, streets, campuses, and banks. Abnormal behavior detection using traditional video surveillance is mainly realized by manual methods. However, a long-term continuous observation often leads to staff fatigue, making them prone to missed inspections. e emergence of machine learning [1][2][3] can be exploited to realize the automatic detection of abnormal human behaviors. Compared with traditional manual methods, this method can save manpower and reduce missed detections.
With the rapid development of machine learning and deep learning, intelligent video surveillance is becoming increasingly mature. Some well-known video surveillance systems [4][5][6] have been proposed one after another and have been put on the market. e detection of human abnormal behavior using video surveillance has become an important research topic in the field of computer vision in recent years, and it has attracted widespread attention.
Reference [7] summarized the recognition of different postures of the human body during activities. Reference [8] proposed a new technology and applied it to human behavior analysis and motion detection in video surveillance scenarios. Reference [9] provided a comprehensive overview of the current automated monitoring systems for abnormal behavior detection. Reference [10] proposed a method for estimating abnormal human behavior in different environments based on video surveillance. Reference [11] used a set of data codebooks constructed with two-level trees to calculate the similarity between space-time cubes at two scales and then used the linear discriminant analysis model to represent the topics of these scenes. If the test sample does not belong to these topics, it is considered abnormal. e hidden Markov model [12] is a typical inference model that can be applied to abnormal behavior detection in video scenes. Reference [13] used an independent hidden Markov model to construct sparse features in the training phase, and the model could adapt to scene changes by transforming into the most representative model. Reference [14] used a fully convolutional neural network (FCNN) for fast abnormal behavior detection. Reference [15] proposed the Appearance and Motion Deep Net (AMDN) and applied it to abnormal behavior detection in videos.
e above research is mainly based on the detection of abnormal human behavior in outdoor scenes. In campus scenes, students spend most of their time in indoor venues such as classrooms. e effect obtained when the above studies are applied to the detection of abnormal behavior by indoor personnel is not ideal. Moreover, although the detection method based on deep learning is convenient for feature extraction, its detection time cannot meet the needs of realistic applications. Abnormal behavior detection requires high efficiency in terms of detection time. In contrast, the time consumption of machine learning algorithms [16][17][18][19][20][21][22][23][24][25] has improved. Based on this fact, this paper uses a concise detection framework based on machine learning algorithms. First, a background modeling method based on a Gaussian mixture is used to segment the background image of each image frame in a video. Second, block processing is performed on the image after dividing the background to obtain the space-time block image of each picture. ird, the foreground image features of each space-time block image are extracted. Fourth, FCM is used to detect the data set. e main work of this paper is summarized as follows: (1) An abnormal human body behavior detection framework based on machine learning algorithms is used. e current popular deep learning algorithms are also applied to the detection and tracking of abnormal pedestrian behavior. However, when the crowd density is high or the color and apparent texture are similar, the detection may fail. On the other hand, when the probability of running behavior by a pedestrian is small, a large number of training samples cannot be collected. In this situation, deep learning algorithms are not suitable, and the supervised machine learning algorithm used in this article is more applicable.
(2) e detection framework takes the space-time block as the basic representation of a detection object and extracts the foreground motion features of the spacetime block image. FCM is used to detect outliers in the data sample and finally realize the detection of abnormal behavior in the crowd. (3) To demonstrate the effectiveness of the detection framework used, this paper conducts experiments on public data sets and newly constructed data sets. e experimental results show that the detection effect of the proposed detection framework in indoor environments is significantly better than those of other comparison methods, and the detection effect in outdoor environments is similar to those of the comparison methods. In summary, for places with more indoor scenes, the detection framework used in this paper has better application value.
e organizational structure of the remaining part is as follows: the second section is related knowledge, which mainly includes Abnormal Behavior Characteristics, Human Abnormal Behavior Detection Process, and Typical Public Data Set. e third section is Abnormal Human Behavior Detection Framework, which mainly includes Inspection Framework Description, Representation of Detection Objects, Feature Extraction, FCM, and Detection Framework Execution Steps. e fourth section is Experimental Results and Analysis. e fifth section is the conclusion.

Abnormal Behavior Characteristics.
In the process of human movement, normal behavior usually refers to the time and space states that exhibit a certain repetitiveness and regularity.
ese states include walking speed, walking posture, and spatial position. For example, walking and running at a constant speed are considered normal behaviors. However, there is no unified standard for the definition of abnormal behavior. Some scholars believe that all behaviors that do not match predefined normal behaviors are abnormal behaviors. Some scholars believe that behaviors that rarely occur or have short durations are abnormal behaviors. According to people's daily behaviors and living habits, it has been found that people's walking, running, and other behaviors have certain periodic laws. erefore, for campus monitoring, we define the following characteristics as abnormal behaviors. ese abnormal characteristics are as follows: (1) Abnormal walking trajectory: the walking motion trajectory of a person can indicate the purpose of his/ her motion, and it is of great significance for safety monitoring systems to accurately detect this behavioral feature. For example, during a morning class, all students walk from the cafeteria or dormitory to the classroom, but some students walk backward. It is generally believed that people's destinations should be clear. erefore, the definition of an abnormal walking trajectory in this article is as follows: within the camera's field of view, when the walking path of a moving target over a certain period of time appears as a loop, stop-and-go, and return pattern, it is said that the target exhibits abnormal behavior. (2) Abnormal walking posture: walking upright is one of the signs of normal human appearance. In daily behavior, most people walk upright, but they also have other behaviors. For example, when an event such as stomach pain occurs, a person bends over and walks slowly. erefore, we define an abnormal walking posture as follows: when the tested target bends over and suddenly squats during walking, we say that the target's behavior is abnormal. (3) Abnormal head rotation: according to people's cognitive habits, when a normal person walks, he/she usually looks ahead. erefore, this paper provides the following definition of abnormal head rotation: in the course of a person's travel, when we detect frequent changes in the orientation of the person's head, it is considered that his/her behavior is abnormal.

Human Abnormal Behavior Detection
Process. e detection process for abnormal human behavior is shown in Figure 1. First, spatiotemporal segmentation is performed on the video to extract features that can describe the characteristics of the target area. en, during the training phase, normal events are modeled. In the testing phase, the abnormality of the test features is calculated for the normal event model that has been learned. In addition, it is judged whether a given behavior is abnormal according to the set abnormality threshold. e two steps of feature extraction and abnormal behavior detection by the model have a great impact on the detection effect with regard to abnormal behavior.
(1) Pretreatment: during the preprocessing stage, the rough video data are prescreened through preprocessing operations. For example, normalization of video data and foreground extraction are performed to eliminate useless information in the image and enhance the detectability of valuable information. e data are simplified to the greatest extent possible, and their reliability is enhanced for subsequent processing and analysis. is type of method mainly uses normal behavior feature training to obtain a model for normal behavior. In the normal model, the reconstruction error corresponding to anomalies is large, and the reconstruction error corresponding to normal features is small. erefore, a feature is judged based on a comparison between the test behavior characteristics and the reconstruction error of the normal behavior model with regard to the abnormal threshold. is type of method can be subdivided into two categories: reconstruction models based on sparse coding and reconstruction models based on deep learning.

Typical Public Data Set.
Recently, the number of publicly available data sets for abnormal behavior detection has increased.
e typical public data sets are introduced in Table 1.

Inspection Framework Description.
e detection framework used in this paper is shown in Figure 2. As shown in the figure, the video data are preprocessed first. In this stage, the mixed Gaussian background modeling method [30] is mainly used to eliminate the background of each frame of a given image. In the second stage, the foreground image features of each space-time block image are extracted. In the third stage, FCM is used to cluster the extracted feature data sets. In the fourth stage, outliers are obtained according to the clustering results to determine whether the image frame is an abnormal frame.

Representation of Detection Objects.
Taking the spacetime block as the detection object, the video sequence frame is first divided into W * H spatial blocks, and the size of each  Mathematical Problems in Engineering spatial block is L * L. Because the focus is on analyzing the various parts of the movements of pedestrians, such as the impact of human legs running, the criterion for determining L is as follows: a single spatial block can be expressed as a certain part of a moving target, and a space-time block is composed of consecutive multiple frame spatial blocks at the same position. e basic representation method of using the space-time block as the detection object is not done to directly extract motion information from the current spacetime block but rather to further analyze the motion effect of a space-time block with rich foreground information on the surrounding space-time block.

Feature Extraction.
e description of the features of a space-time block can be obtained by using histogram statistics of low-level visual information such as the gray gradient, optical flow, and texture of the space-time block [31]. Pedestrians with the "panic running" behavior have the characteristics of fast movement and great kinetic energy. eir effect on the surrounding space environment is more significant than that of a pedestrian walking normally. e speed of pedestrian movement, the range of influence of pedestrian movement, and the distance between pedestrians and space are the main factors that determine the effects of pedestrians on space.
e feature extraction process of running foreground images based on space-time blocks is shown in Figure 3. Figure 3, first, the foreground motion image is extracted from the video frame sequence by the adaptive Gaussian mixture model, and the spatial block is obtained. e spatial block is combined with the foreground image and then preprocessed to obtain the foreground motion block. en, the motion representation of the foreground motion block is obtained according to the dense optical flow of the video frame. Finally, the effect weights vector for all foreground motion blocks is calculated for each spatial block. e effect weights vectors of consecutive multiple frames of space blocks are averaged to obtain a feature description of a given space-time block. e detailed description of each step is as follows.

Foreground Motion Block. A foreground motion block
can effectively represent the movement information of pedestrians. Set the j-th space block to B j , 1 ≤ j ≤ W × H. e foreground motion block refers to the space block where the premovement scenic spot appears. e foreground information contained in some spatial blocks in the foreground motion block may be information such as noise, which cannot correctly characterize the motion behavior of an object. To determine the existence of such motion blocks, this paper preprocesses all spatial blocks. Suppose b j is the number of front spots in a block; only when equation (1) is satisfied, can it be retained as a foreground motion block: UMN [27] e database contains 3 crowded scenes; the total numbers of frames in the three scenes are 1450, 4415, and 2145, and the resolution of each is 320 * 240. Subway [28] e database contains the subway-entrance and subway-exit subsets. e video lasts 1 hour and includes abnormal behaviors such as retrograde and fare evasion at the ticket gate. Avenue [29] e data set has 16 training videos and 21 test videos, each with a resolution of 640 * 360. Abnormal behaviors include throwing paper everywhere and running.

Mathematical Problems in Engineering
(1) indicates that an image block B j can be used as the j-th foreground motion block when the above conditions are met. λ(0.1 ≤ λ ≤ 0.3) is the comparison threshold for the previous scenic spot. e above operation is the preprocessing of the foreground image, and foreground motion blocks can be extracted from the preprocessed foreground image.

Foreground Motion Effects Map.
e optical flow vector of all pixels in a preprocessed spatial block is extracted, and the average value is used as the optical flow vector of the current block.
is is used as the motion representation of the foreground motion block, as follows: c i represents the optical flow vector of the i-th foreground motion block. J is the number of all pixels in the foreground block. g j i represents the optical flow vector of the j-th pixel in the i-th foreground motion block. ‖c i ‖ and Γc i represent the magnitude and direction, respectively, of the optical flow of the i-th foreground motion block.

Feature Extraction.
According to the obtained subblock effect measurement, the subblock effect of normal behavior and the subblock effect of abnormal behavior can be effectively distinguished. erefore, this paper uses a foreground motion effects map feature to characterize the motion effects of neighboring foreground blocks on spacetime blocks.
For a spatial block B j and a foreground motion block C i , to measure whether the foreground motion block has an effect on the surrounding space block, two index variables are defined as follows: dist(i, j) represents the Euclidean distance between the foreground motion block C i and the space block B j . δ d is the distance threshold. θ ij is the angle between the vector from the foreground block C i to the space block B j and the optical flow of C i . (− (π/2), (π/2)) represents the field of view when the foreground motion block is in motion. ese two index variables measure whether the space block B j is in the influence range of the foreground block C i . e effect weight of the foreground motion block C i on the spatial block B j is defined as When the space block B j is in the influence range of the foreground block C i , the weight of the effect of B j received by C i is inversely proportional to the distance between the two and directly proportional to the magnitude of the optical flow of C i . C i represents a pedestrian. When the pedestrian runs vigorously, the weight w ij increases with the magnitude of the optical flow. Multiple foreground blocks with different motion directions have differently weighted weights on B j , so all effect weights of B j must be counted to form an effective feature representation. To increase the degree of discrimination of the extracted features for the purpose of calculating the efficiency of the model, the moving direction Γc i of the foreground motion block C i is quantized as follows: where k i ∈ 1, 2, . . . , p and p is the total number of quantization direction intervals. k i represents the quantization direction index value of the optical flow of the i-th foreground motion block. e histogram statistics of the effect weights generated by the foreground block for the spatial block B j are based on the quantization direction of the optical flow of the foreground motion block: Equations (4) and (5) are incorporated into equation (6) and the histogram statistics f j are calculated. To obtain the long-term statistics of the foreground motion effect and make the extracted features highly discriminative, the spatial blocks B j , B j+1 , . . . , B j+m− 1 with m consecutive frames are taken as a space-time partition B j . According to equation (6), the characteristic description of each space block in B j can be calculated. e corresponding weight vector of a given space-time block can be expressed as h j , h j+1 , . . . , e feature description of the m-frame space block takes the mean value as the running foreground effect feature h j (1 ≤ j ≤ W × H) of the spacetime block B j . e calculation formula is as follows: 3.4. FCM. FCM is a classic clustering algorithm based on distance. Given a data set X � x 1 , x 2 , . . . x n consisting of n p-dimensional samples, the data are grouped into c(c ∈ [2, n]) categories. e center of each category is v c . V � v 1 , v 2 , ..., v c is the cluster center matrix. U � u ki ∈ R n×c is the membership matrix, which satisfies c k�1 u ki � 1. e objective function of FCM is as follows: where m is the fuzzy coefficient, m � 2. Minimizing J can complete the division of the sample set. Using the Lagrange multiplier method to iteratively update J, the calculation formulas for the membership and clustering centers are obtained as follows: Equations (9) and (10) can be extracted. FCM is used to cluster the sample set located at (i, j), the optimal number of clusters for the optimization algorithm is K, and the best performance is obtained when the cluster center is set as v l (i,j) , K ≤ l ≤ 1. For the image frame T to be detected, the space-time block response feature map e (i,j) located at (i, j) must be obtained. rough equation (11), the distance between it and the cluster center set v l (i,j) , K ≤ l ≤ 1 can be calculated: e threshold ε 1 is set when d(i, j) > ε 1 , e (i,j) is an outlier. en, the empty time block at (i, j) is marked as abnormal. When the number of abnormal points in a frame of an image is greater than the total number of foreground blocks, the frame of the image is considered to be an abnormal frame.

Detection Framework Execution Steps.
e execution steps of the detection framework used in this paper are as follows: H], is obtained by extracting the empty block at position (i, j).
(2) e FCM algorithm is used to cluster the feature sample set to obtain the cluster center set v l (i,j) , K ≤ l ≤ 1. (3) e foreground motion image feature e (i,j) of the space-time block at (i, j) in frame T to be detected is calculated. e distance d(i, j) from the cluster center set to e (i,j) is then calculated.
(4) When d(i, j) > ε 1 , e (i,j) is regarded as an outlier. (5) e number of outliers in the frame is counted. If the number is greater than the total number of foreground blocks, the frame is determined to be an abnormal frame.
e flowchart of the detection framework used is shown in Figure 4.

Experiment-Related Instructions.
To demonstrate the feasibility and superiority of the proposed detection model, the public UMN data set and manually generated data set are used in the experiment. e UMN data set is a public data set used for anomaly detection research that mainly includes sudden movements, crowd appearances and disappearances, and aggregations. e data set has a total of 7739 frames of images, each image size is 320 × 240, and the resolution is low. ere are three types of scenes. e first type of scene contains 2 sets of complete anomalies. e second category contains 6 complete abnormal data points. e third category contains 3 sets of complete abnormal data points. e comparison methods are taken from [32], [33], and [34]. e settings of the parameters for each comparison method are consistent with those in the respective references. e evaluation indicators are the detection accuracy [35] and Area under the ROC curve (AUC) [36].

UMN Data Set Experiment.
e first 180 frames of 9 video clips in three scenes are extracted as the training set, and the rest are extracted as the test set. e detection accuracy and AUC values of each method under different scenarios are shown in Table 2. e experimental data in Table 2 show that different detection methods work in different scenarios, and the performance of each is quite different. e common point is that all algorithms have their best detection performances on Scene 1. From the average results of the three scenes detected by each method, it can be inferred that the detection accuracy and AUC obtained by the method in this paper are both greater than those of other comparison methods. e detection accuracy obtained by the method in this paper is improved by 5.2% over that of SF, 1.3% over that of STC, and 2.0% over that of MA. e AUC is increased by 2.0% compared with that of SF, 1.9% compared with that of STC, and 1.5% compared with that of MA. rough the comparison with the two above indicators, it is demonstrated that the method in this paper performs better than other methods in detecting abnormal human behavior on the UMN data set.
To explore whether the selection of the clustering algorithm in the detection framework is appropriate, we replace the FCM at the end of the framework with the classic K-means clustering algorithm [37]. e impacts of these two different clustering algorithms on the detection performance of this detection framework are compared. Two different clustering algorithms are applied to the detection framework to detect human abnormalities in the UMN data set. e experimental results are shown in Table 3.
e experimental results show that the detection effect obtained by using FCM in the detection framework is better than that obtained by using K-means. is is because FCM is not sensitive to noise in the data, so the detection result is not easily affected by noise either, thereby improving the detection accuracy. is is the reason why this paper chooses FCM for use in the detection framework.

Self-Made Data Set Experiment.
To further verify the robustness of the algorithm in this paper, we download    Table 4.
In indoor scenes, the detection method used on the manually created data set has the best detection accuracy and AUC value. is fully demonstrates the superiority of the proposed detection method when used for indoor scene detection. In outdoor scenes, the MA algorithm has the best detection performance, but the detection effect of the proposed method is close behind, and the performance gap is not large. Moreover, the detection performance of this method in outdoor scenes is better than those of SF and STC.
is shows that the indoor detection performance of this method is significantly improved, the outdoor detection performance is not poor, and the method can fully meet the needs for actual application scenarios.

Conclusion
Abnormal behavior detection in videos is a research hotspot in the field of smart security. According to different surveillance video scenes and surveillance objects, abnormal behaviors have different definitions. e detection of abnormal behaviors among a crowd in public places has high research value.
is research focuses on the analysis and detection of the abnormal behaviors of crowds in videos, focusing on "panic running" and other behaviors. For these abnormal behaviors, a detection framework based on machine learning algorithms is used. First, a space-time block is used as the basic representation of the detection object. Second, a background modeling algorithm is used to analyze the effect of the foreground motion block on the surrounding space blocks and calculate the characteristics of the foreground motion effects map of the space-time block. Finally, the FCM algorithm is used for clustering, training, and outlier detection. When the number of detected outliers is greater than the number of foreground blocks, it is determined that the detected frame is an abnormal frame. Compared with existing detection algorithms, the experimental results show that the method used has better detection performance. However, there are many parameters that need to be manually determined in the proposed detection framework, and this is a drawback of this research. Improving upon this drawback is also a future research direction.
Data Availability e labeled data sets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.