Operator Behavior Analysis System for Operation RoomBased on Deep Learning

Human behavior analysis has been a leading technology in computer vision in recent years. .e station operation room is responsible for the dispatch of trains when they enter and leave the station. By analyzing the behaviors of the operators in the operation room, we can judge whether the operators have violations. However, there is no scheme to analyze the operator’s behavior in the operation room, so we propose an operator behavior analysis system in the station operation room to detect operator’s violations..is paper proposes an improved target tracking algorithm based on Deep-sort..e proposed algorithm can improve the target tracking performance through the actual test compared with the traditional Deep-sort algorithm. In addition, we put forward the detection scheme for common violations in the operation room: off-position, sleeping, and playing mobile phone. Finally, we verify that the proposed algorithm can detect the behaviors of operators in the station operation room in real time.


Introduction
In the railway industry, the operators in the station play a vital role in the safety of train dispatch. If these operators have some violations, they may have serious potential safety hazards to railway operation safety. e most common violations are off-position, sleeping, and playing mobile phone.
ese three violations may lead to serious safety accidents. At present, the most common method is to set up some security officers to monitor these operators through remote monitoring systems. e remote monitoring systems are composed of monitoring cameras in each operating room. e security officers can judge whether each operator has violations by looking at the monitoring screen from remote monitoring systems. However, a railway bureau usually has hundreds of operating rooms, which requires many security officers to meet the needs of the monitor. erefore, an intelligent behavior analysis system is urgently needed to replace the manual management in the operation room. e operator behavior analysis system first analyzes the pictures collected by the monitoring camera in the operation room to find out the operators and track them. en, the analysis system uses three behavior analysis methods to judge whether the tracked target has violations. In addition to the behavior analysis of railway station operators, the analysis system can also be applied to other similar fields.

Related Work
Before analyzing the behaviors, we first use an object detection algorithm to detect the operator location. Object detection algorithms are mainly divided into two categories: two-stage and one-stage. e two-stage network first extracts the object candidate regions from the input image and uses the classifier to classify all the candidate regions. erefore, the detection speed is relatively slow. is algorithm mainly includes RCNN [1], Fast-RCNN [2], Faster-RCNN [3], and Mask RCNN [4]. e one-stage network directly finds candidate regions from the feature map. e detection speed is usually faster than the two-stage network, but the actual detection accuracy of the algorithm may be affected. At present, the common algorithms are Yolov1 [5], Yolov2 [6], Yolov3 [7], SSD [8], RetinaNet [9]. e detection efficiency of Yolov1 is excellent, but the overall accuracy is low. e most significant improvement of Yolov2 is to improve the ability of small object detection. e Yolov3 replaces the backbone network with Darknet53 [7] [12], which solves the problem of end-to-end training. Sun et al. proposed Dan (deep affinity network) [13].
e algorithm can carry out end-to-end training and prediction. However, it introduces a lot of additional calculations, so the algorithm is inefficient.
Behavior analysis is mainly to analyze the behavior of the object. K. Simonyan et al. proposed a two-stream convolutional neural network [14], which significantly improved the accuracy of behavior recognition combined with optical flow information. Girdhar et al. [15] added an Action VLAD layer based on two-stream networks, but they did not research the recognition of multitarget different behaviors. Tran et al. constructed C3D [16] network using 3D convolution and 3D pooling. Xu et al. proposed R-C3D [17] network, which extracts behavior keyframes from a video. en, the category of behavior is identified based on these keyframes. e network can analyze videos of any length.

Behaviour Analysis Algorithm
e design scheme of the behavior analysis algorithm of the station operation room is shown in Figure 1. e algorithm includes object detection, target tracking, and behavior analysis. e object detection module primarily uses the deep learning algorithm to detect the position of the operators. is paper proposes an improved algorithm based on Yolov4 [18]. To improve the detection results of the small object, we add the SPP module to the Yolov4 network. In the target tracking process, we introduce the HOG (histogram of oriented gradients) feature and improve the IoU (intersection over union) calculation method to improve target tracking ability. Finally, we design three behavior analysis methods to identify off-position, sleeping, and playing mobile phones.

Object Detection.
Yolov4 network is the object detection network proposed by Alexey based on the Yolov3 [7] network. e detection network mainly consists of the following four parts: CSP Darknet53 [13] network, spatial pyramid pooling (SPP) [19], PANet [20], and Yolov3 head [7]. e CSP Darknet53 network includes cross-stage partial (CSP) [21] and Darknet53 [7]. e CSP can enhance CNN's learning ability and reduce computational difficulty. e Darknet53 network contains five large residual network blocks. In each large residual network block, it contains some residual network structures. After each large residual block, we add the CSP structure to get the CSP Darknet53.
e SPP can produce a fixed output for any input size, which solves the image distortion error caused by the nonproportional compression of the input image. e SPP is used in the Yolov4 network to increase the receptive field of the network. e PANet can locate the pixels correctly by preserving the spatial information to enhance the ability of instance segmentation. Figure 2 shows the Yolov4 network structure.
e SPP module obtains the receptive field information by using the maximum pooling of different cores and carrying out feature fusion. is fusion of receptive fields in different scales can effectively enrich the expression ability of the feature map. Figure 3 shows the structure of SPP. In the Yolov4 network, the SPP module is located before the final 19 * 19 feature map. In this paper, we also apply the SPP module before the final 38 * 38 feature map and 76 * 76 feature map to enhance the ability to express the feature information in the feature map. Figure 4 shows the improved Yolov4 network structure.

Object Tracking.
e most widely used real-time multitarget tracking algorithm in recent years is sort [10] and Deep-sort [11]. Although the sort algorithm is fast in target tracking, the accuracy will decrease when occlusion occurs.
e Deep-sort uses the Kalman filter in video space. en, it uses the Hungarian algorithm to correlate data frame by frame. is paper considers both target motion information and appearance information when correlating data. e association of motion information uses the Mahalanobis distance between the Kalman prediction result and the object detection result. e association of appearance information calculates the minimum cosine distance between the last 100 successfully associated features and the detection result of the current frame. e formulas are as follows: is paper increases the comparison of HOG [22] features when calculating the association of appearance information.
e HOG feature can describe the target's contour through gradient or edge direction. Many studies in recent years have shown that this feature can accurately describe the outline of a person. We compare the HOG feature of the previously successfully associated rectangular box with the HOG feature of the current rectangular box.

Mathematical Problems in Engineering
In the matching process, Deep-post uses the IoU to calculate the coincidence degree of the bounding box.
e box 1 is the first bounding box. e box 2 is the second bounding box.
e Deep-post does not consider the width and height of bounding boxes, leading to false detection. erefore, we improve the IoU by introducing the height and width information of the bounding box as follows: e h 1 and w 1 are the height and width of the first bounding box. e h 2 and w 2 are the height and width of the second bounding box. e α is the adjustment coefficient.

Behavior Analysis.
At present, the behavior analysis based on deep learning mainly adopts object detection to directly identify people's behavior, such as sleeping and playing mobile phones. is method has poor robustness. In some cases, the results of this algorithm are inaccurate. In this paper, we propose a behavior analysis algorithm based on target tracking and behavior characteristics. is paper analyzes three behaviors: off-position, sleeping, and playing with mobile phones.

Off-Position Detection.
We think they leave their work area when the object detection algorithm cannot detect the operators in consecutive N frames. C leave is the off-position behavior counter. When there is no operator in the detection result, add one to the counter. If the operator is detected, the counter will be cleared when the off-position behavior counter meets the following: It is considered that the operators have the off-position behavior, and the T leave is the off-position behavior threshold.

Sleeping Detection.
e recognition of sleep behavior is mainly based on the change of the tracked operator's position in each frame. rough the target tracking algorithm, we complete the matching degree between the target's current position and the Deep-sort tracking prediction results. In the matching process, we obtain their IoU score. We measure the change of target position in different frames through the IoU score. e C sleep is the sleeping behavior counter. For the same tracked target, if the IoU score between the tracking algorithm predicted position and the object detection predicted position is less than the set threshold, we add one to the counter. e counter will be cleared if the tracked target disappears or the IoU score is smaller than the threshold when the sleeping behavior counter meets the following: It is considered that the operator has the sleeping behavior, and the T sleep is the sleeping behavior threshold.

Playing Mobile Phone Detection.
We assume that when the target is playing with the mobile phone, the mobile phone is close to the person. erefore, by calculating the Euclidean distance between mobile phones and operators, we can judge whether operators are playing mobile phones. First, this paper uses object detection proposed above to  Figure 4: e improved Yolov4 network structure. detect mobile phones. We assume that the center of the mobile phone is (x p , y p ), and the operator's center is (x i , y i ). We calculate the nearest operator to the mobile phone through Euclidean distance as follows: e i min indicates the bounding box of the operator closest to the mobile phone. C phone is the playing mobile phone behavior counter. Suppose the Euclidean distance between the mobile phone and the nearest operator's bounding box is less than the smaller value between the width and height of the bounding box. In that case, we believe that the operator is playing mobile phone in this frame and adding one to the counter.
When the playing mobile phone behavior counter meets the following: It is considered that the operator is playing on the mobile phone.
e T phone is the playing mobile phone behavior threshold.

Experiment Analysis
is paper verifies the feasibility of the proposed algorithm from three aspects: object detection, target tracking, and behavior analysis. e test environment of the experiment is 8 GB memory, Intel Core i5-6500 CPU, and NVIDIA gtx-1050 graphics card.

Object Detection.
In object detection, the training environment of object detection is 32 GB memory, Intel Xeon e5-2650 CPU processor, and NVIDIA gtx-1080ti graphics card. For training and testing, we propose operating room datasets. e total number of images in the operation room datasets is 20000, composed of monitoring images taken by dozens of station operation room webcams in different illumination and time. e image size is 1280 * 720. e dataset is divided into 70% images as the training set, 20% as the verification set, and 10% as the test set. e categories marked in the dataset are mobile phone and person. e main parameters used in this training are shown in Table 1.
In object detection, precision (Pr) and recall (Re) are used as the benchmark to measure the object detection algorithms.
Pr � TP TP + FP , is paper compares three object detection algorithms: Yolov3, Yolov4, and our method. Table 2 shows the test results of the three algorithms.
Compared with the Yolov4 and Yolov3 detection algorithms, our algorithm has improved the accuracy and recall rate in our dataset.

Object Tracking.
In this paper, we use MOTP (multiple objects tracking precision) and MOTA (multiple objects tracking accuracy) to measure the ability of the target tracking algorithm. We use the sort algorithm, Deep-sort algorithm, and our algorithm to test on our dataset and MOT16 dataset [23]. Figure 5 shows the process of target tracking. Table 3 shows that we test three different algorithms on two different datasets.
Compared with the original Deep-sort algorithm, the MOTA increased 2.7%, and the MOTP increased 1.6% in the MOT16 dataset. In our dataset, the MOTA increased 1.9%, and the MOTP increased 1.1%.

Behavior Analysis.
is article analyzes three violations: off-position, sleeping, and playing with mobile phones. e experimental analysis results are shown.

Off-Position Detection.
We extract ten off-position videos from the video database. We extract one image every second. For the test of the off-position behavior, the threshold of off-position behavior T leave is 180. If no personnel is detected in 180 consecutive images, the operator is judged to be off-position. Table 4 shows the results of 10 video tests.
It can be seen from Table 4 that there may be a difference between the detected frame number and the actual frame number because people may not be detected when they leave the screen. But it does not affect the actual detection results. It can be seen from Table 4 that C leave is greater than the offposition behavior threshold T leave , so the off-position behavior analysis algorithm proposed in this paper can judge the personnel off-position behavior. e screenshot of the test video is shown in Figure 6.     Table 5.
According to Table 5, there are some differences between the counters' maximum C sleep and the total frames of sleeping behavior in videos 1, 2, 3, and 10. But the C sleep of the sleeping behavior analysis algorithm is still greater than the sleep behavior threshold T sleep . erefore, the sleeping behavior algorithm proposed in this paper can judge the sleeping behavior of the personnel. In Figure 7, we show some sleeping behavior detection results.

Playing Mobile Phone Detection.
We extract 10 playing mobile phone videos from the video database. We extract one image every second. In Figure 8, we show some playing mobile phone behavior detection results. e red rectangle indicates the person playing mobile phone, and the yellow rectangle

Conclusion
Aiming at operation room management and monitoring demand, we analyze the operation room's actual problems and put forward an efficient behavior analysis method based on deep learning. rough the experimental test, we verify the effectiveness of the proposed algorithm. e method proposed in this paper has been widely used in many railway stations.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.