Positioning of the Moving and Static Contacts of the Switch Machine Based on Double-Layer Mask R-CNN

With the continuous development of rail transit, the maintenance of the switch machine becomes more and more important, and the contact depth of the moving contact and static contact in the switch machine is a key part of it. At present, the manual measurement method is the main measure of contact depth, which has the problems of low eﬃciency and strong subjectivity. The measurement of contact depth based on machine vision includes two steps: moving and static contact positioning and distance conversion. The positioning result will have an important impact on distance measurement. Therefore, a positioning method for moving and static contact based on double-layer Mask R-CNN (DLM) is proposed in this paper: ﬁrst, the moving contact is roughly positioned by Mask R-CNN to obtain the predicted target area; second, the subgraph of the target area is preprocessed; ﬁnally, the precise positioning is used to determine the precise position of the moving and static contact. The accuracy and robustness of the proposed DLM are veriﬁed by the internal image of the switch machine.


Introduction
In recent years, the technology development in the field of rail transit becomes more and more mature, and safety is an essential attribute of rail transit. Trackside facilities such as switch machines play an important role in the safe operation of rail transit. Once the switch machine fails, serious train derailment accidents will occur, so the switch machine must be in good working condition at all times. e switch machine will inevitably wear in daily use, which needs to be observed and maintained on time. e contact depth of the moving and static contacts of the switch machine determines whether the switch machine can work normally, so they become the key to maintenance.
Moving contact and static contact are indispensable parts to complete this task. As shown in Figure 1, the red hollow box is the static contact area, one at the top and one at the bottom. e small red solid circle is the static contact in this paper, while the moving contact in this paper is the blue solid circle. It swings up and down with the state of the switch machine and contacts with the tape spring to conduct current to make the switch machine work. e distance between the yellow lines is the contact depth, which can be calculated from the relative position of the moving and static contact. e contact effect is determined by the depth of the moving contact driving into the static contact.
Among them, the contact depth of the moving contact column and the static contact base of the switch machine become the key to maintenance. If the contact depth is not up to the standard, the electric circuit and some serious consequences would be caused by the poor contact depth. At present, manual participation is still needed for the inspection of the switch machine. Due to the complexity of the switch machine structure, the inspectors need a lot of professional knowledge and testing experience, the test method cannot be promoted, and so on. ese reasons have caused the low efficiency of the maintenance and protection of the switch machine. e maintenance efficiency can be improved by using the automatic method to measure the contact depth of the moving contact and static contact, and artificial intelligence has made great progress in data-driven modeling [1,2]. e automatic detection method is mainly noncontact detection, which will not affect the surface deformation and wear of the device. At present, the common noncontact distance detection methods are mainly divided into ultrasonic detection method [3,4], laser detection methods [5], measurement based on stereo vision, machine vision measurement, and so on. Ultrasonic and laser detection methods require high reflectivity of the object surface to be measured. If the reflectivity of the object surface does not meet the standard, the measurement effect will be poor. e measurement technology based on stereo vision is more stringent on the number and placement conditions of cameras. However, the on-site conditions in the switch machine case do not meet the requirements of the above methods, and the above distance measurement methods are not suitable for the detection of the driving depth of the mobile static contact of the switch machine. erefore, the machine vision measurement method is selected.
When using machine vision for automatic measurement, the first step is accurate positioning of moving and static contact's target area and the second is distance measurement based on the image. e result will be greatly affected by the accuracy of each step, and the target positioning has the greatest impact on the accuracy. erefore, it is necessary to select an accurate positioning method.
At present, the methods of target location are divided into traditional methods and modern methods. Traditional target location methods include feature extraction and feature classification. Template matching [6,7] is a common traditional location method. In the aspect of image feature extraction, there are local binary pattern feature [8], Histogram of Oriented Gradient (HOG) [9,10], Haar feature [11], and other features. After obtaining image features, the similarity measure [12,13] is used to classify and locate the target. A dynamic positioning algorithm based on template matching [14] is proposed by Yin et al. to detect the area, width, and distance of groove shape. e limitation of this method is that it requires high image quality and cannot recognize the rotation or size change of the matching target. ese traditional image feature extraction methods have high normalization to the image and cannot adapt to the complex and changeable environment. In the application of engineering practice, the detection accuracy and anti-noise ability are poor, the robustness of image data processing is weak, and the effect needs to be improved.
With the deepening of image processing and image classification based on deep learning [15][16][17], modern methods of target location have achieved important research results, such as You Only Look Once (Yolo) [18], Convolutional Neural Networks (CNNs), Regional Convolutional Neural Networks (R-CNNs), Fast Region-Convolutional Neural Networks (Fast R-CNNs) [19], Faster Region-Convolutional Neural Networks (Faster R-CNNs) [20], and Mask R-CNN [21][22][23][24]. At present, Mask R-CNN is an excellent deep learning network compared with the previous generation. It adds segmentation branches and carries out target detection and segmentation tasks synchronously. It introduces Region of Interest Alignment (RoI Align) to replace Region of Interest Pooling (RoI Pooling) in Faster R-CNN, which greatly improved the accuracy of region segmentation. Compared with the traditional methods, Mask R-CNN has a stronger ability of high-level semantic abstraction, translation invariance of convolutional neural network, and scale invariance within training, which are also necessary for image classification. Mask R-CNN is the most efficient and widely used method in the target location.
When we use Mask R-CNN for one-step positioning of the static contact of the switch machine, we find that the positioning effect of the moving contact is good but the positioning effect of the static contact is poor. is is because the internal structure of the switch machine chassis is complex, there are many parts with similar shapes and colors, and the shape of the static contact to be positioned is too small, which will lead to the low positioning accuracy of the static contact area. erefore, the static contact area cannot be accurately located by the one-step Mask R-CNN method.
To solve the above problems, this paper proposes to use two Mask R-CNNs to locate the moving and static contact, which are divided into rough positioning and precise positioning. e definition of rough positioning in this paper is as follows: Mask R-CNN is used to get the subgraph containing the moving and static contact area so as to reduce the influence of irrelevant environment and improve the subsequent positioning accuracy.
e definition of precise positioning is as follows: after the subgraph is preprocessed, the Mask R-CNN is used to locate the moving and static contacts and finally get their accurate positions. e problem of inaccurate positioning of static contact can be effectively solved by DLM.
Compared with R-CNN, the improvement of DLM lies in the following: because of using the multistep positioning method, the positioning accuracy of DLM will be higher than that of R-CNN, especially for objects with a small pixel value. When R-CNN locates the target, it uses a rectangular box to locate, while DLM is accurate to the pixel level, which is helpful for the subsequent contact depth measurement. e training process of R-CNN is very complicated, including pretreatment training sample, parameter fine-tuning, SVM training, and bounding box regression training. Multiple GPUs are used to accelerate the DLM training time, which improves the efficiency compared with R-CNN. e main innovations of this paper are as follows. (1) A machine vision-based automatic positioning method for moving and static contact of switch machine is proposed to carry out accurate positioning in a complex environment.
(2) A multimodel method is proposed for object location and subgraph segmentation. e DLM method is used to solve the problem that it is difficult to locate small pixel objects in large pixel images, and the positioning accuracy is improved. e remainder of this paper is organized as follows. Section 2 introduces the theory of Mask R-CNN. Section 3 describes the theory of the proposed DLM switch machine moving and static contact positioning method. Section 4 introduces the experimental results of DLM theory, and the improvement of the accuracy of the DLM is verified compared with the single Mask R-CNN and Yolo. Section 5 summarizes and prospects the full text.

Basic Introduction of Mask R-CNN
Image classification is the basic task of image recognition. Mask R-CNN is mainly used for object detection and entity segmentation. It inputs images and outputs them as image categories and object masks. Its network architecture is shown in Figure 2.
Compared with Fast R-CNN, some changes have been made as follows: (1) Replace the ROI pooling layer with ROI align (2) Add parallel FCN layers (3) Add feature extraction network to resnet101 + FPN to enhance feature extraction ability Mask R-CNN adopts multitask loss function as follows: e loss function of each ROI region consists of three parts: classification loss value of bounding box, location regression loss value of bounding box, and loss value of mask. e mask branch has K * m 2 dimensions output for each RoI (K is the number of masks and m 2 is the resolution). ere are 2 (2 is the number of categories: target and background) binary masks with m * m resolution during training. For a class of RoI, L mask only considers the mask of this target class. e input of other kinds of masks will not be calculated into the loss function. In the feature layer of K * m 2 , each value is a binary mask: 0 or 1. In the first target region prediction, the first feature layer with the first resolution of m * m is selected, and then the average binary cross-entropy loss is calculated, which is the loss function of the mask branch.

Algorithm Process.
is section will introduce the basic process of the algorithm, the specific process is shown in Figure 3, and the overall process is divided into two steps. e first step is rough positioning, which inputs the original image of the switch machine into the step 1 Mask R-CNN for rough positioning, so as to obtain the moving contact area and eliminate the adverse effect of other irrelevant area pixels on subsequent measurement. e second step is precise positioning, which processes the image of the moving contact area obtained in step 1 to obtain the subgraph with fixed size and including the moving and static contact. e subgraph is input into the step 2 Mask R-CNN, and the static contact can be accurately obtained through the pretrained subgraph model for accurate contact depth measurement.
On the basis of Mask R-CNN, the DLM can achieve accurate positioning in complex environment and solve the problem that the internal structure of switch machine is complex and the effect of the ordinary positioning method is not good. e DLM can be used to locate small objects in large images, and the experimental results show that it can improve the positioning accuracy.

Rough
Positioning. In most cases, the direction of the manual photos of the switch machine is not the same. Because the size of the input image in the neural network will be reset in the input layer, some images will be compressed here, resulting in deviation in distance measurement. erefore, before resetting the size, the length and width of the image will be calculated to decide whether to rotate the image 90°. Complexity e pictures with fixed size and direction are put into the network, model 1 is loaded for testing, and the prediction of moving contact area is obtained, as shown in Figure 4. e black background is the schematic diagram of the original diagram, and the white part is the area of interest located by the Mask R-CNN in step 1, which is the moving contact area.

Precise
Positioning. Due to the large size of the image of the data set, the static contact accounts for fewer pixels in the image, so it is difficult to locate directly from the original image, which is easy to cause positioning offset and incomplete mask. In order to get the location of the static contact better, this experiment adopts the step-by-step method. After obtaining the location of the moving contact from the rough positioning, it carries out the preprocessing and subgraph segmentation and then predicts the location of the static contact from the subgraph. ere are four steps in this method as follows: (1) Gray-scale processing is performed on the prediction area of the moving contact mask. Because the color of the predicted area is different from the background, it is easy to segment the edge after graying. (2) e gray image is binarized to form a black-andwhite image, which is easy to find the smallest enclosing circle, as shown in Figure 5 binarization. to intercept the small graphs with a fixed size, the graph is the subgraph of subsequent processing. e processing process is shown in Figure 5, including graying, binarization, the smallest enclosing circle, and obtaining the final moving contact area.
e fixed area subgraph is intercepted by the moving contact column center coordinates obtained in the previous step, as shown in Figure 7(a). If the moving contact column coordinates obtained by the first model are too close to the image edge, the image area to be intercepted will exceed the limit, as shown in Figure 7(b), resulting in the program crash.
In order to solve this problem, some black pixels are filled on each edge of the image, and the moving contact center has moved corresponding pixels in two dimensions, which solves the problem that the center of the screenshot is close to the edge of the image and the screenshot fails. e effect after processing is shown in Figure 7(c). e subgraph required for fine positioning can be successfully obtained through pixel expansion.
By sending the subgraph into the Mask R-CNN in the precise positioning step, we can get the accurate moving contact and static contact area. e accuracy of positioning determines the accuracy of contact depth measurement, so it is necessary to use the DLM method.

Experimental Environment Configuration.
e experimental environment used in this experiment is as follows: the operating system is Windows 10 professional 64 bit; the CPU is Intel i7-8700 @ 3.2 ghz; the GPU is NVIDIA GeForce RTX 2080; the deep learning framework is Tensorflow 1.5.0; the memory is 32 G; and the programming language is Python 3.5.

Data Set Construction.
e data pictures used in this paper are from the switch machine pictures taken by manual mobile phones. Due to the different sizes of the pictures, it is necessary to unify the picture size to 960 * 1280. In order to simulate the recognition rate under special working conditions under poor shooting conditions, some operations are taken, such as angle transformation, brightness transformation, and adding noise. e image states of the data sets are very different, so the positioning is challenging. e training set of rough positioning consists of 500 internal images of the switch machine, and the training set of precise positioning consists of 1500 subgraphs. Each test set has 50 pictures, a total of 6 test sets. e method proposed in this paper is used to collect six times of positioning results, and the positioning accuracy of the moving and static contact of the switch machine is obtained.
is experiment adopts the transfer learning method, and the coco data are used as the pretraining model to speed up the convergence time of the model.
In this experiment, the key to judging whether it is a positive or negative sample is determined by the Intersection over Union (IoU). e judgment of whether the candidate frame anchor is a positive or negative sample is taken as an example. If the calibration threshold is 0.5, then calculate the IoU between each reference box and ground truth. If it exceeds the set threshold, it is a positive sample, otherwise a negative sample, where the ground truth is obtained from the coordinates of the upper left corner and the lower right corner of the mask in the training image label.
where box anchor is the anchor reference box and box GT is the ground truth. Since this experiment involves the location of interest, mask segmentation, and target classification, three judgment thresholds are set. e specific values are shown in Table 1.

Analysis of Single-Layer Mask R-CNN Positioning Results.
In order to verify the high accuracy and robustness of the DLM method, the single location method is used to design the comparative experiment in this paper. Only single-layer Mask R-CNN is used to locate the contact of switch machine picture. e moving contact area obtained by single positioning is very accurate, which is similar to the rough positioning method in the DLM method. However, for the positioning of the static contact, because the target area is small and the original image is a 960 * 1280 large pixel image, the static contact area is hard to determine. e positioning effect is poorer than the DLM method in this article. As shown in Figure 8, the mask of the moving contact is accurate, but the mask of the static contact is incomplete or even disappeared.

Rough Positioning of Switch Machine Moving Contact.
e image is input to be predicted into the Mask R-CNN in step 1 to get the mask segmentation image. e comparison of the moving contact positioning effect is shown in Figure 9. e region of interest of moving contact obtained in Figure 9(b) is processed before segmentation. After graying, binarization, and the smallest enclosing circle, the circle obtained by experiment has high overlap with the original image. e specific experimental picture is shown in Figure 10.
It can be seen from Figure 10 that the method mentioned in Section 3.3 can be used to locate the moving contact under extreme conditions, and the mask covers a complete area without causing the center of the circle to shift.

Subgraph Acquisition and Region Recognition.
Before the region segmentation of the moving contact in the original image, in order to prevent the segmentation failure, the pixel expansion is carried out first. e results show that it is effective to expand 200 black pixels in the two dimensions, and 200 pixels are added to the corresponding coordinates in the segmentation process. e size of the subgraph will not change after segmentation, as shown in Figure 11.
According to the center coordinates of each moving contact after processing the target area, the subgraphs are segmented, and the segmented images are shown in Figure 12. It can be seen that all the subgraphs contain the parts needed for contact depth measurement.

Analysis of Precise Positioning Results.
e subgraph obtained by rough positioning is input into the second stage Mask R-CNN to obtain the region of interest of the static contact, which is an important basis for the subsequent calculation of contact depth.
Among them, the segmentation of the moving contact in the original picture is shown in Figure 13(a). Figure 13(b) shows the key areas around the moving contact identified by the precise positioning step, including the tape spring and the static contact. e area of the tape spring and the area of static contact are shown in Figures 13(c) and 13(d). By calculating the distance between the center of the cylinder and the region of interest of the paddle and the static contact, the contact depth of the moving and static contact can be calculated. e multitask loss function curve of Mask R-CNN is shown in Figure 14.
e change process of the loss function value of the subgraph model is shown in Figure 14.
e number of training sessions is represented by x-axis, and the loss function value is represented by y-axis. Figures 14(a)-14(d) show the loss function, classification branch loss function, mask branch loss function, and positioning branch loss function, respectively. It can be seen from Figure 14 that the losses tend to converge with the increase in training times.
ere are jump forms in the middle of each graph because there is a mode transition from only training network heads to global training. Although the curve of loss function shows that it converges all the time, sometimes when the training times reach a certain value, the loss function mutation will appear, so we need to refer to the trend chart of loss function when selecting the model.  Table 2.

Complexity 9
A M � T M 50 × 100%, And as shown in Figure 15, the number of groups is represented by x-axis, the accuracy is represented by y-axis, the positioning result of moving contact is counted by the red line, and the positioning result of static contact is counted by the blue line. According to the results of DLM, it can be seen that the static contact is more difficult to locate than moving contact, and there is a positive correlation between the positioning accuracy of static contact and that of moving contact. After SPSS software correlation calculation, R value is greater than 0, equal to 0.955 and close to 1, which is means that the two groups of data are positively correlated, and the correlation is strong, P value is 0.02, less than 0.05, so the two groups of data obtained from the experiment have a significant positive correlation. is also shows that in the two-step positioning process, the positioning results of each step will have an important impact on the subsequent experiments. Table 3 shows the positioning results of the three methods for the moving and static contact of the switch, respectively. Because Yolo and R-CNN are commonly used in the target location algorithm, we compare them to increase the reliability of the article. e number of T M and T S is the sum of six groups of results. e data can be seen intuitively in Figure 16.
As can be seen from Figure 16, there is no big difference in the positioning accuracy of moving contacts among DLM, Yolo, and Mask R-CNN, and their accuracy rate is 97.33%, 97.67%, and 95.67%. However, the accuracy of static contacts is quite different, which means that static contacts with smaller pixels in the original image are not easy to be located. e accuracy rate of static contact positioning based on the DLM is 94%, while the Yolo is 62.33% and Mask R-CNN is 5%. e DLM algorithm can be used to solve the problem of static contact positioning accuracy. At present, the algorithm has reached a high detection accuracy, which can reach 94 percent under normal conditions, and it also has a good detection effect on pictures under severe working conditions.    At the same time, it can be found from Figure 16 that Yolo is better than Mask R-CNN in the one-step positioning of static contact. is is because the Yolo positioning result is a rectangular frame, in which there are static points and their adjacent areas. However, Mask R-CNN can accurately locate to the pixel level, so the detection area is much smaller than Yolo, resulting in the difference of positioning effect.

Conclusion
In this paper, the DLM is proposed to locate the moving and static contact of switch machine. First, rough positioning is carried out to obtain the moving contact area in the original image for subsequent measurement; second, the precise positioning is carried out. e subgraph obtained from rough positioning is preprocessed, and then secondary positioning is carried out to improve the detection accuracy of the target area. rough two positioning, the static contact area can be accurately obtained, improving the accuracy of automatic contact depth measurement.
e experimental results show that DLM can automatically locate the internal parts of switch machine box in batch, the positioning accuracy of static contacts has been greatly improved, and the robustness is good. e positioning of the switch machine moving and static contact based on DLM better promotes the automation of rail transit maintenance work, reduces the work intensity of maintenance personnel, and provides a reference for further research on future inspection work. e inspection method proposed in this article can also be applied to a variety of industrial scenarios to improve operation efficiency. e DLM method of this paper has the following points to be improved: (1) e DLM is a two-step positioning method, which takes a long time compared with one-step positioning, so the detection efficiency needs to be improved (2) Before Mask R-CNN training, a large number of labels need to be produced manually, and the training effect is related to label production (3) e angle correction algorithm can effectively solve the angle interference caused by the portable device

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.