Cluster Analysis Based Arc Detection in Pantograph-Catenary System

The pantograph-catenary system, which ensures the transmission of electrical energy, is a critical component of a high-speed electric multiple unit (EMU) train. The pantograph-catenary arc directly affects the power supply quality. The Chinese Railway High-speed (CRH) is equipped with a 6C system to obtain pantograph videos. However, it is difficult to automatically identify the arc image information from the vast amount of videos. This paper proposes an effective approach with which pantograph video can be separated into continuous frame-by-frame images. Because of the interference from the complex operating environment, it is unreasonable to directly use the arc parameters to detect the arc. An environmental segmentation algorithm is developed to eliminate the interference. Time series in the same environment is analyzed via cluster analysis technique (CAT) to find the abnormal points and simplified arc model to find arc events accurately. The proposed approach is tested with real pantograph video and performs well.


Introduction
In high-speed electric multiple unit trains, the pantographcatenary system is one of the critical components in the transmission of electric energy to the train.Any problem that occurs in this system, especially the occurrence of an arc between the pantograph and the catenary, leads to serious damage to the overhead wire and the disruption of traffic on the electrified railway [1,2].To identify pantographcatenary arcs, the 6C system on the Chinese railway is armed with a camera to monitor the operating conditions of the pantograph-catenary system over time.It is important to correctly and efficiently extract the arc images from the vast amount of video and then to use those images to analyze the operating conditions of the pantograph-catenary system [3].Detecting arc faults is essential for the construction and development of the railway system.The pantograph-catenary system is shown in Figure 1.
The image processing based analysis of the pantographcatenary arc has been a research hotspot.Edge detection and Hough transform [4] are used to process the pantograph videos to find the position of the contact wire.Ma et al. [5] obtain the edges of pantograph slide from the images through the wavelet analysis.Recently, automatic approaches based on computer vision and signal processing methods have been more frequently applied [6].Aydin et al. [7] proposed a new method to detect the pantograph arc based on particle swarm.They have used image processing to convert images to time series which provide the possibility for the arc analysis.In [8], a model and image processing based arc detection system is proposed.Arcs occurring in image sequences are modeled during modeling stage which could obtain current and voltage signals that belong to healthy condition and arc occurrence condition.A hierarchical clustering algorithm based on fuzzy -means is developed in [9] and applied to a set of measured data collected by the pantographs of high-speed trains to detect the presence of electric arcs and classify their magnitudes.Reference [10] collects the voltage and current data, which are processed using the SVM algorithm, with the aim of extracting important information to detect arcs.Produced by photoelectric sensors, the edges of the pantograph can be used to detect the abrasion of the pantograph strip [11].An unsupervised classification technique is used in [12] to find the clusters related to the phototube data to verify the efficiency of the clustering procedure.To adjust the pantograph height in an active pantograph system, [13] has proposed a computer vision based control system.With the help of the phototube or photodiode sensors that can emit ultraviolet light, the arc condition can be detected [14].Based on the theory of binocular stereo vision, in [15], a method is proposed which can not only obtain deep information from the switching arc of the low voltage apparatus but also highlight the arc image edge to obtain more useful information from the arc image.To investigate the impact of discharge on the insulator, in [16], a high-speed camera is used to observe this process and the development of a partial arc on the porcelain insulator surface.
The literatures mentioned above proposed many methods to detected the pantograph-catenary arc.However, in the actual operation of the train, there are still some interference factors to consider.Virtually all people use the pantograph videos that are in the ideal environment.Changes in the weather and surrounding objects have great influences on pantograph video.For example, when the train enters a tunnel, the background becomes dark and the appearance of the arc in the image can change significantly.The changes of environment affect the accuracy of arc detection.In addition, some arc recognition methods require the installation of additional devices on the roof of the train, which both increases the complexity and decreases the reliability of the system.
In this paper, to capture the image features of the arc after converting the pantograph videos into frame images, an arc parameter set, denoted as , can be defined to measure the size of the arc in the image.We have considered the influence of environmental change and design the environment segmentation algorithm to get the time series in the same environment.The abnormal points are identified by clustering algorithm and the pantograph-catenary arc review could improve the accuracy of arc detection.

Problem Description and Overall Framework
When pantograph-catenary arc occurs, it produces a strong light that can be seen in the image.The use of image processing methods to analyze the characteristics of the arc light is the first task of arc detection.As the environment on the train track is constantly changing, the environmental segmentation algorithm is the key to ensuring the accuracy of arc detection.The cluster analysis technique (CAT) needs to be able to quickly and effectively identify the arc event, after which the existence of the arc is confirmed by the arc review.The framework of the proposed method is presented in Figure 2.
The proposed method uses image sequences from a camera mounted on the roof of the locomotive which records the pantograph-catenary video.The proposed method mainly consists of four stages for the detection of arc events.
Step 1 (image preprocessing).The images decomposed from the pantograph-catenary video are converted to a binary image, and the ratio of white pixels to black pixels is taken as arc parameter ; here  = / ( denotes the number of white pixels and  denotes the number of black pixels).The value of this parameter is a sample of the signal for the current frame.
Step 2 (environmental segmentation).After the overall image sequences are read, an environmental segmentation algorithm is used to distinguish among the different operation environments, which could increase the accuracy of arc detection.In addition, the algorithm eliminates the impact of the changing environment in time series.
Step 3 (clustering analysis).In the same environment, time series from image sequences are divided into points of normal images and abnormal points by applying CAT.Abnormal points are not all pantograph-catenary arcs because of limitation of clustering algorithm.The points where the arc parameters are too low could also be screened by the clustering algorithm.Step 4 (pantograph-catenary arc review).The model of arc current with time is simplified as the review model which is used to check the abnormal points obtained from clusters.Taking into account the need for a rising edge and a falling edge when the arc occurs, the abnormal points should have the same process and order.
By this method, the proposed system could detect the pantograph-catenary arcs automatically and accurately.

Image Preprocessing
The method proposed in this paper decomposes the pantograph video recorded from the 6C system into frame images, so that continuous video is converted into discrete images that represent the pantograph-catenary region.The Otsu image threshold segmentation algorithm is used to process the images, and the pantograph-catenary area is sheared for arc analysis.The pantograph video is converted into a time series of the arc parameter , which is used to describe the arc sizes that may occur.The image preprocessing method and time series generation are shown in Figure 3.

Otsu Threshold Method.
In Figure 4, the Otsu method is based on the gray level histogram and can be deduced using the least squares method, which is an optimal statistical thresholding scheme.The basic idea is to separate the gray value of the image into two different parts with the maximum variance between them.This approach maximizes the separability according to the optical threshold.
If (, ) is the gray value of position (, ) in the image  × and its gray level is , then (, ) ∈ [0,  − 1].If the number of the whole pixels in gray level  is   , then the probability of occurrence of  gray level is where  = 0, 1, . . .,  − 1, and ∑ −1 =0 () = 1.The pixels are divided into two different parts according to the gray level determined by the threshold : the background  0 and the objective  1 .The gray level of the background  0 changes from 0 to  − 1, and the objective is from  to  − 1.Therefore, the pixels of  0 and  1 are {(, ) < } and {(, ) ≥ }, respectively.
The probability of occurrence of  0 is The probability of occurrence of  1 is where  0 +  1 = 1, and the average gray value of  0 is The average gray value of  1 is The average gray value of the whole image is The variance between the background and the objective is Let  vary from 0 to  − 1, and calculate the betweenclass variance  2 ().Then, we choose the maximum variance, and the corresponding  value is the optical threshold.This method can be used to eliminate the interference.

Image Shearing Algorithm.
Prior to detecting the pantograph-catenary arcs, the pantograph region should be determined.As the camera angle and position remain unchanged, the pantograph position is obtained using an image shearing algorithm, and a rectangular region is defined including the whole pantograph and the partial contact wire.
Figure 5 shows the rectangular region of different frame images after image shearing.The rectangular region is converted to binary image by the Otsu method, in which the white pixels are saved as 1's and black pixels as 0's.The characteristics of the arc image were enhanced by image shearing to improve the accuracy of arc detection compared with the original pantograph image.A time series of arc parameter values is obtained after image processing.

Environment Segmentation Algorithm
When the train is running, changes of surrounding, weather, and light could affect the pantograph video and decrease the accuracy of arc detection.For example, when the train enters a tunnel, the image background quickly becomes dark, and the light from an arc event would appear brighter than in the normal case.To overcome the impact of different image backgrounds, an environmental segmentation algorithm is designed, combining the sliding window, the topdown algorithm, and the interclass variance, with the goal of ensuring accurate segmentation of images from all different environments that could occur in the pantograph video.The impact of this algorithm is tested by performing arc clustering for the same time interval under the same environment with and without the environmental segmentation algorithm.The general framework of the environmental segmentation algorithm is depicted in Figure 6.

Sliding Window.
To avoid interference from the large fluctuations of the single data source when following the topdown algorithm, the initial data is smoothed by setting a reasonable sliding window value.Adjusting the size of the sliding window, , changes the degree of smoothness of the discrete data.

Top-Down Algorithm.
As shown in Figure 7, the topdown algorithm is a time series segmentation method in which the starting and ending points are selected first.The algorithm traverses all points between the starting and ending points and finds the point with the maximum distance.If this distance is greater than a predefined threshold , it is chosen as the next subsection point, and the range of environmental changes is determined.

Interclass
Selecting the optimal segmentation point can minimize the  value.

Cluster Analysis Algorithm
In this section, a novel cluster analysis algorithm in the area of cluster analysis techniques (CAT) is introduced to quickly and effectively find the arc event.The data points in the time series are equivalent to the initial datasets of CAT.The value of the arc parameters is equivalent to the distance in CAT.Then, we could get the normal points into a category and others are abnormal points that are possible to have pantographcatenary arcs.
The clustering approach adopted here is based on the fact that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from the points with higher densities [17,18].For each data point , we compute two quantities: local density   and the distance   from the points with higher density.Both quantities depend only on the distances   between the data points, which are assumed to satisfy the triangular inequality.The local density   of data point  is defined as where () = 1 if  < 0 and () = 0 otherwise.  is a cutoff distance, and   is the number of points that are closer (than   ) to point .The algorithm is only sensitive to the relative magnitudes of the   values for different points.This implies that, for large datasets, the results of the analysis are robust with respect to the choice of   .  is determined by computing the minimum distance between point  and any other point with a higher density.That is, For the point with the highest density, we conventionally take   = max(  ).Note that   is only much larger than the typical nearest neighbor distance for points that are local or global density maxima.Thus, cluster centers are recognized as the points for which the value of   is anomalously large.
After the cluster centers have been found, each remaining point is assigned to the same cluster as its nearest neighbor with a higher density.The cluster assignment is performed in a single step, in contrast to other clustering algorithms where an objective function is iteratively optimized.

Pantograph-Catenary Arc Review
After the cluster analysis, the points in the time series are divided into two categories.Most points are in the normal state with no arc, while others are categorized in an abnormal state, potentially with an arc.To accurately analyze each arc event, this paper makes use of the simplified arc model to check the arc occurrences.
Analogous to the model of arc current with time, there are a rising edge and a falling edge when the arc occurs.In order of time, the arc parameters are gradually changed from small to peak.The abnormal points that have the same process and order are arc occurrences.This feature from arc model could be used to identify the pantograph-catenary arc, as shown in Figure 8.

Testing and Analysis
To verify the applicability of the proposed method under a complex operating environment, we selected a pantograph video from Chongqing with many mountains and tunnels as the initial dataset.A pantograph-catenary video from the CRH380D train is used as a case study for arc detection.
Through the process of Figure 3, the video is divided into 674 frame images.Using image processing and definition of arc parameter , the image could be converted to form the time series.As shown in Figure 9, the abscissa is the number of the frame images and the ordinate is the value of parameter arc, which is a unitless ratio (the same as Figures 9,10,11,12,and 13).In this time series, arcs occur at frames 85 to 89, frames 244 to 246, frames 345 to 348, and frames 586 to 591.In addition, there is a highest peak around frames 491 to 496 because of the overexposure of the camera.
As shown in the time series above, the train's environment over this time period could be divided into three sections.When the train was running in the normal environment at the beginning and end of the time series, the arc parameter was high, while lower values were observed in the middle of the time series as the train traveled through a tunnel.The sliding window algorithm is used to smooth the data and eliminate the interference from single volatile data points with the TD algorithm.In Figure 10, the position of the point with the maximum vertical distance is found by the TD algorithm, and the process of environmental change in time series is divided into sections.This is helpful to accurately determine the position of environmental segmentation through interclass variance.The vertical distance of each point is calculated using the TD algorithm to process the time series, finding that the 414th point has the largest vertical distance.
According to the results of the TD algorithm, the initial time series is divided into two parts, each with a single environment change, as shown in Figure 11.
The interval variance of each point is then calculated for the two parts of the segmented time series.As shown in Figure 12, the 180th and 75th points have the smallest interclass variance.
The time series is then segmented into three parts using environmental segmentation as shown in Figure 13.Within each part, representing the same environment, the influence of environmental changes is excluded.In Figure 9, a highest peak around frames 490 to 498 is due to the overexposure of the camera instead of the arcs.The train is in a dark tunnel when the frame number is less than 491.In a moment when the train goes out of the tunnel, the overexposure of the camera causes the picture to be too bright.This overexposure phenomenon leads to abnormal image, which has a highest peak of arc parameter (frames 493).Compared to the train's running time, the process of changing environment is very short (this process usually lasts about 0.3 seconds which has around 10 images).The algorithm removes the time series during the changing process so that it could eliminate the interferences like overexposure.
The segmented time series is analyzed using a cluster algorithm which yields the cluster center or the center of the normal points.Figure 14 shows the decision graph for the first segmented time series analyzed by CAT.The cluster center is the 48th point with a value of 4.2864 and large values of  and .The normal points are enclosed by a dashed box in Figure 15, and the rest are abnormal points.Through the arc checking scheme, we could detect arc occurrence.The same method is used to address the second and third segments of the time series, as shown in Figure 16.
In this method, pantograph-catenary arc faults can be automatically detected when the train is running.Arc occurrences prompt the train alarm and train staff would take possible operations to ensure the safe operation of the train.If pantograph-catenary arc faults are too serious to cause possible damage of pantograph slider, train staff will use a spare pantograph to provide electricity to the train.If there are frequent small arcs, staff will report the information to the inspection department.Compared with the traditional manual monitoring, the method in this paper can be used for detecting the pantograph-catenary arc quickly and effectively.

Conclusions
The importance of rail transport has increased in recent years worldwide.Power supply units in electrical trains are essential because they provide stable electrical energy to the system.The pantograph-catenary arc directly affects the power supply quality and can lead to the disruption of railway transport.The pantograph-catenary condition can be monitored using the 6C system.However, it is difficult to automatically identify the arc image information from the vast amount of video.The existing methods do not consider the change in the environment surrounding the train, such as the weather, the surroundings, and especially the environment of the operation routes.
This work proposed a novel approach to address the above-mentioned problem.First, the pantograph video is separated into continuous frame-by-frame images through an image processing technique.Second, by defining the ratio between the numbers of white pixels and black pixels as the arc parameter, frame images are transformed into  corresponding parameters, and the pantograph video is converted into a time series to detect the pantograph-catenary arc.Third, an environmental segmentation algorithm is developed to eliminate interference.Fourth, the portion of the time series within the same environment is analyzed using the cluster analysis algorithm to obtain the abnormal points that could represent arc events.Last but not least, pantograph-catenary arc is detected using the simple arc model to accurately analyze each arc event.The method in this paper can be adapted to different environments and weather conditions without requiring any parameter adjustments.In the follow-up work, we will conduct a large area trial to correct the algorithm and analyze the damage of the detected arcs.

Figure 2 :Figure 3 :
Figure 2: The general framework of the proposed method.

Figure 6 :
Figure 6: The framework of the environmental segmentation algorithm.

Figure 9 :
Figure 9: The time series for a video captured under different operating environments.

Figure 10 :Figure 11 :Figure 12 :
Figure 10: Processing the time series with the TD algorithm.

Figure 13 :Figure 14 :Figure 15 :
Figure 13: Segmented time series after processing with the environmental segmentation algorithm.