Faults Detection for Photovoltaic Field Based on K-Means, Elbow, and Average Silhouette Techniques through the Segmentation of a Thermal Image

Clustering or grouping is among the most important image processing methods that aim to split an image into di ﬀ erent groups. Examining the literature, many clustering algorithms have been carried out, where the K-means algorithm is considered among the simplest and most used to classify an image into many regions. In this context, the main objective of this work is to detect and locate precisely the damaged area in photovoltaic (PV) ﬁ elds based on the clustering of a thermal image through the K-means algorithm. The clustering quality depends on the number of clusters chosen; hence, the elbow, the average silhouette, and NbClust R package methods are used to ﬁ nd the optimal number K. The simulations carried out show that the use of the K-means algorithm allows detecting precisely the faults in PV panels. The excellent result is given with three clusters that is suggested by the elbow method.


Introduction
Solar energy source is considered one of the most important energy sources, which has attracted considerable attention worldwide because it provides clean, reliable, and unlimited power. Furthermore, solar energy receives significant investments to develop and improve the productivity of the solar panels, which was evaluated for $131.1 billion in 2019 [1]. Solar energy is captured using photovoltaic panels; these latter present several faults and anomalies that influence the production of the PV systems. On this way, several techniques have been proposed in many works in the literature to ensure reliable and efficient PV operation; these techniques are mainly split into two categories: electrical methods and nonelectrical methods.
Electrical methods are mostly based on I-V characteristics analysis [2], power losses analysis (PLA) [3], statistical and signal processing approaches (SSPA) [4], etc. On the other hand, the nonelectrical techniques are based on infrared thermography, which has known increasing use to con-trol the renewable energy system over recent years [5][6][7]. These techniques are based on the use of thermal cameras and drones in order to detect the hottest areas in the PV field, as shown in Figure 1.
In this context, the applications of the machine learning algorithms to detect and locate the damaged area precisely through the segmentation of a thermal image are investigated.
Segmentation, clustering, or grouping of data is an unsupervised machine learning technique that aims for the partition of data into many groups based on the similarity of properties (e.g., color, size, and shapes). This technique has been widely studied in the literature in a wide range of fields over recent years [8][9][10][11]. Therefore, several algorithms have been proposed, such as the K-means algorithm (KMA) [12], the Fuzzy C Means algorithm (FCM) [13], and the Mean Shift algorithm (MSA) [14]. Among them, the Kmeans method is acknowledged as the simplest, fastest, and most popular clustering method. In spite of having these features, K-means has a disadvantage related to the number of clusters K that must be optimal to get excellent results. To address this issue, numerous methods are proposed, such as elbow method (EM) [15], average silhouette method (ASM) [16], and gap statistic method (GSM) [17].
In this paper, the k-means algorithm's application to detect the damaged solar panels in the PV field is investigated. More precisely, the K-means algorithm is employed to cluster the thermal image into many regions and identify the damaged area. Furthermore, Elbow and gap methods are used to select the optimal number of clusters by setting the system with various values for K.
The remainder of this work is structured as follows: Section 2 provides the clustering techniques and K-means algorithm. Elbow, average silhouette, and NbClust methods are explained in Section 3. Simulation results and discussion are presented in Section 4, whereas Section 5 summarizes the main conclusions of the present work.

Clustering Techniques and K-
Means Algorithm 2.1. Clustering Techniques. Clustering or grouping is one of the most interesting parts of unsupervised machine learning, which has been extensively employed for image processing, and has many applications whether image retrieval or annotation [18]. This technique aims to classify data into many regions (clusters) depending on the similarity of characteristics. As shown in Figure 2, data clustering can be divided into two types: hierarchical [19,20] and partitional [21] clustering. The partitional clustering is the most straightforward technique for grouping data that provides nonoverlapping clusters, where each object belongs to one group, whereas hierarchical clustering permits clusters to have subclusters; therefore, each object can belong to many groups. Figure 3 exhibits the different algorithms clustering for each category, where partitional includes, for instance, Kmean [12] and fuzzy c-mean [13] algorithms. In contrast, hierarchical clustering is split into two types of algorithms: agglomerative algorithms and divisive algorithms [22].
Image segmentation is a process of clustering techniques and the most important key in image processing that aims to divide an image into different segments based on their intensity. In this context, this paper focuses on utilizing image processing to detect the fault in the PV system. In other words, the application of the unsupervised machine learning algorithm is used to locate the damage in solar panels.
2.2. The K-Means Algorithm. The K-means is the simplest unsupervised machine learning algorithm used for clustering, proposed by Mac Queen in 1967 [23]. This algorithm is based on two main steps: the first step is to define the k centroid, and the second step is to attach each point to the nearest cluster. The flowchart diagram of the K-means algorithm is shown in Figure 4.
In this paper, the K-means algorithm is used as a clustering technique to detect and locate precisely the faults areas in a thermal image of the solar field. Otherwise, the thermal image is classified into K groups, and each group contains pixels with similar properties (intensity). The K-means 2 International Journal of Photoenergy algorithm procedure to cluster an image includes the following steps, as noted in [24]: We consider an image with a resolution of ðx, yÞ, p ðx, yÞ is the input, and C k be the cluster centers.
Step 1. Initialization of the cluster k and center Step 2. Calculation of the distance d between the center and each pixel of the image using the following equation: Step 3. Attach all pixels to the nearest center using the calculated distance Step 4. Calculation of the new center position using the following equation: Step 5. Repeat steps 2, 3, and 4 until the center C k no longer moves.
Step 6. Reconstruction of the image by reshaping the pixels of the cluster.

Optimal Number of Clusters
3.1. Elbow Method. In literature, many research works proposed several methods to determine the optimal number of clusters K. The elbow method has been considered among the excellent techniques. It is based on the square distance between the centroid of the cluster and each cluster's sample points. The sum of squared errors (SSE) is the performance indicator, which is calculated for each value K using the following equation [15]: where x is the data present in each cluster and C k is the K th cluster.
The optimal value of K is found when the SSE value drops on the curve drastically and forms a smaller angle.
3.2. Average Silhouette Method. The average silhouette method has the same purpose as the elbow method that was proposed by Liu and Sarkar [25]. It is based on computing the average silhouette of observations for different values of k. Otherwise, the difference between distances an object has to other objects in the same cluster and the distance it has to other objects in other clusters. For object i in a cluster C i , the silhouette width (SW) is defined using the following equation [16]: where b i = min ð∑ j dði, jÞ/jC j jÞ, C i ≠ C j and a i = min ð∑ j d ði, jÞ/jC i jÞ, C i = C j . The SW is considered as a performance indicator varying from -1 to +1, where the optimal number of clusters is corresponding to the highest value.

International Journal of Photoenergy
The flowchart diagram of the elbow method and the average silhouette method are illustrated in Figure 5.

NbClust Package.
To ensure the clustering quality, it is crucial to select the best number of clusters suggested by the most methods cited in the literature. Hence, NbClust has been developed for this purpose, which is a function of the NbClust R package [26] that allows selecting the optimal number of clusters in a dataset by varying all combinations of the number of clusters, distance measures, and clustering methods. In other words, NbClust is based on thirteen indices, as presented in [26], and it provides the indices number suggesting for each number of clusters. The optimal number will be chosen according to the majority rule.

Results and Discussion
In this section, we are going to present the results obtained using the K-means algorithm to detect and locate the faults in PV panels. Firstly, before applying this algorithm, the optimal number of clusters K has to be defined; hence, the Elbow and the average silhouette methods are used.
All implementations are carried out on the thermal image presented in Figure 6; after getting permission from the online website [27], the simulation was performed using python 3.7.0 and an i5 8th gen 1.8 GHz machine with 8 GB RAM platform (Windows10 64 bit).
Before implementing the methods, the image must be converted to vector in the 3-D space of RGB; hence, the input image is converted from RGB (red, green, and blue) colors space to HSV (hue, saturation, and value). Therefore, the LxHx3 image is transformed into a Kx3 matrix with K = LxH, as shown in Figure 7.
The results of applying the Elbow and the average silhouette methods are presented in Figure 8. For the elbow method, it is clear that the elbow point occurs at K = 3; hence, it is the optimal value suggested. On the other side, according to the curve of silhouette width, the highest value corresponding to K = 2, which is the optimal number of clusters proposed by the average silhouette method.
In addition, the NbClust function [28] was also used to find the optimal number of clusters based on 26 indexes, as shown in Figure 9. According to the majority rule, the best number of clusters is 3.  The output results are fragmented into segments depending on the number of clusters. It is evident that Figure 11 pre-sents excellent results that allow to detect and identify the damaged areas that are presented in segment 3. Hence, the elbow method has given the most optimal number of clusters than the average silhouette method.
Four results were carried out in Figure 12 in order to validate the effectiveness of the K-means algorithm for detecting  5 International Journal of Photoenergy the faults in PV panels. As can be noticed from this table, the K-means algorithm has successively detected all faults presented in the four thermal images of PV panels.
Finally, the K-means algorithm with the optimal number of clusters has demonstrated the excellent performance that can be integrated to the drone and the thermal system to identify precisely the damaged solar panels in the PV field.

Conclusion
The energy demand has increased quite fast, which requires the improvement of all energy sources. Especially, solar energy providing clean, reliable, and unlimited energy. Accordingly, many works have been studied to diagnose and control the production of PV panels, among them, the thermal drone system, which allows controlling the vast farms of PV panels.
In this paper, the application of the K-means algorithm is investigated to cluster a thermal image of PV panels and automatically detect the damaged areas. The quality of clustering depends on selecting an optimal number of clusters; hence, the elbow and the average silhouette methods were used.
The K-means algorithm and elbow method together provide excellent results to find and detect precisely the faulty panels. Therefore, this algorithm can be integrated into the thermal drone system to find the faults in the PV system easily and in real-time, especially for large PV fields.

Data Availability
The data used to support the findings of this study have not been made available because it is confidential.

Conflicts of Interest
The authors declare that they have no competing interests.

Authors' Contributions
A. Et-taleby carried out the simulations. The results were discussed and evaluated by M. Boussetta and M. Benslimane. A. Et-taleby wrote the first version of the paper, M. Boussetta corrected and modified it afterwards, and after supervision of the article by all the team, M. Bousseta submitted the article to the journal. All authors of this research paper have directly participated in the planning, execution, or analysis of this study. All authors read and approved the final manuscript.