Fault-Level Grading of Photovoltaic Cells Employing Lightweight Deep Learning Models

The deployment of photovoltaic (PV) cells as a renewable energy resource has been boosted recently, which enhanced the need to develop an automatic and swift fault detection system for PV cells. Prior to isolation for repair or replacement, it is critical to judge the level of the fault that occurred in the PV cell. The aim of this research study is the fault-level grading of PV cells employing deep neural network models. The experiment is carried out using a publically available dataset of 2,624 electroluminescence images of PV cells, which are labeled with four distinct defect probabilities defined as the defect levels. The deep architectures of the classical artificial neural networks are developed while employing hand-crafted texture features extracted from the EL image data. Moreover, optimized architectures of the convolutional neural network are developed with a specific emphasis on lightweight models for real-time processing. The experiments are performed for two-way binary classification and multiclass classification. For the first binary categorization, the proposed CNN model outperformed the state-of-the-art solution with a margin of 1.3% in accuracy with a significant 50% less computational complexity. In the second binary classification task, the CPU-based proposed model outperformed the GPU-based solution with a margin of 0.9% accuracy with an 8× lighter architecture. Finally, the multiclass categorization of PV cells is performed and the state-of-the-art results with 83.5% accuracy are achieved. The proposed models offer a lightweight, efficient, and computationally cheaper CPU-based solution for the real-time fault-level categorization of PV cells.


Introduction
With the beginning of 21st century, stimulation of improving energy-efcient policies increased the public interest towards renewable energy, especially the solar energy, since it is noiseless and pollution-free. Tis interest opened the gates for research studies in achieving optimal performance for solar energy systems. Energy systems, in general, are of two types: active systems and passive systems [1]. Passive systems do not consume energy; rather they convert energy from one form to another. Photovoltaic (PV) systems are purely passive systems as the electrical energy is generated directly from semiconductors by the photovoltaic efect. With the passage of time, diferent kinds of faults may occur decreasing the PV cells efciency such as hotspot fault, diode fault, junction box fault, ground fault, arc fault, and lineline fault [2]. Such faults may rise due to glass breaking, oxidization, delamination of cells, and bubbling [2]. Apart from reducing efciency, there is a risk of fre as well. For detection and diagnosis of such faults, there are several methods in practice including signal processing, statistical approaches, current-voltage curve analysis, power loss analysis, and machine learning-based techniques [3]. For detection of fault, a comparison between reference and observed measurement is made. To maintain the PV system's efciency, the faults must be detected in the frst place followed by isolation or maintenance of the faulty cell. To detect the faults such as microcracks in PV systems, several techniques can be found in the literature [4]. One among them is Laser Beam Induced Current (LBIC). It is an optical way to observe microcracks in PV cells [5,6]. An AC laser beam of wavelength ranging from 638 nm to 850 nm is produced by modulating the electric current through the laser diode and directed on the photosensitive device. Tis causes direct current (DC) to fow through the semiconductor. Large current variation by changing PV cell position indicates the presence of the defect [7][8][9]. Another technique, Electron beam induced current (EBIC) is the semiconductor-based analysis technique in which the current is induced in the sample and used as a triggering signal for image generation. Te image highlights the local defects of the PV cell. Most of the electronic beam techniques are performed using scanning electron microscope (SEM) [10]. Photoluminescence (PL) imaging is another method where electrons get excited in the conduction band after the photon is absorbed. It causes recombination of the electron hole pair. Te image is captured through CCD camera. Te electroluminescence (EL) imaging method is also used for detection of microcracks of the wafers and solar cells by employing luminescence imaging. EL is a form of luminescence in which electrons get excited in the conduction band when electric current in forward bias mode is passed through it. Te excitation of electrons emits infrared radiation at wavelength ranging 950-1250 nm. Te image of solar cell is then captured through charge-coupled device (CCD) cameras [10]. Te defective or disconnected PV cell appears darker [11]. Te defect can be conveniently located through visualization. Te EL imaging phenomenon is shown in Figure 1. Te diference between EL and PL is that, the PL technique involves the excitation of electrons using laser light instead of electric current [12]. Te EL imaging technique works for fnished PV cells, whereas the PL imaging technique is used for both wafers and solar cells.
Te convolutional neural networks (CNNs) have been successfully employed for pattern recognition tasks. However, in literature, only a few studies exist involving CNN for EL image data classifcation, particularly the defect-level classifcation. Most of them focused on the binary classifcation problem where the GPU-based computationally expensive solutions were proposed. In fact, merely a couple of studies proposed CPU-based cheaper and real-time solutions. In this paper, we aim to develop the lightweight CNN-based models suitable for CPU machines for real time processing. Te contributions of this paper are summarized as follows: (i) Te deep learning capability of fully connected neural network is experimented with hand-crafted features for fault-level classifcation of PV cells. Data augmentation is performed to balance the class representation and both preaugmentation and postaugmentation results are presented. (ii) Te two-way binary classifcation is performed by segregating the data in two separate ways; hence, the two independent results of binary classifcation are presented. (iii) A customized, simpler, and computationally efcient CNN architecture is developed ensuring realtime classifcation of EL image data. Te proposed CNN model has 50% less parameters than the stateof-the-art CNN-based solution. (iv) Te proposed CNN model achieved state-of-the-art results with a 23× shorter training time. Moreover, the state-of-the-art result of multiclassifcation of the EL image data is presented.
Te rest of the paper is organized as follows: Section 2 presents the related work, the detail of dataset is provided in Section 3, the methodology is explained in Section 4, performance evaluation strategy and the metrics are presented in Section 5, Section 6 includes results and discussion, Section 7 presents a detailed comparative analysis, and fnally, conclusion is added in Section 8.

Related Work
One of the basic approaches for fault detection is the comparison of observed output power with the reference power. Te diference higher than the defned threshold indicates the presence of fault [13,14]. A study presented the application of Kalman flter for the prediction of power output [15]. Te noisy measurements were taken as input and given to the physical underlying model, which produced an output value with the highest probability. Te output was then used to locate the faults within the measurements of voltage, current, and the power. Artifcial intelligence (AI)based techniques have also been used for fault detection. Bayesian and fuzzy logic algorithms were tested to determine the PV cell output [16]. Both supervised and unsupervised machine learning techniques have been employed for PV   Computational Intelligence and Neuroscience system fault diagnosis such as k-nearest neighbor, decision tree, and support vector machine (SVM) [17]. In a study, an artifcial neural network (ANN) model was developed to locate the short circuit (SC) in PV cell [18]. In another study, Bayesian network (BN) was used to describe the causes of the detected faults [19]. Another research claimed the error rate of 0.35-0.55 by combining two approaches: SVM and knearest neighbor for the detection of PV faults [20]. Mohamed and Nassar presented an ANN-based solution for the diagnosis and repairing of PV systems [21]. Te abovementioned studies involved machine learning techniques for fault detection and diagnosis. However, the features used in those studies were the measured readings of current, voltage, and power, which involve the tedious manual inspection and record. Considering the physical inspection scenario, in most of the cases, PV systems are placed at elevated places making it difcult to locate the fault and isolate the faulty cell [22]. An unmanned arial vehicle (UAV) equipped with thermal camera may be a solution for fault detection in such a scenario [23]. Te images captured with thermal camera unveil the location of the fault in solar cells very conveniently. Recent literature shows the usage of thermography for the detection as well as classifcation of PV faults. However, it has its own limitations [24][25][26]. Te infrared (IR) imaging has also been carried out for fault detection in solar cells [9,27]. However, it is challenging to locate the exact location of the fault and identify microcracks in the infrared images due to its relatively low resolution. Te author concluded that a hot region in IR images may result as a false positive. Te EL imaging, as mentioned earlier, is the technique to identify the faults in PV cells, which involves capturing the infrared energy emitted from the cell in the form of gray-scale image. Te resultant EL image provides better resolution than an IR image [28,29]. Various studies have been carried out for fault detection in PV cells, but a few considered EL imaging. A study presented Fourier image reconstruction technique for fault detection in EL images [30]. However, the authors considered limited defects including fnger interruptions, small cracks, and breaks. Moreover, it was a complex detection method due to shape assumptions. Another study [31] used independent component analysis (ICA) for defect detection; however, fnger interruption and cracks were considered equal defects. In another study, Stromer presented vesselness algorithm for crack segmentation. However, the cracks larger than 20 mm in size were considered [32]. Recently, deep learning and convolutional neural networks (CNNs) have been employed for defect detection in PV modules. In a study [33], automatic inspection of PV module was presented using deep learning, however, only visible defects were considered. Similarly, multispectral CNN was proposed for visible fault detection [34]. A recent study presented the fault classifcation in electroluminescence images including the categories as defect-free, microcrack, break, and fngerinterruption [35]. Te dataset was collected from a private company as well as from public domain. Te general adversarial network (GAN) was used for data augmentation and pretrained CNN models were used for defect classifcation, originally trained for ImageNet challenge [36]. Another study used CNN model for defect categorization in PV cells [37]. However, the IR image data was utilized for this purpose.
Apart from defect detection or categorization, recently, studies have been conducted on detection of the level of the defect. A study presented deep learning approach for defectlevel classifcation in PV cells using EL images [38]. For this purpose, a public dataset of EL images labeled with four distinct defect levels as classes was used [39]. Te author used VGG-19 pretrained CNN model and tuned it via transfer learning. Te model produced 88.4% accuracy for binary classifcation; however, the experiment was performed on the graphics processing unit (GPU)-based machine, making it a computationally expensive solution. Te authors also performed SVM-based classifcation for realtime processing and achieved 82.4% accuracy. Another recent study proposed light CNN architecture for defect level binary classifcation of the same EL image data [40]. Te authors considered the VGG-11 as the initial CNN architecture and further simplifed it to obtain the optimal and light architecture for the classifcation task. Te authors claimed 93.02% classifcation accuracy; however, it was not mentioned how the original four target classes ended up into two classes to perform binary classifcation. Te data augmentation was also performed but there is no mention of the total samples, as well as the evidence of balancing the classes after performing the augmentation.
Concretely, the analysis and classifcation of defects as well as fault-level classifcation in PV images in real time still demand further research in many aspects such as increasing data volume via efcient data augmentation methods; the estimation of simpler and optimized machine learning algorithm to enhance robustness; and to estimate a lighter network for achieving real-time processing. In addition, being four-labeled categories of the publicly available EL image data [39], there is no evidence of multiclass classifcation in the literature. Terefore, there is a need to develop a system for multilevel categorization as well.

The Data Set
For this research, a publicly available dataset of EL images is used [39]. It consists of 2,624 image samples of healthy as well as faulty PV cells. Each sample is an 8-bit gray-scale image with a resolution of 300 × 300 pixels. Tese image samples were originally extracted at cell level from mono and poly-crystalline PV modules, and were normalized with respect to perspective and size. Te details of the dataset are summarized in Table 1.
Te original images were initially analyzed by the human experts based on working condition of the cell, and labels were assigned in terms of defect probability. Tere are four distinct classes labeled with defect probability 0.0, 0.33, 0.66, and 1.0, where defect probability 0.0 represents the full healthy cell; defect probability 0.33 characterizes less faulty cell; 0.66 represents medium faulty and defect probability, and 1.0 denotes full faulty cell. Concretely, the defect probability represents the defect-level of the PV cell.

Computational Intelligence and Neuroscience
Few samples from the dataset belonging to individual categories are shown in Figure 2. Tere are diferent types of labeled defects including microcracks, material defect, fnger interruptions, and fracture interconnect which afected the PV cell efciency. Although, the EL image dataset consists of image samples with four class labels; however, the number of images belonging to each class are not the same. Te maximum number of samples belongs to healthy class (defect probability 0.0), whereas the least representation is of the class with defect probability 0.66. Te count of original classwise samples in the dataset is given in Table 2.

Artifcial Neural Network Model.
Among the data-driven approaches for pattern recognition and classifcation applications, ANN has been successfully used in the last couple of decades. Customized architectures of feed-forward neural network can be estimated to accommodate the complex nature of the input-output relationship of the data. In this work, deep ANN architectures are employed. For the classifcation of EL image data, the ANN architectures are estimated starting with single hidden layer and then extended up to multiple hidden layers until optimized. Te size of each hidden layer is also estimated in the process by observing the cross-validation error. Te Levenberg-Marquardt (LM) algorithm is used for training the network with zero mean square error (MSE) as the convergence criteria. Te description of the fnal estimated architectures is added in the results section.

Hand-Crafted Features.
Te ANN requires features to be fed with. Tere are several kinds of features which may be considered, including polynomial features [41][42][43]. However, we opted for two kinds of popular and widely used hand-crafted features: gray-level co-occurrence matrix (GLCM) features and local binary pattern (LBP) features.

Gray Level Co-Occurrence Matrix.
A GLCM represents spatially joint probabilities of pixel intensities in the image. Te features computed from GLCM are classic yet efective and provide the texture analysis of the image, originally proposed by Haralick et al. [44]. A total of the following 22 features from each of four GLCMs computed at angles 0, 45, 90, and 135 degrees are computed [45]: autocorrelation 1, contrast, autocorrelation 2, cross-correlation, cluster prominence, cluster shade, dissimilarity, energy, entropy, homogeneity, maximum probability, sum of squares, sum average, sum variance, sum entropy, diference variance, diference entropy, information measure of correlation 1, information measure of correlation 2, inverse diference, inverse diference normalized, and inverse difference moment normalized. A total of 88 features were extracted from each image.

Local Binary Patterns.
Local binary patterns (LBP) as features have also been widely used for the applications of pattern recognition and computer vision. Te simplest LBP feature vector is generated as per the following steps: (i) Dividing the window to be examined into cells of 9 × 9 pixels per cell. From each of the image, a 59-dimensional LBP feature vector was computed.

Customized Convolutional Neural Network.
CNNs have been used successfully in recent past for several applications from simple visual recognition tasks up to vehicle's autonomous driving systems. A CNN consists of convolutional layers, pooling layers, activation functions, and fully connected (FC) layers. Te convolutional layer plays a vital role since it extracts the features from the images. Te frst convolutional layer is connected to the raw pixels, extracting low-level features like edges, where the next layer gets medium-level information, and subsequently the next layers extract high-level features. Te pooling layer is employed to reduce the size of the learned features by ignoring less important information. Te FC layer is similar to the one in ANN, where each neuron from the previous layer is connected to every neuron in the current layer. Te number of neurons in the output layer is kept equal to the number of output labels. By customized, it means that the network is built from scratch. Te architecture is selected by increasing the size and recording the classifcation results on training  Table 3 that the representation of full faulty class (defect probability 1.0) is almost 50% of that of representation of healthy class (defect probability 0.0). It can also be seen that there is signifcantly a small representation of the medium faulty class (defect probability 0.33). Te least representation among all is of medium faulty class (defect probability 0.66). Since the machine learning techniques are data-driven approaches, therefore, for any dataset, the classifer is inherently biased towards the target class having the largest number of samples. Terefore, to assure the unbiased learning of the classifer, it is important to balance the target classes before the model training. Tis is normally done using data augmentation when the acquisition of new data is not an easy process. It is also extremely important that the augmentation to be applied to the training data only, while the test data be separated beforehand. In other words, the classifer should be trained with the training data where it may have augmented samples to balance all the target classes; however, the test data should be original without any augmented sample. Initially, the original data is randomized and divided into training, validation, and testing at 70%, 15%, and 15%, respectively. At this stage, both validation data and test datasets have each 226, 45, 16, and 107 images for healthy, less faulty, medium faulty, and full faulty classes, where these sets were separated for later use. Next, the training data is used for augmentation to balance the classes so that the unbiased training of the classifer can be made sure. For augmentation, afne transformation is performed, including horizontal and vertical translation with ±10 pixels, image rotation at ±90°, horizontal and vertical fip, and the intensity transformation with a variation of ±5% in the original pixel intensity. Te number of samples in the training data before augmentation, belonging to healthy, less faulty, medium faulty, and full faulty classes were 1056, 205, 74, and 501, respectively. Te postaugmentation training data size is limited to 6,000 samples in total, with 1,500 samples per category. Te augmented samples have an equal distribution of four types of transformations. Since the class representation is kept equal after augmentation, the postaugmented training data has the maximum number of augmented samples (1500 − 74 � 1426 augmented samples) for the

Evaluation Strategy and Metrics
As mentioned previously in the data augmentation section, the data is divided into training, validation, and test sets at 70%, 15%, and 15% respectively. (i) Binary Classifcation. In this classifcation task, the data of two classes are considered only: healthy class and full faulty class. Te data of the remaining two classes are not used in this case. (ii) Binary Classifcation with 0.5 as Treshold. In this classifcation task, the data of all classes are used but represented with two labels only: healthy and full faulty. For this purpose, samples of healthy class and less faulty class are combined and labeled as healthy class. Te samples of medium faulty and full faulty class are combined and labeled as full faulty class. In other words, defect probability of 0.5 is used as threshold to convert four classes into two categories, which is why this task is defned as binary classifcation with 0.5 as threshold. (iii) Multiclass Classifcation. In this task, the data are classifed as per their original class label being healthy, less faulty, medium faulty, or full faulty. Concretely, all four classes are considered for multiclass classifcation.
Te block diagram of the overall methodology is shown in Figure 3. For each of the classifcation tasks, the estimated optimal network architecture, the choice of hyper parameters, and the corresponding results are discussed in the following section. As described earlier, the results are calculated separately on preaugmented (original) data as well as postaugmented data.

Hardware Details.
Te experiment is performed on a laptop system with the following hardware specifcations: Intel core i3 CPU with 2.4 GHz clock speed and 2 GB RAM. Te software used was MATLAB 2018b in Windows 10 environment.
For the evaluation of results, the confusion matrices with true positives, false positives, true negatives, and false negatives are shown. Te results of classwise accuracy and the overall classifcation accuracy are also presented. Moreover, the receiver operative characteristic (ROC) curves along with the area under the curve (AUC) are presented.

Results and Discussion
6.1. Deep Feed-Forward Neural Network Results. In this section, the test data results for binary as well as multiclass classifcation using deep architectures of feed-forward artifcial neural network are presented. Hand-crafted features i.e., GLCM and LBP are used for data representation, and therefore, fed to the ANN architectures as features. It is worthy to mention that for each task, a number of network architectures were tested; however, the architectures of only the best models are described in the following subsections.

Binary Classifcation Using Original Data.
For binary classifcation with original (preaugment) data employing GLCM and LBP features, the network architecture is optimized by varying the number of hidden layers as well as the size of hidden layers. Te input layer is fed with features, while the output layer has two neurons. For GLCM features, the fnal optimized network architecture has fve hidden layers with [30-30-20-20-10] neurons, respectively. Similarly for LBP features, the optimized network consists of three hidden layers with 30 neurons in each hidden layer. Te test data classifcation results using the GLCM and the LBP features are presented in Table 3 (a) and (b), respectively. It can be observed that the network with LBP features achieved 8.5% higher accuracy overall than the network fed with GLCM features. A similar pattern can be seen in the case of individual class accuracy.

Binary Classifcation Using Augmented Data.
After augmentation, the network is trained using 1,500 images from each class. Te estimated optimized architecture for this set of data is composed of three hidden layers both for the GLCM and LBF cases with 30 and 10 neurons in each layer, respectively. It is evident that the architectures using augmented data are shallower as compared to the ones estimated previously for preaugmented data. Tis is because of improved learning of the network due to its large and balanced representation of classes. Te accuracy is improved by 10% for GLCM features, and 7.8% for LBP features with augmented data, as shown in Table 4. Te maximum overall accuracy for binary classifcation achieved is 92.2% by ANN fed with LBP features. Overall, the LBP features combined with the optimized architecture of deep ANN produce the best accuracy among the four cases (original data with LBP and GLCM; augmented data with LBP and GLCM) of binary classifcation. Figure 4 presents the ROC curves for binary classifcation results obtained in the discussed four cases. Te  Table 5. On comparing these results with the results of binary classifcation in Section 6.1.1 (where only two classes were used: healthy and full faulty of the original data) in context of features' signifcance, the current results are contrarily better for GLCM features and worse for LBP features. Te reason is the merger of four classes into two classes, which increases the complexity of the problem and afected the classifer's performance. After data division using a 0.5 defect probability, the GLCMs proved to have better hand-crafted features than LBPs.

Binary Classifcation with 0.5 as Treshold Using
Augmented Data. After augmentation, the optimized architectures ended up having three hidden layers with sizes [30-20-10] and [30-30-30] for GLCM and LBP features, respectively. It can be observed once again (as discussed in Section 6.1.2) that after augmentation, the results are improved for both feature kinds, where the GLCM produced better results than the LBP, as shown in Table 6. By comparing the previous similar scenario in Section 6.1.2 (postaugmentation binary classifcation results shown in Table 4), the current results are inferior due to the fact of merging the four classes into two. Among all four cases discussed above in subsections 5.1, the best results are obtained using augmented data with GLCM features, where the overall achieved accuracy is recorded at 84.8%. Among the GLCM and LBP feature choices, the accuracy for the defective class is the same; however, the accuracy using GLCM features has increased  Computational Intelligence and Neuroscience for the healthy class, leading to an increase in overall accuracy in the case of GLCM features, as shown in Table 6. Te results of LBP features are improved after augmentation. Figure 5 presents the ROC curves for binary classifcation results with 0.5 threshold obtained in the above discussed four cases. It can be noted that the value for AUC for all four cases is similar; however, the best performance can be observed for GLCM features using augmented data.     To estimate the CNN architecture for a specifc task like binary classifcation, it is started from scratch with a single convolutional layer with a pooling and a fully connected layer. Te complexity is increased progressively until convergence is achieved for the specifc task based on the training data results. Te minimum flter size for convolutional layer is chosen 3 × 3 to start with and increased to be 5 × 5 and up to 7 × 7 maximum. All the flters are applied with stride 1. Te combinations of diferent flter sizes are tested in the convolutional layer. To estimate the number of flters to be used per convolutional layer, initially 8 flters were chosen and further increased by a multiple of 2. Similarly, the number of convolutional layers is increased to achieve the improved results. Te optimized number of convolutional layers is estimated based on best training data results. In the pooling layer, max-pool scheme is opted for to reduce the size with stride 2. After pooling layer, the fully connected layer is added with the Softmax activation function. In estimating the optimal CNN architecture, a large number of architectures were tested with diferent choices of convolutional layers, convolutional flter size, and the number of pooling layers. During the process, both the accuracy of the network as well as the computational cost is observed. Terefore, by observing the trade-of between accuracy and computational complexity, the best CNN architecture is selected. Since three separate classifcation tasks are performed in this research, three independent, task-specifc, and customized CNN models are estimated.

Multiclass Classifcation
We develop the customized CNN model for each of the classifcation problems: binary classifcation, binary classifcation with 0.5 as threshold, and multiclassifcation. A total of 25 diferent architectures of CNN are tested with a suitable selection of hyperparameters to fnd the optimized network for the three classifcation tasks. Te fnal, customized architecture for each of the classifcation tasks is presented in Figure 6. Te optimized CNN architecture for binary classifcation has the following specifcations: three convolutional layers with ReLU (Rectifed Linear Unit) activation function; the optimum flter size is 5 × 5 in all layers; the frst and second convolutional layers have 64 flters, while the third layer has 32 flters. Te convolutional layers are followed by a single pooling layer with max pooling criteria applied with stride 2. Next, two fully connected (FC) layers are added, and fnally the softmax function is used for prediction. For the task of binary classifcation with 0.5 as threshold, the optimized CNN architecture has Conv-Pool-Conv-Pool-FC layers arrangement, where the frst convolutional layer has 64 and second layer has 32 flters, both with flter size of 5 × 5. Te CNN estimated for multiclass classifcation has Conv-Conv-Pool-FC layer architecture with 32 and 16 flters in the frst and second convolutional layers, respectively. In all the estimated CNN architectures, the size of the FC layer is kept at 32 neurons, since it is observed as a suitable minimum size for the FC layer. Te training options and the hyperparameters for customized CNNs are summarized in Table 9.
Now we present the classifcation results of the estimated CNN models on the test data shown in Table 10. Te CNN developed for binary classifcation achieved 94.3% accuracy. On comparing with the binary classifcation case, the CNN showed 2.1% higher accuracy than the best ANN modelbased results, achieved with LBP features (shown in Table 4). Considering the classwise accuracy, the CNN achieved 0.4% and 5.6% higher accuracy for the healthy and faulty classes, respectively. Concluding, to detect a faulty PV cell, the CNN is 5.6% more accurate than the ANN-based model. In the second task, binary classifcation with 0.5 as the threshold, the developed CNN achieved 89.3% accuracy, which is 4.5% higher than the deep ANN model for the same task. Similarly, looking at the classwise results, there is a marginal improvement of 1.4% in accuracy for the healthy class. However, the CNN outperforms the ANN with a margin of 11.4% for the faulty class. Since the detection of a faulty cell is more important than that of the healthy cell, the CNN does the job with signifcantly improved accuracy.
Finally, there is an improvement of 7.4% overall for multiclass classifcation with customized CNN than that of the ANN model (see Table 8 (b)). By carefully observing the Table 10 (c), it is evident that the CNN confused the majority of misclassifed samples with the nearest class. For instance, 19 samples of the healthy class were confused with less faulty class. Hence, most of the misclassifed samples were confused with the nearest class. Tis happened for each of the target class. In contrast, the ANN confused majority of the samples with the far class, e.g., 30 healthy samples were misclassifed as full faulty samples, as shown in Table 8 (b). Terefore, it can be concluded that the CNN not only obtained better results quantitatively but also qualitatively. Figure 7 shows the classifcation results of a few random samples from test data with a confdence percentage, for the diferent classifcation tasks. It is worth mentioning that the confdence percentage does not refect the defect probability predicted by CNN. For instance, in Figure 7(a), the CNN correctly predicted a healthy sample (top left) with a confdence percentage of 91.3%. Tis means the network fnds that 91.3% of the content of the image matches to the foot print, which it has learned for the healthy class, and it is represented as confdence in its decision. However, the high confdence does not necessarily mean that the prediction is correct. Tis is equally possible that the network misclassifes even with a higher percentage. Tis is also the case in the CNN results, as the faulty sample in bottom left of Figure 7(a) is misclassifed as healthy. Similarly, the samples shown in the bottom row of Figure 7(b) are both      misclassifed by the network. For multi-classifcation prediction, the CNN confused the less faulty class and the medium faulty class with each other and therefore, both samples from these classes in the bottom row of Figure 7(c) were misclassifed.
Overall, the CNN model outperformed all the ANN models for each of the classifcation tasks. In addition to achieving the best classifcation accuracy, the developed CNN models also confrm the real-time classifcation of EL image data. Tis is because the developed CNN models are lighter and computationally less expensive. Te models are developed from scratch adopting the bottom-up approach for estimating the architectural depth while considering the complexity-accuracy tradeof, in contrast to choosing a pretrained CNN and progressively reducing the size of the classifcation task in hand like in few existing studies. Te deep learning approach proved to be a good approach since the deep ANN models with hand-crafted features also achieved comparative accuracy for binary classifcation task.

Comparison with Existing Studies
In this section, a comparative analysis with existing studies which specifcally used the same EL image database is presented.

Classifcation Accuracy.
Te authors in the study [38] performed data augmentation and presented the two-way binary classifcation, similar to the two strategies for binary classifcation employed in this study. In this subsection, a comparison of results for binary classifcation (using data of two classes: healthy prob. 0.0 and full faulty) is presented, while the comparison for binary classifcation with 0.5 threshold is discussed later in the subsection 5.2. For binary classifcation, the author [38] claimed 82.44% accuracy using SVM classifer, while the best results using the proposed methods are achieved at 92.2% and 94.3% accuracy with ANN and CNN, respectively. Hence, the proposed techniques produced 11.86% higher accuracy in comparison to [38]. In the other study [40], the authors presented a VGG structure-based CNN classifer for binary classifcation of EL image data and achieved the best accuracy of 93.5%, while the four-fold average accuracy of 93.02%. In contrast, the accuracy achieved using the proposed method is recorded of 94.3%, which is 0.8% (slightly) higher. Concretely, the proposed CNN-based customized model outperforms the state-of-the-art results for binary classifcation of the EL image data in terms of accuracy.

Signifcance of Hand-Crafted Features.
Te authors of the study [38] extracted scale-invariant feature transform (SIFT), speed up robust features (SURF), KAZE, and histogram of oriented gradients (HOG) features from images as well as combinational features using dense sampling. In general, such features are suitable and primarily used for object detection and classifcation tasks where the nature of complexity as well as the number of classes is much higher. In comparison, a less computationally expensive features: GLCM and LBP are proposed in this study, and yet obtained better classifcation results.

Classifer's Computational Complexity.
Te VGG structure-based 6-layered CNN architecture, including four convolutional layers, was presented in the study [40] for binary classifcation of EL image data. Te author split the data into 80 : 20 ratios for training and testing purposes, respectively, and achieved the best accuracy as 93.5%, while the four-fold average accuracy was 93.02%. In comparison, a customized yet lighter CNN architecture having three convolutional layers is proposed which achieves 94.3% accuracy. Considering the computational cost, the CNN presented model in the study [40] had 2,410,208 parameters, whereas the proposed model has 1,331,264 parameters, which was almost half.

Data Augmentation.
Te original dataset contains a total of 2,624 EL images. Te authors of the study [38] performed data augmentation to increase the image samples and obtained a total of 196,800 samples after augmentation. In comparison, in this research a total of 6000 samples are prepared after augmentation, which is almost 39× less in size than [38]. In the study [40], the author performed data augmentation to balance the classes; however, no information was provided regarding data size after augmentation.

Processing Time for Training and Testing.
Considering the processing time, the proposed ANN-based classifer elapsed 296 sec for training (using 6,000 samples) and 7.4 sec to classify the test data (using 394 image samples) with an accuracy of 92.12%. Hence, 18.78 msec of time to classify individual test images, refects the real-time processing speed. Moreover, the proposed CNN-based classifer elapsed 34 min 52 sec for training and 37.2 sec for prediction of test data, making 94.4 msec to classify the individual test data sample. Te authors in the study [40] claimed 8.07 msec to predict the individual test image sample; however, the comparison for prediction time is not straight forward.
Firstly, the hardware used in this research has the following specifcations: Intel core i3 with 2.4 GHz processor and 2 GB RAM, which are low as compared to the hardware specifcation (Intel core i5 with 3.2 GHz processor, RAM not specifed) reported in the study [40]. Another reason for the higher accumulated processing time in this research is the size of the image. In the proposed work, the original size of the image sample is used, i.e., 300 × 300 pixels, while the author in the study [40] resized (down sampled) the image to 100 × 100 pixels before use, which makes the image size 9× smaller than the original size and therefore reduces the processing time. Although the processing time to predict the health of a test sample in this study is larger under the described hardware constraint, it still ensures the real-time grading of PV cells with higher accuracy. Comparing the convergence time, the training time for the CNN used in the study [40] is reported 13 hr 45 min, whereas the proposed model took only 35 min for training, which is 23× less time for convergence.

Binary Classifcation with 0.5 as Treshold.
For the case of binary classifcation with 0.5 as the threshold, the author of the study [38] employed CNN with transfer learning and achieved an accuracy of 88.4%, whereas the proposed CNN model achieved an accuracy of 89.3%. Te improvement in accuracy is minor; however, the architectural complexity of the proposed model is much lower. Te VGG-19 model with 14 convolutional layers and a large number of flters used in [38] made it much more computationally complex than the proposed CNN model, which has only two convolutional layers, one pooling and one fully connected layer. Moreover, the training data after augmentation is much smaller which makes the proposed customized CNN architecture much more efcient. In addition, the study [38] employs GPU for the experiment, making it a hardware-demanding solution. In contrast, the proposed CNN-based classifer works in real-time on a CPU machine and yet achieves better accuracy. Te training time elapsed by the proposed CNNbased model for this particular task of binary classifcation is 47 min 6 sec, and the test time is 31.6 sec. Terefore, the prediction for a single test data sample is carried out in 80.2 msec making the proposed classifer suitable for realtime classifcation of PV cells.

Multiclass Classifcation.
In addition to the two kinds of binary classifcations, the multiclass classifcation of EL image data is presented. Te best overall accuracy of 76.1% is achieved by the proposed deep feed-forward neural network with LBP features, while the customized CNN-based classifer achieves 83.5% accuracy overall. Te testing time for multi-classifcation using the proposed customized CNN is recorded as 28.7 sec for 394 samples. In other words, it took 72.84 msec to predict the health of a single image sample of a PV cell, making the proposed classifer a real-time suitable solution to perform multi-classifcation on a CPU machine. Te multiclassifcation of EL image data has not been presented in existing literature yet; therefore, the state-ofthe-art results are presented in this category.
Te summary of comparative analysis is presented in Table 11. Te results show the signifcance of the proposed methods over the existing studies both quantitatively and computationally.

. Conclusions
In this study, the fault-level binary and multiclassifcation of EL image data are presented. Te deep ANN models with a minimum suitable size are estimated and optimized, where hand-crafted features are extracted from the image data. Te estimated deep architectures of ANN show best performance when fed with LBP hand-crafted features, achieving 92.1% and 76.1% accuracy for binary and multiclass classifcation, respectively. In addition to ANN, customized, task-oriented, and light-weight models of CNN are developed. Te proposed CNN-based customized model achieved the state-ofthe-art 94.3% classifcation accuracy for the binary classifcation. Te proposed model achieved 83.5% state-of-theart accuracy for multiclass classifcation as well as employing a CNN-based model. In comparison, the proposed models achieved enhanced performance than the existing solutions, both quantitatively and computationally. Te proposed solution may be used for real-time health assessment of PV solar cells using EL imaging. Te results also support the efectiveness of CNN-based approach for real time imagebased PV cell health classifcation. Considering the limitation, the data was balanced to equalize the number of samples by augmentation and while this procedure, the number of augmented images were at a diferent scale with respect to number. In future, the advance methods for image augmentation may be used such as the general adversarial network (GAN) to produce high-quality augmented samples for better data representation and improved network learning.

Conflicts of Interest
Te authors declare that they have no conficts of interest.