Pixel-Based Image Processing for CIE Standard Sky Classification through ANN

,


Introduction
Sky conditions are crucial factors when assessing daylighting levels and solar-energy output. e sky is generally classified on the basis of cloud presence into three categories: cloudless, partially cloudy, and overcast. Many models for the calculation of global, direct, and diffuse irradiation and illumination were defined for different sky types based on the values of several climatic parameters [1]. In 2003, the Commission Internationale de L'Éclairage (CIE) adopted the set of 15 standard sky classifications proposed by Kittler et al., in 1998, categorized under 3 sky types, clear, partial, and overcast, each of five grades [2]. ese CIE standard skies that classify a general spectrum of homogeneous skies throughout the world were standardized in ISO 15469: 2004(E)/CIE S 011/E:2003 [3] for the purpose of evaluating indoor visual comfort within buildings [4], solar irradiance calculations [5], and energy efficiency improvements to lighting [6], among other applications. e CIE standard sky classification is based on taking luminance measurements [7] of diffuse luminance angular distribution in the sky vault. Skies within a CIE category have approximately the same well-defined sky luminance and solar radiance patterns.
Devices called sky scanners are used to measure sky luminance patterns. According to the CIE Guide [8], a reliable commercial sky scanner measures luminance from 145 patches of sky hemisphere. However, various alternative procedures have been developed for CIE standard sky classification [9], due to the scarcity of sky scanners available to gather sky luminance data at ground meteorological stations. In this task, Supervised Machine Learning (SML) procedures are proposed as effective tools for sky classification, based on accessible meteorological indices [10] such as decision trees (DTs) [11], Support Vector Machines (SVMs) [12], and Artificial Neural Networks (ANNs) [13][14][15].
Over recent years, interest has been expressed in calibrated sky luminance maps for sky classification and cloud detection [16][17][18][19]. A digital camera equipped with a fisheye lens can map at a higher resolution than commercial sky scanners and High Dynamic Range (HDR) images can capture the full sky luminance range [20].
ere are also novel image-processing methods that can help to overcome misclassification due to cloud cover. While some studies have had their focus placed on color space, the focus of others has been on the modification and combination of the original monochromatic channels, known as the spectral features. A third alternative, texture filters, adjusts the gray pixel image patterns [21]. e RGB (red, green, and blue) chromaticity color model, a basic standard for computer images, has spectral features that may be adapted to cloud detection (CD) [22]. Shorter sunlight spectrum wavelengths will scatter due to atmospheric particles, giving the sky background a blue appearance [23] where the chromaticity component is mainly blue rather than red. Clouds appear white due to the uniform scattering of visible-light wavelengths, indicating similar amounts of red and blue components. Other models successfully applied to CD include Removal Atmospheric Scattering (RAS) [24], Red-Blue Ratio (RBR) [21], Red-Blue Difference (RBD) [25], and Normalized Red-Blue Ratio (NRBR) [17].
Some strategies have been aimed at adapting the image to the color perception of the human eye. Hue Saturation Value (HSV) [17], Red-difference Chroma (YCbCr) [18], and Intensity Hue Saturation (HIS) [26], among other color spaces, have recently demonstrated their efficacy for CD.
In addition to color space and spectral features, texture procedures use the gray distribution of pixels and their spatial neighborhood to identify objects and regions. ese procedures have been shown to be very effective for cloud detection [27], medical images classification [28], and traffic analysis [29]. Gray Level Cooccurrence Matrix (GLCM), Local Range (LR), local Standard Deviation (STD), and local Entropy Matrix (EM) are texture filter procedures that statistically process the textures of images for their classification.
Image processing based on spectral, texture, and color spaces offers various perspectives of the same image. eir combination for image analysis can produce successful applications such as mapping [30] and aerial photographic classification [31]. In this paper, the recently proposed alternatives to the RGB color model are reviewed and compared for the improvement of image-processing methods applied to cloud detection and sky classification using Artificial Neural Network (ANN) algorithms. In some cases, preliminary image processing significatively improved the accuracy of the ANN used to classify the same image dataset. e methods that reduce misclassification will be identified from a detailed study, in which both the CIE standard sky classification (15 types) and the reduced classification of three categories (clear, partial, and overcast sky conditions) were all considered. e paper will be structured as follows. A complete comparison between several image-processing methods for CIE standard sky classification though ANNs will be presented in Section 2. In Section 3, the acquisition and processing of the experimental data will be described. In Section 4, the fit of the results of the ANN models with actual sky conditions will be verified. e results of the classification algorithms will be discussed in Section 5 and, finally, succinct conclusions on the most efficient image-processing methods will be presented in Section 6.

Review of Image-Processing Methods for
Cloud Detection Table 1 summarizes the main characteristics of twenty-two pixel image-processing methods that were reviewed and tested in this study and classified in terms of color space, spectral, and texture features. A complete description of all the image-processing methods will be completed in this section.

Color Spaces.
e RGB color space uses one channel for each of the primary colors: blue, red, and green. Implemented directly in machine learning or with previous processing, this color space will yield spectral features. e primary colors, subchannels R, G, and B, build up a monochromatic image. A grayscale (GS) image is created when only pixel intensity is recorded. As previously mentioned, the HSV space is modelled on visual human perception, which classifies objects in terms of their luminous intensity (brightness or value) and chromaticity. e chromaticity has two independent parameters, hue and saturation. Hue is the pure color that varies from red to magenta (listed as red, yellow, green, cyan, blue, and magenta). e saturation describes the dilution of a pure color in white (0 � white; 1 � pure color). e hue, saturation, and value channels can also be independently used. Clouds are mostly perceived on a grayscale, due to interactions between sunlight and the atmosphere, so different cloud cover can be analyzed through the saturation channel. is color space has proved itself to be highly effective for sky classification into three categories: blue sky, cloudy sky, and sunset sky [35].

Spectral Features Based on the RGB Model.
Unlike the direct implementation of the RGB model, a spectral feature describes the change of tone and color in an image. Its capability of detecting dark clouds from high and transparent cirrus clouds has been demonstrated [25]. e RAS channel was proposed to distinguish atmospheric scatter from atmospheric background light [24]. e RAS channel is obtained from a linear combination of the panchromatic channel (Y), the bright channel (L), and the dark channel (D), defined in Table 1. Channels Y, L, and D can also be independently applied.

Complexity
Different combinations of red and blue channels were proposed for cloud detection. e aim of the Red-Blue Ratio (RBR), which yields small ratios for blue skies and large ratios for clouds, is to recognize thin and opaque cloud cover and clear skies [36]. Heinle et al. [25] noted several problems related to the use of the RBR channel for detecting thick clouds and difficulties with circumsolar pixels. ey therefore proposed the RBD (Red-Blue Difference) channel as an alternative. Yamashita et al. [37] performed a full revision of the blue and red channel and implemented the sky index or NRBR (Normalized Red-Blue Ratio) for separating the blue sky and clouds area. ese adaptations of the RGB channels have been successfully contrasted for CD. e green channel is however often overlooked in image processing. e Adjusted Red Green Difference (ARGD) [22] was introduced to correct any possible saturation of the blue component. Linear combinations of the spectral features have been proposed in other works, such as C1 [17] and C2 [22] that are listed in Table 1.

Texture Filters.
Texture filters use the gray pixel distribution (grayscale, from 0 to 255, GS matrix) and their spatial neighborhood to identify objects and regions. Texture filters divide the GS matrix into local neighbors, applying a mathematical operator: range for Local Range (LR), the Entropy Matrix (EM), and the local Standard Deviation for STD image processing [34]. Figure 1 shows an example of an LR texture-filtering process. GS is a monochromatic matrix whose elements are M i,j . e size of the GS matrix, defined by its neighbors, is represented in Figure 1 as the 9 × 9 blue square. Its size is smaller throughout the GS boundary (elements represented as m i,j ). e filter function applies a mathematical operator in this neighborhood and the result is included in the position (i, j) of the new matrix.

Local Range Texture Filter.
e purpose of LR filtering is to make the edges and contours of an image visible. e highest value is subtracted from the smallest one within the 9 × 9 neighborhood, as shown in Figure 1. e function saves the result in the LR matrix.

EM Texture Filter.
Entropy is a measure of the image texture randomness. e Entropy Matrix (EM) calculates the local entropy of all the GS neighborhoods [34]. e EM value   Complexity that is directly proportional to the degree of variation of a pixel with respect to its neighbors is calculated with In Figure 2, an image histogram with high variations is shown. p k reflects the occurrence for the gray level p element; N is the total number of gray levels in the neighborhood.

Local Standard Deviation (STD) Texture Filter.
e following equation is used to calculate the local Standard Deviation (STD) within each neighborhood: (N � 9) is the number of elements in the neighborhood; k, l varies from − 1 to 1 to cover the neighborhood matrix.

Experimental Data Acquisition and Processing
As previously stated, the main objective of this work is the analysis of image-processing algorithms for CIE standard sky classification using ANN-processed sky images. e workflow is described in Figure 3 and explained in the following sections.

Experimental Data Acquisition.
e experimental data used in this work were recorded at a meteorological weather station located on the roof of the Higher Polytechnic School building at Burgos University (42°21′04″N; 3°41′20″O; 856 m above mean sea level). A complete description of the meteorological facility may be found elsewhere [1,10,38]. e experimental equipment is shown in Figure 4. e sky luminance distribution for characterization of sky conditions according to the CIE Standard General Sky classification was measured with a commercial MS-321LR sky scanner (EKO Instruments Europe B. V. Den Haag, e Netherlands). e sky scanner was adjusted on a monthly basis for taking measurements from sunrise to the sunset. It completed a full scan in four minutes and started a new scan every 10 minutes. e first and last measurements of the day (α s ≤ 5°) were discarded, as measurements were higher than 50 kcd/m 2 and lower than 0.1 kcd/m 2 , following the recommended specifications of the sky scanner equipment. e    [39], detailed in a previous paper [38], was used to determine the CIE standard sky types over Burgos during the experimental campaign. A total of 1,500 images were selected from the experimental dataset (more than 80,000 sky images), 100 from each CIE sky category, which were characterized by greater concordance with the CIE pattern for that category. e experimental dataset was therefore composed of one hundred sky images catalogued as CIE standard sky categories. is sky classification was used as a reference for the sky conditions.

Data Processing.
e original sky images were processed using the twenty-two different channels chosen for the study and summarized in Table 1     directly generated from the sky images, others had to be generated through complementary channels. In Figure 5, the results of the image-processing methods applied to images of sky conditions are classified as clear, partial, and overcast, following the CIE taxonomy. As can be observed, each filter highlights different features of the images. e circumsolar area and the nearest horizon zone present the greatest difficulties for cloud detection. In Figure 5, it can be seen that the RGB image is sensitive to the circumsolar region and is capable of detecting the solar corona. However, in the RGB image, no differences can be appreciated in dark-homogenous sky conditions. e appearance of direct day beam can be a source of errors. Although the blue channel saturated the circumsolar region, both the red and the green channels showed greater sensitivity at detecting cloudy areas. In contrast, the horizon was captured by the Y, D, L, RAS, V, STD, and EM methods, and LR mainly defined the contours. Unlike most of the other channels, the RGB model had difficulty with the directional homogeneity of the images for the detection of overcast sky conditions. e family of RAS methods (RAS, Y, D, and L) appeared to show similar levels of accuracy under all sky conditions, their main differences being near the circumsolar area.

Image Compression.
e high resolution of the original sky images (1158 × 1172 pixels) requires their compression to reduce the dimension of the dataset, improving data storage and subsequent image processing. In this study, the original sky images were compressed to 110 × 110 pixels in each channel. Figure 6 shows the result of the image compression procedure to 0.89%, which facilitates ANN tuning with no loss of efficiency.

ANN for CIE Standard Sky Classification
Artificial Neural Networks (ANNs) are frequently used in meteorology science: CIE and cloud classification [40,41], solar irradiance and wind speed forecasting [42][43][44][45][46][47], atmospheric pollution distribution [48,49], and rainfall [50,51]. ANN classification models serve to classify input information into certain categories or targets. A Supervised Machine Learning (SML) neural network is required for CIE standard sky classification where the sky types are previously known.
e model works efficiently when the prediction matches the target. Modelled on the biological concept of neurons, ANN is a very powerful technique for classification problems. Figure 7 shows a conventional ANN structure, which consists of an input layer, a set of several hidden layers, and an output layer. e information from the neurons of the input layer (X 0 i ) crosses the hidden layers (one in this work), following unidirectional connections, to the output layer that has one neuron (X 2′ i ) per target. Each processing center or neuron is adjusted to the other neurons through an interactive process, using (3).
e Scaled Conjugate Gradient method (SCG) [52] was used to fit the weights (weighting matrix, W n ) for each iteration.
where W n is the weighting matrix, X n− 1 are the input variables, and B is the bias. e neuron generates the output, X n′ i , through the activation function, f(X n i ), given by the hyperbolic tangent sigmoid transfer function in this study, as shown in [13] X Supervised Machine Learning requires three datasets: training, validation, and test datasets. e training group is used to determine the weighted matrix and the bias in an iterative process. e training is over when the results of the performance of the resulting model, calculated using the validation set, reach the desired quality. e test data group is used to calculate the performance of the model. Random dataset division is crucial to achieve a reliable performance. A conventional training dataset is randomly selected and consists of 70% of the total data, while the validation set and the test set each represent 15%, respectively. e design of the ANN is adapted to the database and the process is simulated. ere is no standardized procedure for establishing the most effective number of neurons and hidden layers [42], so experimentation or tuning is needed. In this study, several trials were performed in which the number of neurons (1-100) was varied, searching for the best accuracy, Acc, of the ANN, given by where TP and TN are the correct predictions of the ANN (true positives and true negatives) and FP and FN are the incorrect predictions (false positives and false negatives). Accuracy is rated by the number of correct predictions over the total number of predictions. e neural network structure (number of neurons in the hidden layer) was selected on the basis of highest accuracy. After several trials, the number of hidden layers was fixed at one.

Results
In Figure 8, the improved accuracy of the ANN models that used the sky images as their input is shown. Each image had previously been processed by each of the twenty-one imageprocessing methods summarized in Table 1, with respect to the RGB space, defined as Δ(Acc) and shown in where Acc(channel x) and Acc(RGB space) are the accuracy obtained when the input of the ANN is the set of sky images processed by each method x (x � each image-processing method summarized in Table 1) and RGB space, respectively. e accuracy of each ANN and the number of neurons in its hidden layer are shown in Table 4.

Complexity
As can be seen in Figure 8, HSV is better color space than RGB for CIE standard sky classification using images, with a small improvement in the accuracy (0.66%) with respect to RGB image processing. e GS color space and the RGB space were equally accurate. e use of the R, G, and B monochromatic channels also improved the accuracy of the ANN for CIE standard sky classification, the G channel being the most suitable for this task. e accuracy of the ANN fitted using the individual channels, H and S, worsened over the RGB color space, while the V channel significantly improved ANN accuracy. Figure 5: Results of the image-processing methods applied to clear, partial, and overcast CIE standard sky types.

Complexity
In the spectral feature category, the RAS processing method worsened the sky classification accuracy of the ANN. However, channels Y and L showed better behavior for sky classification, although they used more neurons in the hidden layer. Among the rest of spectral feature channels, only RBD and C1 significantly improved ANN accuracy. With regard to the texture filters, EM showed little or no advantages over the use of the RGB color space and the other two filters, LR and STD, impaired the accuracy of the resulting neural network. e number of neurons in the    Table 1, over ANN accuracy obtained with the original RGB images as input.
hidden layer, shown in Table 4, never increased the accuracy of the ANN, as can be seen from the use of image-processing methods Y and V. Figure 8 shows the results of each ANN classifying the skies into the fifteen CIE standard sky categories. A simpler classification into three categories (clear, overcast, and partial conditions) is often sufficient for many applications, such as luminous efficacy calculations [53] and lighting design in buildings [54]. e fitted results of the ANN sky classification for three categories are shown in Figure 9 and Table 5.
For CIE standard sky classification into three sky categories, lower differences in accuracy can be seen and only the G, the B, and the GS monochromatic channels and the spectral features L and C1 improved ANN accuracy. In all these classification cases, the number of neurons in the hidden layer was lower. e accuracy index was used to group the goodness of fit of the ANN in all categories, although the fitted quality in each individual category was not processed. A confusion matrix analysis is shown in Figures 10-13. In a confusion matrix, when the Supervised Machine Learning algorithm prediction and the target match each other (TP or TN result), the corresponding diagonal boxes of the matrix are colored. When there are no matches between the prediction and the target value (FP and FN), the other boxes of the confusion matrix are filled in. e best image-processing method will have the highest number of colored boxes around the diagonal line of the matrix. e figures below represent the confusion matrices corresponding to the 15 types of CIE standard skies. Figure 10 shows the confusion matrix of the ANNcalculated RGB-CIE sky classification for the test set (15% of the total dataset). It can be seen that the RGB-CIE classification with machine learning misclassified cloudy and partial skies: few matches are visible in the boxes along the diagonal line. In Figure 10, the CIE standard sky classification into three categories (clear, partial, and overcast sky conditions) is also presented. ose cases classified outside the corresponding category were designated as critical, i.e., clear skies classified as either partial or overcast or vice versa. e same information is shown in Figure 11 for the color space CIE standard sky classification, corresponding to the other color space processing methods under analysis. e red, the green, and the blue channels showed a similar behavior to the RGB color space. e red channel adequately classified CIE standard sky types 7 to 15, in other words, all clear skies and some partial sky types. e HSV color space showed a similar performance in all categories, in contrast to the RGB color space, in which the classification of clear sky types may be highlighted. Hue and saturation channels introduced too much noise, but the value channel showed good performance.
In Figure 12, the confusion matrices are shown for the spectral feature image-processing methods-CIE standard sky classification. e RBR and NRBR spectral features introduced noise, but the resultant combination, C1, reduced misclassification, improving the traditional RGB color space. It therefore appears to be an adequate alternative image-processing method for CIE standard sky classification using sky camera images. e RAS channel theoretically removed atmospheric scattering, but the confusion matrix never reflected a better performance than the RGB color space. e confusion matrix has demonstrated that it cannot distinguish the CIE sky types 1, 3, and 5. e RAS method also introduced too much noise in cloudy-to-partial sky types.
Finally, the confusion matrices are shown in Figure 13 for texture filter processing methods-CIE standard sky classification with ANN. As can be seen, all texture channels performed well, especially the EM channel, while LR largely failed for CIE standard sky classifications partial and overcast.
A detailed study for the CIE standard sky classification into three categories is presented in Figure 14, where the confusion matrices presented in Figures 10-13 were divided into four submatrices: overcast (CIE standard sky types 1 to 5), partial (CIE standard sky types 6 to 10), clear (CIE standard sky types 1 to 5), and critical that refers to cases classified out of category. e red line indicates the RGB result, taken as a baseline for accuracy improvements, Δ(Acc). Some of the image-processing methods for classifying certain sky categories are highlighted in Figure 14. RBD, D, and B showed the best performance for the detection of overcast skies, increasing the performance of each respective ANN. G, S, and GS achieved better results for the detection of partial skies and clear skies were also in the same category in which the conventional RGB color space achieved its best performance. Some channels highlighted certain sky types but drastically failed to classify other types. e blue channel saturated in clear skies, to such a point that its performance was almost the worst for clear skies detection. is behavior was also noted for the D channel. Unfortunately, no image preprocessing method drastically improved the RGB classification in the three subcategories (clear, partial, and overcast conditions). However, Y, green, red, RBD, V, and EM processing methods were prominent in one or two categories and their results were acceptable in all other categories, as shown in Table 6. : Improvement in ANN accuracy, ∆ (Acc), for CIE standard sky classification in three sky categories: overcast, partial, and clear conditions, using as input the image processed by each image-processing method summarized in Table 1, over ANN accuracy obtained with the RGB images as input.

Complexity 11
Almost all the image-processing methods reduced critical mistakes or misclassification, which should as far as possible be avoided. Following this criterion, RBR, RBD, NRBR, ARGD, H, and S were discarded as preprocessing image methods for ANN sky classification of sky images.

Conclusions
Sky classification and cloud detection from sky images and machine learning can be largely improved through preliminary image processing, reducing errors in classification and simplifying algorithms. In this study, 22 sky imageprocessing methods have been reviewed, including the three most common categories, color spaces, spectral features, and texture filters. e CIE standard sky classification has been selected to determine the characteristics of the sky, as it is recognized as representative of the atmospheric conditions. A very extensive unbiased dataset has been used, including 1,500 sky images and their corresponding CIE classification, calculated through the Normalized Luminance method from sky luminance distribution data.
e Artificial Neural Network (ANN) was the selected machine learning algorithm.
As a first conclusion, digital cameras equipped with fisheye lens can be used as alternatives to sky scanner devices for ANN-assisted CIE standard sky classification. e accuracy of the classification algorithm can be improved with adequate preliminary image processing that highlights the sky image information and optimizes the algorithmic structure.  Figure 14: Acc for the CIE standard sky classification through sky images and ANN, using the different image-processing methods recorded in Table 1.  HSV was a better color space than RGB, as were the monochromatic channels R, G, and B, for classifying the skies on the basis of the images into the fifteen CIE standard sky types. Only the V individual channel of HSV worked better than both HSV and RGB. Spectral feature channels Y and L showed better behavior for sky classification than the RGB color space, but they used more neurons in the hidden layer. Among the rest of the spectral feature channels, only RBD and C1 significantly improved ANN accuracy. Texture filters added no significant advantages over the RGB color space.
For CIE standard sky classification as clear, partial, and overcast conditions, RGB appeared to be the best imageprocessing method and only the monochromatic channels G and B, GS, and the composed spectral feature C1 improved the accuracy of the RGB color space. No improvement in ANN performance was therefore noted with the use of extra channels.
In contrast to previous studies [14] which have their weakest accuracy in cloudy conditions, several channels have worked successfully, improving the accuracy of the machine learning algorithm by 10% over the RGB color space for cloudy skies. ese channels were B, R, S, V, ARGD, RBD, C1, C2, Y, STD, and EM.
RGB and its primary channels, R, G, and B, were not good enough for dark cloudy conditions, due to imageprocessing information losses. While traditional cloud detection has usually omitted the G channel, both the G and the B channels have been shown to be equally effective. In contrast, the B channel tended to saturate on clear sky conditions. e confusion matrices highlighted that the ANN failed to distinguish CIE sky types 1, 3, and 5. e main conclusion is that the use of a specific imageprocessing method could improve the accuracy of an ANN algorithm, depending on the information required from the image for the classification problem. Future work will focus on the classification of skies according to the CIE standard using neural networks specifically designed for the classification of images such as convolutional neural networks.
Data Availability e neural network database used to support the findings of this study has been deposited in the Institutional Repository of University of Burgos (https://riubu.ubu.es/).

Conflicts of Interest
e authors declare that they have no conflicts of interest.