Classification of Continuous Sky Brightness Data Using Random Forest

,


Introduction
Global progression of human civilization cannot be separated from the multidimensional side effects it causes. As emphasized by Lewis and Maslin [1], human activity affects global environment in many ways, including major changes in biogeochemical cycles, increase of species extinction rates that lead to extreme term of the sixth mass extinction in Earth's history, and significant alteration of land surface, atmosphere, and oceans. In the last decades, extensive uses of artificial lights have also altered the daily rhythm of human activity and even disturbed other species' life cycle. Excess use of artificial lights inundates natural dark sky at night with light pollution which has various negative impacts ranging from the disruption of the ecology [2], health problems [3,4], wasted energy [5], and also loss in cultural asset [6]. With those impacts, concerns on light pollution increase significantly these days. Numerous efforts to quantify the level of light pollution and to mitigate its impacts have been conducted by many researchers through various measures using different types of instruments to obtain one-or two-dimensional sky brightness or even the spectra [7]. Quantification of light pollution on a global scale was conducted by means of accurate modeling of artificial light at night based on satellite imagery [8,9]. On the other hand, local measurements of sky brightness at night and the portion of light pollution were performed by means of observation campaigns and continuous monitoring using different instruments [10][11][12]. Among those measures, utilization of low-cost photodiodebased Sky Quality Meter (SQM) makes sky brightness measuring and monitoring significantly easier. e increasing use of the SQM for continuous sky brightness measurement opens the stream of data which are valuable for the mitigation of light pollution and other purposes. Measurements conducted in various locations or commenced in long time span increase the volume and variety of the data generated. Typically, SQM produces approximately 50 kilobytes of data in one night when it is operated in one-minute cadence, but multiyear data product will be quite large. A massive dataset can be aggregated into univariate or multivariate distribution of sky brightness for further analysis and inference. Density plots or jellyfish diagrams are often used to visualize the data acquired systematically in long duration [12,13].
Careful examination of the nightly data is required in order to suppress the noise and to avoid biased inference. In most of the cases, SQM data need to be evaluated whether they are moonlight affected or not [14], whether they are common or outliers, whether they are obtained during clear, cloudy, or overcast night [15,16]. Classification of the nightly SQM data is also needed for specific research objectives. For example, to evaluate the amplification of ecological light pollution by clouds, Kyba et al. [15] used SQM data acquired from three distinguished observing conditions, namely, clear, cloudy, and overcast night. is kind of evaluation and classification is a laborious task such that it should not be done manually and machines should work automatically for this purpose.
Nightly SQM data are analogous to the diverse light curves of variable stars such that similar automatic classification algorithms [17,18] are expected to be able to evaluate and classify sky brightness data from SQMs. In variable stars classifications, both periodic and nonperiodic features can be extracted from the light curves and then stuffed to particular machine learning algorithm for obtaining a certain classification model. e algorithm can be classification trees, random forests, Support Vector Machines, likelihood classifier, or artificial neural networks [18,19].
is study aims to develop a model for classification of nightly SQM data and to evaluate its performance. Such a model is expected to help researchers to handle the data appropriately and to compile good data from an agglomeration of data with different qualities.

Data and Feature Extraction. Several Unihedron Sky
Quality Meters (type LU-DL) were used to measure sky brightness over eight different locations in Indonesia since April 2018 and were still working in early 2020. is type of instrument uses converging lense and color compensating filter such that it has a moderate field of view of about 20°( FWHM) and an effective spectral window between 400 and 600 nm [20,21]. SQM measures sky brightness in terms of magnitude per square arcsecond (mpsas) with absolute precision of 0.1 mpsas [7]. With a fine performance, SQMs were used extensively in enormous research and monitoring around the world even for the calibration of global map of light pollution [9].
Zenithal sky brightness data used in this study were acquired from eight locations with different conditions in terms of artificial light contamination. As summarized in Table 1, some locations can be considered to be situated in urban areas where artificial light pollution is overwhelming, while the others are in periurban and rural areas. Sky brightness measured at those locations may behave differently as local atmospheric conditions and light pollution influence the observed sky brightness. Clouds may darken the rural sky as it covers a fraction of celestial lights and high altitude atmospheric emissions. On the other hand, clouds partially reflect artificial lights from ground and brighten the sky or amplify the light pollution [15,16] such that the natural regime of sky brightness is masked [22]. By acquiring data from locations with different conditions, global characteristics and behavior of sky brightness can be understood more thoroughly.
At every location, continuous measurement of sky brightness was conducted from April 2018 to December 2018, though measurements at some locations only cover shorter durations. One-minute sampling rate was used such that a dataset obtained in one observing night contains up to 750 rows. e two most important variables recorded and used in the current study are time (in UT and local time) and sky brightness (in magnitude per square arcsecond, mpsas). Using those two variables, light curve for every observing night can be constructed for further analysis. e magnitude recorded by the SQM is in inverse logarithmic scale so that higher value of mpsas means darker sky; e.g., each 1 magnitude increase means that the observed patch of the sky is 2.5 times dimmer. Typical brightness of a clear rural sky is more than 21.6 mpsas while the light polluted sky in the proximity of a city may be hundreds times brighter than the natural sky (∼16 mpsas).
By examining the light curves visually, experienced users can assess and classify them into several classes (see Figure 1). In this study, six distinguished classes are defined. ey are class-1 for peculiar/invalid data, class-2 for data acquired during overcast conditions, class-3 for cloudy nights, class-4 for clear and moonless nights, class-5 for cloudy nights with moonlight, and class-6 for clear and moonlit nights. Weird data are characterized by unreasonable value of sky brightness (e.g., extremely high or low) or light curves with pattern beyond expectation (e.g., perfectly flat light curve during moonlit night). If the light curves exhibit fluctuation or brightening due to clouds for more than 50% of the time, then they are categorized as class-2; otherwise, they belong to class-3. Brightening due to scattered moonlight as a function of its phase and zenith distance is a gradual and smooth process [23] which can be identified quite easily (for class-6 data), unless clouds interfere and the light curves become wiggly (class-5). Class-4 contains smooth light curves with plateau (constant sky brightness) for the whole night. ose characteristics can be described using a number of features, some of which are often found in the analysis of light curves of variable stars [19,24,25]. Table 2 summarizes 15 features extracted from the data.
Most of the features were extracted from a single variable which is the sky brightness. Beside the raw sky brightness data, smoothed light curve which is obtained through median filtering was taken into account. Smoothing window of 61 data points which corresponds to 1-hour continuous measurements was used. is smoothed model was required as the basis to quantify the fluctuation of sky brightness. Lastly, time of measurement was used to define night time and to calculate the ephemeris of the moon such that moon phase and zenith distance were extracted as features.
After the feature extraction process, the dataset which contains 1250 rows and 15 columns is ready for the machine learning stage. Each row in this dataset has its own label (1, 2, . . ., 6) according to the manual evaluation by one of the authors (RP). For every data row, the evaluation was conducted by displaying the nightly light curves superposed with the plot of moon's altitude as a function of time and also the density function constructed from the sky brightness measured in a month. e density function can be regarded as the reference to determine whether the data obtained in a particular night is peculiar or not. According to the characteristics observed from the plot, a certain class was chosen to describe the quality of the data obtained in the whole night. e process was repeated several times in order to suppress the doubt and reduce subjective bias. e dataset is randomly split into 2 datasets: 875-row training dataset and 375-row testing dataset, while crossvalidation data is a subset of the training dataset. Figure 2 summarizes the distribution of classes and training/testing dataset in terms of observation time and locations.

Random Forest Classifier.
Advancement of computing capabilities in the current era enables astronomers to conduct statistical data analysis and inferences more easily. Machine learning has become a popular approach for dealing with massive observational data. Implementations of machine learning technique range from classification of variable stars [18,19] and regression of photometric redshift [25] to intelligent recognition of predigital images [26].
In this study, Random Forest [27] is utilized for classification tasks since it has several advantageous properties. It works well with data with high dimensionality and it does not require scaling of features or variables. By default, Random Forest Classifier only uses a randomly selected subset of the features such that training or classification process can be quick and parallelization can be performed to make the process faster. is algorithm is considered to be robust to outliers, good in handling nonlinear data, and also able to handle imbalanced data as it tries to minimize the overall error rate. Low bias and low variance are the other advantageous properties of the Random Forest since it is basically an ensemble of decision trees. It is constructed from training dataset through bootstrap aggregation (bagging), where bootstrapped sample is used to grow numerous decision trees, in which sample is recursively divided into more homogeneous subgroups based on the best splitting features and thresholds. Bagging is implemented so that Random Forest becomes robust against overfitting. Gini impurity or information gain is often used as the criteria for selecting the best split during the training. Each tree provides predicted classification, while the average of all trees is the final classification model with higher accuracy compared to single decision tree. In most classification cases, the performance of Random Forest is between Support Vector Machine and Deep Learning [19,28].
Random Forest Classifier from the Python Scikit-Learn [29] is used in this study. Several hyperparameters can be imputed to the algorithm to obtain optimal classification model, among which are the number of trees (n_estimators), criteria to measure the quality of a split (using Gini impurity), and the maximum number of features used (max_features). As demonstrated by Criminisi et al. [30],   the number of trees or the size of the forest influences the smoothness of the posterior probabilities resulting from the classifier and also the confidence. e depth of the trees (max_depth in Scikit-Learn) also influences the prediction confidence, but Random Forest with too deep trees tends to overfit the training data. In this study, max_depth is not set so that the nodes are expanded until all leaves are pure. Balanced Random Forest which assigns class weights is applied, in order to reduce the bias due to imbalanced data.

Results and Discussion
In order to get an optimum model for classification of sky brightness data, the training dataset, which contains 875night sky brightness data with 15 features, was stuffed to Random Forest Classifier. Tenfold stratified cross-validation was used to produce unbiased measure of classification performance.
e following analyses are based on the Random Forest Classifier with 300 trees which accounts for a maximum of 10 features. ese optimum hyperparameters are deduced from randomized optimization based on the out-of-bag errors resulting from multiple training processes with varied number of trees and maximum features accounted for. In this optimization process, the first hyperparameter is randomly selected between 10 and 1000, while the second one ranges from 1 to 15. Execution time increases linearly as the numbers of trees and maximum features increase.

Feature Importance. Not only does Random Forest
Classifier produce the most accurate model for classifying the data, but it also identifies which features are the most efficient for the classification. In Random Forest, feature importance basically is the average value of importance obtained from all decision trees. In each decision tree, feature importance is a function of purity such that a more important feature is able to split the samples into two subsamples with lower impurity. e obtained feature importance is normalized such that the sum of all values is one.
Feature importance which represents predictive strength of any feature can be utilized to rank feature and to avoid dimensionality curse. In this study, 15 features cannot be considered as a big problem of dimensionality that needs to be pruned, especially when Random Forest is the machine. As mentioned before, this algorithm only selects a portion of the features (with typical size of the square root of the total number of features) such that the execution time grows moderately by O(n 1/2 ). However, there are other benefits of analyzing feature importance such as getting a better understanding of the underlying processes that generate the data and reducing the storage requirements [31]. e analysis of feature importance may serve better understanding of the behavior of sky brightness variation in many conditions. Proper feature selection might be of great importance when other machine learning algorithms are used, e.g., Support Vector Machine which is more affected by unimportant or noisy features. Figure 3 displays the importance of 15 features extracted from the sky brightness data for multiclass case and binary classification case. In the former case, feature importance is byproduct of the Random Forest during the classification of the training dataset into six different classes. Table 2 also summarizes the obtained feature importance in which the uncertainties are the standard deviations from 10-fold crossvalidation. In binary classification case, feature importance is calculated separately during yes-no classification, e.g., determining whether a sample belongs to class-1 or not. e overall importance or rank of the features may change slightly by the stochastic nature of the algorithm and also change slightly if the training dataset is altered, but the composition of the top half features does not change.
In general, the Phase becomes the most important feature since this variable effectively classifies the majority of the data. is result is in line with the fundamental classification of sky brightness data into moonlit or moonless class [14]. As displayed in Figure 3, the Phase is ranked first for identifying data with class-2, -3, or -5 while identification of clear sky data (class-4 or -6) requires other features such as MADRes or IQR that works better. Smoothness of the sky brightness data is an important key for finding clear sky data from a pile of data, while for moonless nights, flat sky brightness means small dispersion which can be described by IQR. Apart from those cases, pairwise feature importance shows that identifying data as class-1 requires statistical quantities that describe both dispersion and the shape of distribution. Skew, Kurt, and P05 represent the shape of distribution, while STD and Range describe dispersion. From the perspective of experienced users, class-1 data usually deviates from expectations such that detecting outliers is the key.

Feature Selection.
Some features have significant correlations with other features since they describe similar characteristics. For instance, Range, STD, IQR, MAD, and PDMed represent the dispersion of the data while PDMod, MADSlope, and MADRes represent the data fluctuation as a function of time. Moon Phase and the measures of dispersion are moderately correlated because sky brightness increases as the moon with higher illumination (or Phase) approaches instrument's field of view. Consequently, sky brightness deviates from moonless level and the dispersion increases. Range and STD also correlate with P05 and P95 since those measures of dispersion are more sensitive to the tail of data distribution compared to IQR and MAD. However, some features may be more effective in classifying the data compared to the others. Figure 4 maps the coefficient of determination (R 2 ) between two features. In machine learning, some features that highly correlate with other features can be excluded in order to reduce dimensionality and optimize the computation. e inclusion of such features may not increase the classification performance significantly. Based on the correlation analysis and the feature importance scoring, five features can be excluded for further training/prediction. ey are Range, MAD, and PDMed which describe the Advances in Astronomy dispersion of the data; PDMod which describes fluctuation; and also P50 which highly correlates with P95. e dispersion of the data is sufficiently represented by IQR which gains 10% of importance while the fluctuation is described better by MADRes which is ranked second. P95 is preferred instead of P50 as P95 is better in detecting outliers or identifying class-1 data. e evolution of misclassification error rate as the increase of maximum features utilized can be viewed in Figure 5. Classification based on single feature is approximately 50% accurate, and the accuracy increases abruptly by the introduction of some other important features. e average misclassification rates are below 20% when five or more features are used in training. Minimum errors of about 18% are achieved by using 15 features or 10 carefully selected features. However, training with 10 features can be performed 20% faster, even though the difference is not significant when dealing with a considerably small sample of size 10 3 .

Classification Performance.
e overall performance of the classifier can be evaluated by implementing it on the testing dataset. Confusion matrix which summarizes the accuracy of the prediction compared to the expert classification is displayed in Figure 6. Using only 10 selected features, average accuracy of 92% was achieved with the minimum accuracy of 89% for identifying class-5 (cloudy moonlit sky) data and the highest accuracy of 98% for identifying class-1 (invalid) data. However, these high accuracy rates might be less informative.
As discussed in the literature [32,33], accuracy is not a good metric to evaluate the performance of any classifier handling imbalanced data because it has low informativeness, produces less distinctive and less discriminable values, and also is in favor of dominant class. Alternatively, other metrics such as F-score and G-mean can be computed using the following equations: F-score is basically the harmonic mean between recall and precision while the G-mean is the geometric mean between the two. Recall or true positive rate measures the ability of the classifier to find data with certain class while precision describes the fraction of correct classification in a retrieved class. In general, recall anticorrelates with precision such that adjustment in the classifier hyperparameters may   yield high recall while precision decreases such that proper averaging between the two scores is needed. Table 3 summarizes performance scores calculated from the confusion matrix in Figure 6, but for binary classification in every class. e average values of the scores (weighted by the number of data points in each class) are considerably good while F-score and G-mean tend to be convergent. Nevertheless, misclassification occurs such that some scores are below average. Some peculiar data of class-1 were misclassified as moonlit data since some features (e.g., moon phase, measure of dispersion, and fluctuation) resemble the characteristics of moonlit data. Significant portions of misclassification can also be found for class-3, class-4, class-5, and class-6 where the justification of cloudiness during the night is the key aspect for identifying data of those classes. However, those relatively low scores are not necessarily caused by a poor classification model. Instead, inconsistency of manual classification by experienced users also contributes to the scores achieved since they are calculated by assuming that manual classification is true. is argument may explain the misclassification observed during dichotomy between class-3 and class-4 and also between class-5 and class-6.
Aside from the aforementioned scores, the overall performance of the classifier can also be evaluated according to the posterior distribution of the data. Figure 7 shows posterior probability distribution functions (PDF) of sky brightness over eight selected stations after classification processes. Data of class-1 is not displayed in the chart as it is considerably meaningless.
In most cases, clear moonless sky brightness (class-4, green) shows a sharp distribution with full width at half maximum of about 1.5 mpsas, except for the data from Agam (AGM) where multimodal distribution is obvious. Examination of the AGM dataset reveals that this multimodal distribution is produced by several flat light curves with different levels categorized as class-4. e peaks of the distribution vary from ∼16 mpsas for data acquired at light polluted urban regions to ∼21 mpsas for data obtained at considerably dark rural areas. is implies that the classification model works well for sky brightness data with different degrees of light pollution. Additionally, the classifier is expected to perform equally well for the data acquired at high latitude regions where the duration of dark nights can be shorter and clear moonless sky does not necessarily produce flat light curve [12]. For data with such characteristics, the importance of IQR as the distinguishing feature is expected to be overtaken by other features that represent data fluctuation (PDMod, MADRes) rather than dispersion.
Next, data of class-2 (orange) and class-3 (yellow) have broader and more left skewed distribution as cloud appearance amplifies the light pollution and brightens the night sky [16,17]. Actually, darkening of the night sky was observed several times in rural station of Agam (AGM) and urban station of Bandung (BDG) during overcast conditions where thick clouds, fog, covered the sky or even precipitation occurred. At those moments, darkening started before twilight and the observed brightness reached 22.5 mpsas (in AGM) with less fluctuation compared to the data obtained during partially cloudy conditions. is pattern was also reported by Jechow et al. [16]. Presence of dew and water droplets in front of the sensor container also influences the measurements. BDG data show darkening during overcast conditions even though it was acquired in the urban area where brightening due to the clouds is expected. Because of this anomaly, data with indications of darkening was manually classified as peculiar (class-1) and omitted in further statistical analysis. For instance, Rezky et al. [34] developed an algorithm based on Hough transform to identify the actual clear sky brightness from SQM data which is served as density plot (or scotogram).
is value (or function) is expected to be linear during the ideal night without any interference from clouds, moonlight, or Milky Way patches. e inclusion of peculiar or invalid data in this analysis is likely to obscure the linear pattern observed.
Based on Figure 7, data from Pasuruan (PSR) seem to be outlier because the posterior distributions of class-2 and class-3 have bumps on the right side (darker sky). On several nights, flat light curves suddenly jump to higher magnitude at around 3 am local time without any indication of clouds. e attenuation of sky brightness due to clouds in this region is highly unlikely since the location is situated close to the city and the brightening is expected during cloudy and overcast nights. On the other hand, darkening by smoke is plausible since the measurement was conducted at the location which is somewhat close to the industrial complex. Due to the bumps, the algorithm correctly identifies overcast conditions. Data obtained during clear moonlit sky (class-6) have the widest distribution whose left tail reaches 12 mpsas in most cases. Data of class-5 (cloudy moonlit) has shorter tail because clouds attenuate scattered moonlight and make moonlit sky appear a bit darker. Considering the shape of the distribution, it is noteworthy that failure in discriminating data acquired in overcast conditions may lead to overestimation of night sky brightness up to 1 mpsas. In order to properly obtain a single representative value of local sky brightness, it is highly recommended that only data of class-4 is used. is careful selection is comparable to the significant magnitude which is defined as the average of the highest third of night sky brightness values recorded during ideal night (m 1/3 ) [35]. e ideal night in this regard is defined as the condition of astronomical darkness, when the sun is below -18°and the moon is below −5°. Recently, it is  [36]. In short, these statistical selections are the other ways of selecting clear and moonless data that represent the true value of night sky brightness. e stability of sky brightness during the whole night becomes the main key to the selections both in the studies by Bará et al. [35,36] or in the study reported here. e difference is that Bará et al. [35,36] used statistical approaches to the density distribution constructed using long term data while the current study evaluates nightly characteristics to determine the class. For the case of data obtained in tropical region where the clear sky fraction is somewhat lower [37], there is a caution that the sharp distribution of the ideal data is likely to be overwhelmed (or smoothed) by data obtained during cloudy or overcast conditions. In general, the classification model performed well in following the patterns of data it used in training. Beyond this point, it is normal to question whether the classification is also in agreement with the actual weather data. Additional data from instruments such as all-sky camera and weather station or even ex-situ observation will be required to confirm the weather conditions deduced from SQM data. Not all stations invoked in this study are equipped with operational weather stations so that further data compilation from various sources will be required to discuss this issue.

Conclusions
In this study, a Random Forest Classifier of 300 trees that invokes 10 features from nightly sky brightness data has been trained and tested. Observation data in the form of light curves need to be categorized into six different classes according to their qualities and observing conditions (peculiar data, overcast, cloudy, clear, moonlit-cloudy, and moonlit-clear). Initially, 15 features were extracted from the data, but some of the features were not selected for further classification as they highly correlate with other better performing features. Among those features, the moon phase becomes the most important feature for classification followed by some measures of dispersion and fluctuation.
ere is a slight confusion to discriminate data obtained in cloudy, overcast, and clear nights such that recall and precision scores achieved are below average. is confusion cannot be judged as pure misclassification because the foundation of the Random Forest Classifier is training dataset evaluated or labelled manually.
is laborious task is not independent from error and inconsistency while machine learning technique serves better consistency. Apart from that, the classifier produces a good and reasonable set of posterior distribution functions where the distribution of class-4 (clear sky) data is the most concentrated than others. e distributions of class-2 (overcast) and -3 (cloudy) data are left skewed following the notion that the presence of clouds tends to amplify the light pollution, while data of class-5 (cloudy moonlit) and -6 (clear moonlit) have the largest dispersion/spread. However, further study using actual weather data (either in situ or ex situ) to examine the conformity of the classification model is interesting to do in the future.
Currently, open source software, such as PySQM developed by a group in Universidad Complutense Madrid, is available for SQM user community with the main functions of visualization and standardization of data produced using SQM. Evaluation and classification tasks can be regarded as fundamental and crucial tasks to do after those basic visualization and standardization processes.

Data Availability
e data used in this study can be provided by the corresponding author upon request. Codes related to this study are available through https://github.com/rhorom/skybright.