Random Forest-Based Approach for Maximum Power Point Tracking of Photovoltaic Systems Operating under Actual Environmental Conditions

Many maximum power point tracking (MPPT) algorithms have been developed in recent years to maximize the produced PV energy. These algorithms are not sufficiently robust because of fast-changing environmental conditions, efficiency, accuracy at steady-state value, and dynamics of the tracking algorithm. Thus, this paper proposes a new random forest (RF) model to improve MPPT performance. The RF model has the ability to capture the nonlinear association of patterns between predictors, such as irradiance and temperature, to determine accurate maximum power point. A RF-based tracker is designed for 25 SolarTIFSTF-120P6 PV modules, with the capacity of 3 kW peak using two high-speed sensors. For this purpose, a complete PV system is modeled using 300,000 data samples and simulated using the MATLAB/SIMULINK package. The proposed RF-based MPPT is then tested under actual environmental conditions for 24 days to validate the accuracy and dynamic response. The response of the RF-based MPPT model is also compared with that of the artificial neural network and adaptive neurofuzzy inference system algorithms for further validation. The results show that the proposed MPPT technique gives significant improvement compared with that of other techniques. In addition, the RF model passes the Bland–Altman test, with more than 95 percent acceptability.


Introduction
Solar energy is inexhaustible, free, and clean and is considered as the core of renewable energy (RE) in recent times primarily because of the depletion of fossil fuels and environmental pollution [1]. Among various RE resources, photovoltaic (PV) systems are gaining popularity in a wide range of applications, from small building integrated systems to largescale utility systems [2]. However, PV systems have the issue of intermittent power generation under different weather conditions [3]. Moreover, the amount of generated power from a solar cell depends on the nonlinear power-voltage (P-V) and current-voltage (I-V) characteristics that vary with irradiance ( ) and temperature ( ) [4]. Regardless of the size and type, the crucial issue for any PV system is the efficiency 2 Computational Intelligence and Neuroscience observe whether the converter power is increasing toward the MPP and in the next step, while the reference current/voltage is increased by the amount of B. The P&O method depends on the applied step size for the current/voltage reference. However, oscillations occur around the MPP, which leads to power loss. To avoid large oscillations, [12] suggested minimizing the applied step compromising the response time of the method. Meanwhile, the HC technique is highly comparable with P&O. The difference between P&O and HC methods is that the latter updates the operating point for the PV system by perturbing the duty cycle instead of the current/voltage. If the direction of the power is increasing, updating at the operating point is achieved by perturbing the duty cycle through the applied step size. Otherwise, the tracking is indicated as moving away from the MPP. However, HC is prone to failure in cases of large changes in irradiance [13]. To overcome some of the limitations of the P&O and HC methods, the IC approach was proposed under the conventional MPPT category. The idea behind the IC operation is to determine the MPP by tracking the PV panel power against the voltage curve [14]. This method improves dynamic performance and tracking accuracy under rapidly changing environment conditions. However, the IC method also suffers from some oscillation around the MPP, aside from power losses caused by noise and measurement errors. Furthermore, the IC method has higher computational burdens than the P&O method.
In the soft computing-based MPPT category, the most talked about approaches are Fuzzy Logic Control (FLC) [15], artificial neural network (ANN) [16], and other Computational Intelligence (CI) [17] methods. The main advantage of FLC-based methods is that a mathematical model for the system is not required. Thus, the FLC-based MPPT has been frequently implemented with PV systems in recent years [18,19]. However, the performance of FLC depends on the rule basis, number of rules, and membership function [20]. These variables are determined by a trial and error procedure, which is time-consuming. Another well-known approach in this category is ANN. In the MPPT application, ANN is applied to estimate and recognize unknown parameters [21] such as reference current/voltage or duty cycles. However, weights associated with the neurons should be accurately determined by a training process before they are used to supply the reference current ( MPP ) or reference voltage ( MPP ) to the MPPT controller. Besides, the ANN requires large training data before the method can be trained and implemented in the MPPT system. Another popular soft computing method for MPPT is based on CI methods which are nature-inspired computational methodologies that address complex real-world problems. These methods can be divided into two groups: swarm intelligence algorithms (SAs) and evolutionary algorithms (EAs). The most popular SAs are particle swarm optimization (PSO) [22], artificial bee colony (ABC) [23], and ant colony optimization (ACO) [24]. The most popular EAs are the genetic algorithm (GA) [25], differential evolution [26], and lightning search algorithm [27]. PSO has been used to optimize a nine-rule FLC for MPPT in a grid-connected PV inverter in which the FLC generates a DC bus voltage reference for MPPT [28]. A hybrid GA-ANN MPPT is proposed in [29]. In this approach, the optimized values for the array voltage and power are obtained by GA for different irradiance and temperature conditions. Similarly, the authors in [30] used GA to optimize the FLCbased MPPT. However, CI methods have limiting factors such as trapping in local minima and premature convergence. Among the aforementioned methods, most have been criticized for being inefficient because of the inability of the detector to fully differentiate the accurate MPP. Current challenges in detecting accurate MPP lie in the adaptation of algorithms in fast-changing environmental conditions, efficiency, accuracy at steady state, and the response speed of the tracking algorithm. In a number of previous studies, actual environmental condition problems were not addressed fully. Hence, the aforementioned methods do not have an integrated solution to address all of the problems in real environment conditions and are therefore inadequate in producing an effective MPPT system.
Recently, a new soft computing approach known as random forest (RF) approach received attention in many applications. The authors in [31] present a supervised classification method based on the RF to identify the layer from where groundwater samples were extracted, and they reported that the results by the RF approach were much better than those by linear discriminant analysis and decision tree-supervised classification methods. Ash Booth et al. in [32] proposed an expert system that uses novel RF machine learning techniques to predict the price return over seasonal events, and then these predictions are used to develop a profitable trading strategy. The results show that the RF approach produces superior results in terms of both profitability and prediction accuracy compared with those of other ensemble techniques. The RF method was also applied in other applications such as in improving rainfall rate assignment [33], assessing visual attention [34], resampling field spectra [35], and quantification of aboveground biomass [36]. In these applications, the authors concluded that the RF model has higher stability and robustness and better success rates with the use of proper training parameters than those of other models. Therefore, a better outcome will be obtained with the implementation of the RF approach in MPPT for PV systems.
This paper attempts to design and implement the RF method to track MPP accurately for the PV system, by considering the problems of the fast-changing environmental conditions. The system is modeled in the MATLAB environment to demonstrate the performance of the proposed controller.

PV Model and Maximum Power Point
The power output of the PV system depends on its voltage and current characteristics. However, solar irradiation and temperature are the two main parameters responsible for the operating point of the PV panel, hence, the MPP [37]. The equivalent electrical circuit for the PV is shown in Figure 1, which is used to obtain the characteristics of a PV cell. The electrical circuit contains a diode, a serial resistor, a parallelconnected resistor, and a current source. The mathematical  model of the circuit, which represents the output of the cell current , can be expressed as follows [38]: where PV is cell output current (A), ph is the light-generated current (A), is the cell reverse saturation current or dark current (A), is the electronic charge (1.6 * 10 −19 C), V is the cell output voltage (V), is the ideality factor, is the Boltzmann's constant (1.38 * 10 −23 J/K), and T is the cell temperature (K).
The light-generated current extracted from the photovoltaic cell, ph , is directly proportional to the solar irradiance, , and temperature, T. Assuming the nominal condition for and denoted by and , respectively, ph at other conditions can be calculated as follows [38]: where sc, is short-circuit current at the nominal condition and is short-circuit current temperature coefficient which are provided by the manufacture's datasheet as shown in Table 1.
Since electric power is the product of current and voltage, therefore a power-voltage (P-V) characteristic curve of a solar cell can be obtained for a given radiation level as shown in Figure 2. From the figure, at the maximum short-circuit current, the voltage is zero and thus the power is also zero. The situation for current and voltage is reversed at the opencircuit point, so again the power here is zero. However, there  is one particular point at which the solar cell can deliver maximum power for a given radiation intensity, and this operating point is called the maximum power point (MPP) point. From (1) and (2), the cell output current is shown to be nonlinear and dependent on irradiation and temperature. These equations can be used to calculate reference current ( MPP ) which eventually provides MPP by considering the cell output voltage. If the number of PV cells is known, the same relationship can be used to obtain MPP in a PV module or a system. However, the main drawback of this mathematical model is the time-consuming and iterative process required to calculate the cell output current, which hinders the utilization of the model in high-speed tracking.
Thus, in general, most of the MPPT algorithms usually start by sensing PV and PV , from the PV system terminals. Then the MPPT algorithm implements its own procedures (e.g., P&O) to find MPP or MPP to extract maximum power PV from the PV systems as shown in Figure 3, where PV is the product of MPP and MPP . It should be noted that the MPPT algorithm only provides a reference to the controller of the DC-DC converter of a PV system [5]. It does not directly generate the duty ratio required for the converter to produce maximum power.

Characteristics of the Studied PV System
In this study, 25 SolarTIFSTF-120P6 PV modules are used with the capacity of 3 kW peak to supply the load, as shown in Figure 4. The modules are arranged in series-connected configuration, which produces a DC output voltage of 435 V. and are measured using a solar pyranometer sensor and a temperature sensor, respectively, as shown in Figure 4.

Sensors Characteristics.
As mentioned earlier, two highspeed sensors are required to measure the irradiance and temperature. The irradiance sensor (S-LIB-M003) contains a silicon photodiode to measure solar power per unit area (W/m 2 ). This silicon pyranometer smart sensor is designed to work with the HOBO5 Weather Station Logger via its plug-in modular connector. In addition, all the calibration parameters are stored inside the S-LIB-M003 sensor, which automatically communicates configuration information to the logger without the need for any programming, calibration, or extensive setup.
Similarly, for temperature measurements, the smart sensor (S-TMB-M006) temperature is used. The stainless steel tip and a robust cable allow the S-TMB-M006 sensor to be immersed in water up to 50 ∘ C for 1 year. Thus, it is suitable for PV system condition monitoring. It can also automatically communicate configuration data information to the HOBO Weather Station without any programming, calibration, or extensive user setup. The silicon pyranometer smart sensor S-LIB-M003 uses the first channel and S-TMB-M006 smart sensor temperature uses the second channel out of 15 available channels of HOBO Weather Station.

PV Characteristics.
The characteristics of this PV module are depicted in Table 1, and the I-V and P-V curves obtained from (1) and (2), with varying irradiation and temperature values, for the SolarTIFSTF-120P6 module are exhibited in Figure 5. After a proper mathematical model is obtained, a suitable MPPT method is required to achieve better performance for the overall system. This is because the mathematical model cannot be directly used to generate reference currents due to computational burdens. Thus, in this study a new MPPT method is proposed as detailed in the succeeding section.

Proposed Random Forests MPPT Approach
Unlike most of the MPPT algorithms such as IC, P&O, and HC, the proposed method uses and as inputs because these two measurements are commonly integrated in many modern PV systems for monitoring purposes as shown in Figure 4. Considering and are available as inputs, a recently developed RF soft computing approach is suggested to process the two inputs to generate the required reference current, MPP , to the controller of the PV system as shown in Figure 6. Thus, this paper seeks development of a proper RF-based MPPT algorithm utilizing historical and data and target MPP values obtained from mathematical model described by (1). However, the designing of efficient control routine and the DC-DC converter is not the main focus of this work. An overview and adoption of RF to MPPT and in revaluation procedures are described in the following subsections.

Overview of Random Forests.
RF is an ensemble learning method for classification, regression, and other tasks, which operates by constructing a multitude of decision trees at training time and generating the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. RF corrects the habit of decision trees in overfitting their training set. The training algorithm for RF applies the general technique of bootstrap aggregating or bagging to tree learners. The RF illustrated in Figure 7 classifies or predicts the value of a variable for an ( ) input vector by building a number ( ) of regression trees and averaging the results. After and trees { ( )} 1 are grown, the RF regression predictor is derived as In general, the RF algorithm for regression works as follows: (i) ntree bootstrap samples ( = bootstrap iteration) are randomly drawn with replacement from the original dataset, with each containing approximately onethird of the elements of the calibration dataset X. The elements not included in are referred to as outof-bag (OOB) data for the corresponding bootstrap sample.
(ii) A regression tree for each of the bootstrap samples is grown (resulting in ntree trees) with the following modification: at each node, a subset of the predictor variables (mtry) is selected randomly to create the binary rule. In other words, mtry specifies the number Computational Intelligence and Neuroscience of randomly chosen variables, upon which the decision for the best split at each node is made. Variable selection is based on the residual sum of squares; that is, the predictor with the lowest residual sum of squares is chosen for the split. mtry is held constant during the forest growing process. (iii) Each of the ntree trees is grown to the largest extent possible. No pruning is conducted.

6
Computational Intelligence and Neuroscience Figure 7: Random forests.
(iv) Lastly, predictions are calculated by placing each OOB observation or observation of the test data for each of the ntree trees. The predictions of all regression trees are then averaged to produce the final estimate [39].
The OOB error is an important feature of RF. As mentioned previously, each tree is built on a bootstrap sample that comprises roughly two-thirds of the training data. The remaining one-third (OOB) of the training data is not included in the learning sample for this tree and can be used in testing. Therefore, the RF model is applied to the OOB data. The deviations between predicted and reference values are then used to calculate the OOB error, which is the mean square error (MSE) for the regression. These OOB elements can be used by the nth tree to evaluate performance [40].
To avoid the correlation among the different trees, RF increases the diversity of the trees by making them grow from different bootstrap samples created by a procedure called bagging (bagging = mtry = number of predictors) [35]. This procedure increases generality, makes the regression more robust at slight variations in the training data, and generally increases overall prediction accuracy [39]. When RF reflects the growth of a tree, the best split based on a number of randomly sampled predictor variables is used. If all variables were used for each tree, the trees would become identical and therefore highly correlated [39]. Thus, the randomly chosen subsets of predictor variables at each split of each tree ensure lower correlation between trees, which in turn increases model robustness.
RF is efficient to apply for MPPT prediction for PV systems because it involves a combination of robust characteristics. The approach does not require the specification of an underlying PV system model, and it offers the ability to capture nonlinear association of patterns between predictors, such as irradiance and temperature, to determine accurate reference current ( MPP ) and calculate MPP. The approach is also able to handle highly correlated predictor variables. Moreover, the approach offers the flexibility to perform a number of statistical data analyses and is computationally lighter than other tree ensemble methods [33,39,41].

Random Forests Training.
In this study, a RF-based MPPT system is considered for the 3 kW PV system, with the use of SolarTIFSTF-120P6 PV modules described in Section 2. The input data samples that correspond to irradiance ( ) and temperature ( ) are generated using (4) and (5), respectively, and the derived values are used to obtain the target data for training using the parameters given in Table 1 and (1).
where i is the number of data samples, from 1 to number of data samples ( ). The minimum and maximum limits for the and are selected based on historical data. In Malaysia, typically varies between 0 W/m 2 and 1200 W/m 2 , while fluctuates from 20 ∘ C to 35 ∘ C [42]. Therefore, min , max , min , and max in (4) and (5) are identified as 0 W/m 2 , 1500 W/m 2 , 0 ∘ C, and 45 ∘ C, respectively, to ensure that the generated data cover the typical and historical data range. Since the studied system is small, effect of partial shading on some modules of the system is not considered in this work in tracking MPPT. Table 2 shows some of the data samples which are used for training.
In general, two parameters must be adjusted in RF: the overall number of trees in a forest (ntree) and the number of data samples (m). The tuning is based on the performance of OOB data. An important consideration is to determine how many trees should be grown according to the RF model. Breiman [39] suggested that the generalization error converges as the number of trees increases. Adding an increasing number of trees to the model does not result in overadjustment. The main limitation of increasing ntree is the additional computation time.
To assess the optimal value of ntree and the optimal value of the number of data samples (m), four RF models are created using 500 trees with different numbers of data samples. The MSE values are then averaged. Figure 8 shows how the error rates (2.582 − 5, 1.148 − 5, 7.38 − 6, and 5.60 −6) change with the number of trees, when data samples used are equal to 100,000, 200,000, 300,000, and 400,000, respectively. From approximately 200 trees, the MSE of each dataset stabilizes, and increasing the number of trees neither increases nor decreases the MSE. Therefore, 200 trees in the RF can be regarded as sufficient. As shown in Figure 8, the values for MSE for the 300,000 and 400,000 data samples slightly differ. Hence, 300,000 data samples are used because increasing the data samples beyond 300,000 leads to an increase in computation time, which will not be beneficial. After the RF is trained, the approach can be used to generate reference current, MPP , and calculate MPP with new input data as shown in Figure 9. It should be noted from Figure 9 that the trained RF only estimate the reference current, MPP , and MPP is then calculated simply by multiplying MPP and MPP , where MPP is the system voltage at MPP as shown in Figure 2.

Performance Evaluation.
The main concern in forecasting is to test the performance of the developed forecasting technique for its suitability and accuracy. For this purpose, the Bland-Altman test is conducted first. The Bland-Altman test is a type of statistical analysis typically used to compare measured values and a reference value. If the differences between the RF-based MPP and the reference MPP are within the ± 2 (95%) limits of acceptability, the proposed method is considered as an accurate model [43][44][45]. In the above expression, is the mean difference (bias) of the power measurements between the proposed system and the reference and is the standard deviation for the difference of the power measurements. To evaluate the performance of the various MPPT methods, namely, ANFIS, ANN, and the proposed RF-based method, three standard error measurements are used: mean error (ME), mean square error (MSE), and standard deviation of the error ( ). These indices are given by [33] where RF is the ith power measured using RF, math is the th reference power based mathematical model, and is the average of the measured values.

Results and Discussion
To validate the MPPT algorithms, the developed and trained RF-based MPPT is verified using the actual SolarTIFSTF-120P6 PV modules output data instead of testing based on the slow change of metrological conditions (ramp) or step change implemented in previous research. The data were obtained from a 3 kW rooftop PV system at Universiti Kebangsaan Malaysia from March 1, 2013, to February 15, 2014, using a high sampling data logger at a sampling rate of 30 seconds. and are measured using a solar pyranometer sensor S-LIB-M003 and a temperature sensor S-TMB-M006, respectively, as explained in Section 3. RF-based MPPT (RF model) for selected days that have varying patterns. From these figures, the output power of the system is shown to be highly dependent on and follows the same pattern. Figures 10-17 do not clearly show how close the measured MPP based RF (RF model) is from the reference MPP based mathematical model (math model) which have been obtained using the data given in Table 1 and (1). Therefore, to show the difference or symmetry between the powers measured using RF-based MPPT (RF model) and the reference power based mathematical model (math model), the Bland-Altman test is conducted. For the selected days, the corresponding Bland-Altman test plots are exhibited in Figures 18-25. In these figures, a regression line is added to the plots as a dotted line to show the limit of acceptability between the proposed MPP and the reference MPP. From the figures, most of the data fall within the ± 2 (95%) limits of acceptability [43][44][45]. This observation indicates that the proposed RF-based MPPT is accurate and effective.
To show the reliability of the proposed MPPT algorithm, tests were conducted for an entire year. Measurement data and results of the Bland-Altman test conducted at the beginning and at the end of each month are shown in Table 3. As shown in this table, most of the measured data from the proposed MPP tracker and the reference power lie between 95% and 96.83% limits of acceptability. These statistical results support and promote the validity of the measurement of the power of the proposed MPP tracker relative to the reference power.  Table 4. The best performance is boldfaced, which clearly shows that the RF approach provides better results with very low ME values than those of ANN and ANFIS models. For RF-based MPPT, the best ME result obtained was 0.002985 on January 15, 2014, and the worst result recorded was 0.005087 on May 1, 2013. In the 24 days, the average ME values for RF, ANN, and ANFIS were 0.00439, 0.03083, and 0.89843, respectively. These values clearly indicate that the proposed method outperformed the other methods in terms of ME.
The second index compares the performances of MSE of various MPPT methods. MSE is inversely proportional to the quality of the signal. A decrease in MSE value means an increase in the quality of the signal. In other words, a decrease in the MSE value means the output is closer to the reference MPP, whereas an increase in the MSE value implies that the output is spread out from the true MPP. The performance of RF, ANN, and ANFIS-based MPPT methods in terms of MSE is listed in           ANN, and ANFIS were 7.432 − 05, 0.004893, and 4.689996, respectively. The results again show that the RF-based MPPT achieves better results than the other techniques. The third index compares the standard deviations, , of the various MPPT methods. evaluates the rate of variation against the average value. A low standard deviation indicates that the data points are near the mean, while a greater indicates that the data points are scattered in a wide surrounding range from the mean. The calculated standard deviations of the RF model, ANN, and ANFIS for 24 days, from March 1, 2013, to February 15, 2014, are depicted in Table 6. Table 6 clearly shows that the RF-based MPPT performs better than the other methods at varying between 0.005054 on January 15, 2014, and 0.009126 on May 1, 2013. In the 24 days, the average values for RF, ANN, and ANFIS were 0.007295, 0.059747, and 1.878943, respectively.

Conclusion
This paper introduced a new and effective MPPT algorithm based on RF for a 3 kW peak PV system composed of 25 SolarTIFSTF-120P6 PV modules. With the bootstrapping method used in the training procedures and proper parameter selection of the random forests, better MPPT model performance was achieved. To evaluate the reliability and efficiency of the proposed algorithm, the RF model was tested using actual data obtained from March 1, 2013, to February 15, 2014, every 15th of the month, and the performance of the proposed RF technique was compared with that of ANN and ANFIS methods. The performance of the proposed RF model was evaluated based on the Bland-Altman test results and on the obtained ME, MSE, and values. The results showed that the RF-based MPPT passed the Bland-Altman test with more than 95% limits of acceptability in all tested cases. Furthermore, comparative analysis reveals that the proposed MPPT method outperforms both ANN and ANFIS algorithms in terms of ME, MSE, and by a significant margin when tested under the same strict meteorological and technical conditions. Finally, the proposed method is found to respond quickly to fast-changing environmental conditions; thus the method can be adopted for real-time MPPT. The extension of this work is under way to develop a DC-to-DC boost converter hardware based on the proposed MPPT algorithm.