Data-Driven Photovoltaic System Modeling Based on Nonlinear System Identification

Solar photovoltaic (PV) energy sources are rapidly gaining potential growth and popularity compared to conventional fossil fuel sources. As the merging of PV systems with existing power sources increases, reliable and accurate PV system identification is essential, to address the highly nonlinear change in PV system dynamic and operational characteristics. This paper deals with the identification of a PV system characteristic with a switch-mode power converter. Measured input-output data are collected from a real PV panel to be used for the identification. The data are divided into estimation and validation sets. The identification methodology is discussed. A Hammerstein-Wiener model is identified and selected due to its suitability to best capture the PV system dynamics, and results and discussion are provided to demonstrate the accuracy of the selected model structure.


Introduction
The modern power system is increasingly taking advantage of renewable energy sources entering the marketplace. Traditional central power stations with their pollution related problems will likely be replaced with cleaner and smaller power plants closer to the loads. The energy generated by the sun is one of the most promising, nonpolluting, free sources of energy [1]. Among their benefits, solar-powered systems are easily expanded. Despite their still relatively high cost, photovoltaic (PV) systems installed worldwide show a nearly exponential increase [2]. A PV cell directly converts sunlight into electricity, and the basic elementary device of PV systems is the PV cell [3]. PV systems have proven that they can generate power to very small electronic devices up to utility-scale PV power plants. The basic building block for PV systems is a PV panel consisting of a number of prewired cells in series [4]. Panels are then connected in series to increase voltage and in parallel to increase current; the product is power. A PV array is formed by series and parallel combinations of panels [5]. Figure 1 represents a generic PV array structure. The performance of a PV system is normally evaluated under the standard test condition (STC), where an average solar spectrum at 1.5 Air Mass (AM) is used, the irradiance is normalized to 1000 W/m 2 , and the cell temperature is defined as 25 ∘ C [6]. However, under real operating conditions (i.e., varying irradiance as well as significant temperature changes), most commercial panels do not necessarily behave as in the specifications given by the manufacturers [7,8]. In addition, PV panels perform differently according to the location, time of day, and season of the year.
For PV system, the relationship between environmental conditions and electrical output parameters (current and voltage) is highly nonlinear. For this reason, the process of modeling the dynamics of the PV system and identifying the model structure that captures real-life behavior is extremely essential for the purpose of controlling the output power. Another important purpose is to predict future performance of the system for maintenance and troubleshooting.
Modeling and simulation of PV systems have been the subject of many research studies [3,[9][10][11][12]. However, the focus of the studies was on the PV panel/array stage without including the power electronics of the complete systemthat is, identifying only the nonlinear I-V characteristic of  the PV system. In this work, the PV system identification incorporated the entire setup considering a PV system, power converter, maximum power point tracker (MPPT), and a load.
Identification of linear/nonlinear systems became a hot topic in the 1960s, probably because proper models were needed to design good controllers [13]. At this stage, the system identification field was rather immature. The first attempt to put the field into order was the published pioneering work of Eykhoff [14]. Afterward, the field became completely structured when the books of Ljung [15] and Soderstrom and Stoica [16] laid out a complete theoretical framework and practical methodology to the identification process.
The core of the system identification process is to construct a mathematical model from observed input-output data. It is widely applied in different engineering disciplines to help understand the studied process, predict the system responses, and create better design with new specifications.
This work is organized as follows. Section 2 presents a description of the studied PV system. Section 3 details the system identification methodology. Section 4 discusses the results associated with implementing the PV system identification from applied input-output data, dividing the data into estimation and validation sets. Finally, Section 5 concludes the work.

PV System Description
The PV system illustrated in Figure 2 is considered for this study. The PV panel performance is affected by several environmental conditions; however, irradiance has the strongest impact on the panel output power. The panel is a multipurpose module consisting of 36 polycrystalline silicon cells connected in series. Under STC, the open circuit voltage ( oc ) is 21.6 V, the short circuit current ( sc ) is 5.16 A, the maximum power voltage ( mp ) is 17.3 V, and the maximum power current ( mp ) is 4.63 A. Figure 3 presents the I-V and P-V characteristics under different irradiances. These data are obtained experimentally [17], and they are not the data  that will be used for the system identification procedure; nevertheless, prior information about the system is beneficial. What we are interested in is the current measurement as input and the power measurement as output with respect to time, as indicated in Figure 2.
The I-V and P-V characteristic curves show the standard PV panel/array behavior; as the irradiance increases, the International Journal of Photoenergy 3 panel current is directly proportional to the solar intensity per unit area. Cutting the irradiance in half leads to a drop in current by half. Decreasing irradiance also reduces voltage, but it does so following a logarithmic relationship that results in a relatively modest change of voltage.
In general, the operating point of the PV panel is not the maximum power point (MPP) along the I-V curve, at which the panel operates with maximum efficiency and produces maximum power. Thus, a switch-mode power converter (DC/DC buck converter, in our case) with a MPPT is utilized to maintain the PV panel operating point at the MPP. The MPPT controller retains this operating point by controlling the PV panel's voltage or current independently of those of the load.

Identification Methodology
In this section, the principle of system identification is described. Dynamic models depend heavily on the amount of a priori knowledge about the dynamic process that is to be incorporated. Generally, modeling any dynamical system can be categorized into two approaches: first-principle modeling and data-driven modeling.
First-principle modeling uses an understanding of the system's physics to derive a mathematical representation, whereas data-driven modeling involves using empirical data to construct a model for the system. The two approaches classify the modeling process in terms of known parameters and structure into black box, grey box, and white box modeling as suggested by Figure 4.
According to [38], white box modeling is when a model is perfectly known; it is possible to construct the model entirely from prior knowledge and physical insight. Grey box modeling is when some physical insight is available, but several parameters remain to be determined from observed data. In black box modeling, no physical insight is available or used, but the chosen model structure belongs to candidate models that are known to have good flexibility and have been successful in the past.
The black box modeling is used for this work. It is usually a trial-and-error process where one estimates the parameters of various structures and compares the results. The PV system constructed and implemented in Figure 2 is used for data collection purposes. The measured input-output data will then be utilized to select model structure and identify the system. Figure 5 illustrates the measured data collected to be used for the system identification. The measured input-output data were taken from the PV system setup shown in Figure 2. When the irradiance is changing with time, the output maximum power will vary according to the maximum power values in Figure 3. The STC (1000 W/m 2 ) power is 82.5 W for the selected panel. The irradiance fluctuates, thereby causing the input current outputted from the panel to change. This results in continuous change of the output power of the system.

The PV System Identification Process.
The flowchart in Figure 6 summarizes the identification process using the measured data. The input-output data are a sampled time domain signal. The next step is to select a suitable structure for the model. Selecting the model structure is the most difficult step in the identification process because there are a multitude of possibilities. Next, one can use the Matlab System Identification Toolbox and compare models to choose the model with the best performance. The final step is to evaluate the resulting model with a validation data set. It is worth mentioning that system identification  is not a straightforward technique; rather, it is an iterative procedure involving several decisions to be made during the process.

Hammerstein-Wiener Model.
A nonlinear block-oriented model is frequently applied for an adequate description of the nonlinear behavior of a system over a range of operating conditions. The identified system is generally subdivided into a linear dynamic block and a nonlinear static block. The Hammerstein-Wiener model is a nonlinear model that is used in many domains for its simplicity. The model is popular because it has convenient block representation, has a transparent relationship to linear systems, and is easier to implement than heavy-duty nonlinear models such as neural networks and Volterra models. It can be used as a black box model structure because it offers flexible parameterization for nonlinear models. The model describes a dynamical system using one or two static nonlinear blocks in series with a linear block. Only the linear block contains dynamic elements. The linear block is a discrete-time transfer function, and the nonlinear blocks are implemented using nonlinearity estimators such as saturation, wavelet, and dead zone. Figure 7 depicts the structure of the nonlinear Hammerstein-Wiener model.
The input signal passes through the first nonlinear block, a linear block, and a second nonlinear block to produce the output signal [39].
The Hammerstein-Wiener structure can be described by the following general equations: where ( ) and ( ) are the inputs and outputs for the system, respectively. and ℎ are nonlinear functions that correspond to the input and output nonlinearities, respectively. For multiple inputs and multiple outputs, and ℎ are defined independently for each input and output channel. inputs, the linear block is a transfer function matrix containing entries in the following form: where = 1, 2, . . . , and = 1, 2, . . . , . If only the input nonlinearity is present, the model is called a Hammerstein model. If only the output nonlinearity is present, the model is called a Wiener model. The available nonlinearity estimators to be used in the identification process, by estimating the parameters of the input and output blocks, are dead zone, piecewise linear, saturation, sigmoid network, and wavelet network [40].

Dead Zone Function.
The dead zone function generates zero output within a specified region that is called the dead zone. The lower and upper limits of the dead zone are specified as the start and the end of the dead zone parameters, respectively. The dead zone defines a nonlinear function = ( ), where is a function of . There are three intervals, which can be identified as follows: ( ) = 0 when has a value between and ; this zone is called the "zero interval" zone. See Figure 8.

Piecewise Linear Function.
The piecewise linear function is defined as a nonlinear function = ( ), where is a piecewise linear (affine) function of and there are breakpoints ( , ) where = 1, . . . , and = ( ). is linearly interpolated between the breakpoints. and are scalars.
International Journal of Photoenergy Figure 8: Dead zone function. Figure 9: Saturation function.

Saturation Function.
The saturation function can be defined as a nonlinear function = ( ), where is a function of . There are three intervals as shown in Figure 9 and they can be identified as follows:

Sigmoid Network Function.
The sigmoid network nonlinear estimator uses neural networks comprising an input layer, an output layer, and a hidden layer employing sigmoid activation functions as represented by Figure 10. It combines the radial basis neural network function using a sigmoid as the activation function. The estimator is based on the following expression: where is the input and is the output. is the regressor. is a nonlinear subspace and is a linear subspace. is a linear coefficient. is an output offset. is a dilation coefficient, is a translation coefficient, and is an output coefficient. is the sigmoid function, given by the following equation: Input layer Hidden layer Output layer Input (u) Output (y) Weights Weights Activation function Figure 10: Sigmoid network function.

Wavelet Network Function.
The wavelet estimator is a nonlinear function combining wavelet theory and neural networks. Wavelet networks are feedforward neural networks using a wavelet as an activation function, based on the following expression: where is the input and is the output, is a nonlinear subspace and is a linear subspace. is a linear coefficient. is an output offset. is a scaling coefficient and is a wavelet coefficient. is a scaling dilation coefficient and is a wavelet dilation coefficient. is a scaling translation coefficient and is a wavelet translation coefficient. The scaling function (⋅) and the wavelet function (⋅) are both radial functions and can be written as follows: In the system identification process, the wavelet coefficient , the dilation coefficient , and the translation coefficient are optimized during model learning steps to obtain the best performance model.

Results and Discussion
As mentioned earlier in Section 3, the measured input-output data collected in Figure 5 are the system identification data to be used for the estimation. Table 1 shows the values of irradiance, current, and power with time for the PV system.
Once the identification process results in finding the best model in a structure, another data set (validation data) is used to validate the model. The identification procedure follows the flowchart steps in Figure 6. Table 1   domain data in Figure 5 are imported to the system identification toolbox in Matlab. Such a toolbox integrates techniques for nonlinear and linear models so that the complex problem of estimating and analyzing nonlinear models appears simple and systematic. In our case, a model is to be identified for the PV system from the current data as input to the system, and the recorded power is to be identified as the system output. A nonlinear autoregressive exogenous (NARX) model was selected as a reference point. There are several models that can be selected from the toolbox. Figure 11(a) demonstrates the available classifications of system identification; however, discussing the detail of each one is beyond the scope of this paper. Figure 11(b) shows the result of a nonlinear ARX model type with wavelet network nonlinearity. The figure indicates the model output matches the measured output with 77.48% accuracy. However, this match does not best describe the dynamics of the PV system. The input and output data each contain 100001 samples with sampling interval of 0.0001 seconds.

Model Estimation.
After the iterative process-by selecting different nonlinear models with their associated input-output nonlinearity, searching methods, and manipulating numerous estimation configurations-Hammerstein-Wiener was found to be the best model that captures the main system dynamics, with a 94.51% fit. See Figure 11(c).

Model Cross-Validation.
Similarly, the model performance is evaluated using cross-validation. In cross-validation, the Hammerstein-Wiener model is confronted with a new data set that is different from the data used to estimate the model. Figure 11(d) provides the new measured data set collected from the PV system, this time with different irradiances, namely, 800 W/m 2 , 600 W/m 2 , 400 W/m 2 , and 800 W/m 2 . This situation emulates an effect of a cloud blocking the irradiance from the PV system. In the same figure, it can be seen that the output of the Hammerstein-Wiener model matches the validation data set well, with 93.98% accuracy. This shows that the estimation process is robust enough to handle different input and that the model estimation is successful. Table 2 lists the results of the best fit with different input and output nonlinearity showing the 93.98% accuracy. The table as well displays an assessment of the estimation in terms of final prediction error (FPE) and loss function.

Conclusions
The results of this work showed that the nonlinear Hammerstein-Wiener model was able to provide an accurate description of the PV system dynamics. The model was selected after an iterative process involving trial and error. The model must produce an accurate fit to the data, and the Hammerstein-Wiener model showed an accuracy of 93.98% applying estimation and validation data sets. Developing such a black box model from measured input-output data can contribute to the design and implementation of nonlinear control strategies. In addition, researchers and engineers will be able to predict future performance of PV systems for maintenance and troubleshooting.