A Practical Approach for Predicting Power in a Small-Scale Off-Grid Photovoltaic System using Machine Learning Algorithms

Climate change and the energy crisis substantially motivated the use and development of renewable energy resources. Solar power generation is being identi ﬁ ed as the most promising and abundant source for bulk power generation. However, solar photovoltaic panel is heavily dependent on meteorological data of the installation site and weather ﬂ uctuations. To overcome these issues, collecting performance data at the remotely installed photovoltaic panel and predicting future power generation is important. The key objective of this paper is to develop a scaled-down prototype of an IoT-enabled datalogger for photovoltaic system that is installed in a remote location where human intervention is not possible due to harsh weather conditions or other circumstances. An Internet of Things platform is used to store and visualize the captured data from a standalone photovoltaic system. The collected data from the datalogger is used as a training set for machine learning algorithms. The estimation of power generation is done by a linear regression algorithm. The results are been compared with results obtained by another machine learning algorithm such as polynomial regression and case-based reasoning. Further, a website is developed wherein the user can key in the date and time. The output of that transaction is predicted temperature, humidity, and forecasted power generation of the speci ﬁ c standalone photovoltaic system. The presented results and obtained characteristics con ﬁ rm the superiority of the proposed techniques in predicting power generation.


Introduction
Renewable or nonconventional sources of energy are something that replenishes itself at the speed of its consumption. Some examples of renewable energy are solar, wind, tidal, waves, and geothermal. Renewable energy ventures are being undertaken in developed as well as developing countries. The majority of the nonconventional energy is harnessed to get electricity, which is more efficient, clean, does not pollute the environment, and is costeffective in the long run. Among all the nonconventional forms of available resources, solar energy is most abundantly found and the amount of solar energy that hits the surface of the earth in an hour is enough to fulfill global needs for an entire year. This power from the Sun is used in a variety of ways, such as solar heating, solar thermal energy, photovoltaic, and photosynthesis. For large scale utility solar installations, solar thermal is employed.
SAPV system or off-grid system is one that is not connected to any electricity distribution system. They are classified into two types: direct-coupled system and stand-alone system. A direct-coupled system does not have a battery to store energy and can be used only when there is enough sunlight available. Stand-alone systems, on the other hand, employ battery backups to be used at any time of the day. They may also be equipped with a sensor and datalogger to sense and record the different parameters that need to be observed, respectively, such as pyranometer, anemometer, temperature sensor, humidity sensor, and system current and voltages. For further analysis of the obtained data, ML tools could be used.
Data collection to digitize the industry and society has become a priority nowadays. A summary of the up-to-date ML with its practical applicability is discussed in [1,2]. The demand for new research concerned with AutoML (automated machine learning) is discussed in [3] so that dependency on domain experts and time-consuming data manipulation is reduced. To estimate the power production of an established PV plant, a comparative analysis of three different methods is performed in [4] between a Sandia National Laboratories model, a multilayer perceptron neural network, and a regression approach. Statistical ML approaches demonstrated more accurate power predictions. Authors in [5] propose a new feature selection-based distributed ML approach to sense the active signatures of diverse power system events. The methodology presented in [5] is demonstrated in an interrelated two-area-based microgrid with numerous kinds of energy generation arrangements. Residential energy management system is proposed in [6] such that possible loads are effectively switched on to local energy storage based on its charge-discharge cycles and grid availability. The automation of switchover is achieved using artificial neural networks and support vector machine, machine learning algorithms. The work presented in [7] gives a summary of predicting means of solar irradiation using ML styles. In [8], the application of deep learning artificial intelligence procedures is reported to predict the energy ingestion and power generation combined with the weather forecasting mathematical simulation. Ensemblebased models are proposed in [9] to fulfil a long-term forecasting for territorial PV generation. Various predictive models based on ML style are trained and confirmed in [10] to estimate the real PV output power in orientation with a satisfactory time period accurately. In [11], an indepth analysis of the current methods used in the prediction of solar irradiance is presented to enable the selection of a suitable forecast method for the proposed system.
Authors in [12,13] elaborated the design of an economic datalogger capable of measuring electrical and meteorological data, which also meets the IEC requirements for SAPV systems to operate in remote areas. The data from the logger is collected via a SD card since telecommunications is a concern. All the required parameters are monitored with high accuracy and low power consumption. The datalogger consists of Arduino Mega 2560 board with ATmega2560 chip and DS1307 real-time clock chip for date stamping in the SD card every time data is logged. The results and findings are recorded and compared with the data that was taken by one commercial datalogger DataTaker DT80 during the testing stage in [14]. In [15], a datalogger, to monitor isolated PV systems in developing nations, was developed using Arduino. The system met all the relevant requirements of the IEC standard. It is tested in harsh weather conditions and exhibited comparable performance to commercial systems, hence is found to be reliable. The microcontrollerbased data acquisition systems are designed and developed for feasible operation in [16][17][18][19][20]. In [21], the author developed a wireless data acquisition system for weather station monitoring consisting of sensors to measure various atmospheric parameters, whose data that is collected and conditioned using precision electronic circuits and interfaced to a PC using RS232 linking via a wireless unit. The processing and display of the collected data are done using the Lab-VIEW program. Also, the data is available over the internet to any user. The key task of observing temperature and humidity along with the transmission of this data in the short message service form to users' cellular phone is done by the system which also offers a data-logging facility. For further analysis by [22], the logged data can be transferred to a PC having a graphical user interface program.
[23] discuss a software program that could be used to acquire any Unidalog (universal datalogger) remotely as remote terminal unit by using the GSM network as an intermediate for data communication. Processing software program design models using transformative development models to get the system iteratively to make segments are part of the telemetry monitoring. Further, in [24,25], a lossless algorithm is presented based on the statistical information with a compression ratio of up to 14: 1 to significantly reduce the storage and telecommunication costs of large volumes of measured data. The development of an economical field datalogger model using Raspberry Pi and industrial sensors is described, and an online rain flow count algorithm is implemented in [26]. The software and hardware style for constructing a wireless datalogger for a thermal authentication system to measure and operate in a stated temperature range and also agreeing with International and European protocols for validation of pharmaceutical, biotechnology, and the medical device is described in [27], where the temperature ranges from -60°C to 150°C with total system measurement accuracy of 0.1°C. CBR algorithm is employed to discover the past cases which are the most identical to the present case in [28]. The exactness of the model is confirmed using past 10 years data.
Few features of photovoltaics, such as loss of load probability and peak solar hour, were demonstrated and simulated by authors in [29,30]. Better battery management systems are proposed in [31][32][33][34] to extend the life cycle of the batteries and smooth transition between charging and discharging cycles. Several methodologies are studied in 2 International Journal of Photoenergy [35][36][37][38] to minimize the cost and size of the SAPV system with DC-DC converters. The parallel connection of the MPPT system decreases the undesirable effect of power converter losses in the total efficiency since only a portion of the generated power is processed by the MPPT system as studied in [39]. For this MPPT to operate with functions of the step-up converter and battery charger, a simple bidirectional DC-DC power converter is proposed. The operational characteristics of the proposed circuit are being investigated with the implementation of a model in a real-time application. In [40], authors discussed the scheme that comprises three DC-DC converters and that operate in accordance with the main bus load demand and the PV panel power. Execution of control elements through the usage of One-Cycle Control permits great accurateness of convergence, fast response to transient conditions, and low-cost analogue application.
Simulations of transition conditions are achieved through actual states such as the absence of sunshine, cloud-edge effects, sluggishly passing clouds. The design and the factors that affect the performance of the SAPV consist of PV sizing, the consequence of shadowing, temperature, and dust, leading to reduced SAPV output power and progressive degradation of the PV generator and cells permanently. Special attention is drawn by authors in [41] to the need for remote monitoring and examination and standardized upkeep measures, and a cautious design segment is encouraged for SAPV systems to remain a favored power substitute for telecommunication applications. An appropriate and rational technique for selecting the optimum constraints of the particle swarm optimization algorithm takes into account the topology, constraints of the DC-DC converter, and the configuration of solar panels. The ideal value of the sampling period for the digital MPP controllers delivering their peak performance is determined in [42] based on an innovative method. Compared to the direct duty cycle method, the voltage reference control method has the attraction of fast convergence and minor oscillations in steady state, also analytical calculations of controller gains are possible in [43]. In [44], the design and cost estimation of SAPV along with MPPT, inverter, charge controller, and lead-acid batteries are studied. The sizing of SAPV is achieved by taking into account the radiation and electrical data for a typical household in remote areas or villages using MATLAB/Simulink. The Life Cycle Cost analysis is used to evaluate the economic feasibility of the system. The results are encouraging, in the sense that, Yemeni remote areas could be electrified. The errors made during designing, costing, sizing, installing, and maintenance of solar PV system are discussed in [45]. The design was done in 2 dissimilar ways. The total price came out to be between US$422.5 and US$ 107.5. An alternative method to MPPT was discussed in [46] grounded on the transformation of the energy parameters into new charge-related parameters specified by the Joint Research Center and the IEC Standard 61724. In [47], authors proposed a scheme which when compared with traditional schemes needs a reduced number of switching elements in the battery path, hence significantly reducing the cost and size of inductors involved with battery charging circuits by employing simple power management topology. The efficiency of the power conversion stage varies with levels of irradiance. Hence, to maximize the system efficiency over a broader array of real-time working environments, a novel adaptive control scheme was proposed in [48].
For the fitted electrical devices in an independent house, the battery storage necessities are to address on the kind, size, and functioning sequence. The relation between the accessible stored energy and the size/operational order of the house's electrical equipment are highlighted in [49]. The rules for the selection of a suitable site/location together with the technique for the assessment of solar energy supply at the preferred site are provided in [50]. Sizing of various solar system components such as panels, batteries, inverter, charge controllers, and other accessories with daily load energy demand considerations was studied in [50] to design and install a solar PV system. The energy management of a SAPV structure is designed taking into consideration the drastic changes in weather conditions and rapid changes in the energy necessities of the client. A storage unit such as a super-capacitor is used to bring under control the mismatch of energy among the load and the SAPV system. The effectiveness and performance of the system in [51] is evaluated using MATLAB/Simulink software. The case studies of several SAPV systems in various parts of the world from [52][53][54][55] present information on guidelines, components required, the energy demand and consumed portfolio, and the process of calculation.
IoT-based solutions are proposed by researchers in [56][57][58] to make available appropriate and suitable source management, load shedding, data acquisition, and control of the SAPV systems and monitor and evaluate the electrification projects. [59] show that IoT provides the capability to bring into use the MATLAB, ThingSpeak, and other tools/ functions by granting the authority to one person to operate the forecasting system. Overall, the entire effort lessens. MAPE does not depend on the scale, there is possibility of intuitive interpretation and it is also easy to compute is presented in [60]. The authors in [61][62][63] discussed that the simulation and design of a hybrid system in an islanded location is actually feasible ecologically and economically. There is a drop in emission of CO 2 , lesser energy, and net cost. The work done in [64][65][66][67][68] evaluates the numerous strategies of load dispatch of islanded microgrid systems. The outcome attained gives a proper parameter to compare and approximate sizes of modules and its related costs. This paper was aimed at developing a novel scaled-down prototype of an IoT-enabled datalogger for PV panels that are installed in a remote location where human intervention is not possible due to harsh weather conditions or other circumstances. An IoT platform is used to store and visualize the captured data. The collected data from the datalogger is used as a training set for ML algorithms. The estimation of power generation is done by a LR algorithm. Further, a website is developed wherein the user can input the date and time. The output of that transaction is predicted temperature, humidity, and forecasted power generation of the specific standalone PV system. Hence, the proposed IoTenabled datalogger extensively contributes in realizing a 3 International Journal of Photoenergy self-sufficient small-scale off-grid photovoltaic system since it does not require involvement of any individual in providing and predicting power. The proposed small-scale off-grid photovoltaic system has applications in electrification of secluded, rural, isolated, and remote areas/homes. It can also be utilized by regions facing frequent power cuts. This system is also capable of supplying electricity under natural disaster circumstances.
The organization of the article is as follows: Section 2 explains the methodology to be followed for realizing the IoT-enabled datalogger for SAPV networks. It also discusses how ML algorithms are employed to forecast power generation. The purpose of creating the website is also discussed in this section. Section 3 discusses the system designing in detail for the hardware prototype of the proposed datalogger. It also explains how the logged data serves as the training data to the ML algorithms. The output of the algorithm is the forecasted power generation. Section 4 elaborates how the datalogger is implemented on the 40 W PV panel and also how the data is logged. LR algorithm is effectively used on the training data to predict the power generation in the website as well. Section 5 concludes working model of the datalogger for SAPV networks.

Methodology
2.1. Datalogger for SAPV Network. The main objective of this work is to develop a low-cost IoT-enabled datalogger for a remotely deployed standalone solar PV system. The datalogger contains calibrated digital sensors and a microcontroller. The sensors monitor and logs performance parameters of the PV system and the microcontroller which has a connection to an IoT server through a Wi-Fi connection send the measured data to the server. While collecting data, the sensors may tend to log erroneous values or miss logging a few values. To avoid the above-mentioned scenarios, two data predicting algorithms are employed. The performance of the two algorithms is compared, and based on the application, one of the algorithms is deployed. On the server-side, users can view live generation performance and can export data to an excel sheet for further data analysis. A datalogger is designed for the PV system as demonstrated in Figure 1. The datalogger consists of a sensor to measure photovoltaic parameters such as voltage and current and a microcontroller that is connected to an IoT server. The sensors collect information and transmit it to the computer via direct serial connection or directly being sent to an IoT server, where it is further viewed and analyzed. Appropriate software and weather data providing platforms are used to store, display, and process the collected data. The collected data then proceeds to the ML algorithm for estimating power generation for the next 30 days.

Machine Learning-based Prediction
Algorithms. The machine learning stages are shown in Figure 2. The datalogger initially collects performance parameters of PV panel, voltage, and current and sends it to the ThingSpeak server. This server provides storing and graphical representation facility. The collected data from the datalogger is used as a training set for the ML program where the algorithm analyzes the collected data and attempts to relate measured data with weather parameters such as temperature and relative humidity.
The algorithms employed here are the following: LR is a mathematical approach to derive a linear relationship between two variables. It does so by finding a linear line that best fits the data points. The equation of a line is used to predict output values that are not present in the data set. LR is fast and used extensively in analysing and predicting data sets. Upon executing the calculation of LR, it creates a LR line equation. The equation has two variables: temperature and relative humidity, and the substitution of these two values will predict the power generation. This process also minimizes errors.

Polynomial Regression. PR has an independent variable
x and a dependent variable y. The relationship between the variables is a polynomial of nth degree. Hence, the plot forms a curve. This algorithm can adjust to multiple variety of curves but requires additional effort to realize the appropriate fit and interprets the role of its independent variables.

GUI for Viewing Forecasted Power.
A webpage, which is created, acts as the GUI to view the generated parameters. The user needs to feed the date and time so that the webpage fetches its corresponding temperature and relative humidity from weather forecasting servers to display generated wattage. The constant record of performance and failure data is enabled by IoT so that it can used for analytics of predicting and forecasting the impending power generation potentials, revenue generation, etc. Photovoltaic systems fitted at isolated or faraway places from the control center can be accessed using IoT which also helps in improving the efficiency of the system, reducing human involvement and supervision time, and facilitating network management.   Figure 3 gives a detailed overview of the methodology incorporated in this article. The design is categorized into two parts: hardware and software. The hardware platform involves realizing a low-cost data monitoring system that gathers data and stores them in the IoT server. To predict power generation, the machine learning algorithm requires at least 4 months of performance data of PV panel for best results. Due to time constraints, the collection of so much data was implausible. Hence, appropriate approximate values were used for training sets. The software part consists of MATLAB for ML algorithm, an IDE for Arduino, and a webpage for viewing the real-time performance of solar PV panel and estimation of generation. ThingSpeak platform is programmed to show the voltage and current of a standalone PV panel.

IoT Based Datalogger
3.1.1. Selection of Microprocessor. The selection of suitable hardware was based on the use of open-source software and hardware that will allow us to attain the economical objective of the final system. Among the plentiful microprocessors based upon open-source hardware available in the market, Arduino Uno is chosen for its low-cost, flexibility, and widespread popular developer community. The Arduino DIY board is based on the ATmega328 microcontroller and has 14 digital I/O pins, 6 analogue inputs, a power jack, USB connection, and a reset button. The board comprises of everything essential to support the microcontroller and it can work using an external supply or any standard USB port using USB type A to USB type B connector cable.
It offers advantages over other development kits such as the following: (i) Easy programming tool: Arduino programming is done by its free IDE software being beginnerfriendly yet flexible enough for advanced users. Arduino IDE is a very small, lightweight program that requires very basic system requirements: (1) Voltage sensor. The voltage is measured and recorded by the new low-cost voltage sensor LV25, which works on the principle of resistive voltage divider design. The voltage detection sensor is very cheap and easily available in the market. Its interfacing with Arduino Uno board is also very simple. It uses a potential divider to decrease any input voltage by a factor of 5 permitting the user to use the analog pin of a microcontroller to monitor voltage greater than its capacity. The specifications of the voltage-sensing device are given in Table 1. Some of the features of LV25 are as follows: (i) Small in size and compact (ii) High personnel safety (iii) High-degree of accuracy    ESP8266 can boot up directly from an external flash drive. It has its memory and integrated cache storage. It remembers data like previously connected Wi-Fi SSID (service set Iidentifier) and password and automatically gets connected to the remembered Wi-Fi network if it is available. It can provide internet accessibility basically to any device which supports a UART connection. The specifications of the Wi-Fi module are given in Table 3. Some features of ESP8266 are as follows: (i) Low-cost: ESP8266 is very cheap and can be found in the market for less than ₹400.    (iv) Fast boot and recovery: ESP8266's latest firmware allows it to boot up quickly and recover cache memory in no time. It wakes up and transmits packets in less than 2 ms and has very low power consumption

Forecasting Based on Linear and Polynomial Regression
Algorithm. The collected data from the datalogger is used as a training set for the machine learning program, whereas the machine learning algorithm analyzes the collected data and attempts to relate measured data with weather parameters such as temperature and relative humidity. Linear and PRs are a mathematical approach to derive a linear relationship between two variables. It does so by finding a linear line that best fits the data points. The equation of a line is used to predict output values that are not present in the data set. LR is fast and is used extensively in analysing and predicting data sets. The prediction process of erroneous/missing data is shown in Figure 4. The estimation of power generation is executed by linear and PR algorithms between generated wattage and temperature, humidity for estimating power generation. Irradiance, wind direction, wind speed, and dew point are also the key elements that affect PV panels. Irradiance measurement instrument (pyranometer) is an expensive tool, and in mon-itoring wind direction, wind speed is also an arduous thing to do. But if the cost is not a concern, then one can further enhance the prediction by adding these factors too.

Challenges.
During the development of the prototype, there were many issues for which alternative solutions were brainstormed and implemented. The first constraint faced was the implementation of ESP8266 in the circuit since programming it as per need was a challenge. The initial plan was to use a GSM module, but it was expensive and can only send a text message to a GSM number, and the proposed work required something which could store data in the cloud.
Secondly, IoT is often thought to be the future of the internet. Ever since the world has got fast personal internet, people are working towards an interconnected world where

Ref.
Paper title

Type of meteorological data
Method Inference [28] Transformer failure diagnosis using fuzzy association rule mining combined with case based reasoning -Hybrid CBR algorithm is employed to discover the past cases which are the most identical to the present case. The exactness of the model is confirmed using past 10 years data.
[69] A hybrid algorithm for short-term solar power prediction -sunshine state case study Hourly Hybrid Results obtained from the hybrid algorithm are more accurate with fast convergence compared to the classic algorithm. [70] A hybrid ensemble model for interval prediction of solar power output in ship onboard power systems Hourly Hybrid The hybrid algorithm gives outcomes with high efficiency considering the meteorological data along with the ship's swinging as the input parameters.
[71] A lightweight short-term photovoltaic power prediction for edge computing Data sampled every 30 minutes Hybrid Compared to other standard ML algorithms, the technique employed here is remarkable and is capable of making short-term power predictions. [72] A local training strategy-based artificial neural network for predicting the power production of solar photovoltaic systems

Hourly Intuitive
Various tests were conducted that showed the superiority of the proposed ANN over the benchmark ANN training strategies. [73] A practicable copula-based approach for power forecasting of small-scale photovoltaic systems

Daily Numerical
From the results, it is clear that the mathematical model used here gives satisfactory prediction for cloudy days.
[74] A solar time based analog ensemble method for regional solar power forecasting Hourly Hybrid The proposed model adapts easily to the changing weather conditions irrespective of the location with high forecasting accuracy, few parameter requirements, data management, etc., [75] Ensemble approach of optimized artificial neural networks for solar photovoltaic power prediction

Daily Intuitive
Accurate day-ahead power prediction is obtained and is verified against a real case study. The number of hidden neurons in the hidden layer of ANN is optimized using trial and error method of the proposed model. [76] Photovoltaic power forecasting with a hybrid deep learning approach Daily Hybrid The proposed hybrid method is compared with three other benchmark methods and is shown to have very small prediction errors. [77] Power generation forecast of hybrid PV-wind system 4 hours daily Numerical The duration for which data samples are incomplete or missing can be predicted by using the proposed method. [78] Prediction of photovoltaic power generation based on general regression and Back propagation neural network

Daily Numerical
Temperature and irradiance were found to be the key parameters. Back propagation neural network predicted accurate results, but general regression technique was more appropriate for big data sets. [79] Probabilistic forecasting of photovoltaic generation: An efficient statistical approach Daily Probabilistic The technique employed here exhibits very high computational efficiency and proves to be remarkably effective. [80] Real-time anomaly detection for very shortterm load forecasting Daily Numerical A way to detect and replace anomalies/corrupted data is proposed here whose performance surpasses state-of-the-art methods. [81] Day-ahead hierarchical probabilistic load forecasting with linear quantile regression and empirical copulas Daily Hybrid A simple linear regression is adopted here for of accurate prediction improvement. [82] Direct quantile regression for nonparametric probabilistic forecasting of wind power generation

Hourly Probabilistic
The proposed linear programming gives a simple solution with high computational efficiency and flexible framework.
[83] Solar power probabilistic forecasting by using multiple linear regression analysis Hourly Numerical The forecasting result was satisfactory using linear regression 12 International Journal of Photoenergy Compared to all the ML algorithms, LR ML algorithm is found to be better as per the parameters is considered in this paper. Forecasting consumption of power for the next-hour is done using the online IoT platform.   14 International Journal of Photoenergy and corrupted, and the rule-based reasoning does not give satisfactory solution. The implementation of CBR algorithm is depicted in Figure 5.

Other Component Specifications.
The specifications of various components and the essential cloud services used to develop the proposed work are tabulated. Table 4 shows the specifications of Arduino Uno. The ratings of the PV panel used are given in Table 5, and Table 6 lists the essential cloud services used.

Experimental Results and Discussion
The proposed model of forecasting of generated power is built and verified under different scenarios to assess its superiority and performance. The assessment and obtained results of the proposed model is discussed in the following subsections.

Datalogger Realization.
The voltage and current sensor are connected to the pins of Arduino Uno which is programmed using an Arduino IDE. ESP8266 Wi-Fi modem is paired with the Arduino Uno to enable wireless capability on the Arduino Uno development board. ESP8266 is flashed with custom firmware to be able to receive and transmit data over the internet. The sensor is manually calibrated and tested with a high precision multimeter. The sensors collect data in real-time and transfer it to the Arduino Uno development board using its analog pins. Upon receiving raw data from the sensors, Arduino Uno is programmed to calculate the performance parameters. This data is sent to be stored in the cloud via the Wi-Fi module. The voltage and current sensors are connected to digital I/O pins of Arduino, whereas ESP8266 Wi-Fi modem is connected to digital pin 2 and 3 which are acting as TX and RX as shown in Figure 6. The reason behind not using the default TX and RX pin is that Arduino board resets itself each time it starts an operation, and while resetting, there should not be any connection on the default TX and RX pin otherwise it will come up with an error code and will not initialize. The Wi-Fi modem has an inbuilt antenna and it works as either a Wi-Fi hotspot or Wi-Fi receiver with  15 International Journal of Photoenergy inbuilt microprocessor and memory. It does not come with any firmware installed in its memory. The users have to flash the latest firmware by themselves which is provided by the manufacturer. In this project, the latest Non-Os AT firmware version, 1.6.2 is being used. The ESP8266 modem serves as a communicator between the datalogger and IoT server.
IoT capability is incorporated by the ThingSpeak IoT server. It is a free and open-source platform to store and retrieve data. It is one-of-a-kind, easy, and simple authentication system that makes it the most popular IoT server for research and project purposes. ThingSpeak provides reading and writing capabilities to its channels by giving a unique API key for each channel.
An IoT-based datalogger for a standalone PV panel is developed with generation estimation capability. A hardware prototype is realized for a scaled-down version of the datalogger circuit. IoT is incorporated by Arduino Uno and ESP8266 Wi-Fi module to collect the performance parameter of solar PV panels such as current and voltage and to store and visualize these on the ThingSpeak server. The temperature and relative humidity are obtained by Visual-Crossing weather API. Visual-Crossing is a free weather forecasting service. It provides hourly or daily based weather data for any given date and location. First up, a datalogger circuit was assembled to monitor the data of the solar panel.
The developed datalogger circuit was tested with a solar PV emulator as shown in Figure 7.
The solar PV emulator emulates power generation through the photovoltaic panels in a closed box. A bright electric bulb emulates solar irradiance, and two small sets of PV arrays work as a PV panel. These PV panels are connected to an inbuilt measurement instrument that displays voltage and current reading in an LCD mounted in front of the emulator system as displayed in Figure 8. The solar PV emulator has two knobs through which temperature or voltage and current generation can be adjusted. It has a switch that switches between external or internal measurement instruments. It has a power outlet pin to attach external measurement monitoring instruments. The datalogger system is attached and tested, and sensors were calibrated.

Data
Collection. The output of the datalogger is monitored in the serial monitor window of Arduino IDE. A screenshot of the same is attached in Figure 9. The PV panel performance data for the next ten days is collected from a 40 W PV panel. Interfacing the datalogger to the PV panel is demonstrated in Figure 10. The power generated is measured and is shown in Figure 11 along with its respective date and time stamp. The monitored data is analyzed and stored on a local computer.

Forecasting Based on Linear and Polynomial Regression
Algorithms. In this section, linear and polynomial regression algorithms are employed to forecast the missing/erroneous values of voltage and current obtained from the datalogger.

Polynomial Regression
Algorithm. The data is used as a training set to train the ML algorithm namely PR. The output is a curve. It is considered to be better than LR if the variables in the data set have little to no linear relations between them. The degree of the polynomial equation can also be adjusted for best fitting. Figure 12 shows the PR between power and temperature. The red dots are the measured points and the blue curve is its linearity.  The captured data as in Figure 11 is also used to train another ML algorithm such as LR, and its output is a straight line. Figure 13 shows the LR between power and temperature. The blue dots are the measured points, and the red line is its linearity. Table 7 summarizes the methods of analysing meteorological data for power prediction. From the research papers studied, we find that regression method is simpler to implement. There are better algorithms available for power forecasting, but due to limited data availability and usage of few parametric variables, the regression technique is more suitable. Hence, the proposed algorithm for power forecasting in SAPV systems for the hourly data accessible is LR. Figure 14 shows the number of research articles deliberate based on method of analysis.

Comparison Summary of Linear and Polynomial
Regression Algorithms. The results of both the algorithms are compared. PR does the same thing as LR but by fitting a polynomial equation to the given data sets. Hence, the result is a curve instead of a straight line. While doing internal testing, the results of PR, Figure 12, were not satisfactory when compared to LR as displayed in Figure 13. It is evident that LR fits more accurately than PR. Also, the prediction error and error percentage are noticeably high with the latter. And going higher than 6 degrees of PR resulted in even more inferior curve fitting. Also, going lower than 6 degrees was not satisfactory for the considered system. Hence, LR is chosen as the ML algorithm for the analysis of data from the datalogger of the SAPV network.

Data
Analysis. The ML algorithm namely LR fetched the weather data (temperature and relative humidity) from an API namely Visual-Crossing. Using the fetched data, the ML algorithm is able to estimate the power generation of the PV panel for the next 30 days. Figures 15 and 16 show the comparison between measured and estimated power generation. As seen in Figure 15, column E contains the measured power of the PV panel and column F and G are the estimated values. Column F lists estimated power generation when only the temperature is being considered in the algorithm whereas column G lists estimated power generation when both temperature and relative humidity are being used for estimation. From Figure 15, it is evident that the obtained wattage with only temperature has an accuracy of 94%, and similarly, the obtained wattage with both temperature and humidity correction is 98.955%. The estimated power is found to be more accurate while taking both the parameters, temperature, and relative humidity into account as in Figure 16. The data recorded in Figure 16 is assessed, and the corresponding results are depicted. Figure 17 compares the wattage with the predicted wattage obtained using both LR and CBR algorithms when only temperature is considered. Figure 18 compares the wattage with the predicted wattage obtained using both LR and CBR algorithms when humidity correction is considered. MAPE is the measure of the forecasting system accuracy. The lower the error, the better will be the prediction. Figures 19 and 20 illustrate the % accuracy and % MAPE attained using CBR and LR algorithms, hence proving that LR is a better power prediction option than CBR. Figure 19 shows that there is a 3.51% increase in the percentage accuracy of predicting wattage with only temperature using the proposed LR method than the CBR method. Also, there is a 1.24% increase in the prediction of wattage with humidity correction between the two methods. Figure 20 depicts the comparison of percentage MAPE between LR and CBR methods. The % MAPE is 2.958 and 3.978 for LR and CBR methods, respectively, for predicted wattage with only temperature. For the prediction of wattage with humidity correction, the % MAPE is 1.321 and 2.374 for LR and CBR methods, respectively. On comparing both the set of values, it is obvious that LR method has lower % MAPE than CBR method.
The measured performance data of the PV panel is sent to the ThingSpeak IoT server by the microcontroller. Thing-Speak provides an easy and hassle-free setup to import data from various IoT-enabled sensors. The developed webpage is shown in Figure 21 containing a graphical representation of current and voltage in real-time. It also has a drop-down menu bar to select date and time, and clicking on Get Information navigates the page to the next page where it shows the estimated power generated for the entered date and time as seen in Figure 22. ThingSpeak then plots a graphical visualization of received data concerning time for each set of parameters individually. These graphs are updated in realtime and can be used to attach anywhere on the web through an embedded link. The website also has a back-end connection to the MATLAB server for the estimation of power generation. MATLAB server is programmed in such a way that upon entering date and time, it obtains weather information such as relative humidity and temperature from Visual-Crossing API for the entered date and time and substitutes these two in the LR algorithm. The algorithm calculates estimated power generation and displays it on the website as can be seen in Figure 22. Figure 23 shows the linear relation between generated power and relative humidity. The blue dots are the measured points and the red line demonstrates their linearity.

Conclusion
In this paper, a new low-cost portable datalogger to monitor PV systems is designed, tested, and analyzed. The system design attributed easy-to-obtain hardware and free software, making it available to any investigator or user for the progress of schemes of their design and usage. This adjustability and adaptability makes the system more appropriate for intended applications like monitoring of PV plants and the collection of data from PV panels at isolated sites in developing nations.
LR machine learning algorithm is successfully used. The data from the datalogger serves as the training data set for the algorithm. The forecasted power generation is the output of the machine learning algorithm, which is tested with realtime data and found out to be accurate with less than 10% error. To store the measured data of voltage and current, sensor two channels of Thing Speak are used. Once date 17 International Journal of Photoenergy and time are keyed in, forecasted power generation for the entered date and time is computed. The website also has a back-end connection to the MATLAB server for the estimation of power generation.
The price of the proposed system is significantly less than commercially available devices, with little loss of accuracy and precision. Essential data for energy management systems are provided by the power generation estimation system. For use by researchers and in developing nations, this datalogger permits additional study. The subsequent phases include achieving lower power consumption, develop a small SAPV system realized with Arduino, using wireless technologies for communication aspects, scrutinizing the pricing meticulously. The proposed work has a very wide scope and can be conveniently used for standalone PV plants which are usually installed in a remote location where human intervention is not possible due to harsh weather or other circumstances.

A:
Ampere AC: Alternating current Amp: Current at maximum power API: Application programming interface AT: Attention CBR: Case based reasoning dBm: Decibel-milliwatts DC: Direct current DIY: Do it yourself GB: Giga byte GHz: Giga hertz GSM: Global system for mobile I/O: Input and output I 2 C: Inter-integrated circuit IC: Integrated circuit IDE: Integrated development environment IEC: International electrotechnical commission IoT: Internet of things kHz: Kilo hertz LCD: Liquid crystal display LR: Linear regression mΩ: Milli ohm mA: Milli

Data Availability
The data used to support the findings of this study are included in the article.