LightGBMLow-Temperature PredictionModel Based on LassoCV Feature Selection

Icing disasters on power grid transmission lines can easily lead to major accidents, such as wire breakage and tower overturning, that endanger the safe operation of the power grid. Short-term prediction of transmission line icing relies to a large extent on accurate prediction of daily minimum temperature.+is study therefore proposes a LightGBM low-temperature predictionmodel based on LassoCV feature selection. A data set comprising four meteorological variables was established, and time series autocorrelation coefficients were first used to determine the hysteresis characteristics in relation to the daily minimum temperature. Subsequently, the LassoCV feature selection method was used to select the meteorological elements that are highly related to minimum temperature, with their lag characteristics, as input variables, to eliminate noise in the original meteorological data set and reduce the complexity of the model. On this basis, the LightGBM low-temperature predictionmodel is established.+emodel was optimized through grid search and crossvalidation and validated using daily minimum surface temperature data from Yongshan County (station number 56489), Zhaotong City, Yunnan Province.+e root mean square error, MAE, andMAPE of the model minimum temperature prediction after feature selection are shown to be 1.305, 0.999, and 0.112, respectively. +ese results indicate that the LightGBM prediction model is effective at predicting low temperatures and can be used to support short-term icing prediction.


Introduction
Evidence from power grid operation shows that wire breaks and tower toppling accidents, caused by transmission line icing, lead to great damage to the transmission lines themselves and also adversely affect the safe and stable operation of the power grid system more generally [1]. Most transmission line icing accidents occur in areas with high small-scale weather variability, which are strongly affected by factors such as temperature, humidity, cold and warm air convection, circulation, and wind [2]. Low temperature is an important cause of transmission line icing. erefore, accurate prediction of minimum temperature can provide a good basis for short-term transmission line icing prediction. Minimum temperature data are generally based on time series, and most traditional prediction methods use univariate time series modeling [3]. However, changes in temperature are affected by various meteorological factors, and those that correlate highly with temperature include wind direction, wind speed, and relative humidity [4]. Traditional time series temperature prediction models mainly include multiple linear regression, autoregressive integrated moving average (ARIMA) [3], and gray prediction methods. Accounting for dynamic changes in temperature is difficult because of the prediction effects of the aforementioned methods, and the prediction results generally tend towards average values. Tao et al. [4] proposed a temperature prediction method using a long short-term memory network based on a random forest approach. Niu and others proposed the use of principal component analysis, a back propagation (BP) neural network, and a radial basis function (RBF) neural network, to establish a temperature prediction model [5]. Although this method considers the influence of multiple meteorological variables on temperature, it does not consider the time series characteristics of those meteorological variables. Jiang [6] proposed an application of a particle swarm-optimized RBF artificial neural network to temperature prediction. Although the structural parameters of the RBF model are optimized by the particle swarm optimization algorithm, problems associated with univariate time series prediction are still encountered, so the prediction accuracy is not high.
Aiming to address the difficulty of traditional prediction methods to learn from large amounts of data, and their inability to fully consider the impact of multiple meteorological variables and their own time correlations on temperature changes, this paper proposes a LightGBM lowtemperature prediction model based on LassoCV feature selection. Daily minimum surface temperature data from Yongshan County, Zhaotong City, Yunnan Province (station number 56489) from 2015 to 2019 were selected as the experimental data for prediction. e model first uses autocorrelation to establish hysteresis characteristics. Subsequently, LassoCV is used to measure the importance of the individual variable features, and those that are highly related to minimum temperature are selected as input variables to LightGBM to model the minimum temperature time series data. e experimental prediction results show that this method has stronger learning ability and higher prediction accuracy than traditional methods and long short-term memory (LSTM) networks. Especially when trained on large-scale multivariable meteorological time series data, this model shows high accuracy and a fast training speed, which are beneficial to practical industrial applications. e accurate prediction of low temperature is a prerequisite to accurately predicting icing on power grid transmission lines. e use of this method can improve the accuracy and speed of prediction and provides a sound basis for supporting the production of icing prediction data.

Principles of the LightGBM
LightGBM is derived from Reference [7] and related open source tools. e light gradient boosting (LGB) model is an efficient implementation of the classic gradient boosting decision tree (GBDT) model. e LGB model handles the classification, regression, and ranking problems in machine learning. GBDT obtains the final answer by combining multiple decision trees and by adding up the results of all the decision trees. is process has been improved to obtain extreme gradient boosting (XGB). e difference between XGB and GBDT is in the way the tree is split and the way the value of the leaf node is determined. e core idea is to conduct a second-order Taylor expansion of the loss function to be fitted by GBDT and to introduce the regular term of the tree intelligently, so that the formula of the second-order Taylor expansion can be simplified and solved analytically. us, a new tree splitting method and leaf node value determination method are derived.
LGB is further optimized on the basis of XGB's improvement of GBDT formula. Figure 1 presents a flow chart of the transition from the gradient boosting method to the LGB model.

GBDT Model.
e forward addition model is written as where x represents the sample and h(x; θ i ) represents the ith base model. us, the entire base model is a weighted sum of M models, so this model is called the forward addition model. e temperature prediction model solves the regression problem. e trained model f minimizes the loss function L: where (x j , y j ) N j�1 is the training sample and (2); the idea is to let β M h(x j ; θ M ), y j fit the negative gradient g j through the gradient descent algorithm, where the negative gradient g j is and g j exists for every sample. us, the way a new sample fits {(x j , g j )} is converted into a regression problem: After fitting the negative gradient, the final step size must be determined.
us, the final model is When the base learner h is determined as the cart decision tree, this becomes the GBDT method.

From GBDT to XGB.
e lowest-level XGB model is still a forward addition model, but the difference is that it is not optimized by simple gradient descent, but the regular term of the tree and the second-order Taylor expansion are introduced in a way that simplifies the splitting of GBDT and leaf nodes in a single tree value determination. GBDT requires the gradient that corresponds to each sample (x j , g j ) N j�1 , and XGB requires the gradient and Hess value (two steps) that correspond to each sample (x j , g j , h j ) N j�1 . e splitting of the XGB tree and the determination of the 2 Mathematical Problems in Engineering value of the leaf nodes are related to (g j , h j ), which are expressed as follows: where c and λ are two hyperparameters used to measure the complexity of the tree by adjusting the number of leaf nodes and the weight of the sum of the squares of leaf node values, respectively. LGB Model. e LGB model is further optimized on the basis of the XGB derivation described above. ese optimizations are performed to reduce the computational cost, but they can also play a role in preventing overfitting (because the original data are noisy, some rough processing may increase the generalization ability of the model). e computational cost of each leaf node split is where cost time represents the time (s) consumed by calculation, feature num represents the number of features, sample num represents the number of samples, and point num indicates the number of candidate points. Showing that the cost of the tree model is mainly based on three factors: (1) Sample size (the cost of calculating h i is closely related to the sample size) (2) Number of features (features need to be traversed when the tree is split) (3) Number of candidate points (the candidate points under this feature need to be traversed when the tree is split) erefore, the core of the LGB approach is to minimize the computational cost for each of these factors. e corresponding three technologies are

Data Collection.
is study uses meteorological observation data from Yongshan County (Station No. 56489), Zhaotong City, Yunnan Province, which comes from an Institute of Geographic Sciences and Natural Resources (https://www.resdc.cn/) data set, collected daily from January 1, 2015, to December 31, 2019. Part of the original meteorological data is shown in Table 1. Table 1

Establishing Hysteresis Characteristics.
With time series, the establishment of lag features is a key step in mining the autocorrelation information of the data. e autocorrelation coefficient (ACF) [8] is usually used to measure the correlation between the current moment y t and the k-order lag y t−k . e correlation coefficient measures the linear correlation between these two variables as follows: where rk represents the correlation between yt and its korder lag, representative autocorrelation coefficient. e autocorrelation represents the relationship between the values of a time series at different points in time. e ACFs of various meteorological variables used in this study are shown in Figures 2-7.
Variables that are highly correlated with temperature include wind direction, wind speed, and relative humidity [4]. When the ACF is r k ≥ 0.3, then a medium-strength correlation between the variables exists [9], and this is used as the selection criterion for meteorological variables.  Second, the input data are normalized. Table 1 shows that the input meteorological variables have different dimensions and units, so they need to be normalized in order to be able to compare them with each other. e normalization method selected in this paper is min-max scaling, using where x i (i � 1, 2, 3, . . . , n) is the input value, x * i (i �  1, 2, 3, . . . , n) is the normalized value, max is the maximum value of the variable, and min is the minimum value of the variable.
Given the many noise problems in established methods for evaluating lag characteristics of multielement meteorological time series, and the insufficient prediction accuracy of the traditional ARIMA model, this study proposes a LightGBM low-temperature prediction model based on LassoCV feature selection. First, the ACF is used to establish  the lag characteristics, and the value of each meteorological element is normalized, that is, the dimensional differences in the multivariate time series data are eliminated, and the processed multivariate data set can be evaluated using a supervised learning method. LassoCV feature selection is then used to measure the importance of each lag feature. e LGB model is next trained on the training data set, and finally the model is optimized through grid search and crossvalidation. Figure 8 presents a flow chart of the model processes.

LassoCV Feature Selection.
Selecting features is necessary here because there are many variables and a large amount of noise but some variables have very little effect on minimum temperature. Feature selection is used to select variables that are highly correlated with minimum temperature. e Lasso algorithm performs feature screening to effectively reduce the dimension of multi-dimensional data. e Lasso algorithm [10] is based on linear regression: a threshold is predetermined for the absolute value of the model regression coefficient, and the residual square sum of the model is minimized by adding a normal form function [11]. is algorithm compresses and eliminates the variables whose correlation is less than the threshold by optimizing the objective function. e remaining variables are then output as the characteristic variables. e results of using LassoCV to select characteristic variables that are highly correlated with minimum temperature are shown in Figure 9.
Set the linear regression model to is the error vector. e ordinary least squares method of the linear regression model is estimated to be when the constraint function is added, that is, LASSO, which is specifically expressed as arg min In the formula, the parameter λ is the penalty coefficient of parameter estimation, and its size is verified by ten-fold cross-validation, and the parameter α is determined in the same way. is paper uses the minimum regression method [12] to solve the LASSO regression algorithm. e minimum regression method is a variable selection algorithm based on the forward selection algorithm and the forward gradient algorithm, which can obtain more accurate eigenvectors, which are described in detail as follows: (1) e calculation process of the forward selection algorithm is as follows: select the independent variable Among them, the coefficient β k is determined by using the following equation: e variable residual is e variable residual is defined as a new target variable, and the set X without x k is used as the new independent variable set. Repeat the above process until the residual is less than the set range or the number of independent variable sets is zero, and the algorithm terminates.
(2) e forward gradient algorithm selects a feature variable x k with the most correlation each time to approximate the target variable y k . Unlike the forward selection algorithm, the residual is defined as Mathematical Problems in Engineering Regard the residual as the new objective function, the original variable set X � [ x 1 , x 2 , . . . , x i , . . . , x n ] T as the variable set, and recalculate according to formula (16), until the residual y res,k less than the set threshold range; the optimal solution is obtained. e specific steps of the Lasso algorithm are as follows: Step 1 (target variable): According to formulae (13) and (14), solve the variable x k with the highest correlation with the objective function, remove it from the variable set sum, and determine the new target variable according to formula (16).
Step 2 (related variable): repeat Step 1 until the correlation between the new variable x l and the target variable y res,k is the same as the correlation between the variables x k and y res,k .
Step 3 (characteristic variable): on the angular bisector of x k and x l , use equation (16) to reapproximate the variable x t , so that the correlation between x t and y res,k is the same as x k , x l , and y res,k . e correlation degree is the same, add the variable x t to the feature set, and use the common angle bisector of the set as the new approach direction.
Step 4 (loop): loop the above process until y res,k is small enough or the variable set is empty, and the final feature set is the desired feature variable.
e results of using LassoCV to select characteristic variables that are highly correlated with low temperature changes are shown in Figure 9. Figure 9 shows that, among the meteorological elements and the established time series features, Low_Tem1, Low_Tem2, Low_Tem8, Max_Win_Aspect1_3.0, aver_-Win1, and Low_RHU1 have higher correlation with minimum temperature than other features; hence, these six feature sets are used as the sample data set.

LightGBM Training and Model Tuning.
Low temperature prediction is defined as using a historical meteorological variable sequence . . . , x t−1 , x t to predict a future minimum temperature sequence x t+1 , x t+2 , . . . . e preprocessed and feature-selected data sets are input to the LightGBM for training, and then a grid search combined with ten-folds crossvalidation is selected to optimize the main model hyperparameters, including iterations, learning_rate, max_depth, and criterion, thereby improving the accuracy of the low-temperature prediction model. After optimization, the best hyperparameters of the low-temperature prediction model are as in Table 2.

Experimental Results and Analysis
To evaluate the accuracy and practicability of the lowtemperature prediction model based on the proposed Las-soCV-LightGBM combination described in Section 2, this work collected meteorological observation data from Yongshan Station in Zhaotong City, Yunnan Province, to train and test the model. Here, we select the data from the first 1790 days as the training set and the data from the last 56 days as the verification set.

Model Evaluation Indicators.
After training the model, the entire verification set is predicted, and then the predicted and observed temperature data are denormalized. To evaluate the performance of the model, this study compares its accuracy with that of the traditional prediction model ARIMA [13] and the LSTM [14], by comparing observed and predicted minimum temperature values. e root mean square error (RMSE), average absolute error (MAE), and average absolute percentage error (MAPE) are selected as the model evaluation indicators. ey are calculated as follows: where y i ′ is the predicted minimum temperature value, y i is the observed value, and N is the number of data elements. Smaller values of RMSE, MAE, and MAPE mean a smaller error in the minimum temperature prediction, and therefore a better prediction model performance. Table 3 shows the RMSE, MAE, and MAPE values of the three models.
To further test the superiority and practicability of the proposed LassoCV-LightGBM model, this study randomly selected data from five other sites in Yunnan Province for testing. e verification results are shown in Table 4. Table 3 shows that the RMSE, MAE, and MAPE values of the LassoCV-LightGBM model are all smaller than those of the LSTM and ARIMA models, indicating that the LassoCV-LightGBM model has higher prediction accuracy and smaller error between the predicted minimum temperature value and the actual observed value.
In summary, the use of the LassoCV-LightGBM model for multivariate time series data with a large amount of data improves not only the prediction accuracy but also the speed of the model training.

Conclusions
e LightGBM gradient boosting tree integration model is suitable for modeling multivariate time series data. Compared with traditional time series forecasting methods, the unique GOSS, EFB, and Hist of the LightGBM gradient boosting tree integration model can address the problems of high dimensionality, nonlinearity, and local minima more effectively, and it has stronger data learning and generalization capabilities. Moreover, LassoCV can analyze the importance of features and adds regular terms to prevent overfitting. is study establishes lag features through ACFs and then uses LassoCV to select features from multivariate meteorological time series data, which provide more effective and accurate data for model construction and reduce the complexity of the model. Taking minimum temperature data as a specific example, a prediction model based on LassoCV-LightGBM was constructed, and meteorological observation data from the Yongshan site were used for prediction and analysis. e experimental results show that the Lasso-LightGBM model performs better than the ARIMA and LSTM models, with improved low-temperature prediction accuracy, indicating that the Lasso-LightGBM model has superior capability in analyzing multisource time series data. It has particular applicability in predicting low temperatures and is clearly a useful tool for supporting power grid icing prediction.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.