Air Quality Prediction Using the Fractional Gradient-Based Recurrent Neural Network

In this study, the air quality index (AQI) of Indian cities of different tiers is predicted by using the vanilla recurrent neural network (RNN). AQI is used to measure the air quality of any region which is calculated on the basis of the concentration of ground-level ozone, particle pollution, carbon monoxide, and sulphur dioxide in air. Thus, the present air quality of an area is dependent on current weather conditions, vehicle traffic in that area, or anything that increases air pollution. Also, the current air quality is dependent on the climate conditions and industrialization in that area. Thus, the AQI is history-dependent. To capture this dependency, the memory property of fractional derivatives is exploited in this algorithm and the fractional gradient descent algorithm involving Caputo's derivative has been used in the backpropagation algorithm for training of the RNN. Due to the availability of a large amount of data and high computation support, deep neural networks are capable of giving state-of-the-art results in the time series prediction. But, in this study, the basic vanilla RNN has been chosen to check the effectiveness of fractional derivatives. The AQI and gases affecting AQI prediction results for different cities show that the proposed algorithm leads to higher accuracy. It has been observed that the results of the vanilla RNN with fractional derivatives are comparable to long short-term memory (LSTM).


Introduction
With the increase in urbanization, industrialization, and trafc in the cities, the air pollutants are increasing and air quality is reducing [1]. To keep a check on the extent of air pollution, the US Environment Protection Agency has introduced a parameter called the air quality index (AQI) which tracks the daily efects of air pollutants [2]. AQI is a numerical value between 0 and 500; when the value of AQI is 0, the air quality is adequate, and if the value of AQI is 500, then the air quality is hazardous. AQI is calculated by considering major air pollutants, such as carbon monoxide (CO), nitrogen dioxide (NO 2 ), ozone (O 3 ), particulate matter (PM 10 and PM 2.5 ), and sulphur dioxide (SO 2 ). Tese pollutants are the residual gases and particles emitted from vehicles, industries, and due to climate change [3].
Biomass and coal-burning highly increase the levels of particulate matter (PM 10 and PM 2.5 ) that causes haze in the air. Tese particles deteriorate the air composition and cause respiratory problems in living beings. Moreover, haze reduces the visibility that further afects the economic sectors such as tourism and agriculture [4]. Te combustion of fossil fuels is carried out in several industries which are the main contributors of SO 2 and NO 2 in air [5,6]. Te motorized vehicles and combustion of fossil fuels also emit CO, which is another major pollutant responsible for worsening the air quality. CO is highly poisonous and can even lead to mortality on long exposure [7]. Another major pollutant is ground-level ozone O 3 , obtained from the combination of two primary pollutants, nitrogen oxides (NO x ) and volatile organic compounds (VOCs). Te 95% of these primary pollutants come from oil, coal, and gasoline combustion in vehicles, industries, power plants, and households, upstream gas and oil production, combustion of residual woods, and the evaporated liquid fuels [8]. Exposure to ozone can signifcantly afect human health, cause asthma, and can lead to premature mortality [9]. In addition, ozone can adversely afect vegetation, damage fowers and shrubs, and reduce crop productivity [10,11].
Air pollution is not a local phenomenon; the current quality of air is dependent on its history. Te industrialization has massively impacted the environment, especially the air quality [12]. Te levels of air pollutants like groundlevel ozone and particulate matter are also getting infuenced by modifying weather patterns that occurred due to climate change [13,14]. Te change in climate afects the temperature, humidity levels, and wind patterns, which in turn infuences the air quality. In addition, the naturally occurring emissions, for example, wind-blown dust and wildfres, get provoked by climate-driven changes in meteorology that afect the air quality. Te uncontrolled emission of air pollutants is gradually causing air pollution. Continuous exposure to polluted air is severely afecting human health [15] and leading to the development of lung, heart, and skin diseases [16]. Six of the world's top 10 most polluted cities are from India. Air pollution has been observed as the second biggest risk factor which is causing diseases in India and thus afecting its economy. Tus, there is a need to keep a check on air pollution in Indian cities. Each city has its unique features such as population per square km, temperature, humidity level, climate, vehicles, and industries in the region, and thus, it is better to study air pollution region-wise. Generally, the air quality in tier I and urban cities is low and it is required to give more attention in such areas.
To prevent the serious consequences of air pollution, several forecasting techniques for AQI are being developed. Based on target objectives, the techniques and approaches of forecasting are being expanded and improved. Traditional AQI forecasting techniques involve statistical techniques such as autoregressive integrated moving average (ARIMA) [17,18], principal component regression (PCR) [19], multiple linear regression (MLR) [20], and grey models [21,22]. Tese models perform well, but with the high increase in pollution, more accurate methods are required. Tese models are linear and thus are unable to capture the nonlinear traits [23,24]. Even with a large amount of data, not much increase is seen in the accuracy of these models. Te performances of the statistical techniques have been improved by developing hybrid techniques [25]. Artifcial intelligence-based techniques are capable of analyzing the nonlinear data and thus are being recently used in the time series forecasting [26,27]. With the availability of sufcient amount of data and computational support, AQI forecasting is being done with a deep neural network [28]. But these methods require to learn large number of parameters. Tus, a simpler and accurate method has been developed in this study using a vanilla RNN. Te current level of air pollution in any area is also dependent on AQI status in the past. To capture the history dependency, fractional derivatives have been employed in the back propagation algorithm to train the vanilla RNN for the prediction of AQI in Indian cities.
In this study, fve cities of diferent tiers are considered for AQI prediction. Bengaluru, Kolkata, and Hyderabad are tier I cities, while Patna and Talcher are tier II and tier III cities, respectively, as shown in Figure 1. Te major air pollutants of Kolkata are also predicted using the proposed approach. Te results show that the proposed method achieves minimum error on some fractional orders. Also, the obtained results are comparable to the LSTM. Te rest of the study is structured as follows: Section 2 briefy explains the related work. Te proposed approach is presented in Section 4. Section 5 discusses the experimental results obtained, followed by conclusion and future scopes in Section 6 and Section 7, respectively.

Preliminaries
Factional calculus is a 300-year-old branch of mathematics that deals with derivatives and integrals of noninteger order, i.e., order can be any number, be complex or real. Earlier, this domain was only theoretical involving rigorous calculations, but these derivatives are used in practical applications as well [29,30]. Several versions of fractional derivatives and integrals have been introduced till now, where each version has unique characteristics. Te most widely used versions are described.
(i) Riemann-Liouville (RL) fractional integral operator: Tis is the most frequently used version of fractional integral [31]. Te α order RL fractional integral is expressed as follows: Here, α > 0, t > a, t, a ∈ R, the function Γ(·) is the Gamma function, and f is a piecewise continuous function on [0, ∞) and integrable on any fnite subinterval of (0, ∞).

(ii) Riemann-Liouville (RL) fractional derivative:
Tis is the natural generalization of integer-order derivative, as this fractional derivative version and ordinary derivative are left inverse of integrals, i.e., For t > a, α > 0, and n ∈ N, such that n − 1 < α < n; the derivative of order α is evaluated by diferentiating the n − ] order RL integral of function f(t)n times, i.e., It is the α order RL fractional derivative [31]. But this defnition has some disadvantages as well. Te most signifcant disadvantage is that RL derivative of order α, ( < 1) of a constant is not zero. (iii) Caputo's fractional derivative: Fort > a, α > 0, and n ∈ N, such that n − 1 < α < n, the Caputo's derivative of order α is obtained by evaluating the n − α order RL integral of n th order derivative of function , that gives the following: It is called the α order Caputo's fractional derivative. Caputo gave this defnition of fractional derivative in 1967 for overcoming the limitations of RL derivative [32]. Te Caputo's derivative of order α > 0, n − 1 ≤ α < n for a constant c is zero. Tis version increased the applicability of fractional derivatives in modelling real world problems. Tus, in this study, Caputo's version of fractional derivatives has been used. (iv) Grünwald-Letnikov (GL) fractional derivative: Te limit defnition of fractional derivative was given by Anton Karl Grünwald and Aleksey Vasilievich Letnikov in 1867 and 1868, respectively [31]. Without any assumptions on diferentiability of the function for α > 0, the α-order GL derivative of function f is expressed as follows: Here, h is the step size and α r � Γ(α + 1)/Γ(α − r + 1).Γ(r + 1) with the Gamma function Γ(·).
Clearly, the limit defnition of frst-order derivative shows that evaluation of derivative involves usage of only two points. But it can be seen from limit defnition of fractional derivatives and from the Caputo's defnition in equation (4) and equation (5), respectively, that their evaluation involves usage of value of the function at all past points. Tis makes the fractional derivatives to be a nonlocal operator and incorporate memory to the systems. Due to the memory property and availability of software and other tools, fractional calculus has been used in numerous applications of science and engineering. Tese have been widely used in viscoelasticity [33], biology [34], signal and image processing [35], stock market [27,36], economics [37], and in other domains with history dependency [38]. Moreover, the order of diferentiation acts as a degree of freedom in the optimization process.
Te nonlocality of fractional derivatives has been the major motivation for their application in diferent domains. Fractional calculus has been successfully used for air quality prediction [39][40][41]. Fractional derivative-based Kalman flter has been introduced to measure the pollutant emission and hence the air quality [39]. Several variants of fractional Kalman flters have been developed using diferent fractional-order derivatives version for improving the prediction accuracy [40][41][42]. In these air-quality models, fractional calculus is incorporated because of its long-term memory and nonlocal nature. Fractional calculus has been successfully applied in the training of artifcial neural networks [43][44][45][46]. After replacing integer-order derivative by fractional-order derivative in the back propagation of the training algorithm, the update rule gets updated as follows: where η and α are the learning rate and fractional order of diferentiation, respectively. Chen [47] employed the fractional derivative in their backpropagation method for feedforward neural networks (FNNs) in 2013. Te simulation Computational Intelligence and Neuroscience 3 results showed that fractional derivative-based FNNs had a substantially better convergence speed than integer-order FNNs. Te fractional derivatives have been successfully applied in the backpropagation learning algorithm of the radial basis function network [48], recurrent neural network [46], convolutional neural network [49], and even in deep neural networks and have shown signifcant improvement in accuracy. In our study, the efect of using fractional derivatives in the learning of the neural network has been analyzed for the prediction of air quality in few Indian cities. Earlier too, the efectiveness of fractional derivatives has been shown for nonlinear system identifcation, pattern classifcation, and Mackey-Glass chaotic time series prediction [46].

Fractional Gradient-Based Backward Propagation Algorithm
In this section, we introduce the fractional-order truncated backpropagation through the time algorithm on the RNN with 10 neurons in a layer. Tis backpropagation algorithm considers the truncated depth of the input data and the state of the network, which makes the algorithm computationally efcient.
For the implementation of the backpropagation algorithm, the mean squared error at an instant is considered as follows: where i is the output neuron, Φ(u i (s)) and x i (s) are the actual output and the expected output of the i th neuron at time s, and u i (s) � j∈Ω w ij v j (s) at time s, where w ij (s) is the weight of a signal from j th neuron to i th and v j (s) is the output of j th neuron at time s; then, the update rule becomes as follows: where η is the learning rate and ∇ α w ij represents the factional gradient w.r.t w ij . Now, ∇ α w ij E(s) can be evaluated by applying the approximated chain to the error function. Te actual chain rule applicable on fractional derivatives is complicated and involves special mathematical functions; thus, several approximated chain rules have been developed for fractional derivatives. [50][51][52][53] Te chain rule given by expression (14) has been obtained by using fractional Taylor's series expansion for diferentiable function. Consider a diferentiable function, say f then for a small h, Ten, Taking limit h ⟶ 0, we get , From above equation, we can also say Hence, After using the abovementioned fractional chain rule, we get Now, as in this study Caputo's version of fractional derivative is being used , for p > − 1, then the following holds: Tus, from equations (4), (16), and (17), the following fnal update rule is obtained:

Proposed Approach
In this study, the vanilla RNN has been employed to predict the AQI value of a day based on the previous sequential AQI data of fve diferent cities. RNNs are capable of learning the sequential pattern of historical data. Furthermore, the accuracy of the system has been improved by incorporating memory into the system using the fractional gradient descent algorithm.

Data Exploration and Processing.
In this study, the continuous AQI data have been used to predict future unseen AQI values. For each city, a continuous-time patch of around 1000 data points has been used from the AQI dataset. Te sample data can be seen in Table 1. For constructing the training data, a min-max scaler has been used to scale the data values between 0 and 1. Te predicted values are also obtained between 0 and 1, which are inverse-transformed later to evaluate the fnal predicted AQI value.

Neural Network Architecture and Training.
Te vanilla RNN has been used in the proposed model which has a single-layered architecture with 10 nodes in it. Te fractional gradient-based RNN model is built from scratch using the NumPy library, and the Pandas library is used for data preprocessing. Forward propagation and fractional gradient-based back propagation as given by (18) have been used for training the vanilla RNN. On the other hand, for the LSTM, the TensorFlow library is used to produce all the results, and the integer-order gradient descent algorithm is used to train the model. Backpropagation has been used in both the models through 10 days (timestamps) which would predict the AQI data for the next (11 th ) day. Training and testing sets have 600-800 and 100 data values, respectively. For the initializing of weights for the models, Xavier's initialization has been used with 0.1 learning rate; 80 epochs were used to train the model for each city and each fractional order. Figure 2 shows the architecture of the RNN with fractional gradient-based backpropagation.

Evaluation Parameter.
Te standard evaluation metrics for forecasting models viz root mean squared error (RMSE) and mean absolute percentage error (MAPE) have been employed to assess the performance of the proposed model in the prediction of AQI of diferent Indian cities and the major pollutants in one of those cities. Te lesser the value of RMSE and MAPE, the better the performance of the predictor. Tese errors measure the performance of forecasting, climatology, and regression analysis for verifying the experimental results. Te detailed information related to these parameters is provided. Te root mean square error (RMSE) is the square root of the average of the squared diference between the actual and predicted value. RMSE can be expressed by the following expression: where o t is the predicted value and o t is the expected output for iteration t, which are observed for N times. Te mean absolute percentage error (MAPE) is the average percentage of the absolute diference between the actual and predicted values divided by the actual value for each time period [54]. MAPE can be expressed by the following expression: where o t is the predicted value and o t is the expected output for iteration t, which are observed for N times. Tis is a form of percentage error, which has helped in the analysis of the proposed model in diferent situations.

Results and Analysis
Tis section describes the AQI dataset of fve cities chosen, the results obtained by using the proposed approach on the AQI data, and the discussion of comparison between predictions of LSTM and the proposed approach on diferent fractional orders. Te performance of networks is measured using RMSE and MAPE.

Dataset.
Te AQI dataset of fve cities for 2015-2020 is considered which is publicly available at the ofcial portal of the Central Pollution Control Board, Government of India (http://cpcb.nic.in/). Te dataset consists of daily air quality levels at various stations across multiple cities in India which are obtained by averaging out the hourly value of AQI. Indian cities chosen for the analysis includes Kolkata, Hyderabad, Bengaluru, Patna, and Talcher. Basic information related to these cities is as follows: (i) Kolkata (22°34 ′ 03 ″ N 88°43 ′ 57 ″ E), located in West Bengal, is a tier I and the seventh most populous city of India with third-most populous metropolitan area. Te concentration of pollutants such as sulphur dioxide and nitrogen dioxide remains within the limit, but the presence of particulate matter in air is high and is increasing over the years. Due to this, air pollution is severe and is causing respiratory ailments such as lung cancer. (ii) Bengaluru (12°58′44″ N 77°35 ′ 30 ″ E), located in Karnataka, is also a tier I and the third most populous city of India with ffth most populous metropolitan area. Bengaluru is also considered as "Silicon Valley of India" because it is the nation's top IT exporter. Tis IT hub region is the most polluted and is causing several environmental issues. Due to the large population, Bengaluru generates tonnes of solid waste which is polluting the environment. Tus, the large population and IT hub of Bengaluru is the major reason for air pollution.
Computational Intelligence and Neuroscience (iii) Hyderabad (17°21 ′ 42 ″ N 78°28 ′ 29 ″ E), located in Telangana, is also a tier I and the fourth most populous city of India with sixth most populous metropolitan area. Again, due to the large population, increased economic activity, and rapid urbanization, tonnes of solid waste are generated, and disposal of such waste becomes hazardous and pollutes the environment. Te particulate matter (PM 10 ) dispersed in the atmosphere causes around 2500 deaths each year. (iv) Patna (25°36 ′ 0 ″ N 85°6 ′ 0 ″ E), located in Bihar, is a tier II city with a high population. Air pollution is a major issue in this city. Te situation in winter becomes even worse due to dense smog, leading to an increase in mortality. Patna was declared as the second most air polluted city in India, in the WHO survey of 2014. (v) Talcher (20°57 ′ 0 ″ N 85°13 ′ 48 ″ E), located in Angul district of Orissa, is a tier III city. Tis is a small city with less population, but Talcher has the country's biggest coalfeld with the highest coal reserve of around 52 billion tonnes. Te presence of these coal mines leads to air pollution.
Te cities of diferent tiers have been chosen where air pollution is a major issue. To summarise, the cities with a large population or with a high emission rate of air pollutants afecting human health are considered. Around 600 normalized data points for each city have been used for the analysis, which is divided into train and test data in the ratio of 4 : 1. Te model has been tested on the data for 100 days for each city.

AQI Prediction Results Using Fractional-Order Gradient
Learning. Te performance of the vanilla RNN in predicting AQI values of each city using the fractional backpropagation algorithm has been analyzed. To assess the performance of the model, RMSE and MAPE are computed. Te prediction performance of the RNN using the fractional gradient descent algorithm with values of fractional orders in the neighborhood of 1 is compared with the performance of the RNN with the traditional integer-order gradient descent algorithm where the order remains 1. Te values of fractional order α which are considered are 6/9, 7/9, 8/9, 1, 10/9, and 11/9 [49]. Te results obtained at diferent orders using the proposed approach can be seen in Table 2. Te graphs in Figures 3-7 show the comparison between actual and predicted output for Bengaluru, Kolkata, Hyderabad, Patna, and Talcher, respectively. In all the graphs, the expected output and the actual output are represented by the yellow lines and blue lines, respectively. It can be observed that the least RMSE and MAPE are acquired by the vanilla RNN at some fractional orders, either on 7/9 or 8/9 for all the cities. Te model achieved minimum RMSE and MAPE of 13.22 and 06.11%, respectively, at α � 8/9 for Bengaluru. Also, the minimum RMSE and MAPE are found to be 19 and 7.02% for Kolkata and 23.85 and 7.43% for Patna at α � 8/9.  Moreover, the minimum RMSE and MAPE are found to be 7.41 and 3.22% for Hyderabad and 11.40 and 4.15% for Talcher at α � 7/9. Tus, it can be concluded from the results that the fractional-order gradient is more accurate than the integer-order gradient algorithm. Moreover, the proposed model performed best for α � 0.7 by achieving the least MAPE of 3.22% for Hyderabad among all the cities.

Comparison of Results Obtained by the Proposed Method and LSTM.
Te prediction of AQI for the same set of datasets of all cities has been done using LSTM as well with the same number of timestamps, nodes, and the same procedure for inputs as done for the fractional RNN. Te obtained results are also shown in Table 2       seen that the least MAPE of 3.22% is obtained by the fractional gradient-based RNN for Hyderabad as compared to other cities and models. Figure 8 shows the comparison between the expected AQI value and the AQI value predicted for all the cities by LSTM.

Prediction Results of Major Pollutants in Kolkata Using
Diferent Fractional-Order Gradient Learning. Te proposed approach has been implemented for the prediction of the concentration of major pollutants such as SO 2 , CO, and PM 10 . Here, the considered time is also the same as used in the prediction of the AQI of Kolkata. As we have seen in the above section, the performance of the algorithm is found to be better either for α � 7/9 or 8/9. So, we have considered these two fractional values to compare the results with the integer-order-based learning of the vanilla RNN. Figures 9-11 show the comparison between expected air pollutant concentrations and actual concentrations in Kolkata. It can be observed from Table 3 that minimum   RMSE and MAPE for each city are attained at order α � 8/9, thus outperforming the traditional integer-order learning for the vanilla RNN. Moreover, the least MAPE of 4.70% is achieved in the prediction of CO, and thus, the proposed model is better for predicting the concentrations of CO as compared to other pollutants.

Conclusion
In this study, the fractional-order gradient has been used in the backpropagation of the vanilla RNN for the AQI prediction of fve Indian cities of all tiers. Te proposed approach has been used for the prediction of major air pollutants in tier I Kolkata city. Trough the results of prediction of AQI of multiple cities and prediction of air pollutants, it has been observed that the minimum error on predictions is achieved at a fractional order. Most cities achieve better results when the order is equal to 8/9. Te architecture of the vanilla RNN is much simpler than the structure/functioning of an LSTM, but the predictions made by RNNs with fractional gradient-based backpropagation are comparable and sometimes even better than LSTM with the integer-order gradient descent algorithm. Achieving lesser RMSE and MAPE with simpler architecture shows the efectiveness of fractional gradient over integer-order gradient descent. Te least MAPE value is found for Hyderabad by using the fractional gradient-based RNN as compared to other cities and models. In addition, the least MAPE is   achieved during predictions of CO concentrations in Kolkata. Terefore, the proposed model is better in the prediction of AQI values of Hyderabad as compared to other cities and CO concentrations in the air of Kolkata as compared to other air pollutants. From the results, it can be seen that RMSE is more for Kolkata and Patna. Patna is even amongst the world's top 10 most polluted cities, and particulate matter is increasing in Kolkata each year. Hence, the memory property of fractional derivatives can be well exploited with deep neural networks for dealing with more complex and dynamic data.

Future Scope
Tis study can be extended by predicting other air pollutants in the cities, and from there, AQI values can also be predicted. Using this strategy, major air pollutants in a city can be detected and stringent actions can be taken accordingly to prevent further damage. A portfolio of economic activities can be created considering the air quality of the particular city and also detecting the most afecting gases among them in the future. Te order of the derivative is chosen manually in this study, due to which results are evaluated only on a few values of order. Hence, there is a need to develop an adaptive method that automatically evaluates the optimal order for a particular city or a set of data. Methods like particle swarm optimization (PSO) and genetic algorithms can be employed for optimizing the order of diferentiation. Fractional gradient descent can be used with suitable architectures for diferent cities. Trough the results of predictions of various gases of the city, we can fnd a better way to develop in a sustainable way.

Data Availability
Te AQI dataset of fve cities for 2015-2020 is considered, which is publicly available at https://cpcb.nic.in, the ofcial portal of the Central Pollution Control Board, Government of India.

Conflicts of Interest
Te authors declare that they have no conficts of interest.