With the development of society along with an escalating population, the concerns regarding public health have cropped up. The quality of air becomes primary concern regarding constant increase in the number of vehicles and industrial development. With this concern, several indices have been proposed to indicate the pollutant concentrations. In this paper, we present a mathematical framework to formulate a Cumulative Index (CI) on the basis of an individual concentration of four major pollutants (SO2, NO2, PM2.5, and PM10). Further, a supervised learning algorithm based classifier is proposed. This classifier employs support vector machine (SVM) to classify air quality into two types, that is, good or harmful. The potential inputs for this classifier are the calculated values of CIs. The efficacy of the classifier is tested on the real data of three locations: Kolkata, Delhi, and Bhopal. It is observed that the classifier performs well to classify the quality of air.
Air pollution is a critical issue that influences the health of urban population. The problem becomes prominent with an exponential increase in the population and continuous industrial development. Many problems, namely, deforestation, waste management, solid waste disposal, and the release of toxic materials, contributed to and influenced the quality of air around us. With this concern, air quality assessment has become a potential area of research. The Air Quality Index is a numerical indicator used by agencies to assess the concentration of various pollutants in the air [
Major pollutants and their details.
Name of pollutant | Effect on humans | Details |
---|---|---|
SO2 | The presence of high levels of SO2 has an adverse effect on human health. The exposure to high level of SO2 may lead to bronchitis, heart issues, respiratory illness, and asthma. | (1) This gas is an outcome of oxidation of sulphur. |
|
||
NO2 | The presence of high levels of NO2 in the air leads to acid rains. This corrodes metal structures like bridges, destroys buildings, with harmful effect on aquatic life due to acid formation. | (1) This gas is an outcome of oxidation of nitrogen monoxide. |
|
||
SPM (PM2.5) (suspended particulate matter) | The presence of high levels of SPM leads to cardiovascular diseases and respiratory issues, namely bronchitis, asthma, and lung cancer. | (1) Particulate matter is a term used for solid particles and liquid droplets found in the air. |
|
||
RSPM (PM10) (respirable suspended particulate matter) | The presence of high levels of RSPM leads to cardiovascular diseases and respiratory issues, namely, bronchitis, asthma, and lung cancer [ |
(1) The term RSPM is a composition of dust particles, industrial waste, and combustions. |
In recent years, various methods have been developed by the researchers to assess air quality. A few of them are based on the calculation of indices. A rich survey of these indices was presented in [
The principal component based regression technique was proposed for the prediction of air quality in [
It is interesting to observe that the approaches employed in this direction are based on the accuracy of the forecasting engine performance. It is also worth mentioning here that the accuracy of the forecasting engine reduces when the span of forecast increases. Environmental problems and their solutions should be based on long term forecasting of the air quality. However, there is less work reported on the classification of air quality. The classification rules are based on the numerical values of the concentration of pollutants. In [
On the basis of critical literature review, the following are the research objectives for this study: To present a mathematical framework in the formulation of CI by employing the numerical values of Air Quality Indices of (SO2, NO2, PM2.5, and PM10) To present a comparative analysis of the proposed CI with existing air quality indices To establish an optimization routine and propose a solution based on Grey Wolf Optimization (GWO) algorithm by estimating the kernel parameters and bias for SVM To derive a linear discriminant function for SVM to classify the data on the basis of quality of air.
In the next section, the details of CI are incorporated. This section also discusses AQI. Section
In the past, several indices have been proposed to indicate the quality of air through numerical values. Pollution index has been proposed by Cannistraro and Ponterio [
Comparative study of Air Quality Indices.
Pollutants | Concentration |
AQI | IAPI [ |
AAQI [ |
CI |
---|---|---|---|---|---|
So2 | 2 | 2.5 | 255.28 | 261.81 | 851.85 |
No2 | 7 | 8.75 | |||
PM10 | 264 | 257.21 | |||
PM2.5 | 96 | 48 | |||
|
|||||
So2 | 2 | 2.5 | 255.28 | 111.23 | 455.91 |
No2 | 11 | 13.75 | |||
PM10 | 126 | 102.04 | |||
PM2.5 | 84 | 42 | |||
|
|||||
So2 | 8 | 10 | 160.94 | 221.33 | 702.69 |
No2 | 61 | 76.25 | |||
PM10 | 158 | 204.5 | |||
PM2.5 | 71 | 35.5 | |||
|
|||||
So2 | 2 | 2.5 | 255.28 | 271.68 | 931.21 |
No2 | 9 | 11.25 | |||
PM10 | 292 | 271.14 | |||
PM2.5 | 25 | 12.5 |
In this table, we have shown a few samples of pollutants concentrations from different cities of India [
We have experimented with the fractional values of
We propose a new Cumulative Index, which has the following attributes: It is easy to understand and follow the NAAQS (as it is based on AQI of the pollutants). It does not suffer from eclipsing and ambiguity. This index can be used as an alert system as it is based on valid air quality data monitored from various air quality measurement stations located in densely populated cities of India. This index is computationally efficient and puts less burden on computational engines
In the work five classes
Now, we calculate CI for different cases where
Let
Let
Let
In recent years application of SVMs in classification problems has increased due to its capability of segregation of datasets by the best hyperplane. SVMs are applied for multidimensional data classification [
Let the
On the basis of ambient air quality of India reported in [
Statistics of example dataset.
Pollutant ( |
Mean | Standard deviation |
---|---|---|
SO2 | 8.49 | 1.90 |
NO2 | 40.32 | 13.41 |
PM2.5 | 159.417 | 87.68 |
PM10 | 122.12 | 77.64 |
First this dataset has been employed to calculate CI, and then these values are employed as the input of the supervised learning model. The target data are the numerical values (1, −1) for good and harmful air quality, respectively. To train this model 1000 data points are considered; out of these data points 70% are used for training. The remaining 15-15% are used for testing and validation purpose. Figures
AQI-SO2.
AQI PM10.
AQI PM2.5.
AQI NO2.
Table
Calculation of CI.
AQI-So2 | AQI-No2 | AQI-PM10 | AQI-PM2.5 | CI |
---|---|---|---|---|
14.53 | 88.1 | 215.42 | 234.23 | 747.44 |
15.53 | 87.66 | 209.45 | 229.28 | 745.07 |
11.68 | 34.21 | 146.93 | 241.30 | 559.49 |
7.85 | 30.58 | 263.68 |
|
952.23 |
14.68 | 35.62 | 144 |
|
1127.87 |
8.58 | 31.08 | 61.22 | 219.38 | 475.35 |
8.18 | 36.46 | 93.87 | 99.5 | 199.99 |
6.62 | 31.31 |
|
|
1159.31 |
11.12 | 58.72 | 122.44 | 289.39 | 784.01 |
8.52 | 43.5 | 201.49 | 164.76 | 586.34 |
10.35 | 53.27 | 252.24 | 214.43 | 744.03 |
7.1 | 43.17 | 171.42 | 231.40 | 708.73 |
7.08 | 39.15 | 183.67 | 153.01 | 584.68 |
7.9 | 43.02 | 89.79 | 225.04 | 447.46 |
8.93 | 38.12 | 205.47 | 218.67 | 736.27 |
7.65 | 42.87 | 207.46 | 127.84 | 631.23 |
It is observed that the samples exhibited in Table Out of 1000 samples 16 extreme cases are exhibited in this table. As per the data of air quality of Nizamuddin, Delhi, these values are realistic as the range for pollutant PM2.5 falls within 14–300, and for PM10 it is 18–890 [ The index proposed in this section is computationally efficient and understandable as it has high numerical value when two or more pollutants concentrations are in harmful range. It is observed from the table that the value of this index is 1159.31 when the concentrations of PM10 and PM2.5 are in harmful range. It can be easily concluded that if the value of this index is higher than either single pollutant concentration is in harmful range (354.08) or two or more pollutants concentrations lie in harmful range (88.1, 215.42, and 234.23). For both of these cases values of index are 1159 and 747. These observations clearly indicate that a classification boundary can be drawn with the help of numerical values of CIs. Further, with this motivation the calculations of CIs for 1000 points are conducted to build supervised learning module.
With the help of this dataset, we propose a classifier based on SVM. As it is a known fact that original SVM is a two-class separator, we employed this model to segregate air quality into two types: good and harmful. The classification rule is derived by the fact that either one of the four pollutant concentrations must lie in range
A recent population based swarm intelligence technique, called Grey Wolf Optimizer, inspired by the nature of grey wolf is discussed here. This technique was proposed by Mirjalili et al. [
Alphas are the leaders of the pack. Alphas are decision makers regarding hunting, sleeping place, time to wake up, and so forth, and that decision is followed by the pack. Hence, alpha wolf is also known as the dominant wolf. Alpha is not essentially the strongest member but good in the organization and at discipline of the pack.
Beta comes at the second level on the hierarchy of grey wolves. Betas help alpha wolves in the decision making and the activities of the pack. Betas are the best candidates to get the position of alpha in case alpha wolves pass away or become very old. The beta supports alpha’s command throughout the pack.
Omega wolves have the lowest ranking in the pack. They always have to surrender to all other dominant wolves. Omega is not a main member, but, in a wolf pack, loss of an omega wolf causes the internal issues.
If a wolf does not fall in the above specified levels then he/she is delta wolf. Delta wolves have to submit before alpha and beta but they dominate omega. Scouts, elders, hunters, sentinel, and care takers belong to this group. According to Muro et al. [ Tracking, chasing, and approaching the prey Pursuing, encircling, and harassing the prey Attacking towards the prey.
In the mathematical modeling of social hierarchy of wolf, alpha (
The vectors
Position update in GWO [
As described in (
Figure The search process starts with creating a random population of grey wolves. Over the course of iterations alpha, beta, and delta wolves search for candidate solutions and in this work the solutions are in terms of choice of kernel and bias parameter and kernel parameters. GWO are based on the philosophy “follow the leader.” The shown hierarchy in Figure Encircling mechanism of GWO defines a circle shaped neighborhood around the solutions, which can be extended to higher dimensions. Exploration and exploitation are guaranteed by adaptive values of
Leadership hierarchy of grey wolves [
This section presents the classification results of air quality by the proposed supervised learning model. The efficacy of the proposed model is tested over the real data of the state of Madhya Pradesh (Bhopal), West Bengal (Kolkata), and Delhi. The historical data of ambient air quality is taken from [
Comparative analysis of concentration of pollutants.
The Central Pollution Control Board (CPCB), India, is executing a nationwide programme of ambient air quality monitoring known as National Air Quality Monitoring Programme (NAMP). The network consists of three hundred and forty-two (342) operating stations covering one hundred and twenty-seven (127) cities/towns in twenty-six (26) states and four
To determine the concentration of NO2, chemiluminescence technique is used to measure total oxides of nitrogen (
Classification results of air quality data of Bhopal, Madhya Pradesh.
AQISO2 | AQINO2 | AQIPM10 | AQIPM2.5 | CI | Classification results |
---|---|---|---|---|---|
|
|
|
|
837.71 |
|
|
|
|
|
200 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
727.23 |
|
|
|
|
|
|
|
|
|
|
|
455.91 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
781.35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Classification results of air quality data of Kolkata, West Bengal.
AQISO2 | AQINO2 | AQIPM10 | AQIPM2.5 | CI | Classification results |
---|---|---|---|---|---|
2.5 | 32.5 | 37 | 10.5 | 200 |
|
20 | 107 | 236.81 | 51 | 761.56 |
|
23.75 | 114 | 227.86 | 48.5 | 752.01 |
|
10 | 76.25 | 204.48 | 35.5 | 702.69 |
|
6.25 | 61.25 | 4.08 | 28.5 | 200 |
|
5 | 68.75 | 191.83 | 38.5 | 453.60 |
|
16.25 | 109 | 228.36 | 65.5 | 741.58 |
|
26.25 | 138 | 339.73 | 184.89 |
|
|
Classification results of air quality data of Delhi.
AQISO2 | AQINO2 | AQIPM10 | AQIPM2.5 | CI | Classification results |
---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
722.39 |
|
|
|
|
|
757.63 |
|
|
|
|
|
|
|
|
|
|
|
699.87 |
|
|
|
|
|
712.53 |
|
|
|
|
|
700.37 |
|
|
|
|
|
|
|
|
|
|
|
737.09 |
|
|
|
|
|
686.30 |
|
|
|
|
|
725.10 |
|
|
|
|
|
700.59 |
|
|
|
|
|
|
|
|
|
|
|
722.21 |
|
|
|
|
|
|
|
|
|
|
|
718.83 |
|
|
|
|
|
766.35 |
|
|
|
|
|
|
|
|
|
|
|
715.64 |
|
|
|
|
|
667.36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table
The highest concentration of PM10 and PM2.5 is the primary reason for higher values of CI in Table
AQI of PM10 Delhi.
CI Delhi.
Total 139 samples have been chosen for testing the supervised engine in case of Delhi. It can be observed from Figure
Air pollution is a major concern these days due to human health and public safety. Classification of the areas, which are harmful for respiratory system, has become a major motivation for this paper. A supervised learning model based on the example dataset has been prepared and tested over three different meteorological stations. The following are the conclusions: A mathematical framework for CI is proposed to employ the AQIs of different pollutants. This index is computationally efficient and understandable. Supervised learning model based on SVM has been developed with the help of example dataset and the values of CIs calculated. A two-class SVM has been designed. To design the SVM module GWO has been employed for parameter estimation with the aim of maximizing classification accuracy. The proposed architecture has been tested over the real data of Delhi, Bhopal, and Kolkata. It has been observed that the values of CIs and the classification results obtained from supervised learning model are aligned.
To develop a forecast engine for predicting the concentration on the basis of CI values lies within the scope of future work.
The authors declare that they have no conflicts of interest.
The authors acknowledge the support and encouragement of Swami Keshvanand Institute of Technology, Management & Gramothan, Jaipur, Rajasthan, India.