A Study of Probability Models in Monitoring Environmental Pollution in Nigeria

In Lagos State, Nigeria, pollutant emissions were monitored across the state to detect any significant change which may cause harm to human health and the environment at large. In this research, three theoretical distributions, Weibull, lognormal, and gamma distributions, were examined on the carbon monoxide observations to determine the best fit. The characteristics of the pollutant observation were established and the probabilities of exceeding the Lagos State Environmental Protection Agency (LASEPA) and the Federal Environmental Protection Agency (FEPA) acceptable limits have been successfully predicted. Increase in the use of vehicles and increase in the establishment of industries have been found not to contribute significantly to the high level of carbon monoxide concentration in Lagos State for the period studied.


Introduction
It is common knowledge that population growth and globalization have become the major drivers of pollution.Out of the various forms of pollution, a large number of studies that investigated the relationship between air quality and health effects cited air pollution as the major environmental issue of concern to the community.Increase in hospitalization, emergency room attendance, and decreased lung function have been associated with the following common air pollutants: carbon monoxide (CO), nitrogen oxides (NO  ), inhalable particles (measured as PM 10 ), photochemical oxidants (measured as ozone), and sulphur dioxide SO 2 .
Air pollution is defined as the presence in the outdoor atmosphere of one or more pollutants in such quantities and of such duration that may tend to be injurious to human, plant, or animal life or property or which may unreasonably interfere with the comfortable enjoyment of life or property or the conduct of business [1,2].
In this research work, emphasis will be on one of these criteria pollutants which is carbon monoxide because of the major threats it poses to human health.
Carbon monoxide is a colourless, odourless, and highly poisonous gas produced in large quantities as a result of incomplete combustion of fossil fuels.It is known that the main source of carbon monoxide is from motor vehicle exhaust (vehicular emission); about two-thirds of the pollutant emissions come from transportation sources, while other sources include industrial processes and open burning activities [3,4].
The natural concentration of carbon monoxide in air is around 0.2 ppm, and that amount is not harmful to humans, while exposure to the pollutant emission at 100 ppm or greater can be dangerous to human health.Carbon monoxide endangers humans specifically by its tendency to combine with haemoglobin in the blood.Their combination produces carboxyl haemoglobin (COHB), thus reducing the capacity of the blood to carry oxygen [5].The acute effects produced by exposure to carbon monoxide (in parts per million) are given in Table 1.
Probability models have been applied successfully in many physical phenomena such as wind speed, rainfall, river discharges, and air quality.It has been applied to fit the data of vehicular emission in Chennai, India, for predicting the concentration of carbon monoxide in the ambient atmosphere [6,7].In their research, ten standard probability models were fitted to the data and goodness of fit was assessed using Kolmogorov-Smirnov test and Anderson-Darling test.When the parent probability distribution of air pollutants is correctly chosen, the specific distribution can be used to predict the mean concentration and probability of exceeding a critical concentration [5,8].
The objectives of this paper are to fit the three probability distributions afore-mentioned to the concentration of carbon monoxide in Lagos State, Nigeria, to determine the "best" distribution to describe the data, and to establish the distribution of carbon monoxide concentration with a view of predicting the probability that the concentration would exceed a critical or an acceptable concentration.
To this effect, observations on the pollutant concentration were collected (as available) between the years 2004 and 2010.As vehicular exhaust (emission) is the major source of carbon monoxide, information was also collected on the number of newly registered vehicles and the number of newly registered industries in Lagos State between the years 2004 and 2010.

Methodology
Weibull Distribution.Let  denote a random variable; the two-parameter Weibull density function [9] is given by where  is the shape parameter and  is the scale parameter.
Lognormal Distribution.A random variable  is log-normally distributed if ln() is normally distributed.Its probability density function [10] is given by where  is the location parameter and as well the mean of the distribution and  is the scale parameter and as well the standard deviation of the distribution.
Gamma Distribution.Let  denote a random variable, the two parameter gamma density function [11] with parameters  and  is given by where  is the shape parameter and  is the scale parameter.

Methods of Parameter Estimation.
The parameters of the distributions can be estimated using various methods like the method of maximum likelihood estimation (MLE) and method of moments (MOM) among others.In this paper, the method of likelihood estimation will be used because it is commonly used and it always gives a minimum variance estimate of parameters.
The MLE is widely and commonly used because it has many desirable properties; the maximum likelihood estimator is consistent, asymptotically normal, and asymptotically efficient.Let  1 ,  2 , . . .,   be a random sample of size "" drawn from a p.d.f, (, ), where  is an unknown parameter.The Likelihood function of this random sample is the joint density function of the "" random variables and it is a function of the unknown parameter [12].Thus,  = ∏  =1 (  ; ) is the likelihood function.The maximum likelihood estimator (MLE) of , say θ, is the value of  that maximizes  or, equivalently, the logarithm of .The MLE of  is a solution of According to [12], the maximum likelihood estimators α and β of the shape and scale parameters of Weibull distribution are the solution of the simultaneous equations For lognormal distribution, the maximum likelihood estimates for  and  2 are given by Lastly, the maximum likelihood estimators α and β for gamma distribution are solutions of the simultaneous equations where (α) is a digamma function with an argument α defined as 2.2.Weighted Least Squares.Weighted least squares is an efficient method that makes good use of small data sets.
The main advantage that WLS enjoys over other methods is the ability to handle regression situations in which the data points are of varying quality.If the standard deviation of the random errors in the data is not constant across all levels of the explanatory variables, using WLS with weights that are inversely proportional to the variance at each level of the explanatory variables yields the most precise parameter estimates possible.Consider Since the sample sizes   of the data also varies, the weight used in this research work is The WLS estimate of  is given by The matrix of  is given by Fitting this model is equivalent to minimizing

Test of Goodness of Fit.
In order to verify the goodness of fit of the models to the carbon monoxide data observations, the Kolmogorov-Smirnov (K-S) and Anderson-Darling (A-D) tests are used.The lower the value of these statistics is, the closer the fitted distribution appears to match the data.The hypothesis for the tests is given as follows: H o : the data follow a specified distribution versus H 1 : the data do not follow the specified distribution.
Given "" ordered data points  1 ,  2 , . . .,   , the test statistics for the Kolmogorov-Smirnov test are given as The test statistics for Anderson-Darling are given by where F(⋅) is the CDF of the continuous distribution being tested and   are the ordered data.

Probability of Exceedance.
The probability that carbon monoxide observations would exceed a specified standard or limit is based on the distribution that has been chosen as the best distribution for Carbon monoxide concentration in Lagos State for the period studied.
Mathematically, the probability of exceeding a critical concentration [13,14] is given by

Summary of the Data Collected
In this section, we provide and describe the information gathered on carbon monoxide concentration, number of newly registered vehicles, and industries.

Data on Carbon Monoxide Concentration.
This section provides information on the secondary data collected on the concentration of carbon monoxide measured in parts per million (ppm) in Lagos State (as available) from August 2004 to August 2010.The data was collected as daily data but we could only gather 412 data points for the years considered (e.g., there was no record at all for the year 2007, as shown in Table 7).The data has been summarized in Table 2 giving the minimum and maximum values of the measurement recorded, the standard deviation, mean, and the mode of the observations.The diagrammatic representation of the data on carbon monoxide concentration (ppm) is given in Figure 1.It can be deduced from Figure 1 that the information on the carbon monoxide concentration (as collected) is positively skewed and mode occurs at 0 ppm.This justifies our reason for using positively skewed theoretical distributions to model the data set in this paper.

Data on Registered Vehicles.
In this section, we provide information on the number of vehicles (trucks, buses, and cars) that were registered in Lagos State each year between the years 2004 and 2010.The data is provided in Table 3.
The graphical representation of the number of newly registered vehicles is given in Figure 2.
It can be observed from Figure 2 that there was a little decline in the number of registered vehicles in 2006 and a sharp increase in year 2007 and the highest registration was recorded in year 2008.4 shows the summary of the information collected on the number of newly registered industries (manufacturing industries) in Lagos State between 2004 and August, 2010.It should be noted that there are more industries in Lagos State apart from the ones captured in this paper but we only consider manufacturing industries that are registered.

Data on Registered Industries. Table
The graphical representation of the number of newly registered industries is given in Figure 3.
It can be noticed in Figure 3 that only few manufacturing industries are registered.Besides, the records keep increasing from year 2004 to year 2010 except in 2009 where there was a little decline.

Analysis and Results
The parameters of the distributions under study (Weibull, lognormal, and gamma) were estimated by fitting the distributions to the data of carbon monoxide concentration collected using Easy-Fit statistical package.

Test of Goodness of Fit.
In an attempt to choose the "best" probability model to describe the concentration of carbon monoxide in Lagos State for the period studied, Kolmogorov-Smirnov goodness of fit test was conducted.The summary of the analysis is given in Table 6.
The graph for the Cumulative Density Function (CDF) of the three distributions is shown in Figure 4.
This graph shows how well Weibull, lognormal, and gamma distributions fit the data.It can be seen that the CDF of the gamma distribution is closer to the true CDF of the carbon monoxide concentration.

Probability of Exceeding Critical Concentrations.
Since gamma distribution fits the data better than the remaining fitted distributions, the probability that the carbon monoxide concentration would exceed both the Lagos State Environmental Protection Agency (LASEPA) standard (5 ppm) and the Federal Environmental Protection Agency (FEPA) standard (10 ppm) will be calculated based on the cumulative density function (CDF) of gamma distribution.
The probability density function of a gamma distribution with parameters  and  is given by And the cumulative density function (CDF) is From Table 5, the shape parameter  = 0.1463, the scale parameter  = 87.378,and ∴  ( ≤ 5) = 0.699181.

Linear Regression Modelling.
The mean yearly carbon monoxide concentration  (in ppm) will be regressed on the number of newly registered vehicles ( 1 ) and the number of newly registered industries ( 2 ).There was no data available for carbon monoxide concentration in the year 2007; therefore, the year 2007 is automatically ignored in the regression analysis.Table 7 shows the summary of the data used for the regression analysis.The regression equation is Equation ( 26) is interpreted as follows.There will be a decrease of 0.000276 in  for a unit change in  1 when variable  2 is held fixed and there will be an increase of 0.049 in  for a unit change in  2 when variable  1 is held fixed.Decision.We do not reject   since 0.171 is not less than 0.05.

Analysis of Variance (ANOVA)
Inference.From Table 8, considering the respective  values for the parameters  1 and  2 , it means that the regression parameters are not significantly different from zero with an R-Sq = 69.2% and R-Sq (adjusted) = 48.7%.Also, from Table 9, based on the  value (0.171), we conclude that the regression model is not significant at ( = 0.05).

Conclusion
In this paper, we have been able to establish (based on the data collected) that the distribution of the carbon monoxide observations in Lagos State between the periods studied is positively skewed as shown in Figure 1.Gamma distribution is considered the best distribution for modeling carbon monoxide concentration in Lagos State as confirmed by the Kolmogorov-Smirnov and Anderson-Darling tests in Table 6.The carbon monoxide concentration in Lagos State exceeds the Lagos State Environmental Protection Agency (LASEPA) and the Federal Environmental Protection Agency (FEPA) standards with probabilities 0.300819 and 0.231621, respectively.Increase in the use of vehicles and increase in the establishment of industries in Lagos State do not contribute significantly to the high carbon monoxide concentration levels.Perhaps, further researches could be focused on the age of the car engines, quality of the fuel used for vehicles and machineries, then the smoking activities in the state.

Figure 1 :
Figure 1: Histogram of data on carbon monoxide concentration.

Figure 2 :
Figure 2: Bar chart of the number of newly registered vehicles.

Figure 3 :
Figure 3: Bar chart of the number of newly registered industries.

Figure 4 :
Figure 4: Graph showing the cumulative distribution function of the fitted distributions.

Table 2 :
Carbon monoxide concentration (ppm).Source of data: Lagos State Environmental Protection Agency (LASEPA).Using MINITAB statistical package, regressing  on both variable  1 and  2 gives the results shown in Table8.

Table 4 :
Estimated number of newly registered industries.Source:

Table 5 :
Parameter estimates of the fitted probability models.

Table 6 :
Fitted distribution type and goodness of fit statistics.
*Note: * denotes the best fit.

Table 8 :
Table of results from regression analysis.

Table 9 :
Analysis of variance table.