Prediction on the Seasonal Behavior of Hydrogen Sulfide Using a Neural Network Model

Models to predict seasonal hydrogen sulfide (H2S) concentrations were constructed using neural networks. To this end, two types of generalized regression neural networks and radial basis function networks are considered and optimized. The input data for H2S were collected from August 2005 to Fall 2006 from a huge industrial complex located in Ansan City, Korea. Three types of seasonal groupings were prepared and one optimized model is built for each dataset. These optimized models were then used for the analysis of the sensitivity and main effect of the parameters. H2S was noted to be very sensitive to rainfall during the spring and summer. In the autumn, its sensitivity showed a strong dependency on wind speed and pressure. Pressure was identified as the most influential parameter during the spring and summer. In the autumn, relative humidity overwhelmingly affected H2S. It was noted that H2S maintained an inverse relationship with a number of parameters (e.g., radiation, wind speed, or dew-point temperature). In contrast, it exhibited a declining trend with a decrease in pressure. An increase in radiation was likely to decrease during spring and summer, but the opposite trend was predicted for the autumn. The overall results of this study thus suggest that the behavior of H2S can be accounted for by a diverse combination of meteorological parameters across seasons.


INTRODUCTION
Hydrogen sulfide (H 2 S) is generally recognized as the key component of reduced sulfur compounds (RSCs) [1,2]. It is a colorless, poisonous gas with an odor of rotten eggs. A number of industrial sources, such as natural gas plants, have been identified along with the decomposition from sewage [3]. Exposure to high-level H 2 S can cause adverse health effects, such as respiratory stress or olfactory fatigue. There are a variety of meteorological parameters affecting the dispersion and distribution of H 2 S. They include temperature, relative humidity, dew point, pressure, radiation, direction and speed of wind, etc. A

Site Characteristics
The target study site, the city of Ansan, is well known for the large industrial complex situated on its western side, which deals mainly with metal works, petrochemicals, electronics, and printing. As development proceeded, large apartment buildings were built in the large residential area on the eastern side of the city. Along with this development, traffic consequently expanded. With the air predominantly blowing from the west (industrial complex) towards the east (residential area), complaints grew regarding the industrial processes as the main cause of malodor. In an effort to control odor problems in the city, the concentrations of some major odorant groups, including H 2 S, were monitored concurrently with such meteorological parameters as temperature, relative humidity, dew point, pressure, wind speed, wind direction, rainfall, and radiation. The analysis of wind direction data and/or its impact on H 2 S behavior was treated separately because of its nonscalar properties. The wind rose pattern, when examined using hourly wind direction data, was helpful to understand the common factors affecting the distribution of H 2 S at our study site. In addition, detailed evaluation of our datasets has been made in our recent publications to deal with the diurnal and seasonal variations of H 2 S, along with the analysis of its properties as an offensive odorant [16]. According to this study, H 2 S generally exhibited relative enhancement during the night-time period. Although such patterns were seen consistently throughout the seasons, it was most prominent during the summer. In contrast, other RSCs generally exhibited much weaker signals of temporal variations relative to H 2 S.

Instrument Settings and Operation
To continuously monitor several target compounds in ambient air, a monitoring station was established at the One-Four park district in Ansan, Korea. The station has been in operation since July 2005, and the concentration levels of H 2 S and several other offensive odorants were monitored at hourly intervals from August 2005 to Fall 2006 by an online GC system (with CP-Sill 5CB column pulsed flame photometric detector [PFPD]). In this study, we focus on the analysis of H 2 S, as it was detected most abundantly during the study period.

NEURAL NETWORKS
Two types of neural networks were used to build prediction models of H 2 S. Their fundamentals are briefly introduced.

Generalized Regression Neural Network
A schematic of the generalized regression neural network (GRNN) [17] is depicted in Fig. 1A. As shown in Fig. 1A, the GRNN consists of four layers, including the input layer, pattern layer, summation layer, and output layer. Each input unit in the input layer corresponds to an individual process parameter. The input layer is fully connected to the second, pattern layer, where each unit represents a training pattern and its output is a measure of the distance of the input pattern from the stored patterns. Each pattern layer unit is connected to the two neurons in the summation layer: S-and D-summation neuron. The former computes the sum of the weighted outputs of the pattern layer, while the latter does the unweighted outputs of the pattern neurons. The connection weight between the ith neuron in the pattern layer and the S-summation neuron is y i , the target output value corresponding to the ith input pattern. For the Dsummation neuron, the connection weight is unity. The output layer merely divides the output of each Ssummation neuron by that of each D-summation neuron, yielding the predicted value to an unknown input vector x as where n indicates the number of training patterns, and the gaussian D function in Eq. 1 is defined as: where p indicates the number of elements of an input vector. The x j and x j i represent the jth element of x and x i , respectively. The ζ is generally referred to as the spread, whose optimal value is experimentally determined. It should be noted that in conventional GRNN applications, all spreads for the units in the pattern layer are identical. Despite the simplification of the training process, this may limit the improvement of the GRNN prediction performance. The limitation might be circumvented by adopting a multiparameterization of training factors.

Radial Basis Function Network
Architecture of the radial basis function network (RBFN) [18] is sketched in Fig. 1B. The RBFN consists of three layers: input, pattern, and output. Each unit in the pattern layer, called pattern unit, calculates its activation using a radial basis function. The activation (o j ) is computed as: where x is the input vector, and µ j and σ j represent the center and width of a receptive field. The receptive fields are the areas in the input space that activate local pattern units. In Fig. 1B, the RBFN is composed of three receptive fields in the input layer. Each receptive field is composed of a different number of input training patterns. The widths determine the radii of the areas around the centers, in which the activations from the pattern units are significant. Training the RBFN consists of two separate stages. In the first stage, the weights between the input and pattern layers are determined given a specific number of pattern units (i.e., clusters). For this, an unsupervised, clustering algorithm called the k-means algorithm is typically used. In this algorithm, k training input patterns are first sampled from n training input patterns. These k vectors are regarded as the initial center vectors for the k clusters. For each of the remaining (n-k) training input patterns, Euclidean distance to each center vector is calculated. A training input pattern is then classified into a particular cluster, to which the Euclidean distance is minimized. The center vector of this cluster chosen is subsequently updated by calculating the mean of the center vector and the training input pattern classified. In this way, other training patterns are classified while continuously updating k center vectors. The first stage is completed by assigning each of all the training input patterns to a specific cluster by comparing Euclidean distances with respect to the final updates of the center vectors. In the second stage, the weights between the pattern and output layers are determined in a supervised way. A RBFN error to be minimized is defined as: where y j and d j indicate the prediction from the jth output unit and the actual measurement given to that neuron. The E defined in Eq. 4 is typically minimized by the gradient descent algorithm, and the resulting weight update rule called the delta rule is expressed as: where W ij represents the connection weight between the ith pattern unit and jth output neuron; o i indicates the output from the ith pattern unit. The remaining α is the learning rate and it was set to 0.1 in this study. The initial weights are randomly distributed within a predefined range of -1 to 1.

Optimization of Model Prediction Performance
Using the GRNN, the H 2 S data are modeled. The range of spreads selected is 0.1 to 1.0. The spread was increased with an increment of 0.1 and one model was constructed at each spread. A total of 10 models were developed. Rather than developing one single model for the whole data, seasonal models were built separately for spring, summer, and autumn data. Each set of seasonal data was divided into three subdatasets for training, validation, and testing the models. The total number of patterns generated from the spring dataset is 3571 and this was divided into the same 1197 patterns comprising each of the three subdatasets. The total patterns of 3516 and 3501 corresponding to the summer and autumn datasets, respectively, were grouped into the three subdatasets of equal size of 1172 and 1167. The performance of the model is quantified by the root-mean square error (RMSE) and the result is shown in Table 1. In Table  1  The same H 2 S data are modeled using the RBFN. The performance of the RBFN model was optimized by experimentally increasing the number of cluster units from 2 to 10. The results are shown in Table 2. For all RBFN models, the training error appears to slightly increase with an increase in the cluster unit. This is true for other types of error. For the spring model, the smallest testing error (0.348) is obtained for four types of cluster units from 7 to 10. The testing error of an optimized model for the summer data is 0.653 and this is obtained for the cluster units more than 5. For the remaining autumn data, the optimized model occurs for the cluster units of 9 and 10, and the corresponding error is the same at 0.377. As the optimized errors are compared with those of the GRNN models, the latter model shows an improvement of about 2.58 and 4.24% for the spring and autumn data, respectively. Better prediction of the GRNN models over the RBFN models is demonstrated. For the summer data, the performances of the optimized models are comparable. The optimized GRNN models are then used for further analysis.

Analysis of Parametric Sensitivity and Main Effects
The impact of a parameter can be quantified in terms of its average sensitivity and main effect. We propose an average sensitivity that is calculated for all sampled points of equal distance. This is simply computed by summing all the sensitivities for all the sampled points and subsequently dividing the sum by the number of sampled points. It is noted that the individual sensitivity has already been defined [19].
Here, the distance between two sample points is 0.01 and the total number of sampled points is 200. The average sensitivity is defined as: where x is a vector of environmental parameters maintained at their medium values, Δx i is an incremental change of x i , and f is the functionality that the neural network learns. The sensitivity calculated from the optimized seasonal models is shown in Fig. 2A-C. The acronyms of T, RH, DP, P, WS, RF, and R are used to represent the temperature, relative humidity, dew-point temperature, pressure, wind speed, rainfall, and radiation, respectively. Fig. 2A reveals that during the spring, H 2 S is very sensitive to the variation of RF and R. A negative sign indicates that the sensitivity is applicable only where H 2 S decreases with an increase in either factor. To the variation of P and DP, H 2 S is not sensitive. As illustrated in Fig. 2B, H 2 S shows the most pronounced sensitivity to RF during the summer period. An insensitivity to the variation of DP, P, and WS is noted. RF is common in both periods of spring and summer. Fig. 2C shows that H 2 S is very sensitive to the variation of WS and P. Interestingly, WS exerts a higher sensitivity as compared to others. Another interesting feature is that they have an opposite sign. In other words, H 2 S is sensitive to WS as it is increased and decreased, respectively.
To investigate the main effect of each parameter, the formula expressed in Eq. 6 was slightly modified not to contain the normalization factor. The main effects calculated using the modified formula are shown in Fig. 3A-C. As exhibited in Fig. 3A, H 2 S is the most strongly affected by P during the spring period. The parameter P is seen to exert the highest effect during the summer period. This contrasts with its sensitivity. It is noted that the influential parameter P is common for the spring and summer periods. During the autumn, RH is identified as the most influential factor. Its sensitivity is even relatively high.

Model-Based Interpretations
From the optimized seasonal models, main effects on H 2 S by the parameters are explored by plotting three-dimensional pictures. The other parameters except the concerned two were set to their medium values of the experimental data. Fig. 4 shows the main effect on H 2 S as a function of DP and P during the spring period. An increase in P increases H 2 S. The positive effect of P was already noted from the main effect analysis. Its effect is highest at a DP of 0°C. As P increases either higher or lower DP than 0°C, the effect becomes smaller. In general, the effect distribution seems to be symmetric relative to 0°C. An increase in H 2 S with DP in the range -20-0°C is consistent with the SPM data [4]. Fig. 5 shows the effect of RF and R on H 2 S. Opposite effects of RF are evident in Fig. 5 depending on its amount. A decrease in H 2 S with a decrease in RF is predicted in the first half of the total range. The amount of R is also seen to greatly affect the effect of R. Fig. 5 predicts that in the first half range, varying R causes a little variation in H 2 S. However, an increase in R in the remaining half range at relatively higher RF drastically decreases H 2 S. Therefore, Fig. 5 illustrates that the amount of RF plays a significant role in determining the effect of R on H 2 S. Fig. 6 shows the effect of WS and RF. As WS increases, H 2 S decreases and the decrease becomes noticeably large at lower RF. Although there is one turnover point, increasing RF generally decreases H 2 S irrespective of WS. This indicates that the effect of RF is not sensitive to WS. This is partly supported by it less sensitivity as shown in Fig. 2A. Fig. 7 shows a plot of H 2 S as a function of DP and P produced from the summer model. A decrease in P decreases H 2 S. The decrease becomes drastic at lower DP. In contrast, H 2 S is decreased with an increase in DP. This is similarly predicted from the spring model as shown in Fig. 4. In this sense, both DP and P exert a conflicting impact on H 2 S. As shown in Fig. 8, an increase of RF in the first half range seems to cause little variation in H 2 S. In contrast, a further increase in the remaining half range considerably decreases H 2 S. The effect of R is noticeably large at a lower RF. A similar effect has been predicted from the spring model at a relatively higher RF. In Fig. 9, the effect of RF is very similar to that of the spring model. The effect of increasing WS is seen to decrease H 2 S as it increases. This is similarly predicted from the spring model.   Plots of H 2 S prepared from the autumn model are shown in Figs. 10 and 11. As shown in Fig. 10, the effect of P is similar to those for the spring and summer models. The effect of DP is similar to that for the summer model. With respect to the spring model, a similarity is observed only at the range of positive DPs. For the negative range, the DP effects from the spring and autumn are opposite. In this sense, conflicting DP effects are present only during the spring season. In Fig.11, H 2 S is likely to increase with an increase of R. This is similar to that predicted from the summer model at the lower RF, but opposite to that predicted from the spring model at the higher RF.

CONCLUSIONS
In this study, neural network models were developed and applied to the seasonally divided data groups of a well-known odorant, H 2 S. Both GRNN and RBFN were optimized as a function of their training factor and the former showed better prediction accuracy. From the optimized GRNN model, analysis of average sensitivity and main effect was conducted. The sensitivity analysis showed different sensitivity of H 2 S on the parameters across seasons. During the spring and summer periods, H 2 S was found to be the most sensitive to the variation of RF. The main effect analysis revealed that P is the most influential parameter for the period of spring and summer. For the autumn period, the main effect of RH was highest. The optimized models provided useful qualitative information on the parameter effects and they generally showed similar effects without regard to the seasons. H 2 S was noted to decrease with an increase either in R, WS, or DP in its positive range as well as with a decrease in P. It is not certain that the reporting tendencies are applicable to any other seasonal data due to the adoption of one year of seasonal data. Nevertheless, each sensor variable is expected to cover a wide range of variables that might occur during each season. From this perspective, the constructed model is likely to provide useful predictions to quantify and predict the effect of the environmental variables. The constructed models can be easily updated by retraining them with newly updated data, thereby providing timely and accurate prediction of environmental characteristics.