Orthogonal Signal Correction to Improve Stability Regression Model in Gas Sensor Systems

Metal oxide sensors are the most often used in electronic nose devices because of their high sensitivity, long lifetime, and low cost. However, these sensors suffer from a lack of response stability making the electronic nose systems useless in industrial applications. The sensor instabilities are particularly caused by incomplete recovery process producing gradual drifts in the sensor responses. This paper focuses on a signal processing method combining baseline manipulation and orthogonal signal correction technique in order to reduce effectively the drift impact from the sensor outputs.The proposed signal processing is explored using experimental data obtained from a gas sensor array responding to various concentrations of pine essential oil vapors. Partial Least Squaremethod is then applied on the corrected dataset to establish a regression model for the estimation of gas concentration. In this work, we show essentially how our drift correction approach can help to improve significantly the stability of the regression model, while ensuring good accuracy.


Introduction
Gardner defines the electronic nose (E-nose) as "an instrument, which comprises an array of electronic chemical sensors with partial specificity and an appropriate patternrecognition system, capable of recognizing simple or complex odors" [1].The metal oxide sensors (MOX) are the most used in this instrument because they are very sensitive to many gases, are commercially available, have a long lifetime, and have low cost [2].However, these sensors show a lack of reproducibility (or instability) which limits the translation of laboratory results to industrial applications [3].Instability of sensor responses can be due to several problems.
(1) Drifts: sensor response signals always tend to show a small variation even if the E-nose is exposed to same gas and concentration under constant environmental conditions.Many reasons can explain this variation: (i) sensor aging due to thermomechanical fatigue after successive gas expositions [4] and (ii) sensor poisoning because of the exposure to high concentration or to an aggressive chemical or silicone vapors [5].(2) Environmental disturbances: the humidity variation and temperature or pressure fluctuations change also the sensor responses [6].(3) Sampling condition: the mechanism of MOX technology, which is the exchange of oxygen molecules between gas mixture and metal film, makes the acquisition cycles very long, since in many applications the steady-state responses of sensors are never reached [7].
A lot of correction methods have been investigated to improve sensor response stabilities; they are based on different approaches: univariate or multivariate methods.In univariate technique the correction is applied on each sensor individually.Among these methods the baseline manipulation, largely used in industry [8], consists of transforming a sensor response with use of the initial response value.Three kinds of transformation can be made to correct the baseline of sensor signals: differential, relative, and fractional corrections.Generally baseline manipulation is used as preprocessing of the sensor output [9].Filtering techniques like Fourier bandpass filter, moving median filter, or discrete wavelet transform have been also utilized by many researchers in this field to remove drift effects from measurements [10,11].Among the above-mentioned methods, discrete wavelet transform is more flexible because it can analyze the signal at different frequency bands with different resolutions.In case of E-nose measurements, since the drift effects are correlated, the multivariate methods allow capturing more information from all the sensors permitting modeling more complex or nonlinear drift effects [12].In the literature around the E-nose research different multivariate methods can be found.For example, adaptive neural network methods like Self-Organizing Maps [13] show good performances; but they are limited to gas classification applications and in case of gas quantification it would be hard to obtain good results [14].
When gas sensors are exposed to the same gas under the same sampling and environmental conditions, any changes in the sensor response are related essentially to drift.In order to reduce the variance of this drift that follows one direction, the multivariate linear correction methods based on partial least square PLS or Principal Component Analysis, PCA, constitute the best approach [6,15,16].Among these methods orthogonal signal correction (OSC) has been chosen because different studies have proven that this method is the most efficient [16][17][18].
The orthogonal signal correction (OSC) was proposed firstly by Wold for NIR spectra correction [19], and then several algorithms were published to improve its performance.The main idea of OSC technique is focused on removing the variance that is not correlated to the variable to estimate.
In this study, we combine baseline manipulation with OSC technique in order to remove the drift effects on a MOX sensor array.Then PLS is used to model the behavior of our gas sensors responding to different concentrations of pine essential oil vapors (EO) diluted in pure air.The combination of this correction approach gives a good quantification of essential oil vapors by using only a few components of PLS in the modeling.

Materials and Methods
The data used in this work are obtained from a home-made experimental equipment mainly composed of a gas sensor cell and an EO vapor diffuser (Figure 1).The sensor cell contains seven metal oxide gas sensors (TGS882, TGS2620, SP31, SPAQ1, MQ3, and MQ138 produced by Figaro, FIS, and Hanowei companies) and the diffuser unit uses an air bubbling system in liquid EO.So, the concentration of the EO in pure air is controlled by the flow rates of two mass flow controllers, MFC1 and MFC2.The total flow rate (MFC1 + MFC2) through the sensor cell is maintained constant, which also allows generating different EO concentrations by varying the percentage of the MFC1 flow rate over the total flow rate.The sensors cell and liquid EO bottle are placed in Plexiglas chamber to keep the sensors in constant climatic conditions [20].

Measurement Protocol.
According to our previous study [20], we performed series of measurements on different EO concentrations.Each measurement comprises an EO exposure phase followed by a sensor cleaning phase through dry synthetic air.The response type of MOX sensors during this measurement cycle is presented in Figure 1(b): the sensor conductance increases during the gas exposition and then decreases during the cleaning process through its previous baseline.Gas exposition time is fixed at 75 seconds to provide a quantifiable response of the sensors, and cleaning time is set to 350 s to permit an acceptable recovery of the sensor sensitive element.For the learning measurements, we have created nine constant concentrations in the range of 0.5% to 4.5%, producing pleasant odor for aromatherapy, with a step of 0.5%.
Sensor outputs are digitalized, filtered, and then recorded every second in terms of sensor conductance values.We have opted to express the sensor response in conductance rather than resistance because this parameter is more efficient when gas concentrations identification with n-type semiconductor metal oxide sensors is demanded [1].
Forty measurements have been realized for each of the nine EO concentrations randomly selected throughout the experiments.Each sensor output is characterized by 425 recorded points (75 points during gas exposition, 350 points at recovery process).Data are arranged on a dataset formed from 2975 columns {425 data * 7 sensors} and 360 rows {40 measurements * 9 concentrations}.

Instability of the Measurements.
To illustrate the instability of the temporal responses, we have grouped on the same axis the signals for 1, 2, 3, and 4% EO concentration of each sensor (Figure 2).We can easily reach the conclusion of an important disparity of the sensor responses obtained for the same EO concentration and the same conditions.This instability can be explained by the sensor drift which is mainly due to incomplete recovery process.In Figure 3, we have reported the temporal responses of TGS2620 sensor during several successive measurements at different concentrations.In this figure, the observation of the different baselines (sensor conductance at the beginning of a measurement) shows that the baseline value depends highly on the gas concentration used in the previous measurement.As shown in Figure 3, starting a measurement at 3% EO, the sensor baseline is 26 S when following a measurement at 2% and 35 S if the previous experiment was at 4% EO.Moreover, Figure 3 highlights that this drift alters not only the baseline but also the sensitivity of the sensors.

Journal of Sensors
This elementary comparison confirms the nonefficiency of the sensor recovery process causing noticeable drifts.In fact, these drifts will be greater in case of real-time and continuous measurement with an E-nose.

Drift Correction and Calibration
We have grouped all the sensor responses () in a dataset which is represented by matrix of 396 rows and 2975 columns, where rows correspond to the observations and columns to the corrected temporal response of sensors.Concentrations of the various experiments are grouped in vector  (396).

Drift Correction.
The reliability of E-nose results depends strongly on how the sensor outputs are treated particularly to minimize noises and drift affects (shown in previous section).So, the signal processing has a key role in E-nose performance and many studies were already done on this subject.

Baseline Manipulation.
Baseline manipulations are very often cited in literature to remove the drift effects on sensor responses [9,10].Usually, baseline corrections are based on the initial response of a sensor.In this work, we have opted for sensor conductance ( final ) obtained at the end of the precedent cleaning process [21] and fractional correction on temporal response (), as shown in This manipulation is made for temporal signals of each sensor.We observe only a very light amelioration concerning the disparity of the sensor responses obtained at same measurement condition and gas; so the concentration discrimination is still not conceivable (feasible).
For further development of the drift correction, the dataset named  containing corrected data after baseline manipulation is used.

Orthogonal Signal Correction (OSC).
Prior to applying the regression modeling, we followed the baseline manipulation of OSC technique to reduce more efficiently the drift effects from the sensor signals.We show that the use of this correction technique improves the calibration processes making it reliable and stable.
The main objective of the OSC technique is to remove the variance which is not correlated to the variation of concentration .This procedure is done by the suppression of nonrelevant information of gas response in matrix .So, only information orthogonal to  is removed, and this condition will guarantee that the information useful for the calibration is largely saved.
The algorithm for OSC [19] is based on the following steps: (i) Use Principal Component Analysis (PCA) to decompose  into scores .
(ii) Orthogonalize the first score  1 (first component containing maximum of information) to  in order to obtain new score   as follows: where   is the transpose of .
(vi) The new corrected matrix  corr is given by In order to test the benefits of the OSC technique, the new dataset ( corr ) composed of 396 observations is divided into "training set" which contains 75% of observations and the "test set" (25% of observations).Both of these datasets cover the concentration range.For each gas sensor, applying OSC technique makes the responses at the same EO concentrations more similar.Figure 4 shows the temporal responses of the test set after OSC treatment for the 1, 2, 3, and 4% EO concentrations.Comparing Figures 2 and 4, we can observe that the dispersion of temporal responses at same concentration is significantly attenuated.Figure 4 also highlights how it could be easier to discriminate the different EO concentrations after temporal signal correction with OSC.
For better perception of the OSC impact on gas quantification, we have plotted in Figure 5 PCA scores of all our data before and after the correction with OSC technique.PCA plot of the dataset (before OSC) confirms the impossibility of concentration discrimination, but we can clearly observe on Figure 5(b) the improvement brought by OSC correction which permits a successful separation of all the EO concentrations, even with a very small step (0.5%).

Calibration.
As the predictors in our dataset are highly correlated and their number is very large (number of columns) by comparison with the number of observations (number of lines), the use of multiple linear regression (MLR) model is not suitable because of the existing multicollinearity [19].To deal with the multicollinearity problem, regression model should be performed on independent variables.So, we have utilized the partial least square (PLS) analysis to find independent components that can explain as much as possible the covariance between  and .These components can be used for regression modeling and guaranteed a good prediction [22].However regression methods can suffer from the overfitting or the underfitting: if we take a large number of components, the model shows poor performance for the recognition of new data, and the use of a small number of components may not be sufficient to reach a good precision.Then, the number of components should be optimized.
We have performed the calibration of our E-nose by using PLS regression as recognition method.In this aim, dataset Figure 4: Temporal responses at 1 to 4% EO concentrations after using OSC technique for each of the seven gas sensors (TGS880, TGS822, TGS2620, MQ3, MQ138, SP31, and SPAQ1).In Figure 6 we illustrate the value of the RMSEs versus of number of components in the case of performing PLS on raw data or performing PLS on OSC corrected data.We can observe that both PLS and PLS + OSC give a good accuracy but PLS uses 18 components to reach RMSE = 0.1% while OSC + PLS needs only one component.

Feature Selection.
To investigate the stability of the pattern, we compare between the variability of regression coefficients obtained by PLS or OSC + PLS.Each sensor output is characterized by 425 points; hence the number of regression coefficient is (425 * 7 + 1) making the comparison extremely challenging.To reduce this dimensionality we decide to use the average of the 425 points of a sensor signal as characteristic feature [23].So we only have (7 + 1) coefficients to compare.

Stability Test.
The coefficient of variation (CV) is calculated for all the regression coefficients in the two cases (PLS analysis or OSC + PLS analysis) for seven cycles.In the first cycle we started to build a model using one component and we added one more component in each cycle until we used all of them (7 components) in the final cycle.Dataset was divided on 12 subsets, so in one cycle we have calculated 12 times the regression coefficients and the RMSE.At the end of each cycle, CVs are calculated as the ratio of the standard deviation over the mean value of each coefficient and TOTAL RMSE as the average of twelve RMSE.These results are presented in Tables 1 and 2.
Figure 7: Boxplots of the distributions of the coefficients regression value for OSC + PLS and PLS models.
As we can see, in the case of OSC + PLS the CVs of the regression coefficient are approximately 10 times lower than those obtained in the case of applying only PLS.Moreover, to obtain the best result for TOTAL RMSE, we need 1 component if applying OSC and 7 components without OSC.
Consequently, we have chosen the model with one component for OSC + PLS and the model with 7 components for PLS because they give the best results, and also because they have approximatively the same RMSE allowing us to compare the stability of regression coefficients independently.This statement is confirmed in Figure 7 which shows the distribution Boxplot of the regression coefficients using PLS and OSC + PLS.Distribution boxes are very narrow in the case of using OSC + PLS, highlighting a very poor disturbance.
For better comparison of regression coefficients stability we have plotted on the same figure the absolute value of CV; Figure 8 shows that the CV of regression coefficients of OSC + PLS model is ten times less than those of PLS model.

Conclusion
The main challenge in E-nose field is based on sensor signal processing, particularly to correct the gas sensor drift affects.As a first step, a fractional baseline correction is suggested by the use of a reference value corresponding to the sensor conductance taken at the end of the cleaning phase ( final ).Afterwards, PLS regression was combined with orthogonal signal correction to improve regression modeling performances for gas quantification.In this paper, the proposed signal processing is performed on a dataset obtained from experiments carried out on various dilutions of pine EO vapors in dry synthetic air.The Principal Component Analysis of the dataset shows clearly how signal treatment with OSC technique is essential for the EO concentration discrimination.Regarding the performances of the regressing model, the two methods PLS and OSC + PLS were compared in terms of stability and accuracy.Each of the two methods gives high accuracy, but PLS without OSC needs more variables to reach the same performance as OSC + PLS.
We have investigated the stability of the regression model by comparing the variability of regression coefficients of these two methods.Our results show incontestably that for the same accuracy OSC + PLS model has high robustness comparing with PLS model.

Figure 1 :
Figure 1: (a) Schematic representation of the experimental set-up and (b) typical temporal response of a MOX gas sensor during exposition and regeneration phases.

Figure 3 :
Figure 3: Illustration of the TGS2620 gas sensor instability in terms of its initial conductance during successive measurements by varying the EO concentrations.

Figure 8 :
Figure 8: Magnitude of CV for the regression coefficients of OSC + PLS and PLS model.

Table 1 :
CV values of regression coefficient and TOTAL RMSE along with number of OSC + PLS components.

Table 2 :
CV values of regression coefficient and TOTAL RMSE along with number of PLS components.