Estimating Annual CO2 Flux for Lutjewad Station Using Three Different Gap-Filling Techniques

Long-term measurements of CO2 flux can be obtained using the eddy covariance technique, but these datasets are affected by gaps which hinder the estimation of robust long-term means and annual ecosystem exchanges. We compare results obtained using three gap-fill techniques: multiple regression (MR), multiple imputation (MI), and artificial neural networks (ANNs), applied to a one-year dataset of hourly CO2 flux measurements collected in Lutjewad, over a flat agriculture area near the Wadden Sea dike in the north of the Netherlands. The dataset was separated in two subsets: a learning and a validation set. The performances of gap-filling techniques were analysed by calculating statistical criteria: coefficient of determination (R 2), root mean square error (RMSE), mean absolute error (MAE), maximum absolute error (MaxAE), and mean square bias (MSB). The gap-fill accuracy is seasonally dependent, with better results in cold seasons. The highest accuracy is obtained using ANN technique which is also less sensitive to environmental/seasonal conditions. We argue that filling gaps directly on measured CO2 fluxes is more advantageous than the common method of filling gaps on calculated net ecosystem change, because ANN is an empirical method and smaller scatter is expected when gap filling is applied directly to measurements.


Introduction
A good knowledge of the production rate and local storage of CO 2 as well as of the flow of energy (mainly heat and momentum) and mass (mainly gasses and vapour) is important in view of the recent reports about global climate change. The CO 2 flux to and from the atmosphere is a measure of growth or decrease of biomass in an ecosystem. Inversely, ecosystem-atmosphere gas fluxes can be modelled using knowledge of biomass changes.
When atmospheric conditions (relative humidity, wind velocity, air temperature, and global radiation) are constant and the main vegetation is homogeneous and situated on a flat terrain for an extended distance upwind, the eddy covariance method is the most reliable for determining the quantity of CO 2 exchange between the biosphere and atmosphere [1].
When the eddy covariance method is used over natural and complex landscapes or during atmospheric conditions that vary with time, the measurements must include estimates of atmospheric storage, flux divergence, and advection. Gross ecosystem carbon uptake and ecosystem respiration are the two major components of NEE with the atmosphere [2]; thus, the exchange of CO 2 between the atmosphere and the biosphere is the balance between the gross ecosystem productivity and ecosystem respiration [3].
Gaps or missing data originate in calibration errors, night-time air drainage flow beneath sensors, and missing data due to instrument failure or extreme weather conditions [4][5][6][7][8]. Calculating carbon balances from daily to annual time scales is a challenge because of these errors. Moreover, gaps in recorded data set are usually not distributed randomly during the year due to seasonal variations in the climate and 2 The Scientific World Journal ecosystem function, which adds difficulties to the gap filling process and data processing [9]. Another source of gaps is, for the particular data set used in this study, the selection of wind direction with flow over homogeneous vegetation, because data with northern wind are influenced by the nearby Wadden Sea.
Despite intensive studies of gap-filling techniques, there is a need to improve the quality and reliability of the results, which are still highly dependent on meteorological conditions, on methods, and on the characteristics of the data set. In this paper, we aim at disentangling between three methods that can be used for gap filling and to find the most reliable method that could be used for a set of given meteorological conditions, based on a defined set of available data. We compare results obtained with three methods: multiple regression (MR), artificial neuronal network (ANN), and multiple imputation (MI), which were applied to data basis consisting of hourly eddy covariance measurements collected from Lutjewad, The Netherlands, during 2008.

Experimental Data.
The measurements are taken at the 60-meter-tall atmospheric research tower Lutjewad (6 • 21 E, 53 • 24 N) located in the north of The Netherlands. Concentrations of several greenhouse gases [26] and their isotopes are measured on a continuous basis and by automated flask sampling techniques [27]. Meteorological data include wind speed, air temperature, solar radiation, and relative humidity, measured at various heights. The wind direction is measured at 60 meters, precipitation is measured at ground level, and atmospheric pressure is measured at 7 meters. Starting with the summer of 2006, a Gill Windmaster Pro 3dsonic anemometer/LICOR LI-7500 infrared CO 2 and H 2 O analyzer combination is running at a height of 50 meters. The collected raw EC data have a time resolution of 10 Hz. The 10 Hz data were processed to fluxes with AltEddy software, a powerful program written at AltTerra (Wageningen University and Research Centre). Hourly averaged data were used in this study because the turbulence has a low frequency at the height where the CO 2 flux is measured [28]. Halfhourly averaged momentum fluxes were underestimated by 4.4% compared to hourly fluxes.
The site is influenced mainly by winds originating from southwest and west (about 30% of the time) [29]. Directly to the north of the tower is a dike with a height of almost 8 meters, running in a direction of about 75 • from north. To the north of these lie salt marshes of about 800 meters, followed by the tidal Wadden Sea of about 8 km, and the island of Schiermonnikoog, beyond which starts the North Sea. In all other directions, the region is dominated by arable land for fetches up to at least 10 km, mostly sown with grain, sugar beets, and potatoes. Because of the remoteness of the location, anthropogenic sources of CO 2 apart from the arable land are rather small. Northern winds are influenced by fluxes from the salt marsh and Wadden Sea, and southern winds are influenced by the existence of arable crops. In order to avoid sea influences, we selected only wind direction from the agricultural area between 95 • and 215 • .
Each hourly averaged flux measurement is accompanied by a quality data factor from 1 to 10, where 1 denotes the highest quality [30]. Only data with quality factor 1 have been used in the present study. Due to factor quality and wind-based limitations, about 60% of the existing database is rejected. Most of the data with a high-quality factor have been found from June till September. The lowest number of high-quality measurements of CO 2 flux appears to have been collected in November, when only 69 measurements were used, compared to an average of around 400 used for summer months.

Gap-Filling Methods.
Gap filling in the CO 2 atmospheric flux was done by three methods: MR, ANN, and MI, using the statistical software program SPSS (version 16 and 17 SPSS Inc., IL, USA). Hourly data, separated for each month, were used in the analysis, and monthly averages were calculated in order to account for seasonal changes in plant biomass and ecosystem exchange.
These data were obtained as follows: all hourly highquality measurements obtained during the whole year have been separated in two equal sets. The first half was kept as a witness dataset and was considered as unknown. These data were considered as missing, thus creating artificial gaps. The second half has been considered as learning data, that is, this dataset was considered "known" and was introduced as such in each of the three statistical gap-filling methods. This way we can compare the CO 2 flux that was actually measured with the gap-filled CO 2 flux obtained with the MI, ANN, and MI models, as if these measurements were not available.
The gap fill methods were used directly on measured CO 2 fluxes, in contrast to common methods that fill gaps in NEE [9,10]. The difference between CO 2 flux and NEE in our case is mainly due to CO 2 storage between the surface and measurement height. Filling gaps directly in measured CO 2 data has the advantage that the results are not deteriorated by additional scatter from inaccurate estimations of CO 2 storage. This is especially important when measurements are executed at elevated height, where storage may be large. However, some differences between CO 2 flux and NEE exist: CO 2 fluxes depend on wind velocity, whereas NEE is supposed to be independent of such an atmospheric variable. Input values for all three gap fill methods in this study are meteorological parameters (air temperature, wind velocity, global radiation, and relative humidity) associated with output data, CO 2 flux (Table 1). Wind velocity is closely related to friction velocity and has the advantage that its measurement is more reliable.

Multiple Regression.
We used a stepwise multiple regression analysis (MR) to predict missing data of CO 2 flux using the hourly data measurements of high quality of flux and meteorological condition in SPSS software. Meteorological parameters that influence CO 2 fluxes are global radiation, wind velocity, relative humidity, and air temperature. Correlation coefficients between hourly CO 2 flux and each of the four parameters have been calculated. The correlation coefficient R 2 between global radiation and CO 2 flux is the highest, with a value of 0.96. The other correlation coefficients are 0.93 for wind velocity, 0.51 with humidity, and 0.21 with the temperature. The SSPS program works as follows: global radiation score is entered first as a basis variable, while wind velocity, relative humidity, and air temperature would count for the variance. Next, the algorithm evaluates whether the remaining variables contributed significantly to the R 2 (the part of the variance in the response variable that the explanatory variables account for). If they did not, they would not be entered in the equation, in spite of the fact that initially their correlation levels were about the same strength. Stepwise regressions can be done forward, backward, or both ways, and, in all cases, the computer picks the regression configuration based on purely statistical information, with no logical or theoretical assumptions involved [31].
Multiple regressions require a large number of observations, and the number of input variables must substantially exceed the number of predictor variables. The equation has the following form: where b 1 , b 2 , . . . , b n are regression coefficients. The standard input method is simultaneous because all variables are introduced into the equation at the same time. Each predictor is analysed and evaluated by the influence it has on the prediction of the dependent variable. Variables can be retained or deleted on the basis on the associated statistics [32].

Artificial Neural Networks.
Artificial neural networks (ANNs) are purely empirical nonlinear regression. ANNs are composed of nodes connected by weight that are the regression parameters [33][34][35]. The multilayer perceptron (MLP) is a feed-forward neural network architecture and uses different linear combination functions and nonlinear sigmoidal activation functions. The MLP architecture contains an input layer, at least one hidden layer and an output layer. Each unit from the input layer is connected to a unit from the second layer, and the output layer is connected to the hidden layer. The input units are used to predict the values of the target variable. The hidden units execute an internal nonlinear transformation, and the output units create predicted values and then back-propagate errors (compares the difference between the predicted values with the values of the output units) adjusting the weights so that the network output optimally approximates CO 2 flux. In the present study, a network with one hidden layer was used.
The input values pass the network, and the error is calculated by a comparison of the network's outputs y j with measured target values m j . The quality of the network is evaluated on the basis of the mean squared error (MSE). E is the error as accumulated over all N data records that served as learning patterns [36]: (2)

Multiple Imputation (MI). Multiple imputation (MI) uses a Markov Chain
Monte Carlo algorithm to replace missing value with a range of estimated (imputed) values for each missing item. MI uses a regression model to predict missing values. The MI technique consists of three steps: imputation, analysis, and pooling. First, sets of plausible values for missing data are created that reflect uncertainty about the estimated model. Each of these sets of plausible values is used to impute the missing values and to obtain a complete data set. Second, each of these data sets is analyzed using statistical methods. Third, the results are combined, which allows the uncertainty regarding the imputation to be taken into account [37]. The observed values are the same as in the original data set, and only the missing values have different estimated values [16,38]. The accuracy of MI can be improved by using the multiple imputations (5-10 imputations) instead of the single imputation method that underestimates the error variance of missing data [39].

Statistical Performance Measures.
Five performance indicators were calculated for describing the accuracy of the three gap-filling methods: the mean square bias (MSB), maximum absolute error (MaxAE), and mean absolute error (MAE), which calculate the magnitude and distribution of individual errors, and root mean square error (RMSE), R-squared (R 2 ), which measures the correlation.
MSB is used to evaluate the performance of an estimator and is given by 4 The Scientific World Journal where o i is individual observed CO 2 flux data, p i is the predicted values, N equals the number of hourly predicted observation pairs. MaxAE represents the largest forecasted error, expressed in the same units as the dependent series. MaxAE is useful for analyzing the worst-case scenario for forecasts data: MAE measures how much the series varies from its model-predicted level: RMSE is a measure of the difference between the observed and predicted CO 2 flux data, and it was used to provide the average error of model: R 2 was used to estimate the proportion of the total variation in the series that is explained by the model: where o i is individual observed CO 2 flux data, p i the predicted values, p and o their means.

Diurnal Variation.
Diurnal variations of measured and gap-filled hourly fluxes are shown in Figure 1 for each month. The diurnal cycle of the measured flux from March to October is nicely reproduced by all three methods, but MR and MI underestimate the negative peak during daytime, especially during the summer months. Higher differences between measured and gap-filled data seem to occur in daytime, when the average CO 2 flux is negative and large, compared to the night time, when CO 2 flux is positive and small. In all months, the ANN line is the closest to the measured values, suggesting that the ANN method gives more accurate results than MR and MI. The highest differences are observed in November, when an hourly mean of the measured flux (at 9 AM) is about 2 times higher than the gap-filled data. Further analysis showed that the value at 9 AM is the average of only two measurements, so all random errors are already included in this value.

Seasonal Effects.
The results for monthly averages of gap-filled CO 2 fluxes are shown in Figure 2, together with measured values. The plot shows that, most of the time, gapfilled values are close to the measured ones. However, some departing results can be noticed, for instance, in July, August for MR, or April for MI. The highest difference between measured and all gap-filled data exists in November.
Data were grouped by season in order to observe a possible seasonal effect on the accuracy of gap-filling results. Winter seasons consist of November-February, March, April, September, October are considered to be part of the equinox season, while summer months are May-August.
In winter, there is less sunlight and air temperature is lower. Most fields are bare except maybe for winter wheat with some above ground biomass, so photosynthesis is expected to be very small and soil respiration may explain the positive CO 2 flux. The highest positive CO 2 fluxes are observed in January. The averages of measured and gap-fill data of CO 2 flux for the entire cold season (January, February, November, and December) are presented in Table 2. The ANN method is the closest to the measured flux, with differences between gap-filled and observed data ranging from −0.06 to 0.03 µmol m −2 s −1 (except November). MR differences are between 0.02 to 0.14 µmol m −2 s −1 , and MI differences range from −0.18 to 0.05 µmol m −2 s −1 . Concluding, all gap-filled methods provide good results for winter time.
The equinoctial season consists of data from March, April, September, and October. The equinoctial average of the measured and gap-fill data of CO 2 flux is presented in Table 3. Differences range from 0.31 to −0.42 µmol m −2 s −1 for MI, while ANN departs from measured results with values from −0.02 µmol m −2 s −1 to 0.05 µmol m −2 s −1 . Differences between MR-gap-filled and measured data are in between, varying from -0.23 µmol m −2 s −1 to 0.14 µmol m −2 s −1 . Concluding, the best fill for equinox months is obtained using the ANN method.
The warm season is considered to last from May to August. The average of measured CO 2 flux and gap-fill data by MR, ANN, MI is presented in Table 4. Again, ANN gives the best results out of the three methods for the warm season as well.
Atmospheric conditions and the assimilation/respiration of plants change with the seasons. During the summer season, the absolute value of the CO 2 flux is higher because photosynthesis occurs most rapidly in summer, thus CO 2 exchange is more intense. Peaks of the negative CO 2 [36]. Also, respiration increases with temperature.

Statistical Analysis.
In order to have a better view of the accuracy of each gap-fill method, we compared the monthly variation of MR, ANN, and MI using five statistical parameters ( Figure 3). The MSB is a handy criterion for the evaluation of the gap-filling techniques, determining the systematic error for a long dataset. Its monthly value is computed by the average square root of biases for every hour of the day during a specific month. Averaging over all observations during a month of a certain time of day reduces the impact of 6 The Scientific World Journal   statistical uncertainties in the measurements and thus should give a better focus of the performance of the gap filling method but less focus on the uncertainty in observations. Higher biases are observed in summer, especially for the MR method. Lower differences between measurements and gap fill occur in winter, except for November, when the mean square bias is high for all three methods. The best results are obtained at equinox, when day and night periods are of similar length and CO 2 uptake during day time is almost equal by release during the night. ANN performs better than MI every month and better than MR in 11 out of 12 months. A clear seasonal effect is seen in the MR and MI methods: the poor result of MR and MI during months with high irradiation might be partly caused by the use of linear regression on irradiation. In reality, photosynthesis saturates with high irradiation because the vegetation cannot photosynthesize more quickly. This might indicate that MR and MI calculations are dominated by the radiation monthly budget: the method gives poor results when the radiation is high and good results when the radiation is low.
One explanation for the poor result in November might be the fact that the amount of data used for gap-fill is too small (only 69 measurements) to give reliable results, regardless of gap-fill method. Figure 3 shows that results are good in March, although the number of data that has been used is also relatively small (128, compared to 200-500 for the rest of months). On the other hand, the flux in March is 60% from the November one. This might suggest that any gap-fill method will become unreliable when the number of highquality measurements is below a certain threshold. MaxAE confirms the results described above, with a higher accuracy for all gap-filling techniques for autumn and winter months (except November) and a lower accuracy for June and July. MAE measures the average magnitude of the errors in a set of forecasts, and the same pattern was obtained for all three gap filling methods. The best performance is given by ANN in equinox months, with a MAE of 0.86 µmol m −2 s −1 in October. Low values of RMSE indicate a good fit of the model to measurements. Again, ANN is overall the best method while the worst gap-fill method is the MR method.
The highest values of R 2 are obtained for ANN, supporting that, indeed, ANN is the best gap-fill technique. All methods give good results in December for all three methods, with R 2 ranging from 0.80 (MR) to 0.88 (ANN). A low accuracy of all methods is seen in September, when the lowest R 2 is 0.49 (MR) and the highest is 0.63 for ANN.
A concise view of gap-filling methods is given in Table 5, where annual averages of all statistical parameters is given The Scientific World Journal 7  for all three gap-filling techniques. Bold digits show the best result, and it is clear that the best response at each statistical test is given by the ANN method. Only meteorological parameters have been considered in our calculations; thus, it might be necessary to take into account some indicators of specific spring biological processes. The large bias in April, compared to other equinox months, might be due to the fact that the CO 2 uptake by the agriculture area is strong or to the fact that large differences exist in the CO 2 uptake between day and night. Good results are obtained in autumn which might be explained by the fact that effects of global radiation and air temperature on ecosystem cause the reduction of the potential for agriculture area carbon sequestration. Correlation coefficients between biases (MSB) of each gap-fill method and each meteorological parameter are given in Table 6, which shows that the ANN method is independent of the meteorological parameters. The performance of MR gap-fill method is strongly influenced by the global radiation and by the relative humidity, while the performance of MI depends on relative humidity. None of the three methods is sensitive to the number of measurements, although a possible limitation to a minimum number might be necessary for getting reliable results (see the case of November).
To find which statistical indicator is the best for the determination of the optimal gap fill method, we add a test. Based on the above results, that is, that ANN in the best gap-fill technique, we calculated the difference between other methods and ANN and the scatter in that difference, measured by the variance. The ratio between scatter and the mean difference is a measure of the probability that ANN is actually better than another method. The ratio between mean and scatter is a measure of the probability that the results are caused by statistical uncertainty; the larger the number, the smaller the probability.
The results of the test, shown in Table 7, show that the largest ratio is found for R-squared for the difference between MR and ANN, but this is not valid for the difference between MI and ANN, for which MaxAE seems to have the highest value. However, MaxAE, R 2 , and MAE are very close, without any clear difference. All ratios are relatively close to each other, suggesting that all indicators can be used in evaluating the relative performance of a method compared to others.

Conclusions
Three gap-filling methods, MR, ANN, and MI, were used for estimating atmospheric CO 2 flux, and their accuracy was studied using an hourly dataset covering one whole year (2008) from an agricultural area. Poor weather and/or northern wind conditions led to large gaps in data. Errors may be introduced also by a nonrandom distribution of data set gaps. Each of these statistical methods gives a good estimation of atmospheric CO 2 flux, when number of gaps of original dataset was small and had a random distribution. The small biases that were found in this study imply that gap fill methods could be used directly on CO 2 measurements. The first general conclusion is that, overall, ANN gives better results than MR and MI at yearly, monthly, and diurnal scale. ANN has hardly any diurnal variation, while the MR and MI methods perform better during night time than during day time. The ANN performance indicators are better for almost every month. The efficiency of the gapfilling methods depends on the season, especially for MR and MI. Higher biases are met during warm seasons (April-August), when CO 2 fluxes are negative and their absolute values are higher. All three methods give low errors in colder seasons (September-March), when CO 2 flux is positive and smaller. The decrease of biases towards August (especially in ANN results) coincides with a decrease in the absolute value of the CO 2 flux. An exception occurred in November when very few high-quality measurements were available as learning dataset. We conclude that sufficiently high-quality measurements might be needed to reduce the impact of random errors in the results of the gap-fill methods and a minimum critical number of measurements is needed in order to reduce random errors and obtain reliable results. Negligible errors were obtained for the year average flux, but the good overall result for the MR and MI methods was caused by compensating errors in summer and winter and compensating errors in day and night time.
The ANN produced the best results, having the lowest annual average for RMSE and the highest R 2 values. The accuracy of the method has a small seasonal and diurnal variation, which means that this method is almost independent on environmental conditions. In contrast, the accuracy of the MR and MI methods varies significantly with the season and with the time of day.
The ecosystem exchange with the atmosphere influenced the results of gap filling CO 2 flux for each of the three methods. This is related to two causes: large difference in CO 2 exchange during day and night and strong temporal change throughout a month because increasing soil cover of the vegetation. Since only meteorological parameters have been considered in the calculations, this might be an indication that biological proxies should be also taken into account.
Gaps were filled on CO 2 measurements instead of NEE calculations because CO 2 measurements lack additional noise from inaccurate storage measurements. A possible drawback of using CO 2 data as input might be that relations between CO 2 flux and biological processes are less clear compared to NEE data, but this argument does not hold when the empirical ANN method is used to fill gaps.
Therefore, we conclude that all three methods could be used to calculate year-round average flux, but ANN is clearly preferred when shorter timescale data sets are analysed.