Neural Networks Technique for Filling Gaps in Satellite Measurements: Application to Ocean Color Observations

A neural network (NN) technique to fill gaps in satellite data is introduced, linking satellite-derived fields of interest with other satellites and in situ physical observations. Satellite-derived “ocean color” (OC) data are used in this study because OC variability is primarily driven by biological processes related and correlated in complex, nonlinear relationships with the physical processes of the upper ocean. Specifically, ocean color chlorophyll-a fields from NOAA's operational Visible Imaging Infrared Radiometer Suite (VIIRS) are used, as well as NOAA and NASA ocean surface and upper-ocean observations employed—signatures of upper-ocean dynamics. An NN transfer function is trained, using global data for two years (2012 and 2013), and tested on independent data for 2014. To reduce the impact of noise in the data and to calculate a stable NN Jacobian for sensitivity studies, an ensemble of NNs with different weights is constructed and compared with a single NN. The impact of the NN training period on the NN's generalization ability is evaluated. The NN technique provides an accurate and computationally cheap method for filling in gaps in satellite ocean color observation fields and time series.


Introduction
The number of successful neural network (NN) applications in satellite remote sensing, meteorology, and oceanography has steadily increased over the last two decades [1,2]. In these fields, NNs have been applied to address such problems as classification, feature extraction and tracking, pattern recognition, change detection, forward and inverse problems, and so forth. NNs also have been used to fill data gaps in measurement time series [3][4][5], while Peres et al. [6] used NNs to extend observation records.
NNs have also been applied in satellite remote sensing of ocean color (OC) (see references below). The "color" of the ocean is determined by the interactions of incident light with substances or particles present in the water because suspended particles will increase the scattering of light. In coastal areas, material in river runoff, resuspension of bottom material (sand, silt) by tides, waves, and storms, as well as biologically active components in the water column can change the color of near-shore waters. These components contain substances that absorb certain wavelengths of light, altering the optical signature. For example, microscopic marine algae (phytoplankton) have the capacity to absorb light in the blue and red regions of the spectrum due to the chlorophyll which enables photosynthesis. The underlying principle for remote sensing of ocean color (OC) is water with higher concentrations of phytoplankton (chlorophyll) is greener, while water with lower concentrations of phytoplankton is bluer [7]. Ocean color data is a vital resource for operational forecasting, oceanographic research, and earth sciences, along with a wide variety of related applications [8].
Satellite-derived OC fields are essential for numerical prediction applications, enabling numerical models to address a biophysical feedback process that is particularly important to coupled ocean-atmosphere modeling. As a new capability, integrating/assimilating near-real-time OC data into numerical ocean modeling improves forecast accuracy; consequently, a robust method needs to be developed for filling data gaps and providing the projected OC values needed to run the model into the future for predictions. The assimilation of OC data also drives/constrains the modeling of physical-biogeochemical processes that are the foundation 2 Computational Intelligence and Neuroscience for ecological forecasting; however, such models require continuous (gap-free) satellite ocean color fields for development, initialization, and data assimilation [8]. Thus, this work contributes a critical foundation for physical and biogeochemical modeling by providing continuous ocean color data and projected values. For the past three years, NOAA has been the developing capability to use near-real-time Visible Infrared Imaging Radiometer Suite (VIIRS) and other OC fields for its operational ocean [9,10] and coupled seasonal forecast systems [11].
Multiple NN applications have been developed to solve forward and inverse problems in satellite ocean color remote sensing [12, and references there]. NNs also have been applied to merging OC information from multiple satellite missions [13]. In this work, we developed a new NN approach, which allows gaps (spatial and temporal) in satellite derived OC fields to be filled using physically related, but independently derived, satellite and in situ observations which provide physical information about the state of the upper layer of the ocean.

Formulation of the Problem.
Chlorophyll-a (Chl-a) concentration, a biological proxy for the intensity of photosynthesis derivable from ocean color observations, is affected by processes in the upper layers of the ocean of various spatial and temporal scales. Physical parameters characterizing the state of the ocean surface and upper mixed layer-temperature, salinity, and density-define the active physical background for associated biological processes; thus, variability of the physical background is responsible for a significant portion of the variability of entrained biological parameters. Accordingly, we can consider the OC, (in this case, the single parameter Chl-a), as a function or mapping of a vector of the ocean surface and upper mixed-layer state variables, . This mapping can be symbolically written as where denotes the mapping, is the dimensionality of the input space, and is the dimensionality of the output space. This function/mapping is expected to be a complex nonlinear function because the variability of the physical parameters is transferred into the OC variability through a complex hierarchy of physical, chemical, and biological processes. Also, both the OC and ocean state data have finite spatial and temporal resolutions (provided on a grid with limited spatial resolution and averaged to daily temporal resolution); consequently, the physical and biological variability on scales finer than these resolutions appear as stochastic contributions to the OC, . Thus, the mapping between the OC, , and physical ocean variables, , is a complex, nonlinear stochastic mapping, The stochastic variable represents an uncertainty introduced into the OC, , due to unaccounted high-frequency small scale (subgrid) variability of physical, chemical, and biological processes. Also, all or a part of variables, constituting vectors and , are observations, which have different levels of noise. This noise also contributes into the stochastic variable . Assuming that stochastic part of the mapping is additive, representation (2) can be simplified, It is noteworthy that the uncertainty is an inherent informative part of the stochastic mapping, containing important statistical information about the mapping. Actually, the stochastic mapping is a family of mappings distributed with a distribution function. The range and shape of the distribution function are determined by the uncertainty vector .

NN Emulation for the OC Mapping.
Neural networks are very generic, accurate, and convenient mathematical models that emulate complicated nonlinear input/output relationships through statistical learning algorithms [14]. NNs can be applied to any problem that can be formulated as a mapping (input vector versus output vector dependence). The multilayer perceptron (MLP) with one hidden layer is a generic tool for approximating such mappings [2]. The simplest MLP NN analytical approximations use a family of functions like where and are components of the input and output vectors and , respectively, and and are NN weights. Here, the hyperbolic tangent is used as an activation function. Equation (4) is also a mapping, which can approximate any continuous or almost continuous (with final discontinuities) mapping [2,15]. Symbolically, it can be represented as = NN( ).
To train the NN that is emulating the mapping (1), an error function, , is created, and minimized to find an optimal set of coefficients and . However, for stochastic mapping (3), the training criterion should be modified as A single NN does not provide an adequate emulation/approximation of the stochastic mapping (3); therefore, an ensemble of NNs should be trained using criterion (6) [2]. If each NN member of the ensemble satisfies condition (6), this ensemble provides an adequate approximation for the stochastic mapping (3). Thus, to effectively account for subgrid scale effects and to reduce the impact of noise in NN simulated data (e.g., the Chl-a concentration), an ensemble of NNs was trained using criterion (6) and the average of this ensemble was used as the simulated OC, . In producing the ensemble of NNs to start the NN training, a slightly different initialization of NN weights, and , was chosen for each NN ensemble member; thus, different NN ensemble members correspond to different local minima of the error function (5), all satisfying condition (6). This simplest approach was selected because the data have a significant level of uncertainty/noise. The magnitude of the uncertainty estimated in Section 4.1.2 ( Table 2) shows that the basic ensemble approach allows us to obtain an approximation error of magnitude close to the magnitude of uncertainty in the data. In our opinion, this result shows that, consistent with the parsimony principle, the use of more sophisticated approaches is not justified in this case.
The ensemble was also used to improve the stability of the NN Jacobian [16], which is used below for a sensitivity study. The Jacobians of each th NN ensemble member ( × matrix of the first derivatives of the NN outputs over the input), were calculated and then averaged to calculate the mean Jacobian used for the sensitivity study below. Formally speaking, the Jacobian of the MLP NN (4) can be easily calculated using direct differentiation, However, calculating the derivative of any statistical model (including NN) is an ill-posed problem [2] which should be regularized. As shown by Krasnopolsky [16], the problem can be solved using an NN ensemble and calculating the Jacobian as an average of Jacobians of the NN ensemble members. This approach is used in this effort.

Selecting NN Inputs and Outputs.
Selecting the emulating NN architecture includes selecting NN inputs, NN outputs, and the number of hidden neurons, . For this study, we selected one output-chlorophyll-a concentration. The vector of inputs, , was composed of two parts = { ⃗ , ⃗ }, where ⃗ is a vector of physical parameters, which includes satellite sea-surface elevation (SSH), sea-surface salinity (SSS), and sea-surface temperature (SST) and in situ Argo salinity (sal) and temperature (temp) vertical profiles. It can be expressed as and vector ⃗ is a vector of auxiliary or meta variables or tracers configured as ⃗ = {yr, sin ( ) , cos ( ) , sin (lon) , cos (lon) , sin (lat)} , where yr is the year, = 2 ( /365) ( equals the day of the year), and lon and lat are, respectively, longitude and latitude in radians.
Metadata are included in the input vector, , to permit training a single NN (or single ensemble of NNs) that, given the input for a particular location on the globe (lat and lon) at a particular moment in time (yr and ), provides output (Chl-a concentration) for the same location and time. This NN is trained using records { , } collected over several years at locations representing the entire globe. The trained NN (or NN ensemble), using the same weights, then is used for the entire globe for a long period subsequent to the training interval. Hence, each trained NN (a single NN or NN ensemble member) takes information from one grid point and produces a simulated value of OC (Chl-a) for that grid point at the corresponding time. The same single NN or NN ensemble then moves to the next grid point of the global grid, producing, in this way, a global field of OC (Chl-a). The results presented below are obtained with the input vector , comprising ⃗ (11), including all metadata variables, and ⃗ (10), including three surface variables plus variables representing seven upper layers of sal and temp from Argo profiles. Thus, in this study, the NNs emulating the OC mapping each have 23 inputs and 1 output. Table 1 lists these variables and their units, as well as the output parameter.
In the following assessment, an optimal set of inputs will be determined, specifically which parameters to include in (11) and the number of upper layers established from Argo profiles to include in (10). The accuracy of approximation and the correlation coefficient between NN-generated values and observed OC (Chl-a concentration) will be used as major indicators of the significance of the various inputs. The VIIRS chlorophyll-a global fields are composited daily and interpolated from the 9 km resolution provided by NASA [17] to a global 1 ∘ × 1 ∘ latitude/longitude resolution.

Data
Temperature and salinity profiles for the top 75 m of ocean water from Argo float data, a collaborative international partnership program for measuring upper ocean temperature, salinity, and currents in the Earth's oceans, were obtained from the International Pacific Research Center at Hawaii as gridded global Argo monthly mean data interpolated to daily values (1 ∘ ×1 ∘ latitude/longitude resolution) [18]. Daily global satellite SSH values [19] and daily global satellite SST, interpolated to 1 ∘ × 1 ∘ latitude/longitude grids, were acquired from NOAA [20]. NASA's Physical Oceanography Distributed Active Archive Center (PODAAC) provided SSS (bias adjusted, version 3, gridded) data from the Aquarius mission [21,22]. Since the SSS fields do not have global coverage daily, the fields were composited to obtain daily coverage on a 1 by 1-degree global grid.
The aforementioned satellite and in situ observations are well documented and available in near-real time; however, the Aquarius mission recently ended (June 2015) due to equipment failure. These satellite and in situ data are interpolated to the same global one-degree latitude-longitude grid and

The Accuracy of Approximation.
This study determines the "optimal" architecture for the emulating NN using an early stopping method, evaluating the level of uncertainty in the data, and comparing performances of single NN and the NN ensemble. For this assessment, 2012 and 2013 data are used for training and test set, with 2014 data used for validation.

Selecting the Number of Hidden Neurons.
To evaluate the "optimal" size of the hidden layer (the number, , of hidden neurons in (4)) supporting the previous selection of NN inputs and outputs, a set of ten NNs with 23 inputs and one output were trained, varying from 3 to 45. Figure 1 depicts the corresponding RMSE calculated for the independent test set. From Figure 1, = 30 is the "optimal" number of hidden neurons, because it provides the "best" approximation with where app is the approximation accuracy of the NN per se and is the uncertainty due to unaccounted fine scale processes, subgrid variability, and observation errors (Section 3.1). To better estimate the approximation accuracy of the NN simulating OC, we trained an NN with the same number of hidden Computational Intelligence and Neuroscience 5 neurons, = 30, and one output, Chl-a concentration; however, this NN had only one input, the same Chl-a concentration. In other words, we trained an NN that emulated the identity mapping. Table 2 row 1 shows the approximation statistics for this NN, noting an approximation error of order 0.002 mg/m 3 , which is a very small portion of the RMSE (12) when adding Chl-a, as an additional input to the previously selected 23 inputs ( Table 2, row 4), and comparing Table 2 rows 2 and 4, the RMSE and correlation coefficient between NN output and observed data do not significantly change. Thus, practically the entire RMSE, approximately 0.18 mg/m 3 (Table 2 row 2), can be attributed to uncertainty due to subgrid processes and observation noise in the 23 inputs. Employing an NN ensemble does not materially change the estimate ( Table 2 row 3).
It is clear from physical considerations that the level of observation noise (errors) and subgrid uncertainty are higher at higher values of Chl-a variability ( Figure 2). Higher levels of noise occur in coastal areas due to local subgrid processes. With higher Chl-a concentrations in coastal regimes, satellite observation errors are often higher and the accuracy of the retrieval algorithm is lower at higher levels of Chl-a, because there are few of in situ observations available for the algorithm development.

Bias and RMSE.
Of the data sets examined here, less than one percent of the grid points have a Chl-a concentration greater than 1.0 mg/m 3 ( Figure 2); thus, there is insufficient data for training NN, precluding NN Chl-a estimates from achieving adequate accuracy at higher concentrations of Chla. Accordingly, we train our NNs employing the full data set; however, results are presented for both the full OC interval and, when it is appropriate, the case where data points with Chl-a concentrations greater than 1.0 mg/m 3 (about 0.2% of data) are removed. Table 2 rows 5 through 7 show the error statistics and correlation coefficients for NN cases presented in Table 2 rows 2 through 4, but with data records having Chl-a concentrations greater than 1.0 mg/m 3 removed. For Chl-a concentration less than or equal to 1.0 mg/m 3 , the data uncertainty is significantly smaller (order 0.11 mg/m 3 versus 0.18 mg/m 3 ) and the correlation between the NN simulated OC and observed OC is significantly higher (0.72 versus 0.67).  The NN ensemble (Table 2 row 6) provides a significant RMSE and correlation coefficient improvement over those of the single NN (Table 2 row 5). Figure 3 compares NN simulated versus observed Chl-a concentrations for Chl-a concentrations less than or equal to 1.0 mg/m 3 . Figure 4 shows binned dependence of NN error (bias) on the value of Chl-a. In both figures, the standard deviation bars for each bin reflect the level of noise in the data. For Chl-a concentrations less than 0.5 mg/m 3 , the NN values have a small negative bias, while, for Chl-a concentrations above 0.5 mg/m 3 , the NN values have a small positive bias;  however, the magnitudes of these biases are on order of the level of the data uncertainty.

Performance of NN Ensemble.
Finally, an ensemble consisting of six NNs ensemble members, with all six ensemble members having the same architecture (23 inputs : 30 hidden neurons : 1 output), was trained on the full NN training set using different initial values for NN weights, and , (4). Thus, different NN ensemble members correspond to different local minima of the error function (5). Table 3 displays the ensemble member and the ensemble average performances for Chl-a concentrations less than or equal to 1.0 mg/m 3 , showing that the ensemble mean has higher correlation between the NN output and the VIIRS observations and lower RMSE than any of the individual ensemble members. The ensemble mean clearly outperforms all of the individual ensemble members, suggesting that random noise may be contaminating the input and/or observation streams.

Prediction Accuracy.
Trained NNs and an NN ensemble have been applied to the data sets for 2012-2014 to produce Chl-a fields and to calculate statistics for validating the NN fields against observed VIIRS Chl-a observations. In Figure 5 an individual NN (dashed line) and from the NN ensemble (solid line) are presented. Also presented in Figure 5 are the corresponding time series when using only records with Chl-a concentrations less than or equal to 1.0 mg/m 3 (red). Figure 5 supports our conclusion that the major contributor of noise in the VIIRS Chl-a data is the small amount of data with Chl-a concentrations greater than 1.0 mg/m 3 . These high concentration points, about 0.2% of the data set, create a problem for retrieval algorithm development and make training the NN for Chl-a concentrations greater than 1.0 mg/m 3 more difficult.
The mean biases ( Figure 6) are very small, of order of the estimate for the level of noise in the data ( Table 2 row 7).  Figure 7 indicates that, when compared to results from an individual NN, the ensemble NN mean has lower bias in many regions of the global oceans, especially in the southwestern Indian Ocean, the North Pacific basin, and broadly in the Atlantic Ocean. The results for the tropical Pacific Ocean are mixed, with greater bias along the Equator and reduced biases throughout the higher latitudes of the Tropics. Figure 8 shows the mean correlation (CC) is relatively stable and high throughout the validation period (∼0.8), which is reassuring. The CC is lower and more variable for the cases where all data points are retained, suggesting that a few data points with Chl-a greater than 1.0 mg/m 3 are responsible for notably degrading NN performance. Figures 5-8 indicate that an NN ensemble generally outperforms a single NN. These figures also indicate good generalization skill from NNs trained on two years of data.

Evaluation of NN Generalization Skill.
To better evaluate generalization skill and the smallest training set required for accurate prediction of Chl-a fields, the second partition of data (Section 3.2), with only one year (2012) of training data, was used. Figure 9 shows the correlation coefficient between the single NN-generated chlorophyll-a concentration fields and global VIIRS chlorophyll-a retrieval fields for 2012-2014 for two different single NNs: the first (red curve) corresponds to NN trained with two years of data (2012-2013) and second (black curve) to NN trained with one year of data (2012). As Chl-a correlation coefficient  Figure 9 demonstrates the NN trained on one year of data very quickly loses its generalization skill beyond the training period (2012). The NN trained on two years of data preserves its generalization skill, generating data during the validation period (2014) of similar quality (in terms of correlation with observed data) to that produced during the training period (2012-2013). Similar depictions were obtained for bias and RMSE. Thus, we conclude that using a training data set of two years duration is sufficient to predict global OC fields for at least one year following the training period. These results demonstrate very good generalization skill by the NN, in terms of both spatial and temporal generalization.

Preliminary Evaluation of NN Sensitivity.
Using the NN Jacobian ( (8) and (9)) for evaluating the relative contributions (importance) of the variables comprising the input vector, , to the output vector, , the NN Jacobian for each NN LAT Figure 10: Linearized estimates (NN Jacobian Equation (9)) of the input relative contributions to the NN output, chlorophyll-a concentration. Table 1 input: (1) year, (2) day (sine component), ensemble member and then the mean Jacobian are calculated. Figure 10 depicts the absolute value of the Jacobian vector components, giving an estimate of the significance of each NN input. Figure 10 demonstrates that the most important input is the SST, followed by the Argo surface and subsurface salinity fields. The Argo surface temperature observations and satellite sea-surface salinity are less important than the aforementioned ones, perhaps because the satellite SST and Argo surface salinity observations already capture some of the variability of the other measurements.

Discussion and Conclusions
This work introduces a new NN application, using NNs to relate the biological parameter, Chl-a concentration, to physical processes of the upper ocean. This new NN maps satellite-derived surface parameters, for example, sea-surface temperature (SST), sea-surface height (SSH), and sea-surface salinity (SSS) fields, along with some in situ observations (upper layers of Argo salinity and temperature profiles), to Chl-a concentration. In other words, an NN empirical biological model for Chl-a is introduced in this paper. Ocean color (chlorophyll-a concentration) fields from NOAA's operational Visible Imaging Infrared Radiometer Suite (VIIRS) as well as NOAA SSH and SST fields and NASA Aquarius mission SSS fields were used. Observational data for 2012-2014 were spatially and temporally averaged and fitted to a 1 ∘ × 1 ∘ latitude/longitude grid for NN training, testing, and validation. Results were assessed using the mean error (bias), rootmean-square error (RMSE), and the correlations between observed and NN-generated chlorophyll-a concentrations. An ensemble of NNs with different weights was developed to reduce the impact of the noise in the data, as well as for calculating the relative significance of contributing inputs. Coarse spatial and temporal resolutions of the data limit the features resolved by the NN-generated OC fields. Global and mesoscale features are represented sufficiently well in the NN OC fields; however, resolving finer-scale features requires the NN to be trained with finer-resolution data. This study demonstrates that employing an NN technique can provide an accurate, computationally cheap method for filling gaps in satellite observation fields and time series. It is noteworthy that an individual NN (or a single NN ensemble) is capable of generating OC fields for all global grid points, although an NN ensemble produces better results. This study demonstrated that training with at least two years of data is needed for sufficient skill to ensure that the accuracy of the NN prediction does not significantly degrade during the one-year validation period. These results demonstrate very good NN skill in terms of both spatial and temporal generalization. The NN approach successfully eliminates the systematic component of noise (bias). Employing an NN ensemble reduces the random component of the noise. When the small percentage of noisy data (less than one percent) is removed, the ensemble mean outperforms each of the ensemble members.
The mean Jacobian was used to evaluate the relative significance of the NN inputs, revealing that the daily SST is the most important input, closely followed by Argo monthly subsurface salinity profiles. The Argo monthly temperature subsurface signal moderately contributes to NN performance. We are planning to improve NN skills by (1) optimizing NN inputs, (2) retraining NN with accumulated new data, (3) introducing additional information (new input variables), and (4) employing higher resolutions for both the inputs and outputs. This NN system will be used to create consistent chlorophyll-a concentration time series across various ocean color satellite missions. The NN approach developed in this study and applied to filling ocean color satellite measurement gaps is a generic approach that can be applied to fill gaps in Computational Intelligence and Neuroscience 9 other satellite measurements. It is anticipated that this NN system will also be applied to other ocean color parameters important to numerical modeling, for example, the diffuse attenuation coefficient for photosynthetically active radiation (Kd PAR ).