Adjusting Neural Network to a Particular Problem: Neural Network-Based Empirical Biological Model for Chlorophyll Concentration in the Upper Ocean

The versatility of the neural network (NN) technique allows it to be successfully applied in many fields of science and to a great variety of problems. For each problem or class of problems, a generic NN technique (e.g., multilayer perceptron (MLP)) usually requires some adjustments, which often are crucial for the development of a successful application. In this paper, we introduce a NN application that demonstrates the importance of such adjustments; moreover, in this case, the adjustments applied to a generic NN technique may be successfully used in many other NN applications. We introduce a NN technique, linking chlorophyll “a” (chl-a) variability—primarily driven by biological processes—with the physical processes of the upper ocean using a NN-based empirical biological model for chl-a. In this study, satellite-derived surface parameter fields, sea-surface temperature (SST) and sea-surface height (SSH), as well as gridded salinity and temperature profiles from 0 to 75m depth are employed as signatures of upper-ocean dynamics. Chlorophyll-a fields from NOAA’s operational Visible Imaging Infrared Radiometer Suite (VIIRS) are used, as well asModerateResolution Imaging Spectroradiometer (MODIS) and Sea-ViewingWide Field-of-View Sensor (SeaWiFS) chl-a concentrations. Different methods of optimizing the NN technique are investigated. Results are assessed using the rootmean-square error (RMSE) metric and cross-correlations between observed ocean color (OC) fields and NN output. To reduce the impact of noise in the data and to obtain a stable computation of the NN Jacobian, an ensemble of NN with different weights is constructed.This study demonstrates that the NN technique provides an accurate, computationally cheapmethod to generate long (up to 10 years) time series of consistent chl-a concentration that are in good agreementwith chl-a data observed by different satellite sensors during the relevant period. The presented NN demonstrates a very good ability to generalize in terms of both space and time. Consequently, the NN-based empirical biological model for chl-a can be used in oceanic models, coupled climate prediction systems, and data assimilation systems to dynamically consider biological processes in the upper ocean.


Introduction
The neural network (NN) technique is a generic machinelearning technique.The versatility of this technique allows it to be successfully applied in many fields of science and to a great variety of problems.Of course, for each problem or a class of problems, the generic NN technique (e.g., multilayer perceptron (MLP)) usually requires some adjustments, which often are crucial for the development of a successful application.In this paper, we introduce a NN application that demonstrates the importance of such adjustments; moreover, in this case, adjustments applied to the generic NN technique may be successfully used in many other NN applications.
In this work, we developed a NN-based empirical Biological Model for Chlorophyll Concentration (BMChC) in the upper ocean.Such a model is needed for the assimilation of ocean color (OC) data into ocean models.Operational integration/assimilation of OC fields into ocean models has a significant positive impact on predictive skills of climate models [1][2][3][4][5].For successful assimilation, the OC data must satisfy three fundamental requirements/conditions: first, spatial and temporal gaps in the observations need to be filled; second, data assimilation must be for a predicted parameter (prognostic variable) or a parameter explicitly related to a prognostic variable; and third, the data being assimilated must have a long data record to facilitate compilation of a robust statistical database spanning multiple seasons.
In our previous studies [6,7] we addressed the following: (1) spatial and temporal gap filling and (2) employing a NN to relate chlorophyll a (chl-a) to predicted parameters, respectively.A new approach, based on a neural network (NN) technique, was developed that allows filling large spatial (up to global-size) and temporal (up to year-long) gaps in ocean color (chl-a) fields produced by the Joint Polar Satellite System (JPSS) Visible Infrared Imaging Radiometer Suite (VIIRS) satellite.The developed NN also provides linkages between the signature of a biological process, that is, satellitederived OC fields, and signatures of upper-ocean physical processes, that is, prognostic variables.Thus, the NN can be used as an empirical biological model to relate chl-a to ocean modeling prognostic variables in data assimilation systems.
In our previous study, we applied the MLP NN in a rather straightforward manner.In Section 2 of this study, we examine different approaches to optimizing that NN technique.We evaluate the performance of the NN to emulate chl-a concentration and the NN's ability to generate a long consistent time series of chl-a concentrations, examining the impact of (i) extending the training set, (ii) optimizing NN inputs, (iii) optimizing outputs (or error function), and (iv) adding additional outputs that correlate with primary output.In Section 3, we discuss results and present conclusions.

Optimization of NN Performance
In this study, as in our previous studies [6][7][8], we use a NN multilayer perceptron, where   and   are components of the NN input and output vectors X and Y, respectively, a and b are fitting parameters (NN weights), and  and m are the numbers of inputs and outputs correspondingly.NN weights a and b are "learned" from data (training set, Section 2.1) in the process of training [8,9].The NN training is a time-consuming process of nonlinear optimization; however, it should usually be performed once for an application.A trained NN provides very fast, accurate, and robust solutions.
Several options are available for optimizing NN performance: (I) The NN training set can be extended to expose the NN to a greater variety of input and output patterns.
An independent validation set can be extended to better understand the limitations of a trained NN and improve the NN performance during the next NN training.Also, for a given parameter, the validation data set can be enriched with data measured by different instruments/sensors to evaluate cross-sensor consistency of the data simulated by the NN.(II) The set of NN inputs can be optimized by removing inputs that do not significantly contribute to the result.The importance of an input can be determined by a study (e.g., [6]) that evaluates the sensitivity of outputs to different inputs.(III) A transformation of the output variable can be performed to reconcile the statistical properties of the outputs with the error function that is used for training.(IV) Additional outputs, which correlate with primary outputs that have been already used and have similar or lower levels of noise, can be included to improve the accuracy of the primary outputs.
All of the aforementioned options are investigated in this study.

. . Extending Training and Enriching Validation Sets.
Figure 1 shows the satellite-SST, SSH, and sea-surface salinity (SSS)-and in situ (ARGO; [10]) data available for NN training and validation in this study.The daily VIIRS data on a 1 ∘ by 1 ∘ latitude/longitude global grid are available during the three-year period 2012 through 2014.In our previous study [6], the first two years (2012 and 2013, 730 days) of daily data (approximately 20,000,000 grid points/records) were used for NN training and testing.The data for 2014 (365 days) were left for validating the trained NNs and estimating its prediction (generalization) capability.It was shown that two years of training data are sufficient for accurately forecasting chl-a concentration out to, at least, one year.
In this study, we use all three years of daily VIIRS data for training and testing.For validation the data from two other satellite sensors, MODIS (ten years, 2005-2014) and SeaWiFS (six years, 2005-2010), were used.The time period starting from 2005 was selected because the in situ ARGO profiles, the most significant NN inputs for simulating chla concentration [7], begin in 2005.We used NOAA satellite sea-surface height (SSH; [11]) and sea-surface temperature (SST; [12]) fields, while the Aquarius SSS data [13], which was used in our previous work, were not used in this study.As we showed in [6], the NN for chl-a concentrations is not sensitive to the Aquarius SSS data.For the NN training, the SSH/SST fields were spatially and temporally averaged to daily averages on a 1 ∘ × 1 ∘ grid to match the chl-a data.Results are assessed using the mean error (bias), root-mean-square error (RMSE), and cross-correlations (CC) between observed and NN-generated chl-a.To reduce the impact of noise in the data and to calculate the NN Jacobian for sensitivity studies, an ensemble of NNs with different weights was developed.
To assess using SeaWiFS and MODIS data for validation of NNs trained on VIIRS data and evaluate the differences Figure 2 demonstrates a high level of correlation (greater than 0.9) between VIIRS and MODIS data.The correlation between MODIS and SeaWiFS has the same order of magnitude everywhere, except several months in 2008 when SeaWiFS had problems (∼months 38-50).Figures 3 and 4 provide estimates of RMS and mean differences between the sensors that will be used for comparison.The comparison of sensors shows that they are close enough, except for the period with SeaWiFS problems, and, thus, can be used for validating the NN ensembles, taking into account the aforementioned estimates of their differences.Two NN ensembles, each consisting of six NN ensemble members, were trained: one ensemble, using two years of daily VIIRS data, and the other, using three years of data.All ensemble members have the same architecture: 23 inputs (Table 1), 30 hidden neurons in one hidden layer, and one   The daily VIIRS and NN data were averaged to get monthly data for comparison with monthly MODIS and SeaWiFS data.Figure 6 depicts the correlations between the monthly mean chl-a concentrations simulated by the NN ensemble and the VIIRS, MODIS, and SeaWiFS observations.Figures 7 and 8 show the RMSE and bias of the monthly mean chl-a concentrations simulated by the NN ensemble versus the VIIRS, MODIS, and SeaWiFS data.These comparisons demonstrate that the ensemble of NNs trained on three years of daily VIIRS data performs satisfactorily, producing chl-a estimates that are in good agreement with the chl-a concentrations observed by three different satellites within a ten years period.
The RMSE and biases between the NN simulated data and the three satellite observation data sets are similarly small and, most importantly, they do not increase with temporal distance from the training period.Correlation is high enough; however, it slowly decreases with increasing distance from the training period.
. .Optimizing Inputs and Outputs.The inputs of the NN used in this study are slightly different from our previous studies [6,7] in that the surface salinity data have been removed from the NN input vector and ocean depth has been added to better resolve chl-a variations with changes in the ocean depth (from top to bottom).The final selection of the NN inputs is presented in Table 1, which shows all NN inputs and outputs.The second output was added for one experiment (Section 2.4).
. .Optimizing the Error Function.In our studies, the mean square differences between the training data {  } =1,..., and NN outputs are used as the error function (2).This function is minimized in the process of the NN training.
The minimization of this error function is equivalent to the basic statistical maximum likelihood principle only if the probability density function (pdf) of outputs is normal [8].This condition means that, if the output pdf is not normal, the error function does not deliver the optimal parameters for the trained NN; consequently, that NN does not provide the optimal (best) approximation for the training set.Therefore, to achieve the best results, the goal should be to make the PDF as close to normal as possible.Figure 9 depicts two PDF: one of chl-a (left, solid line) and another one of the natural logarithm (ln) of chl-a (right, solid line).The chl-a PDF is far from normal (normal is shown by dashed line): with a very long tail, this distribution is closer to log-normal.The right panel shows the PDF of ln(chl-a), which is almost normal.Thus, in (2), if ln(Y  ) is used, rather than Yi, the error function (2) becomes nearly optimal and the NN, which now generates ln(chl-a), becomes almost the best approximation for the data.The NN with logarithms of chl-a and Kdpar (Section 2.4) as outputs is called LN-NN.
The following illustrates advantages from employing NNs that produce a logarithm of chl-a as the output.An ensemble   of these NNs (actually LN-NNs) was trained using three years of VIIRS data (2012-2014).The trained NN ensemble was applied to the independent test sets and monthly means were calculated.Comparisons of correlations (Figure 10), RMS error (Figure 11), and bias (Figure 12) statistics for two NN ensembles are provided for (1) an ensemble of NNs having and observed data are significantly higher, with diminished correlation reduction with temporal distance from the training period (Figure 10).RMSE (Figure 11) and biases (Figure 12) are notably reduced when using LN-NN versus NN.The amplitude of the annual cycle signal is significantly decreased in all the aforementioned statistics when using LN-NN.
Figures 13 and 14 show the global spatial maps of RMSE and cross-correlations for LN-NN (a and c) and NN (b and d).
The NN has difficulty in certain regions (Figure 13), because strong spatial gradients and high temporal variability in satellite chl-a values are not adequately sampled by the coarseresolution inputs (SST, T, and S).Clearly, cross-correlation values with respect to VIIRS (Figure 13(a)) and with respect to SeaWiFS (Figure 13(c)) are much more improved in most regions of the global oceans when using LN-NN than for NN (Figures 13(b) and 13(d)), respectively.Figure 14 depicts that, in the equatorial and tropical oceans, the spatial plot of LN-NN with respect to VIIRS observations (Figure 14(a)) is quite similar to that with respect to SeaWiFS (Figure 14(c)); however, in areas closer to the polar oceans, the RMSE with respect to SeaWiFS observations are larger.Also, the RMSE of LN-NN with respect to VIIRS and SeaWIFS (Figures 14(a . .Supplemental Outputs.In many cases, adding an additional NN output that is physically related or statistically correlated with the major output improves the accuracy of the major output, especially when the additional outputs are more accurate.For this case, satellite OC data include the diffuse attenuation coefficient for photosynthetically active radiation (Kd PAR ), which is derived directly from parameters measured by satellite sensors.This parameter is more accurate than chl-a because chl-a is derived with additional assumptions about the relationships between the radiative parameters and the biological parameter, chl-a.NN can take advantage of the additional output because both outputs (i) are different linear combinations of the same hidden neurons (basis functions tanh, (1)), (ii) contribute to the same error function, and (iii) are optimized simultaneously when the error function is minimized during training.Fewer neurons are required to approximate a set of correlated outputs than to approximate each of those outputs with a separate NN [8].Thus, in addition to improving the accuracy of chl-a, including the second enabled producing global fields of the second parameter Kd PAR using NN with the same number of inputs and hidden neurons.
To demonstrate advantages of such an approach, we trained a third ensemble of NNs with 23 inputs, 30 hidden neurons, and two outputs (chl-a and Kd PAR ).Table 2 shows the comparison of the three ensembles: (i) an ensemble of NNs with one output (chl-a), (ii) an ensemble of LN-NNs with one output (ln(chl-a)), and (iii) an ensemble of LN-NNs with two outputs, ln(chl-a) and ln(Kd PAR ).All NNs in these ensembles have 23 inputs and 30 hidden neurons.
As results presented in Table 2 demonstrate, including the second output improves the accuracy of the output of primary interest (chl-a).Also, the second output (Kdpar) is produced with even higher accuracy than chl-a.

Discussion and Conclusions
In our previous studies [6,7], we introduced a new approach, based on a NN technique, for relating a biological parameter, chl-a concentration, to the physical processes of the upper ocean.Our NN mapped satellite-derived surface parameters (sea-surface temperature (SST), sea-surface height (SSH), and sea-surface salinity (SSS) fields) and some in situ observations (upper layers of Argo float salinity and temperature profiles), to satellite-derived chl-a concentration.Concisely stated, we previously developed a NN-based empirical biological model for chl-a; however, that NN model had limited predictive skills.
Aiming at improving the predictive skill of the previously developed NN, this effort evaluated several optimization methods, developing (1) an empirical biological model for chl-a capable of long-term (several years) prediction of global chl-a fields and (2) a NN capable of simulating a longterm (up to 10 years, 2005 to 2014) global chl-a data set that is consistent with observations from three OC sensors (SeaWiFS, MODIS, and VIIRS).Results were assessed using the mean error (bias), root-mean-square error (RMSE), mean absolute error (MAE), and cross-correlation between observed and NN-generated chl-a concentrations.
The coarse spatial and temporal resolution of the data limited the types of features that can be resolved in the NN-generated chl-a fields.As shown, global and mesoscale features are represented reasonably well in the NN-estimated OC fields; however, to generate finer scale features, the NN needs to be trained on finer-resolution data.
This study demonstrates that the NN technique provides an accurate, computationally cheap method for generating long time series (up to 10 years long) of consistent chl-a concentration, which agree well with satellite chl-a observations   [14] for evaluating the bulk biophysical ocean-atmosphere feedback effects induced by chl-a variability that have been demonstrated in previous modeling studies (e.g., [15,16]).Our method accurately estimates the seasonal cycle and large-scale spatial patterns in satellite derived chl-a fields, best reproducing chl-a variability in the major ocean gyres in the mid-latitudes.The largest errors are found in areas where the spatial scales of variability are small and the variability is large, for example, continental shelves, coastal regions, and marginal seas.In these regions, OC (chl-a, Kd PAR ) variability is high and satellite-derived data have the highest levels of noise.Removing data points with chl-a concentrations greater than 1 mg/m 3 (less than 1% of observations) prior to training the NN improves NN performance due to reduced input and output noise.Additionally, the quantity of data is very small and insufficient for adequate NN training here in such areas.The NN approach successfully eliminates the systematic component of the noise (bias), while an NN ensemble approach reduces the random component of the noise.
It would be very important, of course, to compare our NN-based model with traditional statistical approaches (e.g., EOF-or SVD-based approaches); however, it is difficult to perform such a comparison using published results (e.g., [17,18]) for multiple reasons.First, different time scales apply; this application works with daily satellite data and, mostly relevantly, important upper-ocean data.The aforementioned studies used monthly or even three-month averaged data.Next, in this application, daily chl-a concentration and Kdpar are predicted by the NN, while previous studies monthly averages predicted the penetration depth of solar radiation (Hp), which is only related to chl-a concentration, or threemonth averages of chl-a.Additionally, this application of a NN globally establishes the relationships between chla concentration, Kdpar, and major upper-ocean physics parameters (SST, SSH, temperature and salinity profiles, etc.).The previous references studies applied simplified statistical models in limited regions to establish the relationship only between (1) Hp and SST, or (2) chl-a, SST, and SSH.Finally, while the referenced statistical models "can reasonably well capture interannual Hp response to SST anomaly in association with ENSO," our NN model very well captures the significantly more complicated relationships between daily Chl-a concentration, Kdpar, and upper-ocean physics over entire globe.The NN, EOF, and SVD methods are similar in that statistical correlations are invoked to obtain the bulk biological activity feedbacks to the ocean physics.The NN has the advantages that it is robust and easily extended to add additional oceanographic and atmospheric variables and clearly identifies the impacts of the various inputs.

Figure 1 :Figure 2 :
Figure 1: Satellite and in situ data available for NN training and validation.

Figure 5
shows the correlation of NNsimulated chl-a with MODIS data for the two previously identified NN ensembles.The figure demonstrates that three years of data provide sufficient information for training the NN ensemble, with the performance of that NN ensemble (three years of training) only slowly deteriorating during the sevenyear validation period(2005)(2006)(2007)(2008)(2009)(2010)(2011).The NN ensemble trained on two years of data does not demonstrate a sufficiently stable level of performance during the validation period.We compared the NN ensemble, trained using three years of daily VIIRS data, with MODIS and SeaWiFS observations.

Figure 5 :
Figure 5: Correlation between chl-a simulated by two NN ensembles.Black curve shows results for NN ensemble trained on two years (2012 and 2013) and pink curve for NN ensemble trained on three (2012 to 2014) of daily VIIRS data.

Figure 6 :Figure 7 :
Figure 6: Correlation of monthly mean chl-a concentrations simulated by the NN ensemble with VIIRS data (black curve), MODIS data (pink curve), and SeaWiFS data (green curve).

Figure 8 :
Figure 8: Bias of monthly mean chl-a concentrations simulated by NN ensemble versus VIIRS data (black curve), MODIS data (pink curve), and SeaWiFS data (green curve).

Figure 9 :
Figure 9: Probability distribution functions (PDF) of chlorophyll-a (solid line) and corresponding normal distribution (dashed line) having the same mean value and standard deviation: chl-a (left) and ln(chl-a) (right).
) and 14(c)) are lower than the corresponding values for NN (Figures14(b) and 14(d)) in all regions of the global oceans.

Table 1 :
Neural network inputs and outputs.