Gap Filling of the CALYPSO HF Radar Sea Surface Current Data through Past Measurements and Satellite Wind Observations

High frequency (HF) radar installations are becoming essential components of operational real-time marine monitoring systems. The underlying technology is being further enhanced to fully exploit the potential of mapping sea surface currents and wave fields over wide areas with high spatial and temporal resolution, even in adverse meteo-marine conditions. Data applications are opening to many different sectors, reaching out beyond research and monitoring, targeting downstream services in support to key national and regional stakeholders. In the CALYPSO project, the HF radar system composed of CODAR SeaSonde stations installed in the Malta Channel is specifically serving to assist in the response against marine oil spills and to support search and rescue at sea. One key drawback concerns the sporadic inconsistency in the spatial coverage of radar data which is dictated by the sea state as well as by interference from unknown sources that may be competing with transmissions in the same frequency band.This work investigates the use of Machine Learning techniques to fill in missing data in a high resolution grid. Past radar data and wind vectors obtained from satellites are used to predict missing information and provide a more consistent dataset.


Introduction
The risk of oil, from marine spillages beaching on shores, hitting important economic resources and causing irreversible environmental damage is a very realistic menace in the Malta Channel and the stretch of sea between Malta and Sicily.Especially in a small island state like Malta where economic assets are concentrated in space, the damage would be even more devastating.Moreover, this region is situated along the main shipping lanes of the Mediterranean Sea.
Risks can be highly minimised by using the best tools for surveillance and operational monitoring against pollution threats, as well as a capacity to respond with informed decisions in case of emergency.In the CALYPSO project, top-end technology consisting of an array of HF radars was installed to monitor in real-time meteo-marine surface conditions in the Malta Channel.The collected measurements continuously provide accurate information to monitor and respond effectively to threats from oil spills.Observed sea surface currents can be coupled with Lagrangian particle models to compute the hindcast trajectory of any detected spill.If such information is coupled with historic data from Vehicle Tracking Systems (VTS), marine vessels intersecting the predicted spill movement can be identified and the source of the pollution may be determined.Moreover, data from the HF radar provides an avenue for a wider range of applications including search and rescue and safer navigation.
While the use of the 13.5 MHz radar frequency used in CALYPSO provides a good spatial coverage and resolution over the required domain, considerable interference with the radar signals, noted in the area especially in the early afternoon periods, results in significant loss in spatial coverage.In this work, an intelligent gap filling technique that makes use of past sea currents as well as wind measurements recorded by satellite is proposed.A mesh of neural networks is trained at each grid cell to model the circulation patterns from recent observations.The zonal () and meridional () components of the sea currents are treated separately.The system is trained and tested on a dataset generated over 25 months with a temporal frequency of one hour.

International Journal of Navigation and Observation
McCulloch and Pitts introduced neural networks to the field of artificial intelligence in around 1943, when they modelled the switching activity of neurons in the human brain [1].Since then, their applicability was tested on various applications ranging between handwriting recognition, face recognition, speech recognition, galaxy morphology classification, and other areas where complex real-world data needs to be interpreted and processed [2].The artificial networks consist of a number of neurons with complex interconnections between them.Each virtual neuron takes a vector of real values and produces one output according to its activation function.In this work, such a method is used to generate a value for  and  components of sea surface currents at specific positions in space and time.
The performance of the proposed gap filling technique is compared to an existing method which computes missing values by a Data Interpolating Empirical Orthogonal Function (DINEOF) algorithm [3].Although the latter is the current commonly used interpolating scheme, its performance is tested by reconstructing  and  components of radar sea currents over time slots where data is available.The data generated over the gaps do not match with the expected values with sufficient accuracy, thus showing the limitations of the method.Furthermore, the DINEOF algorithm overspills and contaminates the data adjacent to the gaps; the algorithm alters to a substantial extent the available measurements before and after a slot of missing data and appears to influence observations by the generated values in order to achieve smoother currents.
In the following section, the CALYPSO and CALYPSO Follow On projects as well as the outcomes and deliverables are described briefly.In Section 3, the design details of the prototype data filling technique are given.Results on real datasets are presented in Section 4. A general discussion and the planned way forward are given in the concluding section.

The CALYPSO Project
CALYPSO was a two-year project partly financed by the EU under the Operational Programme Italia-Malta 2007-2013 [4].The consortium consisted of research and public entities with responsibilities for civil and environmental protection, surveillance, security, and response to hazards.The main project deliverable was the setting up of a permanent and fully operational HF radar observing system, capable of recording (in real-time with hourly updates) surface currents in the Malta Channel.The system, which has been running operationally since 2012, consists of antenna installations on the northern Malta and southern Sicilian shores at selected sites.In particular, radial sites were installed at Ta' Barkat limits of Xghajra in Malta, Ta' Sopu limits of Nadur in Gozo, and Pozzallo and Ragusa in Sicily.A combining station at the University of Malta was set up to elaborate and publish data to users.The system is set to operate at a frequency of 13.5 MHz and with an angular resolution of 5 ∘ .With this configuration, hourly sea surface current data over a regular grid of cells 3 × 3 km 2 is generated through the combination of two or more radials.The sharing of the same frequency band is made possible through GPS time synchronisation between all the radial sites.
The HF radar data are intended to primarily support applications and optimise intervention in case of oil spill response as well as support tools for search and rescue (SAR), maritime security, safer navigation, improved metro-marine forecasts, monitoring of sea conditions in critical areas such as proximity to ports, and better management of the marine space between Malta and Sicily.A key service consists in the direct access of the HF radar data by the Armed Forces of Malta (AFM) through the Search and Rescue Optimal Planning System (SAROPS).Based on the US Coast Guard model, this software is used to support SAR missions.In case of an accident, past and real-time met-ocean data is automatically obtained from the Environmental Data Server (EDS) which is linked to the CALYPSO system.SAROPS can then utilise the high resolution local data to identify the "most likely" location of missing persons or drifting objects based on drift models.The search pattern, probability of success, and probability of containment are computed and given to the authorities [5].
The CALYPSO project also served in capacity building in the monitoring of the coastal seas and adjoining resources.The measured data is shedding new insights into the dynamics of the sea in this part of the Mediterranean, leading to research efforts also related to improved forecasting of the marine environment, protection from oil spills, search and rescue, and fisheries.
The CALYPSO Follow On project was a six-month extension project that improved on the achievements of the original project.After its completion, a more robust HF radar monitoring system was established and downstream services to targeted users were accomplished including the launch of a smartphone application for use by mariners.The location of the radial sites as well as the data recorded on the 17/ 06/2016 at 00:00 is presented in Figure 1.Real-time information can be accessed from the project website http://www .capemalta.net/CALYPSO.
Validation of the observed HF radar currents was done through 27 Surface Velocity Program (SVP) drifters that were released in five different deployments along a chosen transect in the Malta Channel.The Iridium satellite constellation was used to track the position of each buoy with a temporal frequency of one hour.Apart from the geographical coordinates with an accuracy of about 10 m, the transmitted data included battery level, sea surface temperature, and an indication of the presence of the underwater drogue that reduces the wind influence on the followed path.By using consecutive points, the zonal and meridional velocity components at each transmitted location were calculated and compared to the remotely sensed currents recorded by the radar network.Accuracy in the surface layer was found to be between 1 cm/s and 3 cm/s [6].The validation was also done through a dedicated ADCP survey.    2 and 3 show the reduction in spatial coverage and the corresponding spectra as measured by the Ta' Barkat station in Malta during normal operation compared to instances when the external interference is present.Figure 4 shows how this interference can drastically affect the spatial coverage of the computed combined sea current fields.

An Intelligent Interpolation Scheme
Such data gaps in both space and time are highly restrictive on the quality of the service provision to users.HF radar data streams need therefore to be processed to fill in the gaps by reliable guesses.An off-the-shelf interpolation technique was initially applied using the DINEOF algorithm made available by the GeoHydrodynamics and Environment Research (GHER) lab [3,7,8].This method relies on an iterative Empirical Orthogonal Function (EOF) decomposition of an incomplete data matrix.Missing values were initially padded with zeros and were recursively filled with the results obtained from the previous iterate.The reconstruction was performed using 25 EOF modes, which retained 95% of the variance in the original dataset.Details on the generation of such a dataset and its validation can be found in [6].As shown in Figure 5, vectors that cover the entire domain were generated.The skill of this technique was tested by reconstructing  and  sea current components over patches for which observed radar data was available.However, the generated values did not correlate well with the available measured currents.The discrepancies shown in Figure 6    provide a good basis to generate missing meridional and zonal sea current components from a learning process that makes use of previously observed HF radar and wind fields.Such networks connect a number of elements in a structure that takes a set of inputs and produce a single real number.The learning algorithm determines numeric weights to apply between each of these neurons to obtain the desired output.One main advantage of this technique is that it can produce good results even when it is supplied with noisy and incomplete data.
Missing current vectors at a particular time are generated by processing the HF radar observations in the previous few   hours preceding the gap.Sea currents in the Malta Channel are the expression of a number of factors influencing the motion of the water at different temporal and spatial scales.The general circulation is indeed dictated by the slow basin scale (vertical) thermohaline structure of the Mediterranean and exhibits known seasonal characteristics.The spatial scale of these circulation patterns is captured at the level of the full HF radar domain.However, the circulation is also modified by strong mesoscale signals in the form of eddy, meander, and filament patterns.These mesoscale processes are triggered by the synoptic scale atmospheric forcing.The heat and momentum fluxes at the air-sea interface represent the dominant factor in the mixing and preconditioning of the surface of the Atlantic Water that crosses the Malta-Sicily Channel on its way to the Eastern Mediterranean [9,10].To make the data filling technique applicable even over larger temporal gaps, wind observations are therefore also utilised in combination with HF radar currents in order to fill data gaps by a more intelligent guess.
The preference to the use of wind velocity rather than wind stress relies on the linear relationship between currents and friction velocity evidenced in elaborated models of wind driven currents [11].Studies on the wind influence of sea surface current variability observed by HF radars in other parts of the Mediterranean confirm that the local wind forcing is a dominant effect.In the Ibiza Channel, the first and second HF radar EOF modes account for more than 60% of the total variability and cross-correlate with the meridional and zonal wind components, respectively, with a peak at zero time lag.The wind forcing has thus a predominantly immediate effect on the surface current field [12].The same applies more specifically to the Sicily Channel where the correlation between the velocities of Lagrangian drifters and ERA 40 from the European Centre for Medium range Weather Forecasts (ECMWF) is found to be maximal with time lags less than 6 hours [13].
Satellite wind data (Level 4) were acquired for the dataset period from the Copernicus Marine Environment Monitoring Service [14] with a temporal frequency every 6 hours.The IFREMER CERSAT global mean wind fields were used consisting of the surface 10 m wind speed, wind zonal component, wind meridional component, wind stress amplitude, wind stress zonal component, wind stress meridional component, and the associated errors.This spatially gridded dataset was estimated from the ASCAT and OSCAT scatterometers that are mounted on the Metop-A and OceanSat-2 satellites and are in quasi sun-synchronous orbits, crossing the equator in ascending mode at 09:30 am and 00:00 local time, respectively [15].Satellite swath data recorded within a three-hour timeframe from the six-hourly ECMWF wind field were automatically included.ASCAT and OSCAT values with a timestamp between three and nine hours from the ECMWF analysis were interpolated in time.This results in a blended wind field with a temporal resolution of six hours which globally spans at a spatial resolution of 0.25 ∘ across both longitude and latitude.In this work, the wind data was spatially interpolated onto the HF radar grid and the temporal frequency was increased to one hour by linear interpolation.Figure 7 shows the original and upsampled satellite data product.
A dataset was initially created and provided as a training set to a number of Artificial Neural Networks (ANNs) to learn the local patterns.Since the gap pattern in the data is not constant, each grid point is treated separately.Different ANNs are defined and trained to predict the values for each cell without requiring data from adjacent points.This ensures that sea surface current values can always be computed irrespective of how large or long the HF radar data gaps are in space and time.Data with a temporal frequency of 1 hour collected between 01/01/2013 at 00:00 UTC and 31/01/2015 at 23:59 UTC was considered.This resulted in a dataset of 18,264 raster sets collected over 25 months.A high resolution regular grid of 736 nodes with a spatial resolution of 0.04 degrees was defined over the Malta Channel.The domain extended between 13.6376 ∘ E and 15.3981 ∘ E in longitude and between 35.7263 ∘ N and 37.0192 ∘ N in latitude.70% of the labelled datasets were used to build the models.In each iterate 15% were used to assess, validate, and check for convergence.The remaining 15% of training examples were used to quantify the accuracy of the system before processing the data gaps.Since real radar observations were available for this labelled set of vectors, the behavior and accuracy of the model could be tested on unseen data.
To predict the coefficient at a particular point in time, the wind and current data for the past six hours were used. component of a particular grid cell.Each row encodes the same data fields that are used to make a prediction together with the corresponding current magnitude.Such information includes the values of the previous six hours for the wind forcing (-6, -5, -4, . . ., -1) and the past six hours of current observations by the radar (-6, -5, -4, . . ., -1).Initially wind and current data for  and  components over 24 hours were considered.However, the correlation of data vectors with a timestamp separation longer than 6 hours was not found to be strong enough.Adding such information to the classifier did not improve the results.
A separate network was generated for each grid cell.As shown in Figure 8, for each position in the grid, a structure with 12 neurons in the input layer, 15 neurons in the hidden layer, and 1 output neuron was implemented.The Levenberg-Marquardt backpropagation algorithm was used to learn the weights and to minimise the required nonlinear function    [16,17].Mathematical details of how the net values, slopes, and outputs for all neurons were computed in the forward pass, and how the weights were updated during backpropagation, can be found in [18].

Analysis of Results
Once all the data was collected and processed, an ANN was trained for each grid cell of the HF radar domain.Each network was built using all the available information.Training vectors for which at least one value was missing were ignored.The converged ANNs were run for the timestamps where radar data were missing in order to fill in the gaps.Points corresponding to the centre of the domain for which a lot of information was available converged in a few iterates with a Pearson correlation of 0.9867 ( value of 0).Cells over regions that experienced a lot of gaps required more iterates to reach steady state conditions.In such cases, the correlation between the few observations available and the predicted vectors went down to about 0.9319 ( value of 5.9544 − 07). Figure 9 provides examples of the resulting hourly maps with missing data filled in. Figure 10 shows sections over which original radar data was also available.This allows a comparison to be made between actual observed data and the values generated by the ANN.
Figures 11 and 12 show the time series plots for the  component at two points, as recorded by the HF radar and as predicted by the respective DINEOF and ANN techniques.The absolute residuals between the real currents recorded by the radar and the predicted values by the two methods are also presented.To quantify the error of the predicted trends, the Mean Square Error (MSE) as well as the average absolute residual was computed for  and  components, respectively.Such an analysis was carried out on available labelled data (for which radar values were available) which were not used during the training phase.The results of the same grid cells used in the time series shown in Figures 11 and 12 are summarised in Tables 2 and 3, respectively.The ANN interpolation scheme gave a lower MSE by an order of magnitude and superior performance was recorded for all tested grid cells.

Conclusion
The routine acquisition of multidisciplinary, spatially widespread, long-term datasets of the ocean and coastal seas is expected to trigger an unprecedented leap in the economic International Journal of Navigation and Observation  value of ocean data and information and will additionally target multiple applications and users.The HF Radar Network installed during the CALYPSO projects puts Malta and Sicily at the forefront of such initiatives in the Mediterranean and will serve as a stepping stone to add to the system in the future to have a coverage of the full marine space around the Maltese Islands and the Sicilian perimeter, including the coastal areas.For higher quality data, this work investigated the potential of using Machine Learning techniques to fill in gaps within the HF radar observed current maps.While further work is necessary to get the system running in an operational mode, the proof of concept has shown that very good results can be achieved using the latest six hours of observed current and wind data.Planned future work includes experimentation with other learning methods.The applicability of such techniques for short term forecasting will also be studied.In particular, the potential use of observed sea surface currents and wind vectors to predict surface state conditions of the sea over the next few hours will be investigated.

Figure 5 :
Figure 5: Examples of observed HF radar sea surface current vector fields (black) overlaid on the results by the DINEOF gap filling technique (red).

Figure 6 :
Figure 6: Inconsistencies between the radar data (black) and the interpolated vectors using the DINEOF technique (red).

Figure 7 :
Figure 7: Original IFREMER CERSAT global blended mean wind fields (blue) and upsampled (red) satellite wind data.The arrow scales of the two datasets are not the same and have been set for better visualisation of the overall wind pattern circulation.

Figure 9 :
Figure 9: Gap filling results by the ANN technique.

Figure 10 :Figure 11 :Figure 12 :
Figure 10: Strong correlation between the original radar data and the interpolated results by the ANN technique.

Table 1 :
Sample training dataset used for supervised learning of  and  coefficients for time T.

Table 1
provides a subset of the training and target vectors for

Table 2 :
MSE and absolute residual averages between the DINEOF and ANN techniques for the grid point at 14.3601 ∘ E longitude and 36.0659∘Nlatitude (corresponding to Figure11).

Table 3 :
MSE and absolute residual averages between the DINEOF and ANN techniques for the grid point at 14.888269 ∘ E longitude and 36.296182∘Nlatitude (corresponding to Figure12).