Ship Trajectory Reconstruction from AIS Sensory Data via Data Quality Control and Prediction

,


Introduction
Maritime transportation occupies over 90% of global trade in terms of goods delivering volume. Enhancing traffic safety attracts huge attention considering that maritime traffic incident can cause significant loss of human life, navigation environment damage, etc. [1,2]. To avoid potential maritime accidents, various maritime surveillance data are collected for the purpose of navigation environment awareness, which provides accurate early-warning information to maritime traffic participants [3]. e AIS data involves meaningful spatial-temporal maritime traffic information which supports various navigation operation decisions. More specifically, the AIS data is a popular data source for analyzing ship trajectory variation tendency. Note that AIS is a type of self-reporting system originally designed for preventing potential accident, which is a mandatory facility for cargo ships (i.e., ship with gross tonnage larger than 300) [4][5][6][7][8]. Moreover, fishing boats with length longer than 15 m are required to install AIS equipment in the European Union Member States [9][10][11][12].
e AIS equipment transmits the static and kinematic ship information (e.g., ship type, call sign, speed, latitude, longitude, heading, Maritime Mobile Service Identity (MMSI), etc.) at a variable refresh rate. More specifically, the AIS system broadcasts the ship information ranging from several minutes to two seconds based on the ship travelling speeds (i.e., the AIS system updates its data at lower frequency under larger maneuvering speed). In that manner, ship (equipped with AIS facility) position can be obtained in real time in coastal area. Moreover, the large-scale AIS datasets have been stored at regional or national data centers, which can be accessible by users in request (note that users may need to pay for accessing the AIS data). Previous studies suggested that AIS data quality imposes significant influence on the maritime traffic safety analysis, and thus improving AIS data quality has become an active topic in the maritime community [13,14].
AIS data anomaly removal studies involve unsupervised clustering method and neural network based and statistical models [15,16]. Liu et al. proposed an adaptive Douglas-Peucker framework to suppress AIS data outliers in the manner of data compression [17]. Deng introduced a Markov based model to explore ship movement patterns, which were further used to identify the abnormal AIS data samples [18]. Zhang et al. proposed a hierarchical density-based spatial clustering of applications with noise based model to cluster and denoise the raw AIS trajectories [19]. Rong et al. cleansed the raw AIS data in the lateral and longitudinal dimensions with a novel probability trajectory prediction model [20]. e neural network relevant models have shown many successes in tackling the AIS trajectory denoising and prediction tasks [21]. Hoque and Sharma applied long short-term memory neural network to forecast ship trajectories, which were employed to suppress the AIS data anomaly [22]. Kim and Lee proposed a novel deep neural network model to remove AIS outliers and thus predict both medium-and long-term ship trajectory variation tendencies [23]. Similar researches can be found in [8,[24][25][26][27][28].
We aim to propose a novel AIS denoising and prediction framework with the support of data quality control procedure. Our main contributions can be summarized as follows: (1) we cleansed the raw AIS data with the steps of trajectory separation, outlier removal, and data normalization; (2) we predicted ship trajectory via the denoised AIS data with the artificial neural network (ANN); (3) we testified the proposed framework performance on two ship trajectories. e study can help maritime traffic participants forecast accurate ship trajectories and thus take early-warning measurements to enhance maritime traffic efficiency and safety. e remainder of the paper is organized as follows. We introduce the data source used in our study in Section 2. After that, the methodology details about the AIS data denoising are illustrated in Section 3, and then the ANN model used for predicting ship trajectories is presented. e experimental results are shown in Section 4. Section 5 briefly concludes the study and illustrates future work.

Data
e U.S. Marine Energy Administration and National Oceanic and Atmospheric Administration provides largescale AIS data, which benefits many AIS relevant studies due to its public accessibility (https://marinecadastre.gov/ais/) [29,30]. e original AIS dataset includes both kinematic and static information for the ship, which contains MMSI, Coordinated Universal Time (UTC), latitude, longitude, speed over ground (SOG), heading, course over ground (COG), timestamp, call sign, and so forth. We collect the AIS data from the Gulf of Mexico with latitude ranging from 18°N to 31°N, and the longitude falls in the interval [75°W, 100°W]. e minimum time interval for sampling the AIS data is 1 s, and the maximum value is 500 s. We collect 12813 AIS data samples on April 11, 2017, from the above-mentioned database (see Figure 1). Following the international standard for representation of latitude and longitude, the south latitude (west longitude) is denoted as a negative number, while north latitude (east longitude) is presented with a positive number.

Methodology
e raw AIS data may contain different types of outliers due to instable signal transmission rate, data transmission congestion, etc. It is important to suppress such data anomalies for the purpose of exploiting reliable maritime traffic kinematic information from the AIS dataset. To address the issue, we firstly implement the data quality control procedure to remove the trajectory outliers and then predict the trajectory with artificial neural network. e schematic overview for the proposed framework is shown in Figure 2.

Data Quality Control for Suppressing AIS Outliers.
Ship trajectory data (i.e., AIS data) is stored in the database via data delivering/receiving timestamp, and thus we need to aggregate trajectory data (from a single ship) before conducting ship trajectory analysis relevant researches. For the purpose of thoroughly removing anomalous AIS samples, we implement the data quality control with steps of ship trajectory extraction, data cleansing, and data formatting (i.e., AIS time interval normalization).

Ship Trajectory Extraction.
e ship trajectory extraction can be divided into separating trajectories from different ships and removing discontinuous ship trajectories. It is noted that the ship can be uniquely identified by the MMSI, which is thus applied to separate AIS data samples from different ships. en, the raw AIS samples are sorted by timestamp in an ascending manner. It is found that an AIS sample may be recorded in database for several times. To address the issue, the repetition samples are removed to avoid being further processed when the constraints in equation (1) are satisfied. e outputs from the above step are the raw ship trajectories. We find that several time intervals between neighboring samples are very large (e.g., four hours), indicating that many AIS data are lost. Such AIS data discontinuity imposes big challenge for analyzing ship kinematic moving state in detail. To overcome the disadvantage, we divide the raw ship trajectory into different segments when the time interval between neighboring samples exceeds a threshold (see equation (2)): where T a and T b are the timestamps from two AIS records. e ship positions at timestamp T a are denoted as Lat a and Lon a , respectively. Lat b and Lon b are the counterparts at timestamp T b . T i is the time interval between neighboring samples. T th is the threshold, which is set to 4 hours by default.

Removal of AIS Data Anomaly.
After obtaining the AIS data in the above step, we implement the anomaly data denoising procedure to remove AIS data noises. Typical AIS data outliers are summarized as follows: (a) e longitude and (or) latitude: it is far beyond the reasonable range. We collect the AIS data in the Gulf of Mexico, and the longitude (latitude) is supposed to fall in 75°W (18°N) and 100°W (31°N). e ship trajectory will be considered as outliers when the ship spatial data (i.e., latitude and longitude) exceed the range. Moreover, sudden longitude (latitude) variation is another type of typical outlier, and we employ the moving average method to correct the data outliers (see equation (3)). (b) Abnormal velocity data: after manually checking the raw AIS data, we find several ship speed samples are very high (i.e., larger than 30 knots). It is less likely for a ship travelling in inland waterways at such speed for the purpose of ensuring maritime traffic safety. (c) Ship course outlier: ship may change its moving direction in coastal areas to avoid maritime traffic collision. But large ship course variation is not permitted in real world. We average the neighboring ship courses to remove such data outlier. Given ship headings C 1 , C 2 , and C 3 from three neighboring AIS trajectory samples (with timestamps t 1 , t 2 , and t 3 ), we consider C 2 as the outlier when the condition in equation (4) is satisfied. e ship heading C 2 is updated as C 2 ′ with equation (5): > lat th and lat i − lat (i+1) > lat th , > lon th and lon i − lon (i+1) > lon th .
Here, ship latitude and longitude at timestamp i are lat i and lon i , respectively. e rule is applicable to lat (i−1) , lat (i+1) , lon (i−1) , and lon (i+1) . lat th and lon th are the thresholds for the latitude and longitude, respectively. C th is the ship heading variation threshold. e parameter t 13 is time difference between timestamps t 1 and t 3 , and t 12 is the time interval between t 1 and t 2 .

AIS Data Normalization.
We can obtain noise-free AIS dataset after implementing the above two steps. It is found that time interval may vary from different data samples, which hinders ship trajectory reconstruction model from accurately extracting AIS intrinsic patterns. In that manner, it is difficult to predict ship trajectory in real-world applications. To address the issue, we employ the cubic spline interpolation and moving average models to normalize the AIS data series. Given three noise-free AIS trajectories A 1 , A 2 , and A 3 , we label and store the AIS samples A 1 and A 3 assuming that one of the following conditions is met (see equations (6) to (9)). Moreover, the AIS sample A 2 is denoted as flag data when the constraints in equation (10) are satisfied. We normalize the ship trajectory samples between A 1 and A 2 with the cubic spline interpolation, and, for more details, we suggest the reader to refer to [31]. e ship AIS data between A 2 and A 3 is normalized with the moving average model, and details can be found in [26].
Note that appropriate time interval is crucial for ship trajectory analysis due to the fact that large time interval can lead to ship kinematic information loss, and smaller time interval may introduce trivial ship moving patterns. After carefully exploiting time interval distributions via the collected AIS data samples (see Figure 3), we find the majority of time for interval samples is 60 s, which is set as default value in our study without further specifications: t 23 < t t2 , where d 12 · (d 23 ) is the ship displacement between positions A 1 · (A 2 ) and A 2 · (A 3 ) (A 3 ). Parameter t 12 · (t 23 ) is time cost for the ship travelling from position A 1 · (A 2 ) to position A 2 · (A 3 ). e speed recorded in the trajectory sample A 1 is v 1 , and the rule is applicable to v 2 . d t1 , t t1 v th , d t2 , t t2 , d t3 , and d t4 are the thresholds, with the default settings being 5, 360, 5, 10, 160, 30, and 30.

Ship Trajectory Prediction with the ANN Model.
e artificial neural network model has shown great successfulness in many roadway traffic flow prediction applications, which demonstrates its potential in ship trajectory prediction task. e main advantages of the back-propagation (BP) neural network are strong nonlinear curve fitting capability, low complexity, and self-learning ability, which can easily identify and predict ship trajectory variation tendency. Moreover, the ANN model can output ship trajectory prediction results in real-time manner due to the low computational cost, which can provide instant maritime traffic information for tackling time-demanding maritime tasks. Based on the above reasons, we employ the ANN model to predict AIS trajectories. e ANN model exploits intrinsic relationship between input training (and testing) data and the output samples with the human-like information perception rule.
For the given gth neuron node, we denote by A j (j � 1, 2, . . ., J) the input ship AIS trajectory. w j ( j � 1, 2, . . ., J) represents the weight for each ship trajectory. Based on that, the input AIS data for the gth neuron node is obtained by equation (11). Note that the hidden layer in a BP neural network plays the role of extracting ship travelling patterns from the AIS data. With the help of transfer function, the BP neural network can learn the nonlinearity patterns among the input ship AIS data samples, and the sigmoid transfer function used in our study is shown in equation (12). e BP network measures difference between the predicted AIS trajectories and ground-truth data, which is returned back to network to adjust the model structure and neuron settings for the purpose of obtaining optimal ship trajectory prediction results: where N g in is the input for the gth neuron node and s j g is the state of the gth neuron of the hidden layer with the jth AIS sample.

Evaluation Metrics.
To quantify the ship trajectory prediction performance, we compare the predicted AIS data with ground truth data with typical statistical measurements. Following the rule in previous studies [26,27], we employ the root mean square error (RMSE), mean absolute error (MAE), Frechet distance (FD), and average Euclidean distance (AED) to measure the prediction goodness. For any given ship trajectories, the prediction accuracy is quantified with the above-mentioned statistical indicators (see equations (13) to (16)). e smaller RMSE, MAE, FD, and AED indicate more accurate ship trajectory prediction accuracy, and vice versa. Note that both the RMSE and MAE indicators are implemented to quantify the ship trajectory prediction accuracy in terms of longitude and latitude, respectively: FD � max iε [1,n] ����������������������������� where n is the number of AIS data samples. p i is the ith predicted AIS data sample, and g i is the ith ground truth AIS data samples. e parameters p i (x) and p i (y) are the latitude and longitude for the ith predicted AIS data sample, and g i (x) and g i (y) are the counterparts for the ith ground truth AIS sample.

Experiment
For the purpose of evaluating framework performance, we have collected two typical ship trajectories (i.e., two groups of AIS data samples) from the observed navigation region. e AIS data for the ship with MMSI No. 357234000 and No. 367715380 were collected from the above-mentioned data base, which were denoted as Case 1 and Case 2, respectively. e ship trajectory for Case 1 was sampled from 24 January, 2017, to 25 January, 2017. e data samples for Case 2 were collected on 6 January, 2017. e framework was implemented on Windows10 OS with 16 GB RAM and 4 GHz CPU. We employed Python (3.5 version) to perform the data quality control and prediction procedure on the ship trajectory data.

Ship Trajectory Reconstruction on Case 1.
We first presented the ship trajectory reconstruction results on Case 1 (i.e., the ship with MMSI No. 357234000) and then verified the model performance on the AIS data from the ship with MMSI No. 367715380. e spatial-temporal ship trajectory distribution shown in Figure 4 indicated that the ship was travelling back and forth in small area considering that both ship longitudes and latitudes varied in small range. But several obvious outliers were found in the raw ship trajectory data. More specifically, the anomalous ship positions from several AIS data samples were far away from their neighbors, which showed unreasonable ship displacement. After carefully checking the raw AIS data, we found that average ship moving speed was quite slow (i.e., smaller than 4 knots).
e main reason is that the ship is a special survey seismic vessel which was towing on the water surface at a large area (e.g., towing a dozen of sensors connected by hydrophone streamer cables). Moreover, the ship's instant speed reached 20 knots when the ship position was considered as obvious outlier (e.g., abnormal ship latitude and/or longitude). It can be inferred that the ship finished the task in the current coastal region and thus speeded up to another sea area. Besides, the abnormal ship longitude positions were different from those of the latitude counterparts (see Figures 4(a) and 4(b), respectively). In that way, anomaly ship trajectory sample was observed when latitude (or latitude) was interfered by neighboring ship positions. e denoised ship trajectory showed that abnormal data samples were successfully removed by the proposed framework considering that no outliers were observed in the spatial-temporal trajectory distributions. e denoised ship latitude and longitude distributions shown in Figures 5(a) and 5(b) confirmed the above analysis. It is observed that the denoised ship longitude varied from −89.6°to −89.8°, and the latitude data varied from 27.8°to 28°. It can be inferred that the ship travelled in an area with a radius about 2 km, and thus the ship was indeed in mooring state. We observed that ship trajectory samples were not evenly distributed considering that many data discontinuities are found in Figure 5. To alleviate such discontinuity, we have normalized the ship trajectory samples which are shown in Figure 6. e raw denoised ship trajectories were interpolated into evenly distributed data series, and discontinuous data samples were successfully removed (see Figures 6(a) and 6(b), respectively). e ship trajectory reconstruction results were further evaluated by ship trajectory prediction accuracy, which can be found in Table 1 (note that our proposed framework is denoted as DANN). We have implemented another popular trajectory prediction model (i.e., the long short-term memory model (abbreviated as LSTM)) [32] for the purpose of prediction performance comparison. From the perspective of MAE, the longitude error of our proposed framework was approximately one-tenth to that of LSTM, which is 3.94 × 10 −3 . e latitude MAE obtained by our model is 2.07 × 10 −3 , which is about 1% to that of the LSTM counterpart. e RMSE indicators for the longitude and latitude obtained by the proposed framework were both 5.41 × 10 −4 , which showed similar variation tendency to those of MAE. Both the FD and AED indictors demonstrated distance between predicted and ground truth data samples, which showed similar tendency to those of MAE and RMSE.

Ship Trajectory Reconstruction on Case 2.
e model performance of the proposed framework was validated on another trajectory with ship MMSI No. 367715380. We applied the same procedure to improve the AIS data quality, and thus ship trajectory prediction was further implemented. We observed several sudden variations in both latitude and longitude data samples (see Figures 7(a) and 7(b), respectively). Moreover, the maximum distance between neighboring ship latitudes was over 1000 km, which is quite impossible in real world. Figures 7(c)         Mathematical Problems in Engineering 7(d) demonstrated the AIS data after implementing the data quality control procedure. It is demonstrated that ship trajectory outliers were successfully removed by our proposed framework. e ship trajectory prediction results indicated that our proposed framework obtained higher accuracy compared to the LSTM model. For instance, the MAE indicators for the DANN latitude and longitude were 2.23 × 10 −3 and 2.60 × 10 −3 , respectively, which were both approximately one-tenth to those of the LSTM counterparts (see Table 2). e RMSE, FD, and AED indicators showed similar variation tendency to those of the MAE.

Conclusion
It is not easy to obtain accurate ship trajectory information via historical AIS data due to unpredicted noises. We proposed a novel framework by integrating steps of data denoising, data normalization, and trajectory prediction. e proposed framework firstly identified different ship trajectories via time interval between neighboring data samples, which is the first substep in the data quality control procedure of the proposed framework. en, the data outliers in the raw AIS data were determined with a group of constraints, which were further corrected by the moving average method. After that, the denoised AIS data were normalized into data samples for the purpose of ship trajectory analysis applications. en, we predicted ship trajectory with the ANN model for the purpose of further evaluating model performance. e experiments were implemented on two ship trajectories (i.e., typical outliers were observed in the raw data). e statistical results showed that our proposed framework can successfully remove abnormal AIS data outliers and obtained satisfying ship trajectory prediction performance (i.e., the average MAE,   We can expand our work by conducting further studies in the following aspects. First, we applied our proposed framework to cleanse and predict ship trajectories on the AIS data from two special-purpose ships, which is more challenging due to the irregular and unpredicted spatial-temporal movements. In future, we can employ the proposed framework to denoise and forecast ship trajectories for general-purpose merchant ships (e.g., oil tankers and container ships) to further testify the model performance. Second, it is noted that the ANN module in the proposed framework may suffer from the overfitting disadvantage, which may degrade the model performance. We can employ additional bio-inspired models to further enhance the model prediction accuracy.
ird, we can obtain more holistic model performance by comparing it against other popular ship trajectory prediction models. Fourth, we can test the model robustness on the AIS data collected under more complicated navigation environment interferences (e.g., ship sailing at narrow and busy channels). Last but not least, we can implement maritime situation awareness task (e.g., ship behavior analysis and prediction) by exploiting the obtained historical AIS data.

Data Availability
All data generated or analyzed during this study can be obtained from the corresponding author upon request by email.