Taiwan, being located on a path in the west Pacific Ocean where typhoons often strike, is often affected by typhoons. The accompanying strong winds and torrential rains make typhoons particularly damaging in Taiwan. Therefore, we aimed to establish an accurate wind speed prediction model for future typhoons, allowing for better preparation to mitigate a typhoon’s toll on life and property. For more accurate wind speed predictions during a typhoon episode, we used cutting-edge machine learning techniques to construct a wind speed prediction model. To ensure model accuracy, we used, as variable input, simulated values from the Weather Research and Forecasting model of the numerical weather prediction system in addition to adopting deeper neural networks that can deepen neural network structures in the construction of estimation models. Our deeper neural networks comprise multilayer perceptron (MLP), deep recurrent neural networks (DRNNs), and stacked long short-term memory (LSTM). These three model-structure types differ by their memory capacity: MLPs are model networks with no memory capacity, whereas DRNNs and stacked LSTM are model networks with memory capacity. A model structure with memory capacity can analyze time-series data and continue memorizing and learning along the time axis. The study area is northeastern Taiwan. Results showed that MLP, DRNN, and stacked LSTM prediction error rates increased with prediction time (1–6 hours). Comparing the three models revealed that model networks with memory capacity (DRNN and stacked LSTM) were more accurate than those without memory capacity. A further comparison of model networks with memory capacity revealed that stacked LSTM yielded slightly more accurate results than did DRNN. Additionally, we determined that in the construction of the wind speed prediction model, the use of numerically simulated values reduced the error rate approximately by 30%. These results indicate that the inclusion of numerically simulated values in wind speed prediction models enhanced their prediction accuracy.

A typhoon is a severe natural disaster that affects tropical and subtropical coastal countries, and it occurs most frequently in the northwestern Pacific Ocean. Taiwan is to the east of the Eurasian Continent and at the western side of the Pacific Ocean; its climate is intermediate between tropical and subtropical climate. Thus, typhoons frequently occur in Taiwan, generally in summer and fall. Typhoons affecting Taiwan typically develop at the sea surface southeast of Taiwan, and most typhoons are accompanied by torrential rains and strong winds [

For example, the 2015 Typhoon Soudelor was the most destructive typhoon that occurred in Taiwan in recent history, with gust intensity exceeding 12 on the Beaufort wind force scale (32.7 m/s). Its strong gusts caused widespread damage to infrastructure, affecting gas supply, power and utilities, transportation and communication, and weather radar stations. Electricity was cut off in approximately 4.5 million households simultaneously during Typhoon Soudelor—the greatest recorded number in recent history. The economic loss from the typhoon was estimated to be as high as US$76 million [

Therefore, we aimed to establish an accurate wind speed prediction model for future typhoons, allowing for better preparation to mitigate a typhoon’s toll on life and property. In this study, cutting-edge machine learning (ML) techniques were used to improve predictive accuracy. In general, ML algorithms learn from a huge dataset, improving their ability to identify patterns in the data. Specifically, ML involves creating algorithms to make prediction from sets of unknown data. Given their ability to perform parameter adjustment and achieve optimization through self-learning, neural network algorithms in ML are particularly powerful [

Deep learning has become possible due to the exponential increase in computing power in recent years. This approach is the further derivation of multiple neural layers from the original neural layers of a model. Such derivation improves an algorithm’s ability to learn, better approximating the complex neural network structure of a human being. For example, Hu et al. [

In the field of neural networks, recurrent neural networks (RNNs), which can analyze sequential (or time-series) data, have recently been developed [

The widespread application of ensembles in numerical weather prediction (NWP) has helped researchers improve weather forecasts [

Recent studies have applied the NWP model to typhoons and tropical cyclones [

Furthermore, to enhance the predictive accuracy of constructed ML models, other researchers have used numerically simulated results as input data for the construction of ML models. For example, Zhao et al. [

Therefore, our study has two primary aims: (1) develop an ML- and neural network-based wind speed prediction model and compare the predictive accuracy of various neural network-based algorithms and (2) evaluate the improvements to predictive accuracy afforded by the use of numerical solutions obtained from NWP models (by comparing between its use and nonuse) in a typhoon-surface wind-speed prediction model.

Figure

Flowchart of neural network-based typhoon wind velocity prediction model using NWP numerical solutions.

The best model among a set of ML neural network-based models, involving MLP, DRNN, and stacked LSTM, was determined. According to [

In this section, we describe the neural network-based architectures, which used the MLP, DRNN, and stacked LSTM algorithms, that were adopted for model construction. As illustrated in Figure

Architecture of (a) MLP, (b) DRNN, and (c) stacked LSTM neural networks.

As illustrated in Figure

A stacked LSTM architecture is defined as an LSTM model that comprises multiple LSTM layers (Figure

The basic structure of LSTM, as illustrated in the LSTM layer in Figure _{t}, output gate _{t}, forget gate _{t}, and memory cell _{t}. A single LSTM layer has a second-order RNN architecture that excels at storing sequential short-term memories and retrieving them at many time steps later [_{t} is the input vector; _{f}, _{i}, _{o}, _{c}, _{f}, _{i}, _{o}, and _{c} are the weight vector terms; _{f}, _{i}, _{o}, and _{c} are the corresponding bias terms; and _{t} and _{−1} are the current and previous hidden vectors, respectively.

In this study, because the function converges quickly and has no problems with a vanishing gradient, we used the ReLU activation function in the middle layers (the hidden, recurrent, and LSTM layers) of the aforementioned three models. The ReLU is defined as the positive part of its argument; that is,

Taiwan is located at the western Pacific Ocean, a region frequently in the path of typhoons. Typhoons in Taiwan are often accompanied by torrential rains, which severely endanger life and property. Thus, more accurate regional wind speed prediction is needed to improve typhoon protection. In this study, we chose northeastern Taiwan as the study area (Figure

Map of Taiwan’s northeastern region and the meteorological stations therein.

According to typhoon data from the Central Weather Bureau (CWB), between 2000 and 2018, 29 typhoons have directly struck Taiwan (Table

Typhoons affecting the study area from 2000 to 2017 (29 incidents).

Year | Typhoon | Period (UTC) |
---|---|---|

2000 | Kai-Tak | 6–10 Jul |

2001 | Toraji | 28–31 Jul |

2001 | Nari | 10–20 Sep |

2002 | Nakri | 9–11 Jul |

2004 | Mindulle | 29 Jun–3 Jul |

2004 | Aere | 23–26 Aug |

2004 | Haima | 12-13 Sep |

2004 | Nock-Ten | 24–26 Oct |

2005 | Haitang | 17–19 Jul |

2005 | Longwang | 1-2 Oct |

2006 | Bilis | 12–14 Jul |

2007 | Sepat | 16–19 Aug |

2007 | Krosa | 5–8 Oct |

2008 | Kalmaegi | 16–19 Jul |

2008 | Fung-Wong | 27–29 Jul |

2008 | Sinlaku | 10–16 Sep |

2008 | Jangmi | 27–30 Sep |

2009 | Morakot | 6–9 Aug |

2010 | Namtheun | 30-31 Aug |

2012 | Saola | 30 Jul–3 Aug |

2013 | Soulik | 10–14 Jul |

2013 | Trami | 16–22 Aug |

2013 | Kong-Rey | 27–30 Aug |

2014 | Matmo | 21–23 Jul |

2014 | Fung-Wong | 19–22 Sep |

2015 | Soudelor | 6–9 Aug |

2015 | Dujuan | 25–29 Sep |

2016 | Megi | 25–28 Sep |

2017 | Nesat | 26–30 Jul |

Tracks of historical typhoon incidents that affected Taiwan.

The data included two types of datasets: one on typhoon dynamic characteristics and another (comprising 2621 hourly records) on surface wind speed observations. Surface wind speed data were released by the CWB and measured from eight gauge stations: Anbu, Banqiao, Keelung, Pengjiayu, Su-ao, Taipei, Tamsui, and Yilan (Figure

WRF model simulations using typhoon data were used to generate wind speed values at each meteorological station. To set the initial field and boundary conditions, we used data from the Final Operational Global Analysis dataset. The dataset is a part of the Global Data Assimilation System of the US Government’s National Center for Environmental Protection. The grid was set to be a two-domain nested grid. The horizontal grid spacing for the coarser domain (118°E–126°E, 21°N–29°N) was 15 km, and that of the finer domain (121°E–123°E, 24°N–26°N) was 3 km. Both domains had 32 vertical levels. This study used the WRF physical parameters recommended by [

After our simulation of a large number of typhoon events, the WRF model generated wind velocity simulations at eight ground gauge stations. In the verification of subsequent WRF wind outcomes, the output objective for the WRF model was the simulated wind value at an altitude of 10 m, because the ground gauge-station wind speed reached the observation value at an altitude of 10 m. Figure

Scatterplots depicting observations vs. WRF simulations at eight stations: (a) Pengjiayu, (b) Keelung, (c) Yilan, (d) Su-ao, (e) Banqiao, (f) Taipei, (g) Tamsui, and (h) Anbu.

For the four windward observation stations, wind speed simulation values were lower and approximately equal to actual values for high and low wind speeds, respectively. For the three observation stations in the Tamsui Basin, wind speed simulation values were approximately equal to the actual values, for all wind speeds.

To evaluate the simulation outcomes, this study used the mean absolute error (MAE) and root mean squared error (RMSE) that are defined as follows:_{i} is the estimated value of record _{i} is the observation of record

The performance of the estimation indicators (MAE and RMSE) in terms of errors is presented in Figure

Simulation performance in terms of prediction errors.

We used ML-based models to construct our wind speed prediction model for our study area. Observation stations at Taipei and Yilan were chosen as the test location. When constructing the model, the attribute data entered into the model included data on typhoon dynamics, data from ground meteorological observation stations, and data obtained from the aforementioned simulation. In this study, we performed data splitting for all typhoon episodes. Our training-validation set comprised data on 23 typhoon episodes between 2000 and 2013 (2093 records in total); our testing set comprised data on six typhoon episodes between 2014 and 2017 (528 records in total). Model training and validation were performed through 10-fold cross-validation, in which the training set was divided into 10 subsamples, one of which was retained for model verification, and the other nine were used for model training. In the verification process, each subsample must be verified.

As mentioned in Section

These hyperparameters were evaluated through trial and error. First, the number of neurons in a middle layer was adjusted from 10 to 100 until the curves of the RMSE of the RMSE errors were approximated. The middle layers of the MLP, DRNN, and stacked LSTM models are the hidden, recurrent, and LSTM layers, respectively. Subsequently, the prediction errors corresponding to the minimum RMSE were obtained in the form of the optimal solution for the number of neurons in the middle layer. When the lead time was 1 h, for the Taipei Station, the optimal numbers of neurons were 30, 50, and 50 for the MLP, DRNN, and LSTM models, respectively; for the Yilan Station, those numbers for the three models were 40, 30, and 30, respectively (Figures

Sensitivity of model parameters on the Taipei gauge: (a) number of neurons in a middle layer and (b) length of middle layers.

Sensitivity of model parameters on the Yilan gauge: (a) number of neurons in a middle layer and (b) length of middle layers.

Using the aforementioned method, we conducted parameter testing for the prediction models. Tests were conducted using the training-validation set at six hourly lead times between 1 and 6 h. The RMSE performance values (for all lead times) for the three models are presented in Figure

Performance levels of neural network-based predictions for lead times between 1 and 6 h using a training-validation set for the (a) Taipei Station and (b) Yilan Station.

We tested and evaluated the MLP, DRNN, and stacked LSTM models using the testing set. The testing set comprised data (measured at the Yilin and Taipei Stations) on six typhoon episodes that occurred between 2014 and 2017. Figures

Predicted results of typhoons (2014–2017) in the testing set on the Taipei gauge for the lead times of (a) 1 h, (b) 3 h, and (c) 6 h.

Predicted results of testing typhoons (2014–2017) on the Yilan gauge for the lead times of (a) 1 h, (b) 3 h, and (c) 6 h.

To compute term-by-term comparisons of the relative error in the prediction with respect to the actual value of the variable, the mean absolute percentage error (MAPE) and root mean square percentage error (RMSPE) were calculated. MAPE and RMSPE can be calculated using the following formulae:

Generally, MAPE is used to express the MAE as a percentage of the observations. MAPE is an unbiased statistic for measuring the predictive capability of a model [

Figures

Performance measures at lead times from 1 to 6 h on the Taipei gauge: (a) MAE, (b) RMSE, (c) MAPE, and (d) RMSPE.

Performance measures at lead times from 1 to 6 h on the Yilan gauge: (a) MAE, (b) RMSE, (c) MAPE, and (d) RMSPE.

Table

Average performance measures of absolute error terms (MAE and RMSE) and relative error terms (MAPE and RMSPE) for 1–6 h predictions.

Station | Measure | MLP | DRNN | Stacked LSTM |
---|---|---|---|---|

Taipei | MAE (m/s) | 1.241 | 1.070 | 0.928 |

RMSE (m/s) | 1.666 | 1.439 | 1.228 | |

MAPE | 0.730 | 0.599 | 0.560 | |

RMSPE | 2.129 | 1.764 | 1.593 | |

Yilan | MAE (m/s) | 1.863 | 1.714 | 1.528 |

RMSE (m/s) | 2.680 | 2.489 | 2.185 | |

MAPE | 1.151 | 0.974 | 0.927 | |

RMSPE | 3.643 | 3.030 | 2.847 |

We evaluated whether the use of WRF simulation values as input affects the model’s accuracy in predicting wind speed. We focused on the stacked LSTM model because it was the best performing model. Figures

Performance measures of predictive accuracy for the use and nonuse of WRF simulation values on the Taipei gauge: (a) MAE, (b) RMSE, (c) MAPE, and (d) RMSPE.

Performance measures of predictive accuracy for the use and nonuse of WRF simulation values on the Yilan gauge: (a) MAE, (b) RMSE, (c) MAPE, and (d) RMSPE.

We determined that the use of WRF simulation values as input increased the model’s predictive accuracy. Therefore, the use of numerically simulated values as a part of the input data aids in the reduction of predictive error.

We define the improvement rates for the MAE and RMSE (denoted IR_{MAE} and IR_{RMSE}, respectively) as follows:_{with} and MAE_{without} are the MAE results on the use and nonuse of WRF simulation values, respectively, and RMSE_{with} and RMSE_{without} are the RMSE results on the use and nonuse of the WRF simulation values, respectively.

Figure _{MAE} ranged between 25.5% and 29.7%, and IR_{RMSE} ranged between 27.0% and 30.8%. For the Yilan Station, IR_{MAE} ranged between 26.4% and 36.3%, and IR_{RMSE} ranged between 28.4% and 35.7%. Generally, the average IR_{MAE} (for all lead times) was 27.3% and 30.3% for the Taipei and Yilan Stations, respectively, and the average IR_{RMSE} (for all lead times) was 28.7% and 31.1% for the Taipei and Yilan Stations, respectively. For both the Taipei and Yilan Stations, improvement was demonstrated in predictive accuracy, although the improvement rate was higher for the Yilan Station. In summary, although wind speed was difficult to predict accurately at the Yilan Station, the use of WRF simulation values increased the predictive accuracy of the model at the Yilan Station.

Improvement rate of predictive accuracy in terms of MAE and RMSE, comparing the use and nonuse of WRF simulation values for the Taipei and Yilan Stations.

To accurately predict the wind speed of future typhoons, we constructed a typhoon wind speed prediction model using cutting-edge ML techniques. RNNs have been recently developed as a type of neural network that can analyze sequential data. The structure of such networks facilitates the effective processing of wind speed-relevant climate data over an extended period. That is, the structure imbues RNN models with long-term memory capacity. Therefore, such networks are suitable for predicting typhoon wind speeds. LSTM is a type of RNN that allows the user to decide the memory time’s length. Additionally, LSTM gives users the option to filter output results, increasing LSTM’s predictive accuracy. According to current developments in deep learning, learning performance is enhanced when the layers of neural networks are deepened. Therefore, we used deep learning neural networks in this study. Additionally, we compared the performance of three types of RNNs—MLP, DRNN, and stacked LSTM—in predicting wind speed values. These three types of model structure differ by their memory capacity: MLPs are model networks with no memory capacity, whereas DRNNs and stacked LSTM are model networks with memory capacity.

We chose northeastern Taiwan as the study area, and the observation stations at Taipei and Yilan were selected as the study subjects. The results indicated that for both the Taipei and Yilan Stations and for the MLP, DRNN, and stacked LSTM models, prediction error is positively related to the prediction lead time (of which there were six, one for each hour between 1 and 6 h). In other words, the lead time is inversely related to predictive accuracy. Model networks with memory capacity (DRNN and stacked LSTM) are more accurate than those without memory capacity (MLP). Stacked LSTM was also discovered to be more accurate than DRNN. Stacked LSTM is better than the traditional RNN method. Because stacking the LSTM’s hidden layers deepens the model, model accuracy is increased; LSTM is closer to being a deep learning technique than its counterparts. Thus, stacked LSTMs is a stable technique for overcoming problems in wind velocity prediction.

Finally, we examined if model accuracy is increased by the use of WRF simulation values as an input variable in the wind speed prediction model. The stacked LSTM model was used in the evaluation, given its best performance in most evaluation tests of the current study. Evaluating predictive accuracy in terms of the four error indicators, namely, MAE, RMSE, MAPE, and RMSPE, verified that the inclusion of the numerically simulated input greatly increased model accuracy. Specifically, for all lead times, the use of numerically simulated values reduced the MAE and RMSE error rates of the Taipei Station by 27.3% and 28.7%, respectively. Similarly, the MAE and RMSE rates for the Yilan Station were reduced by 30.3% and 31.1%, respectively. Therefore, the inclusion of numerically simulated values in wind speed prediction models enhanced their prediction accuracy.

The Final Operational Global Analysis data were obtained from the Global Data Assimilation System of the US Government’s National Center for Environmental Protection, which is available at

The author declares that there are no conflicts of interest regarding the publication of this paper.

This study was supported by the Ministry of Science and Technology, Taiwan, under Grant no. MOST107-2111-M-019-003. The author acknowledges the data provided by the Central Weather Bureau of Taiwan and the Research Data Archive (RDA) at NCAR.

_{2}solubility in crude oil during carbon dioxide enhanced oil recovery using gene expression programming