Severe weather events occur more frequently due to climate change; therefore, accurate weather forecasts are necessary, in addition to the development of numerical weather prediction (NWP) of the past several decades. A method to improve the accuracy of weather forecasts based on NWP is the collection of more meteorological data by reducing the observation interval. However, in many areas, it is economically and locally difficult to collect observation data by installing automatic weather stations (AWSs). We developed a Mini-AWS, much smaller than AWSs, to complement the shortcomings of AWSs. The installation and maintenance costs of Mini-AWSs are lower than those of AWSs; Mini-AWSs have fewer spatial constraints with respect to the installation than AWSs. However, it is necessary to correct the data collected with Mini-AWSs because they might be affected by the external environment depending on the installation area. In this paper, we propose a novel error correction of atmospheric pressure data observed with a Mini-AWS based on machine learning. Using the proposed method, we obtained corrected atmospheric pressure data, reaching the standard of the World Meteorological Organization (WMO; ±0.1 hPa), and confirmed the potential of corrected atmospheric pressure data as an auxiliary resource for AWSs.
Numerical weather prediction (NWP) refers to a method of weather forecasting based on the numerical analysis of current meteorological conditions using physical and mechanical principles of atmospheric processes. Nowadays, NWP accounts for a considerable proportion of weather forecasts worldwide [
Because NWP-based weather forecasting uses observed data, its prediction accuracy can be improved by reducing the observation interval. However, installing a sufficient number of automatic weather stations (AWSs) to ensure an adequate spacing is difficult to achieve because of economic and geographical limitations such as expensive installation and maintenance costs and difficulties selecting installation sites.
Many studies have been conducted to overcome these limitations of AWSs. Straka et al. [
A Mini-AWS is a miniature weather station developed to overcome the drawbacks of the AWS. The Mini-AWS is approximately seven times less expensive than the AWS (1,300 USD versus 9,000 USD) and requires low maintenance and repair costs. The installation site selection is hardly limited because it requires a very small space; hence, it can be installed in any place deemed suitable, ensuring the stable and steady collection of data. It can also be installed on a mobile object such as a vehicle. However, it is exposed to the external environment depending on the installation site and the collected observation data should be corrected to make them amenable for application in weather forecasting.
We propose a novel correction method based on machine learning to make atmospheric pressure data collected by Mini-AWSs amenable for use as auxiliary meteorological data. If the errors of the corrected Mini-AWS data fall within the range of the maximum permissible error (±0.1 hPa) recommended by the World Meteorological Organization (WMO), they are usable as auxiliary meteorological data [
The rest of this paper is organized as follows. In Section
A Mini-AWS is a miniature weather station (dimensions: 157 mm
Specifications of the Mini-AWS.
Category | Contents |
---|---|
Measurement element | Temperature, humidity, pressure |
Accuracy | |
Temperature | |
Humidity | |
Pressure | |
| |
Range | |
Temperature | −40 to 125°C |
Humidity | 0–100% RH |
Pressure | 10–1200 hPa |
| |
Resolution | |
Temperature | 0.01°C |
Humidity | 0.04% RH |
Pressure | 0.012 hPa |
| |
Measurement interval | Optional (5 s to hours) |
Ambient temperature | −40°C to 85°C |
Duration (without power supply) | Maximum 18 hours |
Data transmission | Wireless (3G mobile network) communication |
Power | USB power 5 V |
Mini-AWS.
For the data collection, we installed eight Mini-AWSs in the Pyeongchang area from January 22 to February 12, 2016. Additionally, we mounted a Mini-AWS on a vehicle and gathered data for three days while driving from Seoul to the Pyeongchang area and returning during the same period. Figure
Installation of the Mini-AWS in cross-country stadium (a) and on a mobile vehicle (b).
Collection sites of Mini-AWS data.
Number of collected Mini-AWS data.
Linear regression, which is one of the most widely used modeling techniques, is a method to linearly model the relationship between two variables (dependent and independent). It can be classified into simple, multiple, and multivariate linear regression (LR) depending on the number of dependent and independent variables.
Linear regression was performed to obtain the equation of the best fit that minimizes the sum of the squared errors (SSE) of the observed data. The least squares method is generally used as an approach to minimize the SSE.
Artificial neural networks (ANNs) are a computational approach in machine learning that simulate the human brain. They are used to analyze the problem through a learning process by connecting neurons in a multilayer structure and controlling the connection strength between individual neurons. There are several types of ANNs depending on the neuron modeling and connection methods. In this paper, the most common type, that is, the multilayer perceptron (MLP) [
Support vector machines (SVMs) [
The sequential minimal optimization (SMO) algorithm [
Expectation–maximization (EM) clustering [
Data collected with the Mini-AWS and corresponding data from 595 AWSs provided by the Korea Meteorological Administration (KMA) were used for the experiments (Figure
Locations of AWSs
For the correction of a collected Mini-AWS data point, the sea surface pressure measured at the nearest AWS was used as the reference value. If the corresponding value of the nearest AWS was missing, that of the second nearest AWS was used as the reference value. If the value was still missing after repeating the process twice, that data point was excluded from the analysis in consideration of the efficacy of the experiments.
Atmospheric pressure readings should be reduced to the mean sea level to make the readings of different weather stations comparable by cancelling out altitude-dependent differences. The reduction to the mean sea level was performed on all atmospheric pressure readings based on information about the atmospheric pressure, altitude, and temperature obtained with the Mini-AWS data to provide more information for machine learning. Equation (
The AWS observation data are subject to errors due to various problems such as observation instrument errors or power and communication line disturbances; it is essential to provide accurate data by eliminating erroneous data through quality control [
The observed values are stored as missing values in case of missing observation data due to observation instrument errors or communication disturbances. A missing data point is generally coded as −999 or −99; all values coded as −99 or less were deemed missing in this study.
The AWS and Mini-AWS data points with location information deviating from the latitudinal and longitudinal extent of the Korean Peninsula were considered to be errors because the observations took place on the Korean Peninsula. The latitude/longitude position was set as 33°N–39°N and 124°E–131°E as provided by the Korean National Geographic Information Institute (2009).
The WMO standards [
The persistence test is performed to detect observations that remain unchanged for a certain period of time due to instrument errors or other disturbances. The WMO recommends cases in which changes do not occur for 60 minutes beyond the threshold values of 0.1°C for air temperature, 0.1 hPa for atmospheric pressure, and 1% for relative humidity as “suspect” cases failing the persistence test. We performed the persistence test and removed all “suspect” cases.
In consideration of the cases in which the observations are influenced by surrounding conditions other than meteorological conditions, we removed all Mini-AWS and AWS observation data deviating from the 3
Various statistical analysis techniques can be used for performance verification; the most important quantitative criterion is accuracy. In this study, performance verification was conducted by comparing the mean absolute error (MAE) and root-mean-square error (RMSE). The RMSE is used as the standard statistical metric when testing model performances in meteorological, atmospheric, and climatological studies; the MAE is also a widely used model evaluation parameter [
All experiments were cross-validated. Cross-validation is an experimental method for the evaluation of the performance of a supervised learning model. In an
The LR and MLP were carried out using the experimental data and Weka software [
Experimental results of the data divided into Mini-AWSs.
Mini-AWS | Number of data | MAE | RMSE | ||
---|---|---|---|---|---|
LR | MLP | LR | MLP | ||
4004 | 297,825 | 0.518 | 0.252 | 0.628 | 0.325 |
4024 | 227,665 | 0.491 | 0.363 | 0.640 | 0.459 |
4025 | 321,815 | 0.390 | 0.310 | 0.523 | 0.399 |
4027 | 8,783 | 0.256 | 0.101 | 0.345 | 0.130 |
4031 | 208,423 | 0.385 | 0.264 | 0.496 | 0.331 |
4032 | 128,439 | 0.329 | 0.180 | 0.438 | 0.230 |
4033 | 252,109 | 0.38 | 0.258 | 0.496 | 0.339 |
4035 | 242,173 | 0.985 | 0.485 | 1.199 | 0.623 |
4036 | 253,600 | 0.395 | 0.239 | 0.524 | 0.311 |
Weighted mean values of the experimental results for the data divided into Mini-AWSs.
ML methods | Weighted mean MAE | Weighted mean RMSE | Average time taken to build the model |
---|---|---|---|
LR | 0.489 | 0.737 | 0.62 s |
MLP | 0.300 | 0.389 | 180.44 s |
The SMOreg (regression by SMO) is known to have superior performance with respect to generalization problems. However, the training time increases in proportion to the number of data points. Figure
Time taken to build the model according to the number of data in the SMOreg.
In an attempt to reduce the SMOreg training time, experiments were conducted using the last 1,500 and 5,000 data points extracted from each Mini-AWS. The experimental results are listed in Tables
Experimental results for 1,500 samples.
ML methods | Weighted mean MAE | Weighted mean RMSE | Average time taken to build the model |
---|---|---|---|
LR | 0.062 | 0.079 | 0.06 s |
MLP | 0.041 | 0.052 | 1.41 s |
SMOreg | 0.057 | 0.077 | 1.51 s |
Experimental results for 5,000 samples.
ML methods | Weighted mean MAE | Weighted mean RMSE | Average time taken to build the model |
---|---|---|---|
LR | 0.087 | 0.111 | 0.17 s |
MLP | 0.056 | 0.071 | 4.39 s |
SMOreg | 0.083 | 0.110 | 32.50 s |
The results in Tables
Considering the different characteristics of the data collected from each Mini-AWS, we created models by Mini-AWSs. Figure
Experimental results for the Mini-AWSs and 1,500 samples.
MAE
RMSE
To improve the results mentioned in Figure
Experimental results for clustered Mini-AWS data.
Mini-AWS | Number of clusters | MAE | RMSE | ||||
---|---|---|---|---|---|---|---|
LR | MLP | SMOreg | LR | MLP | SMOreg | ||
4004 | 7 | 0.066 | | 0.066 | 0.082 | | 0.083 |
4024 | 6 | 0.037 | | 0.037 | 0.046 | | 0.047 |
4025 | 12 | 0.036 | 0.036 | | | 0.047 | 0.049 |
4027 | 4 | 0.032 | | 0.032 | 0.041 | | 0.041 |
4031 | 4 | 0.038 | | 0.038 | 0.049 | | 0.049 |
4032 | 7 | 0.044 | | 0.044 | 0.058 | | 0.059 |
4033 | 2 | 0.103 | | 0.102 | 0.127 | | 0.132 |
4035 | 5 | 0.087 | | 0.079 | 0.111 | | 0.126 |
4036 | 13 | 0.043 | 0.043 | | | 0.057 | 0.055 |
The Mini-AWSs with the best MAE or RMSE values are highlighted in bold. All Mini-AWSs, except for 4025 and 4036, show superior MLP results. The LR and SMOreg results of 4033 and 4035 are greater than 0.1.
In the experiments mentioned in Figure
Experimental results for MLP (option
MAE
RMSE
The experimental results reveal that SMOreg (option
We present a study aimed at correcting data collected with Mini-AWSs using three different machine learning approaches and atmospheric pressure readings of the nearest AWSs as reference values. The weighted means of the experimental machine learning data divided into Mini-AWSs did not reach WMO standards. In the case of SMOreg, the time taken to build the model increased on a logarithmic scale with increasing number of training data. However, the correction results of the SMOreg implementation with the last 1,500 observation data points sampled with each Mini-AWS, which was conducted additionally in an attempt to reduce the training time, fall within the range of the standard permissible error set by the WMO when EM clustering and SMOreg (option
Experiments of sampling and clustering were conducted for performance improvement. We used a sampling method that extracts the most recently collected data, and using the sampled data led to better results compared to doing the whole ones. The smaller the number of samples, the lower the correction error, so we concluded that short learning cycles help to reduce correction errors. But since the data used for verification were collected during a short period of time, it is necessary to collect more data in order to correctly verify the model. In addition, we need to decide how often we will produce a correction model, in the case that we apply the collection and correction of real-time data. Clustering was used to provide additional categorical information on datasets, and we could confirm that performance was significantly improved through the clustering process. However, it was not easy to know which characteristics the data classified by clustering have and how they contribute to performance improvement.
We confirm the feasibility of the error correction method presented in this paper to render Mini-AWS atmospheric pressure data usable as observation data for weather forecasting. However, additional validation is necessary, given the limited data collection period and amount of mobile data. In a follow-up study, additional validation of the Mini-AWS data will be performed taking seasonal and geographical variations into account to test methods and compare errors by applying various preprocessing methods, such as internal consistency tests, in addition to the preprocessing method used in this study. Such studies are expected to continuously enhance the usability of Mini-AWSs and thus contribute to improving the accuracy of numerical weather prediction such as [
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported by the “Meteorological Industry Support and Utilization Technology Development Program” (no. KMIPA2015-4060), which is one of the Meteorological See-At Technology Development Programs through the Korea Meteorological Industry Promotion Agency (KMIPA) in 2015. This research was also supported by the Information and Communication Promotion Fund through the Ministry of Science and ICT (MSIP) of Korea (C0510-18-1007) and may differ from the official opinion of the MSIP.