Correcting Air-Pressure Data Collected by MEMS Sensors in Smartphones

1Korea Oceanic and Atmospheric System Technology, No. 1503, 90 Gyeongin-ro 53-gil, Guro-gu, Seoul 152-865, Republic of Korea 2Department of Computer Science and Engineering, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 139-701, Republic of Korea 3Department of Computer Engineering, College of Information Technology, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do 461-701, Republic of Korea 4Observation Research Division, National Institute of Meteorological Science, 33 Seohobuk-ro, Seogwipo-gi, Jeju-do 697-845, Republic of Korea 5Numerical Data Application Division, National Institute of Meteorological Science, 61 Yeoeuidaebang-ro 16-gil, Dongjak-gu, Seoul 156-720, Republic of Korea


Introduction
In densely populated urban areas, environmental and meteorological data pertains to accurate decision-making regarding important socioeconomic issues, such as demographic changes, healthcare, food supply, security, conflict, and natural disasters.Disastrous weather events, such as localized torrential rains, gust of wind, extreme temperature, and rising sea levels, are among the several events that remind us of the importance of high resolution data regarding the ambient environment manifest as fine-scale climatic features [1][2][3].However, an automatic weather station (AWS) operated by a public institution is limited insofar as it can only provide very short-term (within one hour) weather forecasting [4][5][6], owing to the high social costs associated with installation and maintenance, and it requires large-scale investments in both time and money.Studies have been made for high resolution meteorological observations and forecasting in the United States, Japan, and South Korea by developing portable meteorological equipment, although installing portable meteorological equipment over a large area poses significant regional and economic challenges [7][8][9].
Sensors have recently been available in smartphones with embedded microelectromechanical systems (MEMS) sensors for air pressure, temperature, and humidity to correct global positioning system (GPS) altitude data.Several Androidbased smartphones, in particular, are designed to collect meteorological data (Appendix A).With the increasing ubiquity of such smartphones, the higher the population in urban areas, the more densely distributed the meteorological data available.This study aims to develop a correction method to minimize the errors in meteorological data collected by smartphones vis-à-vis reference data stored in the Korea Meteorological Administration (KMA).To do so, we developed an app (named Yeowoobi, which means sun shower in Korean) [10] that is capable of collecting and storing data observed by various weather sensors embedded in Androidbased smartphones, and we asked participants to use the app to collect meteorological data in June 2013.To our knowledge, this is the first study on correcting air-pressure data collected by smartphones, although guidelines and studies are available for correcting errors from public weather stations.Earlier studies [11][12][13] tried ambient temperature analysis using battery temperatures monitored by smartphones [14].Unlike static weather stations, smartphones continually move from one place to another, and they are exposed to heating and cooling devices, the user's body temperature, and a changing external environment, including spaces within automobiles or trains.Because such factors influence the smartphone's sensors, the data cannot be used directly for weather forecasting.However, the data collected with the existing infrastructure in smartphones can be used as a low-cost auxiliary resource to provide information about the atmospheric environment in terms of fine-scale meteorological phenomena.To make this data more useful, the data must be adequately corrected.It is suggested to have a permissible error range of ±0.5 hPa, as specified in the KMA announcement number 2010-5 about the standard of automatic weather observation system (AWOS), which was provided by the KMA in 2011 [15].This is done by performing a targeted quality control (QC) using preprocessing, statistical analysis, context awareness, and machine learning [16].This study demonstrates the feasibility of such meteorological data collected by smartphones as an auxiliary resource for weather stations by analyzing and comparing the accuracy of the data with stored data from public weather stations.
The remainder of this paper is organized as follows.Section 2 describes the data used in this study-namely, smartphone data, data from public weather stations, and data from digital elevation models (DEMs).Moreover, the preprocessing step for the proposed method is explained, involving the removal of mechanically derived outliers, a reduction to the mean sea level pressure (MSLP), and the removal of outliers lying outside 3.Section 3 explains the linear-regression analysis and statistical values.Section 4 describes the classification of locations by weather station, the inclusion or exclusion of temperature and humidity, enabling and disabling personalization, and the linear-regression analysis results depending on the user's mobility.In Section 5, the results of the study are summarized and discussed.Finally, Section 6 presents the direction of future research.

Data Collected by Smartphones.
The following data is collected by a mobile app called Yeowoobi from general public: time of data acquisition, the user ID (encrypted), temporal information (year, month, day, hour, and minute), latitude (degree), longitude (degree), altitude (m), air pressure (hPa), temperature ( ∘ C), humidity (%), accuracy, speed (m/s), and mobile terminal information (Appendix B).Yeowoobi is designed to read and transmit the sensor values with the default interval, for collecting the air pressure, temperature, and humidity, which is set to 10 minutes.Users can change the settings to suit their needs with consideration to battery consumption, data-transfer costs, and notifications of abrupt changes to the air pressure.They can do so by choosing one of the nine observation intervals, ranging between one minute and three hours.Some screenshots of Yeowoobi app are given in Appendix C.

Data Collected by Public Weather Stations.
The AWS run by the KMA has been using the Automated Surface Observing System (ASOS) data as public meteorological (PM) data.The PM data is spatially distributed from 692 public weather stations, currently active across the country as of November 2014.Only 256 stations (37%) of them collect air-pressure data, since AWS installed after the year 2007 includes air-pressure sensor in addition to conventional sensor arrangement-that of temperature, precipitation, rainfall occurrence, wind direction, and wind speed.Humidity has also been available since 2010.Nominal observation interval for each reading is one minute.The spatial distribution of the public weather stations is approximately 36 km with the ASOS and approximately 13.5 km for an AWS [15].Figure 1 shows a sample plot of air-pressure data of two smartphones and AWS observed during a month (August 2014) at a station.

DEM Data.
The following data was used in this study: (i) smartphone data: specifically the meteorological data collected with smartphones via Yeowoobi between January 1, 2014, and August 31, 2014 (240 days); (ii) PM data: that is, the meteorological data stored in the KMA (AWS and ASOS data) from the same period; and (iii) DEM data: that is, altitude data at 30 m × 30 m resolution [17].When app users are in building or underground, elevation information is not correctly collected by GPS in smartphones.Resultantly we could obtain elevation information as much as only 10 percent of the total collected data.So we used elevation information of DEM data for getting MSLP values of air-pressure data observed in smartphones.Excluded from the analysis were the smartphone and PM data collected from the spatial range of some public weather stations containing data flagged as having abnormal values after QC.

Data Scale.
In Figure 1, the locations of the datacollecting smartphones and public weather stations are plotted on a map.The total number of public weather stations was 692, as of October 2014, of which 217 stations are found in the area covered by the smartphone data.162,387 locations out of the total number of locations for smartphone data collection (787,200) were found to be spatially distributed, as shown in Figure 2(a), when the latitude and longitude (unit: degree) were recalculated to three decimal places.In the plotted distribution of smartphones shown in Figure 2, the number of smartphone data collected across the country was 2,654,548 (a larger number than the number of locations, because different data can be collected from the same location at different times), of which over 50% (1,470,818) was distributed in Gyeonggi-do (including Seoul).Locations with high data density other than Gyeonggi-do were big cities with high population densities, such as Busan, Daejeon, Daegu, and Gwangju.By contrast, mountainous regions in Gangwon-do (in the northeast) and the Namhae plain region (in the southwest) yielded a smaller volume of smartphone data.
As for the temporal distribution, the number of smartphones and the volume of collected data were extremely low between January and May 2014.The volume of data from smartphone increased sharply between mid-June and late July, and the number of smartphones and the volume of collected data steadied at a level of 600/day and 47,000/day, respectively, from August onwards (Figure 3).
Among the approximately 20 models of Android-based smartphones with embedded meteorological sensors for air pressure, temperature, humidity, and humidity, 17 models were used to collect the meteorological data (Appendix B).Whereas all of them could measure the air pressure, only a few had embedded sensors for temperature and humidity (namely, Galaxy 4, Galaxy Note 3, and Galaxy Round), with a data-acquisition rate as low as 35%.Speed and elevation data accounted for only 10% of the air-pressure data (Table 1).Smartphones acquire location information by either the internet or GPS or both.The ones with internet adopt both the Wi-Fi transfer mode and the locations of peripheral communication stations as a reference standard.With GPS, speed and altitude cannot always be calculated.In fact, speed and altitude can be calculated via GPS-mediated communication for only 10% of the meteorological data.
During the data collection period, a total of 2,934,718 data items were collected from 3,096 smartphone users.After removal of the data found to have abnormal values when checked against the equivalent data from the nearest public weather station, 2,654,548 (90.5%) of the smartphone data items were used in the final analysis.

Preprocessing: Quality Control (QC)
2.5.1.Physical Limit Test.According to the general meteorological standards from the World Meteorological Organization (WMO) [18], air-pressure values lower than 500 hPa and higher than 1,080 hPa are specified as abnormal.We removed these abnormal values from the smartphone data in accordance with this standard.

Reduction to Mean Sea Level Pressure (MSLP).
The equation used for the reduction to the MSLP in this study is as follows [19]: where  0 is a sea level pressure (hPa),  is a measured pressure (hPa), ℎ is an altitude obtained from 30 × 30 DEM (m), and  is a temperature ( ∘ C).
Based on the smartphone's location information (i.e., the latitude and longitude), the DEM (30 m × 30 m) altitude data (ℎ), and the air temperature () at the nearest public weather station, the data that remained after purging mechanical errors was reduced to the MSLP.

Removal of Outliers
Existing outside 3.Smartphones are exposed to a number of factors that cause artificial airpressure changes as users move from one place to another using various means of transport (e.g., by driving along the highway, on high-speed trains, and in elevators).Figure 4 shows the distribution of air-pressure data of a representative user and AWS at two representative stations (108 and 410), during a month (August 2014).It was quite similar to a normal distribution.Thus, all data whose values were more than three times the standard deviation (SD; ) value of the total smartphone air-pressure data ( = 5.647) were considered as abnormal values and consequently removed.The data that remained after eliminating the outliers with abnormal values from the 3 test consisted of 2,636,328 data items, or 99.31% of the total number (2,654,548) of collected data before eliminating them (i.e., 18,220 or 0.69%).

Linear-Regression Analysis
We compared the MAE (4) and the RMSE (5) using linearregression analysis in the WEKA (Waikato Environment for Knowledge Analysis) suite [20].For linear-regression analysis, we used the MSLP from the nearest public weather station temporally and spatially as the true value.As preprocessing, we used attribute selection using M5's method (step through the attributes removing the one with the smallest standardized coefficient until no improvement is observed in the estimate of the error given by the Akaike information criterion) and a greedy selection using the Akaike information metric [21].We tested our linear-regression method through 10-fold cross-validation.
After calculating the weight from the training data ( = number of training data), and estimating each training data item, as per (2), we obtain the linear-regression equation by selecting the weight (  ) that minimizes the training data's error, as per (3): The resultant values from the test data ( = number of test data) are obtained by calculating (4) and (5).  2 and 3 show similarities between the patterns of meteorological data from Gyeonggi-do's sample area and those from the entire country.We generated a dataset for linear-regression analysis by corresponding the smartphone data (i.e., the observation time, latitude, longitude, air pressure, and DEM data) with the PM data (i.e., the observation time, latitude, longitude, MSLP, altitude, and the SD of the distance from each smartphone) in the Gyeonggi-do area (latitude: 36.394-38.283,longitude: 126.379-127.858).

Classification by Public Weather Stations.
We performed a linear-regression analysis on the PM data's MSLP from   the dataset described in Section 2.5.3.Table 4 presents the mean of the results of the linear-regression analysis in the Gyeonggi-do area and the results of the linear-regression analysis from the data obtained from all public weather stations.
The linear-regression analysis for all of the data in the Gyeonggi-do area yielded MAE and RMSE values of 1.58 and 2.05, respectively.These values decreased by 0.51 and 0.57, respectively (to 1.07 and 1.48), as a result of the linearregression analysis after grouping the same data according to public weather stations.

Classification Reflecting Temperature and Humidity.
The proportion of data obtained measuring air pressure, temperature, and humidity in the Gyeonggi-do area was only approximately 35%.Furthermore, we performed a separate linear-regression analysis for data containing air-pressure information exclusively, comparing it with data containing temperature and humidity information as well (Table 5).
There were 930,883 data items measuring air pressure exclusively (63.29% of the total data) and 532,946 data items  that included pressure, temperature, and humidity (36.23% of the total data).Although data with all three items accounted for only 57% of the data from air pressure exclusively, the linear-regression analysis resulted in a decrease to the MAE and RMSE by 0.23 and 0.26, respectively.Moreover, the linear-regression analysis for the data containing all three items-after being classified by public weather stationsresulted in a considerable decrease to both the MAE and RMSE.

Classification by Public Weather Stations and Personalization.
We performed linear-regression analysis on the air-pressure data after classifying them by public weather station and by user.The results of the linear-regression analysis after classifying data by public weather stations and by users in the Gyeonggi-do area showed only slight differences to the mean MAE between Station A (108) and Station B (410).That is, the mean value did not differ significantly between Stations A and B. The total number of data items (TN) reveals that Stations A and B are representative areas, accounting for approximately 20% and 10% of the entire Gyeonggi-do area, respectively, and that the MAE values did not differ much in the area around Gyeonggi-do.The correlation analysis for the MAE and RMSE according to the distance and pressure differences from the public weather stations by smartphone users yielded correlation coefficients of 0.18 for the MAE and 0.17 for the RMSE, with the MAE and RMSE according to the SD in pressure at 0.32 and 0.33 (see  values in Figure 5), respectively.In other words, the correlation for the MAE and RMSE was greater with respect to pressure differences than distance differences.6 illustrates the visualized mobility patterns over time for users with high mobility compared with those with low mobility at public weather Stations A and B. The blue dots represent users with high mobility (Users 1 and 3), and the red dots represent those with lower mobility (Users 2 and 4).Table 7 presents the results from the linear-regression analysis for each user, wherein the patterns are apparent for the MAE and RMSE with respect to the air pressure at the public weather stations.

Comparison of Errors according to User Mobility. Figure
The high-mobility users resulted in MAE (RMSE) values of 1.00 (1.42) in Station A and 0.47 (0.89) in Station B. The MAE (RMSE) value for low-mobility users was 0.22 (0.37) and 0.09 (0.11) in Stations A and B, respectively.This implies that errors in air-pressure measurements were greater in proportion to user mobility at the same location.

Discussion
Most Android-based smartphones are able to measure airpressure data.Unlike temperature and humidity, air pressure is less influenced by the man-made environment (indoor/outdoor or air-conditioned/heated).The results of this study revealed that errors in the meteorological data from smartphones tend to decrease under the following conditions: with concurrent observations of temperature and humidity and comparing users with similar moving patterns (presumably because of similar means of transport).By correcting the errors resulting from these factors, we could verify the feasibility of using the air-pressure data collected by smartphones as an auxiliary resource for public weather stations.As such, the proposed method contributes to enhancing the forecasting accuracy by providing high resolution meteorological data in countries or regions with a low distribution of public weather stations otherwise difficult to achieve due to high costs in terms of installation and maintenance.The results show that smartphone-based meteorological data with minimal correction algorithm have potential for contribution to improving high resolution weather forecasting where precise short-term meteorological observations are required, such as sporting events.

Future Research Directions
In this study, we presented a new method to correct errors in air-pressure data collected with smartphones, by comparing the errors in air-pressure data from nearby public weather stations after classification according to relevant factors, such as the presence or absence of temperature and humidity data, personalization, and the mobility of individual users.
In future research, we plan to focus on the following: (i) comparing between the day-time and night-time mobility of users, (ii) comparison of errors with different speed using the location information in smartphones to verify mobility, (iii) comparing between data with and without altitude data acquired from smartphones in addition to DEM data, (iv) comparison with various machine-learning techniques in addition to linear-regression analysis, (v) direct comparison with the smartphone pressure to the WMO-approved pressure sensor as a reference, and (vi) comparing various preprocessing steps, such as time-consistency test (step test) and persistence test, as well as the physical limit test and 3 test used in this study.There are several problems in directly applying step test and persistence test used in AWS to smartphone data.One of the main problems is that the time interval in collecting smartphone data is not irregular (it varies from one minute to three hours).In the cases in which the time interval is too large or it is changed over time, it is not easy to apply the two QC tests to smartphone data.However, its successful application to smartphone data is valuable and necessary to improve the correction quality.
We also intend to explore other comparative methodologies and to validate them.Using these methods, we aim to improve the correction ability of the proposed method with regard to the meteorological data from smartphones and to verify the possibilities of using such data as additional meteorological data for high resolution short-term scale weather forecasting.

Figure 1 :
Figure 1: Sample plot of air-pressure data of two smartphones and AWS observed during a month (August 2014) at Station 108.

Figure 2 :
Figure 2: Locations of the collected smartphone data (a) and public weather stations (b).

Figure 3 :
Figure 3: Quantity of smartphones and data collected.

Figure 4 :
Figure 4: Distribution of air-pressure data of a representative user (a) and AWS (b) at two representative stations (108 and 410), during a month (August 2014).

Figure 5 :
Figure 5: Distribution plots, indicating the MAE ((a) and (c)) and RMSE ((b) and (d)) according to the SD in distance ((a) and (b)) and pressure ((c) and (d)) from the public weather stations by smartphone user (limited to the cases in Gyeonggi-do for data items exceeding 1,000).

Figure 6 :
Figure 6: Time-dependent location distributions of users with high mobility and those with low mobility at the representative public weather Stations A (a) and B (b): Station A (range: latitude 37.53-37.71degrees, longitude 126.95-127.17degrees) and Station B (range: latitude 37.46-37.56degrees, longitude 126.84-126.95degrees).

Table 1 :
Scale and rate of the meteorological data collected by smartphones.

Table 2 :
Results from the linear-regression analysis throughout South Korea, comparing the result before outlier removal with that after outlier removal.

Table 3 :
Results from the linear-regression analysis in Gyeonggi-do, comparing the result before outlier removal with that after outlier removal.
MAE: mean absolute error, RMSE: root mean square error, and TN: total number of instances.

Table 4 :
Results of the linear-regression analysis for all collected data in Gyeonggi-do and the data classified according to the locations of the public weather stations in the same area.
MAE: mean absolute error, RMSE: root mean square error, and TN: total number of instances.

Table 5 :
Results of the linear-regression analyses for air pressure exclusively and results that include temperature and humidity, along with air pressure, at public weather stations.

Table 6 :
Comparison of the results from the linear-regression analysis of the cases exceeding 1,000 data items and of the public weather Stations A (108) and B (410).
MAE: mean absolute error, RMSE: root mean square error, and TN: total number of instances.

Table 7 :
Results from the linear-regression analyses of highmobility users (Users 1 and 3) and low-mobility users (Users 2 and 4) among the smartphone users located close to the public weather Stations A and B.

Table 8 :
List of the smartphone models used in the study for data collection and their respective meteorological sensors.

Table 9 :
Data collected by smartphones and pertinent details.