Data Privacy in Wearable IoT Devices: Anonymization and Deanonymization

With the development of IoT devices, wearable devices are being used to record various types of information. Wearable IoT devices are attached to the user and can collect and transmit user data at all times along with a smartphone. In particular, sensitive information such as location information has an essential value in terms of privacy, and therefore some IoT devices implement data protection by introducing methods such as masking. However, masking can only protect privacy to a certain extent in logs having large numbers of recorded data. However, the effectiveness may decrease if we are linked with other information collected from within the device. Herein, a scenario-based case study on deanonymizing anonymized location information based on logs stored in wearable devices is described. As a result, we combined contextual and direct evidence from the collected information. It was possible to obtain the result in which the user could effectively identify the actual location. +rough this study, not only can a deanonymized user location be identified but we can also confirm that cross-validation is possible even when dealing with modified GPS coordinates.


Introduction
With the development of IoT devices, the intimacy between device users and devices has increased. Previously, IoT devices were strongly perceived as being complex and difficult to use. Typical examples are wearable IoT devices, intelligent buildings, and smart cities. However, there are pros and cons to the development of such devices. Although IoT technology has made people's lives easier and more prosperous, there may be privacy issues with the information collected through such IoT devices [1]. In particular, sensors possessed by IoT devices will become more refined and diversified, and more diverse information will be exchanged through high-performance networks. Under this situation, in the case of a wearable device attached to a person 24 h a day, there is a much greater possibility of infringing on privacy. In line with this idea, wearable device manufacturers try to collect only the minimum amount of data necessary for a service by using anonymization techniques to prevent possible invasion of privacy of the data they collect [2]. In this paper, a method for deanonymizing anonymized location information is described using a scenario studied based on an IoT device investigation. rough four scenarios, the location information, which is sensitive information among the personal information of users, was collected or reprocessed for the development of meaningful data. e structure of this thesis is as follows: Section 1 describes the direction of this research. In Section 2, we describe related studies on data analysis and data privacy processing methods for wearable IoT devices. Section 3 describes the experimental methodology used. In Section 4, case studies conducted based on the methodology described in Section 3 are detailed. Section 5 presents the significance and limitations of the experiments and follow-up studies and provides some concluding remarks.

Related Works
IoT devices are closely related to available devices and their need in people's lives; therefore, their analysis is becoming more critical. In particular, in the case of a wearable device, a user carries the device along with a smartphone at all times, which collects various logs. Based on this perspective, many studies have investigated the internal information of wearable devices.
In particular, Kasukurti and Patil [3] presented a methodology for investigating wearable devices along with a case study. In this study, data such as geolocation information, the user's physical and health information, activity logs, account details of social media interaction, calendar details, media files, key generation mechanisms, and Key-Gen logs can be gathered. We can assert the practicality of a wearable device based on data.
In [4], an investigation method was conducted on three devices: Fitbit, Garmin, and HETP watches, all of which are popular low-cost devices.
e methods for collecting forensic data through these devices are sound, and the identification of files of interest and location data on such devices has been studied.
In [5], the authors conducted a case study on a TomTom Spark 3 device. After installing the application on a companion Android-based device, the authors confirmed that data such as the device information, past user activities, and audio files could be acquired through an analysis.
Yoon and Karabiyik [6], Almogbil et al. [7], and Almogbil et al. [8] conducted a study using Fitbit, which is a wearable device. Fitbit supports functions such as the GPS itself. In addition, there is a possibility of manipulated or altered GPS-tracked activities, and thus there is a need for an open-source tool for dealing with sensitive information such as social media message notifications, credit card information, and health-related data. Fitbit saved this kind of data as plaintext. However, some other devices have stored sensitive information with masking or encryption to protect user privacy.
In [9], the authors researched sensitive information stored in wearable devices. ey used a Samsung Gear S3, which is also a device used in a study by Becirovic and Mrdovic [10]. e authors confirmed that some data, including SMS messages and information, are stored on companion devices, determining the device owner's contextual environment. Baggili et al. [11] and Hassenfeldt et al. [12] also established a research methodology for wearable devices.
To understand the subtle nature of an insider threat, this paper reviews previous studies in this area. It examines the different types of insider threats based on insider characteristics and activities. Moreover, it explores sensors that can detect insider threats in an automated manner and the public datasets available for research. Finally, it examines the detection approaches used in related studies from various perspectives. In particular, IoT devices are the main threat to insider detection [13].
Because mobile devices are used in various areas, the risk of cyberattacks targeting them is becoming critical. In this paper, a threat intelligence evaluation is proposed for mobile malware from the viewpoint of situational awareness through the extraction of features that can detect Android malware using machine learning [14].
Location spoofing is a problem in mobile devices. In the present experiment, the authors used the Bluetooth discovery functionality to collect information about nearby devices and learn about the surrounding environment, which can be used to verify the genuineness of the GPS data. From this perspective, we are trying to discover masked location GPS data [15].
Because the problem of SCA in IoT devices has increased, the trends of SCAs used in such devices are introduced in this paper, and the urgency of developing countermeasures to single-trace attacks that only use sidechannel information is suggested [16].
In addition, we consider deanonymization for a broader approach. is paper proposes an inference attack that can reidentify anonymous mobility data. e attack is based on a mobility model called the mobility Markov chain (MMC). Gambs et al. designed several distances between MMCs to evaluate their impact on anonymization. Experiments on real datasets demonstrate the efficiency of the attack. e results showed that anonymizing mobility data is a difficult task [17].
ese studies collected various types of data, and the research methodologies described wearable devices. In some cases, devices such as Fitbit store sensitive information using plain text with a high risk of privacy violation, whereas devices such as Samsung Gear S3 partially mask the information using GPS. is study attempted to re-identify attempts to process such sensitive information, such as through masking.

Scenario.
We used a specific scenario for the case study because the primary target data in the study are location information, which establishes a situation in which location information can be sensitive. We got inspired from a famous cyber competition in South Korea. With the outbreak and spread of COVID-19 in 2021, the concept of a "self-quarantine" began to emerge. In South Korea, a law was enacted by which a person suspected of being infected with COVID-19 cannot leave the house for two weeks and must have their symptoms observed [18]. Under this scenario, Alice wants to go to the movie theatre to eat something using only a wearable device while leaving her smartphone at home. Law enforcement authorities can investigate such cases and track the self-quarantine location. Figure 1 shows some content related to the introductory part of the scenario. Based on this situation, an attempt was made to re-identify the masked location information in the wearable device.

Experimental Environment.
e device used for the experiment was a Samsung Gear S3, and information about its companion device is shown in Figure 2. As previously confirmed in the related works section, we confirmed that some devices such as the Fitbit store sensitive location information such as GPS in plain text. e related information can be checked, as shown in Figure 3. Moreover, we executed a python-based anonymizer to generate masked sensitive anonymized data.

IoT Device Investigation.
e device investigation used in this study utilized the tools applied in digital auditing. e Samsung Gear S3 used in the scenario applies to Tizen OS based on Android. e most commonly used file format for examining Android devices is the SQLite format [19]. In addition, various third-party applications running on Tizen OS use WebView independently [9]. To this end, we used the WEFA tool to analyze web-related artifacts and cache data. In addition, we conducted an investigation and deanonymization of wearable IoT devices and used basic Linux commands such as cat, grep, and awk for a log analysis [20].
In addition, before proceeding with the case study, classification was carried out using the concept of generating and storing evidence based on the Criminal Procedure Act of the Republic of Korea [21]. Generic evidence is data automatically generated by a system or application and is evidence with relatively little human intervention. Typical examples of its creation include web cache logs, system logs, and event-related logs. Archival evidence is data created to express a person's thoughts or feelings, representative examples of e-mails, SMSs, posts, or photographs. In this study, the generated evidence was directly related to the user's location information. is is called direct evidence. e archived evidence is indirectly related to the user's location information; therefore, only contextual evidence can be known.   e process of the IoT device investigation is shown in Figure 4. We considered masking-based anonymization techniques and encryption, which make it harder to identify specific users.

IoT Data Acquisition Investigation.
e Android Debug Bridge (ADB) was used to acquire data from the Galaxy Gear S3. Because Tizen OS has components of Android, data can be collected using the pull function of the ADB [22]. e collection was generated only under the/data folder related to the user's behavior and the integrity of the collected data, which we can ensure using ad1 of FTK Imager [23].

Contextual Evidence (Personal E-Mail).
Based on this scenario, we assumed that Alice escaped self-quarantine to meet Charlie; it is shown in Figure 5. We checked this type of data through an e-mail record. We determined the e-mail log using an SQLite database labeled/com.samsung.wemail/ data/dbspace/.wemail.db. Alice received that e-mail and asked whether she had left her self-quarantine to go to a movie theatre for something to eat, which remains the limit of the contextual evidence. However, she later stated that she could serve as background knowledge that she could use direct evidence.

Contextual Evidence (Map Application and Web Browser).
As shown in Figure 6, we confirmed that the device owner executed the GoogleMaps application by checking the given wearable device. An artifact labeled/usr/home/owner/applications/dbspace/.context-app-history.db contains the history of the applications executed in TizenOS in SQLite format.
In addition to the previously checked wemail, we can check the execution traces related to the Gearbrowser and Google Map. Google Map-related information can be found in/usr/home/owner/apps_rw/Eb12CGjuFc and/usr/apps/ Eb12CGjuFc. After checking the directory structure before running the analysis, we can estimate that it operates based on chromium by checking the folder called chromium-efl, as shown in Figure 7 [24].
Next, as a result of a string search for files in the folder using the command, data estimated as GPS coordinates could be obtained in Figure 8.
In addition, by examining the cache data using Chro-meCacheView, as shown in Figure 9, it is possible to check the search history of some GPS coordinates (37.535180, 126.903378) and the search near subway station.
As a result of checking the search history and string, traces related to CGV, a Korean movie brand, were found in Mok-dong. We cannot provide direct evidence due to contextual evidence's limitation, but it can act as background knowledge for use as future direct evidence.

Direct Evidence (Anonymized GPS Coordinates).
e GPS data recorded in the device, that is, the system log related to the coordinates, leave a record of the location of the actual device, independent of user actions, such as the map, mail application, and web applications, which we looked at previously. We found GPS-related coordinates on TizenOS Gear S3 in/usr/data/location/dump_gps.log. Only those corresponding to the coordinate data were extracted separately using regular expressions and grep. e extraction results are as shown in Figure 11.
In Gear S3, we have seen that every even number of digits of the GPS coordinates is masked. rough the analysis of 4.1 and 4.2, we can infer that the relevance to Seoul, Korea, is high. In Seoul, Korea, the range is from longitude 37.715133 to 37.413294 and latitude 126.734086 to 127.269311. We observed that the first masked numbers of latitude and longitude were 7 and 2, respectively.
Because the GPS log is data from the GPS sensor, there may be errors in the GPS values depending on the surrounding environment and situation [25]. To prevent risk owing to such errors, we attempted deanonymization using data up to the second masking, which is far from the error range. As a result of checking the statistics based on the masked number, we observed almost no change in latitude and longitude except on 5/22, the day in which the individual escaped self-isolation, as shown in Figure 12.
In particular, checking the GPS masked based on the e-mail timeframe (after 2 pm) conducted in 4.1 on May 22, the individual was estimated to have escaped self-isolation, which we confirmed through the highest number occurring for the situation about Alice. We confirmed this through contextual evidence. e related images are shown in Figure 13.
It is not easy to specify the original GPS log owing to the masking of the original system log. However, we succeeded in deanonymizing the location visited by the actual user by correlating the information on the country where the incident occurred.

Direct Evidence (WI-FI SSID Names).
We can check the logs related to the Wi-Fi connection in/usr/data/snlp/ snlp_dump on TizenOS. In particular, a list of searched APs related to the Wi-Fi search is shown. In many stores and subway stations, the name of the Wi-Fi connection is the name of the store or subway station. In general, the Wi-Fi range is approximately 50-100 m. We can estimate that the user's device exists within 1 of 100 locations, the location of which can be determined using the store name or subway station that appears on the SSID [26]. We can infer this from the current location through/usr/data/snlp/snlp_dump, which records the SSID search result, as shown in Figure 14. In addition, awk and grep were used to extract related logs from/usr/data/snlp/ snlp_dump. Subsequently, cases that could identify the actual location using heuristics were confirmed. e first case is the location identified when contextual evidence is used with other evidence. Under this scenario, we confirmed through the Wi-Fi search, log-based on circumstantial evidence that we determined through the          Security and Communication Networks .pref of Google Maps, that the device exists in the corresponding location, which indicated transportation infrastructure (KTX), as shown in Figure 15. At approximately 17 : 34, we inferred that the device was passed near Kwanghwamoon Station. However, the error range was too broad with a masked GPS log, and we could not conduct cross-validation. In the deanonymization method using the SSID name, we found that the location specification was not as precise as in the anonymized GPS data. However, we found a case in which location identification was sufficiently possible when combined with contextual evidence.

Conclusions
is paper presented a deanonymization method for location information in an anonymized wearable IoT environment. As expected from the existing related studies on wearable devices, we checked various records related to user behavior and a system log. We conducted a case study using these user behavior data by building a scenario that could occur with an actual IoT device in the current COVID-19 environment, which is a hot topic.
In the case study, we divided the data collected from a Samsung Gear S3 into contextual evidence based on user behavior and direct evidence related to the system logs. We found contextual evidence, e-mail data, and application data.
rough an email, users can exchange information such as a promise to go to a specific place. Such evidence is consistent with the nature of contextual evidence. ere is a limitation in that which is necessary to determine whether the user went to the location separately. We stated that only an appointment had been made and that it had not occurred. As traces related to the application, searching for information about movie theaters through the map application and the Internet was found. ese traces are also data related to searching for a route, and there is a limitation in which they cannot be used as direct evidence that the actual user went to the location.
However, this does not mean that contextual evidence is meaningless. We used contextual evidence and direct evidence as a later case study. First, through anonymization, GPS coordinates were set as direct evidence, and masking was applied to the log itself, making it impossible to specify the user's location and path. However, based on the information obtained through contextual evidence in advance, the location can be specified using information that the user is in Seoul and will see a movie. In addition, contextual evidence can help with direct evidence related to the Wi-Fi, which is called snlp dump. Logic reinforcement was possible through Wi-Fi SSID-based distance estimation for contextual information obtained through the map application.
is case study examined the meaning of contextual and direct evidence collected through wearable devices. From the perspective of direct evidence, if privacy such as masking is not processed, it can be used as the most powerful type of intelligence. However, in the case of masking, as in the current scenario, contextual evidence is indispensable. In addition, when using such a GPS manipulation application, if we conduct cross-verification with contextual evidence, we can also characterize whether the attacker applied actual GPS data.
However, the argument in this study has certain limitations. First, we found that it was impossible to generalize the results because only one type of wearable device was masked. However, regardless of the masking process, there are ways in which contextual and direct evidence can be used sufficiently as a general approach. Moreover, to apply these methodologies, there must be a guarantee that no log loss or manipulation exists during the collection phase [27].
In future research, we plan to anonymize sensitive information in anonymized IoT devices. In particular, in this paper, geolocation is limited as sensitive information; however, we intend to research more diverse devices and sensitive information in future research.

Data Availability
Because the data contained private information about an individual, it is difficult to share such data through open access. If someone wants the data for further research, please contact the authors through e-mail (semi0502@korea.ac.kr).

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.