Privacy-Preserving Health Data Collection for Preschool Children

With the development of network technology, more and more data are transmitted over the network and privacy issues have become a research focus. In this paper, we study the privacy in health data collection of preschool children and present a new identity-based encryption protocol for privacy protection. The background of the protocol is as follows. A physical examination for preschool children is needed every year out of consideration for the children's health. After the examination, data are transmitted through the Internet to the education authorities for analysis. In the process of data collection, it is unnecessary for the education authorities to know the identities of the children. Based on this, we designed a privacy-preserving protocol, which delinks the children's identities from the examination data. Thus, the privacy of the children is preserved during data collection. We present the protocol in detail and prove the correctness of the protocol.


Introduction
With computers and networks having become an important tool in everyday life, more and more data need to be transmitted through networks. Meanwhile, privacy issues have drawn public attention. How to protect privacy in a network environment has become a research focus in the field of computer network.
Privacy, broadly speaking, refers to private data held by organizations or individuals, which are confidential to others. For individuals, private information such as personal identification, physical condition, and geographical location is all private [1]. The spread of private information will cause a lot of negative consequences, even leading to crimes. Therefore, privacy-preserving technology becomes an important research direction. At present, researches of privacyprotection technology in the network include at least the following areas.
Privacy Protection in Wireless Sensor Networks. Wireless sensor networks have broad application prospects in the fields of environmental monitoring, health care, national defense, and so on. However, in practical applications, wireless sensor networks are facing a serious risk of data disclosure or tampering that will lead to serious consequences [2][3][4][5]. For example, in the field of military, data collected by wireless sensor networks often contain important intelligence information which, if disclosed or tampered with, will pose a serious threat or military missteps. The privacy-protecting technology is an indispensable part of wireless sensor networks [6][7][8][9][10].
Privacy Preserving-Data Mining. Data mining is the most important knowledge discovery tool in today's society. It can reveal the hidden rules behind large amounts of data for people. However, data sources used for data mining also contain a lot of individual privacy, business intelligence or government secrets. In the data mining process, if the data are used arbitrarily without any restraint, personal privacy and confidential information will be disclosed, and thereby people's daily lives and even social stability will be seriously affected [11][12][13][14][15]. It is a dilemma to pick up potential and valuable knowledge from the massive amounts of data in data mining and in the meantime preserve privacy. The ideal solution is to transform the raw data, and then prevent the 2 Computational and Mathematical Methods in Medicine direct and indirect access to private information, while the mining algorithms are still able to get from the converted data almost the same information and knowledge as those from the raw data [16][17][18][19][20].
Privacy Studies for Medical and Health Information. In the field of medicine, medical treatments and results must be based on the patients' privacy. With the application of information technology in the medical field, electronic medical records (EMRs) have become the main carrier of medical information. EMRs are prevailing in medical institutions because of their large storage capacity, saving of resources, convenient query, and good sharing of information, which improve the efficiency of diagnosis and treatment [21]. However, since EMRs contain a lot of patients' privacy and are easy to copy and spread, privacy protection is significant in the field of medical and health information [22][23][24].
In addition to the above, as new network applications emerge, some new privacy issues also need to be addressed. For example, in fields such as social network, data publishing, cloud computing, and the Internet of Things, privacy has attracted people's attention [25][26][27][28]. In recent years, location privacy in the mobile network also become a highlight as location-based services develop [29,30].
In this paper, we study the privacy in the health data collection for preschool children and present an identitybased encryption protocol to protect the identities of the children. The background of the protocol is as follows. A physical examination of preschool children is needed every year out of consideration for the children's health. After examination, data need to be transmitted through the Internet to the education authorities for analysis. In the process of data collection, it is unnecessary for the education authorities to know the identities of the children. Based on this, we designed a privacy-preserving protocol, which delinks the children's identities from the examination data. Thus, examination data can be transmitted over the network securely without the disclosure of the children's identities.
The rest of this paper is organized as follows. In Section 2, we briefly review the related works and discuss their relationship with our work. In Section 3, we describe the preliminary and cryptographic tools we use to build our protocol. In Section 4, we present the design of our protocol and analyze it. Finally, we conclude the paper in Section 5.

Related Works
With the application of information technology in the field of medicine and health, the privacy issues also begin to grasp people's attention. At present, research on EMRs focuses on three areas: privacy protection of raw data, access control of EMRs, and privacy-protecting medical information system [31][32][33].
Privacy Protection of Raw Data. Privacy protection of raw data refers to the fact that some technologies such as interference or anonymity are adopted to process raw data and form a new data set before the raw data are provided to others. After transformation, the new data set maintains the same distribution characteristics as the raw data, while it no longer contains personal information and therefore achieves the protection of individual privacy. Most of the existing privacy-protecting technologies of the raw data are based on anonymous method. Anonymizing the raw data will inevitably result in loss of information. Therefore, the research work is focused on finding the tradeoff between the availability of data and privacy protection [34][35][36][37].
Access Control of EMRs. Using a centralized management of rights, the access control technology is a defensive measure against unauthorized use of data. Its basic objective is to control access rights of users to EMRs or medical information system and thereby ensure that the medical data are used under authorization. Access control is an important measure to protect electronic medical data in information systems, which determines who can access the system and how the data are used. Appropriate access controls can prevent unauthorized users from making accidental or inadvertent access to data; however, the implementation of access control is complex, and the adjustment and management of rights are difficult [38][39][40][41].
Privacy-Protecting Medical Information System. In addition to the above privacy-protecting technologies, scholars designed some privacy-protecting medical information systems. In [42], Jieun Song and Myungae Chung put forward a safe framework of health privacy for environmental service model. The system includes authentication, access control and privacy-protecting service. In [43], Gardner and Xiong constructed an identity-conversion system to protect the health information of patients. The system uses conditional random fields method to extract identity properties from unstructured data and conduct identity conversion by anonymous method. In [44], Lin et al. proposed a privacy protecting scheme for electronic health systems and proved by formal reasoning that the scheme is able to protect medical privacy and context information simultaneously. However, till now, there is no perfect system architecture of privacy protection.
On the whole, privacy-protecting technologies in the field of medicine and health have made considerable progress. However, there are still some problems with the existing technologies. For example, security assumptions in the models are too strong to be adopted to the real scenario. In addition, existing privacy-protecting schemes have no universality. Every scheme is only for a specific situation or a specific privacy issue. As far as the privacy in the health data collection for preschool children in this paper is concerned, no existing schemes can be directly adopted. Thus, we design an identity-based encryption protocol for the privacy protection.

Preliminary
The identity-based encryption method is a kind of public key cryptosystems. In a public key cryptosystem, a public key of the other party is needed when users send encrypted messages or verify a digital signature. In order to ensure the legitimacy of the public key, the traditional public key cryptosystem adopts a public key infrastructure, in which a trusted party, called the certification authority (CA), is responsible for authenticating and issuing the corresponding public key certificate of users. The public key certificate binds the identity of a user with its public key. In this kind of system, the CA is responsible for the generation, issuance, storage, maintenance, and withdrawal of public key certificates for users, which requires a significant amount of computing and storage resources.
In 1984, an identity-based encryption (IBE) scheme was presented by Shamir [45], which simplified the management of public key certificates in traditional public key infrastructures. The IBE method directly adopts a user's identity information as the public key. The private key is generated by private key generators (PKGs). Therefore, the communicating parties can take each other's identity as public keys for communication encryption, without the need to get special public key certificates and authenticate the identities. The IBE method no longer needs the support of the CA, avoiding the establishment and management of public key infrastructure in traditional public key cryptography system. In 2000, Sakai, Ohgishi, and Kasahara suggested that bilinear maps on elliptic curves can be used to design the identity-based cryptography scheme. In 2001, Boneh and Franklin realized the first practical IBE scheme using bilinear maps on elliptic curves and proved that the scheme is resistant to chosen-ciphertext attacks in the random oracle model [46]. Since then, the bilinear maps on elliptic curves have gradually become the main tool of identity-based cryptography scheme.
In applications, the IBE scheme is typically composed of four algorithms [47].
(a) Setup. Select a security parameter , and get system parameters (params) and the master key. Params include a limited message space and a limited ciphertext space , which are open. The master key is private to PKG.
(b) Extract. Input params, the master key, and ID ∈ {0, l} * , and get the private key . ID is an arbitrary sequence as a public key; is the corresponding private key. The Extract algorithm is used to extract private keys from given public keys.
The above algorithms must be consistent. That is to say, for a given ID, when the private key is extracted by the Extract algorithm, there is Decrypt (params, 0 , ) = 0 , where 0 ∈ and 0 = Encrypt (params, ID, 0 ).
Based on bilinear maps on elliptic curves, we design our IBE scheme in this paper which slightly differs from the Boneh-Franklin cryptosystem but is equivalent in terms of security. It consists of four algorithms as follows.
Initialization. Let be a security parameter and be abit prime. Suppose 1 and 2 are two cyclic groups of prime order and̂: 1 × 1 → 2 is an admissible bilinear map with generator of group 1 . (See [46] for the definition of admissible bilinear maps). Assume that identities are -bit strings (where is polynomial in ). Consider a cryptographic hash function : {0, 1} → 1 . The public key generator (PKG) chooses ∈ {0, 1, . . . , − 1} uniformly at random and computes pub = .
Here is the master private key, while all other parameters mentioned above are public.
Private Key Generation. For an identity ID, the private key is ID = ID and the public key is ID = (ID).
Encryption. To encrypt ∈ 2 under identity ID, one can compute ID ( , ) = ( ⋅̂( pub , ID ), ), where ∈ {0, 1, . . . , − 1} is picked uniformly at random. (2) ) be a valid ciphertext under identity ID. Then, can be decrypted as follows: The protocol is homomorphic. That is to say, the following equation is satisfied: where the product of the two ciphertexts is defined as taking the product of each component of the ciphertexts.

Protocol Design and Analysis
Suppose ID is the identity of the preschool child and ID is the identity of a volunteer helper (who could be one of the preschool children volunteering to contribute his computational resources). We assume the administrator does not collude with the volunteer helper (if there is a risk of collusion, we can extend this protocol by adding more helpers, which is straightforward). Let be the total number of preschool children. Assume that, before the health data are transmitted, each preschool child has been assigned a unique number ∈ {1, 2, . . . , }, such that no two preschool children have the same number, that is, for any ̸ = , ̸ = .
The protocol includes two phases: a health data submission phase and a decryption phase.
In the th round of the health data submission phase, each preschool child first compares his own number with . If = , then he submits.
where is his health data and , is picked uniformly at random. If ̸ = , then he submits where , is also picked uniformly at random. Upon receiving the encryptions in the th round of the health data submission phase, the administrator computes In the decryption phase, the administrator first forwards all to the helper, who computeŝ and returns it to the administrator. Suppose = ( (1) , (2) ). Then, the administrator computes = ID (̂, (2) ) .

Conclusion
With the development of network technology, more and more data need to be transmitted over the network. Related privacy issues also become a hot research topic. We studied the privacy issue of health data transmitted over the network. For the sake of children's health, a physical examination of preschool children is needed every year. The data need to be transmitted to the education authorities over the Internet for health analysis after examination. In the process of data collection, it is unnecessary for the education authorities to know the identities of the children. Therefore, we designed a privacy-preserving protocol for health data transmission, which delinks the children's identities from the examination data. The protocol is composed of three algorithms: Setup, Encrypt, and Decrypt. At last, we proved the correctness of the protocol.