Blockchain-Based Privacy Protection Scheme for IoT-Assisted Educational Big Data Management

Adoption of the Internet of Things (IoT) in education brings many benefits. However, the poor implementation of access control of educational data produced by the IoT devices has brought students’ and teachers’ privacy into danger. Attackers can access educational data that they are not permitted to access and even erase the records during access. To tackle this problem, we employ blockchain technology to guarantee the integrity of access control rules and trace the records of access events. In this paper, we propose a blockchain-based access control scheme for the data produced by IoT devices. The scheme consists of three components: (1) a well-implemented data collection module that is deployed in smart classrooms, which collects and uploads data about the real-time situation inside the smart classroom to the data center; (2) a MongoDB-based data center and its control module that makes access control decisions based on the verification of the permissions of visitors, where the permissions are managed by blockchain; and (3) a customized blockchain system that stores and keeps security policy updates of the role-based access control module and records access events in a trusted way. Our analysis indicates that the proposed access control scheme guarantees the correctness of the access control process and makes the access of collected educational data auditable and responsible. Our system collectively analyzes the context of the smart classroom and is capable of detecting multiple scenarios such as absence, lateness, and gunshot. We show how the scheme preserves students’ and teachers’ privacy by carrying out extensive experimental studies. The results indicate that the proposed data management system can give correct responses as quickly as a traditional data server does while preserving privacy.


Introduction
With the rapid development of Internet of Things (IoT), cities around the world are becoming smarter and smarter. One of the most widespread application scenarios of smart city is smart education, where educational big data is collected through multiple IoT devices deployed in smart campuses and smart classrooms and stored for a variety of data processing and analysis tasks.
Adoption of IoT in education has been widely studied. Marquez et al. [1] proposed a model to integrate objects to Virtual Academic Communities (VAC). Their results indicate that the adoption of IoT yields a more engaging learning environment for learners, and the instructors can obtain more information about the learning process, which in turn enhances the pedagogical process. Moreira et al. [2] conducted a study to provide personalized education to learners by using the data collected through IoT, cloud computing, and learning analytical tools. It is indicated that this approach is able to provide personalized curricula that depend on the abilities of each student. Last but not least, Bagheri and Movahed [3] showed that the use of IoT in education is not limited to teaching and learning. Their study indicated that IoT in education can be used to (1) manage energy and monitor ecosystem; (2) implement secure campus and classroom access control; and (3) monitor student's health. In one word, adoption of IoT in education brings many benefits.
However, the access of the educational data produced during the work flow of these applications is not carefully controlled. Particularly, the privacy of the involved teachers and students is in danger of being violated. There exist many instances demonstrating the severity. Here are a couple of examples. InBloom was a nonprofit educational technology company, which developed educational technology products to provide students with personalized learning services. But inBloom survived only 15 months. The main reason lies in that the information collected by the company involves too much privacy of students, and the company shared these data with other companies. Eventually, public protests and pressure from public opinion caused the company to apologize and shut down [4]. In September 2016, a high school student in Tianjin broadcasted the scenes of her classmates' learning, breaks, outdoor activities, etc., on a live broadcast platform, without the attention of her classmates. There were hundreds of people viewing the live broadcast, and some of them posted explicit information and messages, which include the personal information of the students [5].
As more and more schools are in progress of having smart campuses and smart classrooms, more and more IoT devices are used by students and teachers to interact. The involved privacy problems brought by IoT urge to be solved, which can be summarized as follows: (1) The uses of the sensors and the data produced by the sensors are unlimited. Access control schemes of the educational data are not well implemented. Attackers can cross the access restrictions by tampering with the access rules using methods such as SQL injection (2) The access of the data is not auditable. Attackers can erase the records of their visits using simple methods To address these issues, we propose a blockchain-based access control scheme to ensure that the probability at which an adversary successfully accesses the data is a negligible probability. Our scheme consists of (1) a well implemented data collection module that is deployed in smart classrooms to collect and upload data to the data center; (2) a MongoDB-based data center and its control module that checks permissions on the blockchain and implements the results of the permissions; and (3) a role-based access control module maintained by a customized blockchain system that manages the access permissions and records access events in a trusted way.
The contributions of this paper are summarized as follows: (1) We propose an educational data access control scheme to support trustworthy educational data management. We use blockchain as a trusted, distributed database to store and keep the updates of the security policies involved in the role-based access control scheme, thus achieving secure and trusted data management. We illustrate that our scheme is effective to preserve privacy for IoT-assisted educational big data management. By using blockchain to record the visit events of educational data, we make the access of educational data auditable (2) We fully implement an educational data collection and access control system. The system includes a data collection module deployed in a smart classroom, a MongoDB-based data center and its control module, and a role-based access control module running on top of a customized blockchain system. Our system collectively analyzes the context of the smart classroom and can detect multiple scenarios such as absence, lateness, and gunshot (3) We test the correctness and performance of our system. The results indicate that our system gives correct responses to users in less than one second, which is an acceptable performance for most application scenarios The paper is organized as follows. Background and related works are presented in Section 2. Our blockchain-enabled access control scheme for educational data is proposed in Section 3. Experimental studies are reported in Section 4, and the paper is concluded in Section 5 with a discussion.

Previous Knowledge.
Here, we introduce the key technologies and their related concepts used in our work.
2.1.1. Role-Based Access Control. We use the role-based access control (RBAC) model to represent and manage access privileges of the educational data. Role-based access control is a policy-neutral access-control mechanism defined around roles and privileges. Within an organization, users are grouped into different roles. The permissions to access certain series of data or to perform certain operations are assigned to specific roles rather than specific users. RBAC play a role as the bridge between users and permissions. A role represents a set of users and takes place of the users to be assigned permissions to, for simplification, clearance, and performance. In fact, there exist many other access control schemes such as attribute-based access control (ABAC), access control matrix (ACM), access control list (ACL), and capability-based access control (CapBAC). RBAC is proved to be equivalent to ACM with respect to the policies they can represent. Besides, RBAC is one of the most widespread, clear, and easy-to-develop access control models. The components of RBAC such as role-permission, userrole, and role-role relationships make it simple to perform user assignments, especially for user assignments on blockchain, because role-permissions, user-role, and role-role relationships are highly isomorphic with transactions on blockchain. And by maintaining RBAC with a blockchain system, we can guarantee that all access privileges are correctly stored and cannot be tampered with.
2.1.2. Blockchain. In our scheme, role-based access control is maintained by a blockchain system. Blockchain has served as a trustworthy environment for many different applications, ranging from secure transactions to trusted verifiable computing. Generally, blockchain can be regarded as a distributed ledger, which is kept by a series of computers called blockchain nodes. To make sure that every blockchain node keeps the same ledger, blockchain systems use consensus Although BFT consensus algorithms are well studied, their performance and scalability are still restricting their applications. In this paper, we choose PoW to be the consensus algorithm of our blockchain system. We make this choice for two reasons. On the one hand, PoW is the consensus algorithm for the first blockchain: bitcoin blockchain [6]. On the other hand, PoW is the most widely adopted consensus algorithm in blockchain community.
In blockchain, events recorded on the ledger are called transactions, and transactions are packed into blocks to be added to the end of the blockchain. In PoW, nodes compete to get the right of packing blocks. To get this right, nodes need to find a nonce, by appending the nonce to the block and calculating its hash; the outcome hash is smaller than a predefined threshold. This process is called mining. As the outcome of the hash process can be seen as completely random, the only way to find such a nonce is to guess and try. Then, we can expect that nodes need to try many times to find a valid nonce and add the block to the blockchain. In our experiment, the difficulty of finding a valid nonce was decreased to make the blockchain system run faster.

Related Work.
Before cloud computing becomes prevalent, most information and data are stored locally in users' computers. As cloud computing and mobile network prevail, educational programs, applications, and data are stored in clouds, and users do not know the specific storage location of personal data. In [7], Madeth raised awareness of privacy issues in E-learning that implicate user tracking and personal data usage for instructional purposes. In response to these privacy problems, a widely adopted method is to evaluate privacy-preserving technology of educational technology products. In fact, many schools and districts in the United States conducted privacy technology reviews on commonly used educational technology products with the help of technology review organizations [8]. The results indicate that most educational technology products cannot protect privacy. To solve the privacy problem fundamentally, a secure and trusted data management system is needed.
In fact, people have been trying to protect education privacy. Specifically, the main practices of American society to protect student privacy include three aspects: (1) publishing education privacy laws and regulations [9], (2) setting dedicated student privacy protection position at education departments [10], and (3) carrying out technical privacy reviews for educational products. At the technical level, preserving IoT data privacy in crowdsourcing with blockchain was studied by [11,12]. Blockchain-based privacy preserving schemes on data uploading, trading, and sharing were explored by [13][14][15]. Blockchain systems addressing wireless challenges such as channel variation and adversarial jamming under IoT settings were thoroughly studied in [16,17]. A cloud-enabled blockchain to support IoT applications taking advantage of the advances such as remote direct memory access and shared memory technology was presented in [18]. Trust extension from on-chain to off-chain and ground-truth data collection to blockchain were, respectively, investigated by [19,20]. To the best of our knowledge, there is a lack of decentralized, trusted, automated access control solution to protect educational privacy, which is what this paper intends to address.

Blockchain-Based Access Control of Educational Data
In this section, we describe the details of our blockchainbased access control scheme of educational data. As illustrated in Figure 1, our scheme consists of a smart classroom with IoT devices, a data center, and a role-based access control module running on a blockchain system.

Smart Classroom.
IoT devices continuously monitor and collect data in smart classrooms. The collected data is uploaded to the data center for further processing and analysis. In this paper, our IoT devices include sound sensors, RFID sensors, and cameras. Sound sensors can be used to monitor whether most students are studying attentively or just chatting with each other. They can also be used as gunshot detectors, to set up alarm and notify the police when gunshot is detected. RFID sensors can be used to record attendance of teachers and students, by giving each teacher and student an RFID card. Cameras can take photos and videos of the interior of the classroom. They can be very useful, because based on Artificial Intelligence and Computer Vision technologies, photos and videos can be used to recognize human faces, analyze students' focus and emotion, and extract many other useful information.
For privacy concern, we use a sound detection module as the sound sensor. It only senses the sound intensity of the environment without collecting detailed sound information such as the timbre, frequency, phase or, any other information about the waveform. So, the content of conversations in the classroom is not recognized. The sound detection module outputs a value between 0 and 1023 representing the current sound intensity in the environment. A larger output value means a louder environment. Particularly, as our experiment shows in Section 4, the output value ranges between 21 and 24 in a relatively quiet environment, and goes up to between 30 and 50 when a loud sound is detected.
Most schools, companies, and organizations use RFID sensors to take check-in and check-out records for their members. We do not explain how RFID sensors work here in detail, as it does not affect the design of our system. But we introduce how we use RFID sensors to collect important data. When a check-in action is detected (someone has tapped his/her RFID card or RFID tag at the RFID sensor), the user's RFID (usually a 4-byte array), the type of action (check-in or check-out), user's name, role, and the time and location of the action are collected. Besides, we calculate the SHA256 hash [21] of a record as its digest. Formally, Hash = SHA256 action + RFID + name + role ð + time + location + GPSÞ: Using hash, we can setup a trusted digest to the activity, verify the integrity of data, and increase difficulty for attackers to tamper with the records.
All the collected data including sound, RFID records, and photos are uploaded and stored in the data center, for further processing and analysis.

Data
Center. Collected educational data are stored in the data center. Teachers, parents of students, education managers, or someone else may need to access these data for different reasons, such as making educational decisions,   MongoDB is a popular NoSQL, nonrelational database for modern app development [22]. When compared to relational databases, NoSQL databases are often more scalable and can provide superior performance. SQL databases are most often implemented in a scale-up architecture, which is based on larger computers with more CPUs and more memory to improve performance, while NoSQL databases are created in Internet and cloud computing eras that make it possible to more easily implement a scale-out architecture. In addition, the flexibility and ease of use of their data models can speed up development in comparison to the relational model, especially in IoT and cloud computing environments.
In our design, the database stores three kinds of data: sound records, check-in and check-out records, and photo records. For a sound record, we store 5 fields: record hash, time, value, location, GPS: a check-in and check-out record contains 8 fields: record hash, action type, RFID, user's name, user's role, time, location, and GPS. And a photo record contains 5 fields: record hash, time, value, location, and GPS. Figure 2 shows one example of each kind of data.
We attach each record of data its hash as its index in both the data center and the blockchain. To protect privacy of students and teachers, access of these data should be under control. Data center should only allow authenticated access of designated data. In our access control scheme, we adopt the role-based access control model and deploy it on a customized blockchain system. When a visitor requests to access some data, the data center checks on the blockchain whether the visitor's role is allowed to access the requested data. If so, the data center grants to the visitor the access right to the data. Otherwise, the data center refuses the visitor's request. Based on this principle, we propose a control module of the data center to process visitors' access requests and verify

Wireless Communications and Mobile Computing
the permissions of visitors' access on the blockchain. Specifically, the control module is programmed to synchronize the state of RBAC module as a blockchain node and makes access control decisions based on the state of the RBAC module. Algorithm 1 shows the main frame of the control module's workflow.
In our implementation, the main thread of the control module runs the runServer function, which continuously waits for access requests. When receiving a request, runServer parses parameters of the request and verifies whether it is permitted or not, using the core part of the control module, verify function.
The verify function first checks the signature to make sure that the access request is sent by the corresponding user uid. Then, it extracts the tags of the data and examines the role of uid. Following that are two for loops, with the first one checking if the role of uid has been banned from some tag of the data and returns false if it is true and the second one checking if the role of uid has been authorized to access the data and returns true if it is true.

RBAC Blockchain
System. As mentioned earlier, rolebased access control (RBAC) is a policy-neutral accesscontrol mechanism defined around roles and privileges. The components of RBAC such as role-permission, user-role, and role-role relationships make it simple to perform user assignments. A study by NIST has demonstrated that RBAC addresses many needs of commercial and government organizations. RBAC can be used to facilitate administration of security in large organizations with hundreds of users and thousands of permissions. Although RBAC is different from mandatory access control (MAC) and discretionary access control (DAC) frameworks, it can enforce these policies without any complication. Under the role-based access control model, users are grouped into several roles. Access actions of users are permitted or refused based on their roles.
For example, in our experiment described in Section 4, the roles and access control rules are designated as Table 1 shows. There are 5 roles in total: students, students' parents, teachers, education managers, and unauthorized people.
In our access control scheme, we use a blockchain system to implement the RBAC model. We authenticate user identities using SHA256 signatures and public cryptography schemes. The identities are registered on the blockchain via an authenticated trusted blockchain node. After registration, a public-private key pair is generated for each user, and the trusted node broadcasts a user-registration transaction on the blockchain. The transaction includes designated role for the user, and the public-private key pair can be used to verify whether the role in the transaction is designated to the user by verifying the SHA256 signature. To achieve role-based access control features, we implement 4 transaction types: (i) User-registration: as described above, we use userregistration transactions to register an identity for a user. The format of a user-registration transaction is fuser − reg, pk uid , roleg, in which uid is the user's  (iv) Access-result: the blockchain system employs accessresult transactions to respond to the data center's query about whether a user can access some data. Transaction fresult, uid, tag, t, true/f alseg means the user of id uid can or cannot access the data of tag tag, where t is the timestamp of the request action. Access-result transactions play the role of an immutable access log and make the request action auditable and responsible The main benefit of running an RBAC model on a blockchain lies in that as all transactions are confirmed by all blockchain nodes, and no adversary can change any user's role at its own will.

Experiment
In this section, we report the evaluation results of our system in a practical scenario. We implemented the blockchainbased educational data access control system and used the system to perform the whole process of the educational data from collection, storage, to controlled access. Figure 3, we used a data collection module to simulate the IoT devices in a smart classroom. The data collection module was implemented on an Arduino   (1) OV7670, a camera module (2) SY-M213, a sound sensor module (3) RFID-RC522, an RFID module At the Arduino UNO, we develop and assemble the drivers of the camera module, sound sensor module, and RFID module in C language. The Arduino UNO board was a microcontroller board based on the ATmega328P, which supports USB connection with a computer [23]. At every second, the camera module uploaded a 320 * 240-sized grayscale image and the sound sensor module uploaded its output (it measures the sound intensity of the environment). The RFID module uploaded a record each time an action was detected.

Setup. As shown in
All the collected data were uploaded to a Lenovo G580 PC running Windows 10 Professional 20H2 through serial port. We developed a Python program to display the data read from serial port. It ran on the Lenovo G580 PC and uploaded data to the data center while displaying the collected data.
For the data center, we developed a control module using Python to process visitors' access requests and verify the authentication of visitors' access permissions on the blockchain. This module operated as both a server and a blockchain node. It read role-based access control information in on-chain transactions. If a visitor's access permission was authenticated, the control module would fetch data from the MongoDB database and send the data to the visitor by writing the data into the response body of the HTTPS connection. The control module and MongoDB ran on a 16inch 2019 MacBook Pro with 8-Core Intel i9 @ 2.4 GHz and 16 GB memory that operated on macOS 11.3.
We developed our own blockchain system using Golang for the best flexibility of customization. Golang is a popular programming language in blockchain community and has become a go-to language for developing decentralized systems [24]. We used PoW consensus algorithm, which has practically the best performance and scalability. The PoW difficulty was reduced to 16 leading zero bits as we did not have as much computing power as the bitcoin network has to produce blocks in an acceptable time. That is, mining nodes needed to find a nonce that by appending the nonce to the block data, the produced SHA256 hash had 16 zero bits in the front. So, the expected try times of different nonces for mining a block were 2 16 = 65536. We ran the blockchain system on three computers, with each having 8-Core Intel i7-9700 CPU @ 3 GHz and 16 GB memory and running   Wireless Communications and Mobile Computing Windows 10 Professional 20H2. One blockchain node ran on each of the three computers. Figure 4 shows the average time the blockchain network takes to mine a block, under different difficulty settings. In our implementation (targetBits = 16), the difficulty-reduced PoW blockchain network takes 526 ms to produce a block in average.

Data Collection.
Using the RFID module, we recorded check-in and check-out actions at two different locations (shown in Figure 5). Each time a teacher or a student checks in (swipes his/her RFID card at the RFID sensor), information of his/her identity and the check-in and check-out action including time, location, and hash are collected and uploaded by the RFID module. In our experiment, location 1 is the office of the teacher, and location 2 is the classroom where the students study. Details of these actions are shown in Table 2.
In this experiment, we monitored the environment sound intensity in a classroom with the sound sensor module. As shown in Figure 6, three patterns of environment sound intensity were recorded. In the first pattern (Figure 6(a)), the classroom was relatively quiet, and the uploaded value from the sound sensor module was ranged from 21 to 24. In the second pattern ( Figure 6(b)), the classroom was noisy, so the sound sensor module uploaded value higher than 25 with a high frequency. In the third pattern ( Figure 6(c)), we simulated a gunshot scene with a loud speaker. From the 110 seconds to the 200 seconds, we used the speaker to a play gunshot sound at the entrance of the classroom. After the sound was played, from the 200 seconds, students in the classroom began to scream; then, the classroom became as noisy as it was in the second pattern.
We also used the camera module to take photos of the interior of the classroom. Limited by the performance of the OV7670 camera module, only one photo per second was taken. Figure 7 shows three representative scenes in the classroom: students studying in the classroom (Figure 7(a)), students leaving the classroom when the class was over while several students chose to stay for discussions (Figure 7(b)), and an empty classroom (Figure 7(c)).
These educational data were all uploaded and stored in the MongoDB database. Further analysis and data process can be done after access control.

Access Control of Collected Data.
For simplicity and convenience, we ran the three nodes of the blockchain system, the data center, and the data collection module in the same local area network. This resulted in low network latency. By sending role-registration transactions and userregistration transactions, we registered 8 users of 5 roles: 3 students, 2 parents, 1 teacher, 1 education manager, and 1 unauthorized person. By sending rule-edit transactions, we created the following role-based access control rules: (iv) The unauthorized person cannot access any data, as he or she is not authorized Then, we tested our educational data access control system. We used different uid-sk uid pairs to simulate different users and sent requests to the data center to access the data collected from the smart classroom. We sent 100 requests, 50 of them were good ones that should be accepted, while the other 50 were bad that should be refused. As a result, our access control system worked correctly. That is, for all the 50 good requests, the data center sent data to the user, and for all the 50 bad ones, the data center refused to offer data to the requester. Besides, the result of each request was logged on the blockchain in the form of an access-result transaction. We also analyzed the performance of our access control system. In our observation, it costs the user 13 ms in average to get a refuse message (Figure 8(a)) or 157 ms in average to get the requested data (Figure 8(b)), counting from sending a request to the data center. It can be concluded that the response time of our educational data access control system under local area network is acceptable.

Conclusions
In this paper, we proposed a scheme to preserve privacy in educational application of IoT. We achieved our privacy preservation goal by implementing a blockchain-based access control system. We implemented the full system including the components of collecting educational data, storing the data in a data center, and maintaining a role-based access control on educational data. Our scheme consists of a data collection module, a MongoDB-based data center, and the role-based access control module running on top of a blockchain system. Our educational data access control system guarantees correct execution of the access control rules and makes the access events of educational data auditable and responsible. Our experiment results indicate that our access control system gives a correct response as quickly as a traditional data server. What is more, our access control system was designed to be relatively general-purpose. So, it can be easily extended to other application fields, by including necessary IoT devices and implementing drivers for them.

Data Availability
The recorded data used to support the findings of this study are included within the article.