Hadoop-Based Healthcare Information System Design and Wireless Security Communication Implementation

Human health information from healthcare system can provide important diagnosis data and reference to doctors. However, continuous monitoring and security storage of human health data are challenging personal privacy and big data storage. To build secure and efficient healthcare application, Hadoop-based healthcare security communication system is proposed. In wireless biosensor network, authentication and key transfer should be lightweight. An ECC (Elliptic Curve Cryptography) based lightweight digital signature and key transmission method are proposed to provide wireless secure communication in healthcare information system. Sunspot wireless sensor nodes are used to build healthcare secure communication network; wireless nodes and base station are assigned different tasks to achieve secure communication goal in healthcare information system. Mysql database is used to store Sunspot security entity table and measure entity table. Hadoop is used to backup and audit the Sunspot security entity table. Sqoop tool is used to import/export data between Mysql database and HDFS (Hadoop distributed file system). Ganglia is used to monitor and measure the performance of Hadoop cluster. Simulation results show that the Hadoop-based healthcare architecture and wireless security communication method are highly effective to build a wireless healthcare information system.


Introduction
Since the first biosensor was introduced in 1962 by Clark and Lyons [1], there has been increasing demand for such analytical devices in real applications.Research initially focused mainly on detector principles and recognition elements; however, to obtain a user-friendly and well-performing analytical device, many components have to be considered [1].Biosensors have been developed for many years and research has become very popular in recent years.Advances in microelectronics, material science, and wireless communication technology have led to the development of micro sensors that can be used for the monitoring of bioinformation objects.So biosensors remain a subject of great popular interest [2].
Wireless sensor networks (WSN) [3,4] have received significant attentions due to their widespread applications in military and civilian environments.Sensors are low cost, lowpower devices which have limited resources.A sensor node typically contains a power unit, a sensing unit, a processing unit, a storage unit, and a wireless transmitter/receiver. Wireless biosensor networks can be used in healthcare system: inside a healthcare system, biosensors are placed or embedded in human body to monitor their blood pressure, body temperature, sugar level, heartbeats, and so forth.Biosensors constitute a wireless network and periodically monitor the health information of their hosts.Health monitoring involves collection of data about vital body parameters and making intelligent decisions.This information is required to be transferred securely.Insecurity information transfer can lead to much risk.So security and privacy [5,6] issues have become critical research fields in wireless biosensor network.Development of an effective security scheme is challenged by the limited storage, computing ability, and energy.In modern healthcare environment, it is important to design a security scheme based on small biosensor computing devices and big cloud computing resource.
In this paper, a novel Hadoop-based wireless healthcare architecture is proposed to protect data communication security in biosensor network.There are two main contributions in the paper.The first is that Hadoop-based biosensor wireless healthcare information system architecture is proposed.The second is that ECC-based digital signature and security communication method are implemented in Sunspot WSN.The rest of this paper is organized as follows: Section 2 reviews some related work on biosensor network and Hadoop-based healthcare system.Section 3 describes the system architecture of Hadoop-based wireless healthcare system.Section 4 describes the security data communication method based on small biosensor node and big cloud resource.Section 5 gives simulation results by Sunspot and Hadoop cloud platform.The conclusion is drawn in Section 6.

Related Work Analysis on Biosensor Network and Hadoop-Based Healthcare System
Biosensors, nanosensors, and biochips have become popular as a tool for medical diagnostics due to their noninvasive or minimally invasive nature [7].A biosensor is a probe that integrates a biological component, such as a whole bacterium or a biological product, with an electronic component to yield a measurable signal.It can detect and measure concentrations of specific bacteria or hazardous chemicals; it can also measure acidity levels (pH).Nanosensors provide new and powerful tools for monitoring in vivo processes within living cells [8].Biochips are designed by combining integrated circuit elements, an electrooptics excitation/detection system, and bioreceptor probes into a self-contained and integrated micro device.The rapid improvement in microprocessor and sensing material technology has led to a development of miniature sensors that can be implanted in the human body.The biosensor based approach to healthcare makes it much more effective by reducing the response time [9].The biosensor network consists of a group of biosensors implanted inside the human body, external device (control node) placed on the human body, and a base station, which are shown in Figure 1.
A network is formed by the biosensors between themselves and the control node.The control node is connected to an external base station, as shown in Figure 1.There are three types of wireless communication links in the biosensor network based health care system.They are the communication links between the biosensors, the communication links between the biosensor and control node, and the link between control node and the base station.All these wireless links are considered to be insecure due to the fact that the data is available on the channel.Therefore, data exchange using any of these communication links has to be secured.The set constraints experienced by the biosensors make existing solutions to sensor network security unsuitable for biosensor security.Hence biosensors security requires novel solutions.The algorithm used is a lightweight encryption algorithm.They use the error correcting codes and the multiple biometrics for securing the key for the problems of measurement errors and randomness problems.
Poon et al. explore the use of this conduit in the security mechanism of BASN (body area sensor network) [10], that is, by a biometrics approach that uses an intrinsic characteristic of the human body as the authentication identity or the means of securing the distribution of a cipher key to secure inter-BASN communications.The method was tested on 99 subjects with 838 segments of simultaneous recordings of electrocardiogram.
Perrig et al. [11] have presented a set of protocols for achieving requirements of security in sensor network.Their architecture consists of two blocks that are SNEP and Tesla.In SNEP they use symmetric keys to encrypt the data.Symmetric keys are also used to compute the Message Authentication Code (MAC).Both of these set of keys are derived from a master key which is shared by the nodes with the base station and are placed in them before being deployed.Tesla is used to achieve authenticated broadcast by delayed key disclosure.The keys are computed from the master predeployed key and the counter which is incremented after each block.
New generations of health care systems generally run on thousands of servers to meet the requirements of millions of users [12].Traditional health care data analysis systems are difficult in solving the process problems with massive data.They propose a massive data management and analysis solution based on Hadoop.They present data analysis methods based on MapReduce and Hive.Experiment results show that Hadoop-based framework improves the performance of data upload and data query.Hive-based data analysis method is suitable for massive data analysis tasks.
Traditional data storage for patients is not scalable enough for the increasing number of patients and applications [13].Cloud computing promises low cost, high scalability, and reliability which can be a potential solution for storing patients' medical records.They analyze the impact of cloud computing on improving healthcare services.The architectural design called "MedCloud" which utilizes and integrates services from Hadoop's ecosystem is given for medical systems development.
Kojima and Nagahashi propose the disaster-relief training system using the electronic triage tag [14].They design the graphical user interfaces to develop the scenarios of injured people information and transport information such as ambulance.They let the electronic triage tag generate vital signs of injured people constantly.By collecting and monitoring those data at regular intervals, they construct disaster-relief training system that enables medical staff to conduct a more practical training considering the change in symptoms of injured people.They use SunSPOT developed by Sun Microsystems as an electronic triage tag.
They propose a method to preserve the privacy and security of patients' portable medical records in portable storage media to avoid any inappropriate or unintentional disclosure [15].Following HIPAA guidelines, the method is designed to protect, recover, and verify patient's identifiers in portable EHRs.
IBM InfoSphere Guardium provides database activity monitoring and auditing capabilities that enable user to integrate Hadoop data protection into existing enterprise data security strategy [16].User can configure the system and use InfoSphere Guardium security policies and reports for Hadoop environments.It does not involve wireless sensor network security communication.
HDSM is a Hadoop-based distributed sensor node management system, which uses Hadoop MapReduce framework and distributed file system [17].Each sensor node imitates DVR (digital video recorder) for sensing video data.All sensor nodes are connected to HDSM manager via gigabit ethernet.So HDSM is not suitable to lightweight sensor node and application.
Cloudwave platform is proposed to access and query large volumes of electrophysiological signal data using the HDFS storage module.Cloudwave allows users to search for clinical events using ontology and semantics reasoning [18].However, it does not involve biomedical data security communication.
Seen from the above analysis, how to build security healthcare information system with biosensors and cloud computing is a great challenge to current healthcare information system design and implemention.

Architecture of Hadoop-Based Wireless Biosensor Healthcare System
To build a biosensor healthcare system, multidiscipline knowledge and techniques are needed, such as electronic engineering, bioinformatics, computer science, software engineering, communication technique, and information security.A novel architecture of Hadoop-based wireless biosensor healthcare information system is shown in Figure 2.
As shown in Figure 2, healthcare information system includes biosensors, Sunspot node, Sunspot base station, Mysql database server, and Hadoop cloud computing cluster server.They cooperate to fulfill healthcare information collection, transfer, storage, and processing.Breathing, heartbeat, pulse, blood pressure, and body temperature are important health parameters, which reflect human health status.Continuous monitoring and storage of the health data can provide important diagnosis references to doctors.Seen from Figure 2, biosensor can measure biological signal and converts it into a value.The health parameters can be collected by different kinds of biosensors; the biosensor circuit boards are connected to Sunspot wireless node by standard extension interface.In Sunspot node, there is a 20pin extension interface, in which it supports standard UART (Universal Asynchronous Receiver/Transmitter), I2C, analog signal input, GPIO (General Propose Input Output), Vcc 3V, Vcc 5V, and GND.So different types of biosensors can be connected to Sunspot node; serial port multiplexing technique can be used to connect many biosensors that use serial communication.So the proposed approach has good scalability in case of increase of the number of biosensors.
SunSPOT is a small wireless sensor network device; it is programmable device based on Java.SunSPOT is based on a 32-bit ARM-9 CPU and 11 2.4 GHz radio channels.SunSPOT devices can communicate with each other through the Zigbee protocol.So the measured value can be sent to base station by Zigbee protocol.Then the base station is connected to healthcare Mysql database server by USB interface.All the data received by base station are transferred to the healthcare server and stored in Mysql Database.With the monitor areas enlarging, the number of monitoring nodes will increase greatly.So Hadoop-based big data storage and process are needed.

Biosensor Information Secure Communication and Process
From the network security viewpoint, data communication between Sunspot wireless node and Sunspot base station is vulnerable to attack, such as data tampering attack.So we propose a lightweight digital signature and verification method to protect data communication security.As we know, biosensor has limited computing and storage resource; ECC (Elliptic Curve Cryptography) algorithm is a lightweight public key cryptography algorithm which is suitable for WSN.So we use ECC asymmetric cryptography algorithm (supported elliptic curve SECP160R1) to implement digital signature and verification.Biosensor information secure communication is shown in Figure 3.As shown in Figure 3, biosensor information secure communication is divided into three process steps.The first step is to generate public/private key pair; the wireless node generates its public/private key pair.In Sunspot programming API specification, there are Java class-ECPublicKeyImpl, ECPri-vateKeyImpl, and ECKeyImpl to perform ECC public/private key pairs generation; the method genKeyPair (publicKey and privateKey) in ECKeyImpl class is used to generate the public/private key pairs.Then it transfers its public key to the base station, which stores the public key in the background Mysql database server.Because the step is executed before data communication, it is independent of data transmission; attacker finds it difficult to get the key and the biosensor data simultaneously.So data secure delivery is provided in our communication process.To prevent attackers from tampering with the public keys stored in Mysql database, the public keys are backed up to Hadoop cloud storage.Sqoop are used to execute the task of exchanging data between Mysql database and Hadoop file system.The public keys from Mysql database will be backed up to HDFS storage system by Sqoop  import instruction.The public key can be recovered from Hadoop to Mysql database by Sqoop export instruction.
In the second step, when the wireless node receives the healthcare information from biosensors, it executes ECC digital signature algorithm to sign and protect the biosensor measure values.In Sunspot programming API specification, there is a Java class-signature to perform ECC signature and verification, which includes initSign, update, sign, and verify methods.A 160-bit standard compliant elliptic curve (secp160r1) is used to implement ECC signature by calling Sunspot programming API.
In the third step, when the base station receives the message which includes measure value, base station queries the Sunspot ID and reads its public key from background Mysql database; then base station uses the public key of Sunspot to verify the message.If the message can be verified successfully, the biosensor measure value will be stored in the background Mysql database.The doctors can check and analyze the health information to do intelligent diagnosis.If the message is tampered with by an attacker, it cannot be verified successfully; the message including measure value will be discarded.
Average execution time in Sunspot node is used to measure the overhead of the ECC signature and verification.The message is computed by SHA1 algorithm to get the Hash value; then the Hash value is signed by ECC signature algorithm.Signature and verification time are used to measure the overhead of ECC signature and verification processes in Sunspot wireless node.The experiment results are shown in Table 1.
We use average execution time as the overhead of ECC signature and verification processes on SunSPOT Sensor.Seen from Table 1, the overhead can be accepted, because the average execution time is millisecond level.The average execution time of ECC signature and verification is close, while the time of signature is a little more than that of verification.With the increase of message length, the average execution time of signature and verification increases simultaneously.
To most of signature application in wireless sensor network, the overhead of ECC signature and verification processes can be accepted on the aspect of average execution time.
Two tables are designed in the entity-relationship diagram (ERD) of database; the first table is Sunspot security entity table; it includes the Patient ID, Sunspot ID, and Sunspot public key, in which Patient ID and Sunspot ID are united to be primary key; Sunspot public key can be used to verify biosensor measure value.The security entity table can be backed up to the Hadoop HDFS to prevent public key from tampering attack.The second table is Sunspot measure entity table, which records biosensor health measure values that are measured from the patients.It includes the Sunspot ID, Biosensor ID, measure timestamp, and measure values, in which measure timestamp is the primary key of the Sunspot measure entity table.In every measure time point, the measure value is unique.The entity-relationship model of the two tables is shown is Figure 4.The two entities are connected by verification relation.
Seen from Figure 4, the relation between Sunspot security entity and Sunspot measure entity is verification.Only if the measure value is verified successfully, the measure value can be recorded in the Sunspot measure entity table, otherwise the measure value will be discarded.The integrity of measure values can be protected by the Sunspot security entity.The Sunspot ID of Sunspot security entity is the foreign key of Sunspot measure entity.The two entities can be connected by the Sunspot ID to execute joint query.verify the message; as the tampered with message is not correct, it may misdirect the doctor and harm the patient.

Using Hadoop to Store and Audit Data
Hadoop is an open source cloud computing and big data storage platform.It is used to store and audit the data measured by Sunspot in this paper.Sqoop is a software tool which can transfer data between Hadoop HDFS (Hadoop distributed file system) and structured databases such as Mysql.
In  6 and 7. Hadoop load and memory statistics results are shown in Figure 6.
Seen from Figure 6(a), the time range of statistic results is from 10:20 AM to 11:20 AM.When Hadoop cluster sets up at about 10:26, the number of running processes increases quickly when Hadoop initializes the cluster system.When security entity table is imported/exported from 11:05 AM to 11:15 AM, the number of process increases a little.It shows that import/export operations of Sqoop only use less process resource.
Seen from Figure 6(b), when security entity table is imported, the memory usage rate increases a little.It shows that import operation of Sqoop occupies a little memory resource.When Hadoop cluster sets up, the cluster memory almost keeps constant; it implies that the initial process of Hadoop almost occupies less memory resource.The statistics results show the difference between Hadoop initialization and Sqoop import/export operation.
The CPU and Network statistics results are shown in Figure 7.
Seen from Figure 7(a), when data table is imported/ exported between Mysql and Hadoop from 11:05 AM to 11:20 AM, there are only small waves in User CPU usage curve; the average User CPU usage is about 10% between 11:05 AM and 11:20 AM.Wait CPU usage curve only waves a little from 10:20 AM to 11:00 AM.
Seen from Figure 7(b), there are 4 peaks from 11:05 AM to 11:20 AM in Hadoop cluster network statistics graph.We execute Sqoop import instruction for three times and Sqoop export instruction only once during 11:05 AM-11:20 AM, and the heights of three peaks are different because the amounts of import data are different.In the second import operation, the amount of import data is maximum.The last peak is the lowest, because data export operation occupies less network bandwidth.

Compare the Presented Architecture to Related Works
To show the advantage of our scheme, we compare the presented architecture with related works, which are SNEP and Tesla [11], Cloudwave [18] and MedCloud [13].We compare them from three aspects: lightweight digital signature, big data storage, and interaction with relational database, which are important in the security of wireless healthcare system.The comparison result is shown in Table 2.
Seen from Table 2, our architecture supports lightweight digital signature, big data storage, and interaction with relation database, which achieved better results than other works.Our architecture can meet the security requirement of wireless communication and big data storage in healthcare information system; at the same time, it can exchange data with traditional database.

Conclusion
Recently, biosensor based healthcare information systems attract many researchers and engineers' attentions with the development of microelectronic, Bioinformatics science, embedded computing, wireless communication, and cloud computing.How to use different kinds of biosensors to build an efficient healthcare information system is a challenge problem to current researchers.A novel Hadoop-based wireless healthcare system architecture is proposed in this paper; Sunspot wireless nodes are used to build the wireless biosensor network.A lightweight ECC digital signature algorithm is used to provide secure communication between wireless node and base station.Hadoop cloud platform is used to backup and recover Sunspot security entity table.Sunspot simulation platform and Hadoop cluster are used to validate wireless healthcare system and secure communication method.The main contributions of this paper are the following three aspects: (1) A novel Hadoop-based biosensor Sunspot wireless network architecture is proposed to build human healthcare information system.Multidiscipline knowledge is used to construct a complex healthcare system, such as bioinformation science, wireless sensor network, cryptograph and information security, security communication, database and information system, human-machine interaction, and cloud computing.
(2) To ensure the data communication security in healthcare system, a lightweight ECC digital signature algorithm and Hadoop-based data backup and recovery method are proposed to authenticate Sunspot wireless node and protect Sunspot key.
(3) Sqoop tool is used to import/export data between Mysql database and Hadoop HDFS cloud storage; security administrator can use it to protect and manage key data.
Simulation and monitoring results show that our healthcare information system architecture and secure communication method are highly effective to counter potential data tampering attacks.In the future, more sensor types, such as Beidou position sensor will be integrated in wireless node to provide patient's precise position information.To ensure the security of measure value, lightweight encryption algorithm will be used to protect the confidentiality of measure value.At the same time, more information will be imported from Mysql database to Hadoop HDFS to improve the security of healthcare information system.

Figure 3 :
Figure 3: Biosensor information secure communication and process.
Hadoop cluster memory last hour.

Table 1 :
Execution time of ECC signature and verification on SunSPOT Sensor (ms).
our Hadoop storage experiments, five computer nodes are used to build cloud storage system.One of them is namenode; the other four nodes are datanodes.The import instruction of Sqoop is used to periodically import data tables from Mysql database to Hadoop HDFS.The data tables include Sunspot security entity table and Sunspot measure entity table.Only security entity table is exported to Hadoop HDFS.To ensure the secure audit, only security administrators are granted access to the audit data stored in the Hadoop HDFS.If the security administrator doubts that the security entity table is tampered with, he can check the data on the Hadoop HDSF to detect the tampering attack behaviour.Ganglia tool is used to monitor Hadoop's running status, when the table is imported/exported by Sqoop.The monitoring result of Hadoop is shown in Figures