A Micro Neural Network for Healthcare Sensor Data Stream Classification in Sustainable and Smart Cities

A smart city is an intelligent space, in which large amounts of data are collected and analyzed using low-cost sensors and automatic algorithms. The application of artificial intelligence and Internet of Things (IoT) technologies in electronic health (E-health) can efficiently promote the development of sustainable and smart cities. The IoT sensors and intelligent algorithms enable the remote monitoring and analyzing of the healthcare data of patients, which reduces the medical and travel expenses in cities. Existing deep learning-based methods for healthcare sensor data classification have made great achievements. However, these methods take much time and storage space for model training and inference. They are difficult to be deployed in small devices to classify the physiological signal of patients in real time. To solve the above problems, this paper proposes a micro time series classification model called the micro neural network (MicroNN). The proposed model is micro enough to be deployed on tiny edge devices. MicroNN can be applied to long-term physiological signal monitoring based on edge computing devices. We conduct comprehensive experiments to evaluate the classification accuracy and computation complexity of MicroNN. Experiment results show that MicroNN performs better than the state-of-the-art methods. The accuracies on the two datasets (MIT-BIH-AR and INCART) are 98.4% and 98.1%, respectively. Finally, we present an application to show how MicroNN can improve the development of sustainable and smart cities.


Introduction
International Telecommunication Union (ITU) and the United Nations Economic Commission for Europe (UNECE) jointly put forward the construction scheme of a sustainable smart city [1,2].
e scheme aims to use information technology to improve the level of people's living standards and increase the efficiency of urban services [3]. Problems, such as uneven distribution of medical resources and low efficiency of disease treatment, have gradually become prominent in urban construction [4,5]. Many research works [6,7] explore advanced Internet of ings (IoT) and artificial intelligence technologies to solve these problems to promote the development of urban intelligence and sustainability. e rapid development of deep learning technology and the Internet of Medical ings (IoMT) has brought new opportunities and challenges to medical development in the construction of smart cities [3]. In recent years, some algorithms [6,8] based on deep learning have been proposed to classify healthcare sensor data streams to solve the problem of medical problems in the process of urban development. Deep convolution neural network (CNN) [9] and deep recurrent neural network (RNN) [10] are two popular methods for classifying healthcare sensor data streams. e former is mainly represented by the one-dimensional convolutional neural network, which can extract the features of one-dimensional time series data [11]. e latter mainly serializes the neurons to process the serialized data, so that the neurons among the hidden layers can be related to each other [10]. Most of the existing healthcare sensor data classification methods are improved based on the above two methods. However, these methods are difficult to deploy in edge devices because of their large time and space complexity [12].
To reduce the reasoning time and spatial complexity of the model, different lightweight neural network models are proposed in the literature [13,14]. ese methods can be divided into three scenarios: artificially designed lightweight neural network, neural network model compression algorithm, and automatic design of neural network structures [15]. In the first scenario, the model is made lightweight by reducing the number of parameters, for example, limiting the number of channels of features [16,17], using decomposition convolution operation or 1 * 1 convolution kernel [18], etc. However, the design process of this scenario needs a lot of time [19]. e second scenario mainly uses knowledge distillation [20] and network slimming [21] to compress the network model. Unfortunately, these methods often realize the lightweight of the model at the cost of sacrificing the performance of the model. e third scenario is to automatically design a neural network architecture to solve a specific task according to a certain search strategy [15,22,23]. When using the methods based on the above scenarios to classify healthcare sensor data streams, the accuracy of the models is not very high. It is mainly because these models do not consider how to distinguish classes with similar features [24,25].
In contrast to the above methods, this paper proposes a novel model that ensures the classification accuracy of each class while ensuring the lightweight of the model, called MicroNN. Since RNN has the advantage of memory preservation for time series data, the architecture based on multilayered RNN [26] is used as the feature extractor of MicroNN. In addition, to improve the identification ability of MicroNN between classes with similar features [27], Kullback Leibler divergence (KL divergence) is introduced in this paper. Experiments show that the overall accuracy and the classification accuracy of each class using MicroNN exceed other work. Our main contributions are as follows: (i) MicroNN model is composed of a microfeature extractor and some miniclassifiers. (ii) MicroNN uses a method based on KL divergence to eliminate shared knowledge among classes. (iii) We conduct comprehensive experiments based on time complexity and space complexity. e rest of this paper is organized as follows: section 2 presents the related work, section 3 introduces the proposed model, section 4 shows the experiment, section 5 describes an application scenario of MicroNN, and section 6 summarizes this work.

Related Work
E-health has become a part of the development of sustainable and smart cities [2,32]. With the mature development of deep learning and IoMT, healthcare sensor data stream classification based on edge computing has become possible [1,33,34]. It will effectively alleviate the uneven distribution of urban medical resources and further accelerate the intelligence development of cities.
According to the survey [6], different diseases are bothering mankind, which seriously threaten human life and quality of life. Nowadays, how to detect and avoid related diseases as soon as possible has become a major issue in urban development [1,35]. erefore, disease diagnosis based on healthcare sensor data stream classification has become a hot research topic. Many pieces of research use traditional machine learning methods to classify healthcare sensor data streams, which rely heavily on the characteristics of manual design. Behadada and Chikh [36] proposed a method based on the fuzzy decision tree to improve the detection of arrhythmias. Nasiri et al. [37] designed a model based on the support vector machine and genetic algorithms to diagnose cardiac arrhythmia with relatively high accuracy. Bensujin and Hubert [38] raised a method by combining the K-means clustering algorithm and bacterial foraging optimization algorithm to examine the heart situation of a person. Sharipov [39] used principal component analysis to improve the cardiac diagnosis via ECG. Jadhav et al. [40] proposed static backpropagation algorithms and the momentum learning rule for diagnosing heart diseases.
At present, because of the excellent performance of deep learning technology in the fields of image classification and text recognition, more research works are trying to apply the deep learning model in the field of disease diagnosis. Liu et al. [26] developed a model based on a multiple-feature-branch convolutional neural network for checking the patient's abnormal heartbeat. Chen et al. [28] proposed a new end-to-end scheme using a convolutional neural network (CNN) for automated ECG analysis. Saadatnejad et al. [30] proposed multiple long-short term memory (LSTM) models to monitor the status of heart activity. Faust et al. [31] proposed a bidirectional LSTM for beat detection. Jun et al. [29] used a CNN model with more layers by transforming the healthcare sensor data into a two-dimensional gray image.
Our work is different from the above work. In Table 1, we compare MicroNN with the discussed methods in terms of space complexity. It can be found that the space complexity of the models discussed is relatively larger than MicroNN. It makes some models not widely used in portable devices or edge devices. erefore, this paper not only considers the accuracy of the model but also further considers the space complexity of the model ( Table 2).

System
Overview. MicroNN mainly includes three parts: preprocessing model, microfeature extractor, and miniclassifiers. Figure 1 shows the overall architecture of MicroNN. Table 2 is an explanation of the notations used in the paper. e workflow of MicroNN is as follows: a physiological information record X � x 1 , x 2 , . . . , x n . e preprocessing model splits the record into slices with equal length n, and each slice refers to en, the microfeature extractor is used to extract the features of S i , F i � e 1 , e 2 , . . . , e m . Finally, the feature F i of S i is input into each miniclassifier f i to obtain the corresponding score. Hence, the label of heartbeat S i is y, as shown in (1).

Preprocessing Model.
Physiological signals are mainly measured by some mobile edge devices. However, as physiological signals have the characteristics of low amplitude and low frequency, it is easy to be disturbed by noise in the acquisition process [39]. ese noises mainly come from internal or external interference [36]. erefore, the wavelet transform [41] is used to denoise the original signal in this paper. Firstly, the original data is decomposed into nine scales. en, the wavelet coefficients of nine scales will be processed by threshold operation [41]. Finally, we reconstruct the original data by inverse wavelet transform. Figure 2 shows the changes in physiological signal records (such as ECG) before and after denoising. Secondly, each physiological signal record is segmented into slices based on the annotations provided by the standard file [42]. Each slice S i was normalized, S i,j � S i,j /‖S i ‖ 2 , where S i,j represents the j th point of S i and ‖S i ‖ 2 refers to the 2-norm of a heartbeat slice S i .

Microfeature Extractor and Miniclassifiers.
In the past, many research works used the convolutional neural network (CNN) as a feature extraction model. However, as CNN needs more computing and storage resources [26], it is difficult to deploy it in edge devices. Consider that the recurrent neural network (RNN) has a memory function in the processing of medical time series data and that its volume is smaller than that of the convolutional neural network [28]. Inspired by ShaRNN [43], this paper mainly adopts the collection of multilevel RNNs as the feature extractor (see Figure 1). Firstly, it should be noted that we set the RNN collection with two levels. We set the slice data after preprocessing as S i � v 1 , v 2 , . . . , v r , and we will divide it into some slices whose size is ω. S i will generate n/ω slices, and we use A k to represent each slice. en, we set up an RNN model for each slice: Here, RNN [1] represents the RNN model of the first level, and β [1] k refers to the output of k th slice by RNN [1] . erefore, we can get the result [β [1] 1 , β [1] 2 , . . . , β [1] n/ω ] after the training of RNNs collection of the first level.

Notation
Meaning X � x 1 , x 2 , . . . , x n e raw physiological information record, n is the length of the record. S i � v 1 , v 2 , . . . , v r S i refers to the i th slice after being segmented. S i,j S i,j represents the j th point of i th slice. F i � e 1 , e 2 , . . . , e m e features after feature extractor.
RNN [1] , RNN [2] RNN [1] represents the collection of the RNN model at the first level, RNN [2] represents the collection of the RNN model at the second level. X ∼ P class i X It refers to the data distribution of each class. W 1 , W 2 , and W 3 ey are the weights of a miniclassifier. f 1 , f 2 , . . . , f n ey are the output of the miniclassifier. η , c , and π ey are all hyperparameters in the paper.

Computational Intelligence and Neuroscience
In the next step, we feed the result into the RNN of the second level, and the output is where RNN [2] represents the RNN model of the second level, F refers to the activation function, and y is the extracted feature. It should be noted that RNN [1] or RNN [2] can be any RNN model, such as RNN, LSTM, Bi-LSTM, GRU, and so on.
In the selection of a classifier for MicroNN, we adopt a per-class classification model. e model will establish a separate miniclassifier for each class of the task (see the part of classification in Figure 1). All miniclassifiers are connected with the feature extractor. In addition, to improve the performance of the classifier, we employ a loss function called one-class [24] in the training process: where X ∼ P class i X refers to the data distribution of each class, σ is the activation function, and η, c, and π are all hyperparameters. e first term in the loss function is negative log likelihood. Its purpose is to maximize the score of class i during training. However, if there is no constraint to the negative log likelihood, it will lead to an unlimited increase in the score. erefore, the second term, which is called H-reg, is applied in the loss function. It can reach a balance with the negative log likelihood. e structure of per-class classification is a multilayer perceptron with three layers, as shown in (5).
We can see that the derivation result of H-reg in the training process is related to the weight (W 1 , W 2 and W 3 ). erefore, H-reg can restrict the phenomenon of the unlimited growth of weight, which the negative log likelihood brings.
To make the parameters of classifiers between different classes in the same parameter space, the method uses the parameters from 1 to i − 1 miniclassifiers to initialize the parameters of the i th miniclassifier. Considering the existence of similar features between different classes, deep learning models have difficulty distinguishing classes in the process of training. During the testing stage, a method based on KL-divergence [44] is used to reduce the shared knowledge between classes, as described in the third term of the loss function. Assuming that there are T miniclassifiers in MicroNN, the calculation of shared knowledge among T miniclassifiers is as shown in (6).
where φ i is the mixing ratio with T i�1 φ i � 1, and P i refers to the posterior parameter distribution of the i th miniclassifier. e parameters of the i th miniclassifier are updated by (7).
where τ is a hyperparameter.

Performance Analysis
e experiments are conducted on a computer with a GPU of Intel (R) Core (TM) i9-11900K and 64.00 GB memory. Experiments are done on two different ECG datasets to evaluate the performance of MicroNN. In the experiment, we divide each dataset into training sets, validation sets, and test sets, and their proportions are 6 : 2 : 2, respectively. To better evaluate the performance of the model, we mainly use precision (Pre), recall (Rec), and F1-score (F1) in the paper. eir relationship is as follows:  Computational Intelligence and Neuroscience

Datasets Description.
e details of the two datasets used in the experiment are as follows: (1) MIT-BIH arrhythmia database (MIT-BIH-AR) includes the ECG record of 47 subjects studied by the BIH arrhythmia laboratory, and the sampling rate is 360 Hz. It contains 48 half-hour excerpts of two-channel ambulatory ECG recordings. In the experiment, we use the ECG record based on the MLII lead of MIT-BIH-AR. e full name of MIT-BIH is Massachusetts Institute of Technology, Beth Israel Hospital [42].
(2) St Petersburg INCART 12-lead arrhythmia database (INCART) consists of 75 annotated records from 32 humans, and the sampling rate is 257 Hz. Each record lasts for a half-hour and has the data of 12 standard leads. In the experiment, we use the ECG record based on the II lead of INCART.

Performance of MicroNN.
At first, we compared the performance of MicroNN with existing methods at MIT-BIH-AR and INCART (see Tables 3 and 4). Micro has achieved good performance in ACC and F1. As can be seen from Table 3, the low accuracy of other methods is mainly because of the low F1 of class S. It is because class N and class S have many similar characteristics. e model is prone to recognition errors. However, MicroNN ′s F1 in class S is much higher than other methods, which shows that MicroNN effectively reduces the shared knowledge among classes during training. Similarly, we can see from Table 4 that although the performance of MicroNN in classes N and V is not as good as partial work, MicroNN far exceeds other work in the classification of class S. It is mainly because that MicroNN can effectively solve the problem of the fuzzy boundary. Table 2

reats to Validity.
In the paper, threats to the validity of our proposed method are discussed from two perspectives: external validity and internal validity [14].
(1) reats to internal validity: To prevent the occurrence of overfitting, we divide each dataset into a training set, validation set, and test set. We observed the change in classification accuracy based on different validation sets to check whether the classification model has overfitting. (2) reats to external validity: To verify the generalization of the model, we compared MicroNN on two different datasets. e experimental results show that the performance of MicroNN is better than other models.

An Engineering Application of MicroNN
Deep learning research on healthcare sensor data stream classification has attracted extensive attention [33,54,55]. However, we still face many challenges in the process of development. For example, the current urban medical resources are insufficient compared with the soaring urban population. e treatment efficiency cannot meet the needs of patients in time [4].
In this paper, we deploy MicroNN in edge devices to effectively improve the efficiency of medical treatment. Figure 5 shows an application example of MicroNN based on edge computing. Different healthcare devices have the function of classifying healthcare sensor data streams. e healthcare devices will classify the collected physiological signals of patients. en, the results will be used to assist doctors in judging the condition of patients. Finally, the doctor will inform the patient of the specific situation. erefore, MicroNN plays a certain role in promoting the development of sustainable and smart cities.

Conclusion and Future Work
In this paper, we propose a lightweight neural network model called MicroNN for classifying healthcare sensor data streams. It is composed of a microfeature extractor based on multiple recurrent neural networks (RNNs) and multiple miniclassifiers based on a full connection layer with three layers. At the same time, the method based on KL divergence is used to remove the shared knowledge among different classes to improve the performance of the model. In the experiment, we compared the accuracy, time complexity, and space complexity of the model with other models based on two different ECG datasets. MicroNN shows better performance than other works. In a word, MicroNN is a lightweight and efficient model. We will further improve the accuracy of MicroNN while ensuring the lightweight of the model and extend experiments on other healthcare sensor datasets.

Data Availability
e labeled datasets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
No potential conflicts of interest were reported by the authors.