Personalized Federated Learning for ECG Classification Based on Feature Alignment

Electrocardiogram (ECG) data classification is a hot research area for its application in medical information processing. However, insufficient data, privacy preserve, and local deployment are still challenging difficulties. To address these problems, a novel personalized federated learning method for ECG classification is proposed in this paper. First, a global model is trained with federated learning framework on multiple local data clients. )en, we use the global model and private data to train the local model. To reduce the feature inconsistency between global and private local data and for better fitting the private local data, a novel ”feature alignment” module is devised to guarantee the uniformity, which contains two parts, global alignment and local alignment, respectively. For global alignment, the graphmetric of batch data is used to constrain the dissimilarity between features generated by the global model and local model. For local alignment, triplet loss is adopted to increase discriminative ability for local private data. Comprehensive experiments on our collected dataset are evaluated. )e results show that the proposed method can be better adapted to local data and exhibit superior ability of generalization.


Introduction
Statistics of WHO report that heart disease is the most lethal chronic disease. Nearly 17.7 million people die of cardiovascular disease every year, which accounts for 31% of the total deaths in the world [1]. Electrocardiogram (ECG) is a physiological signal which is widely used in heart health monitoring. It contains many pathological information related to heart activity, and it is an effective way for monitoring and diagnosis of cardiovascular disease [2]. ECG is a long series of data and can be lasted for days, so it is very time and label consuming for monitoring and diagnosis with human experts. erefore, it is necessary to use artificial intelligence technology for automatic cardiovascular disease diagnosis.
Using the advanced machine learning technology especially deep learning, the cardiovascular recognition model can be trained with labeled ECG data, i.e., the method proposed in [3] achieves 0.837 F1 value for 12 kinds of cardiac irregularities. However, most state of the art methods are based on public available training datasets, which are relatively small and with limited varieties. Moreover, they are very difficult to deploy in practical applications.
In order to expand available cardiovascular information and guarantee the privacy, data from multiple medical institutions can be combined as a unified dataset, which can be used to train a superior global model with federated learning framework [4]. Federated learning is a special machine learning model using datasets that are distributed across multiple devices while preventing data leakage. It is also a privacy-preserving decentralized collaborative learning technique [5].
ere have been works that adopted federated learning for medical data processing and model training [6,7].
In this way, a centralized global model can be trained based on data from a large number of local nodes. en, the global model is deployed to the local model for data prediction. However, there is a big challenge for which local models will get different performance by using the same global model. is is because data distributions of local clients vary with the global trained data distribution. erefore, personalization of the global model for each client becomes necessary to overcome the problem posed by heterogeneity of data distribution [8].
Device in terms of storage, computation, and communication capabilities will generate heterogeneous data. For ECG, sampling frequency and duration are different based on parameters setting and environment, which leads to nonuniform of pacing signal, left ventricular high voltage, and other diagnostic patterns. When there is large difference between data distributions of global server and local clients, it is hard to directly measure the feature inconsistency between them. Hence, this makes it difficult to deploy the global ECG classification model to local client with acceptable performance.
In order to solve these problems, a novel personalized federated learning framework for ECG data classification is proposed. First, a global ECG classification model is trained with a typical federated learning method across multiple local clients. en, the global model is inherited to local model and served as the backbone part. During local model personalized training, the inherited part model is fixed. e local model is personalized based on the proposed feature alignment module. To utilize the generalization of the global model, we form the features of batch data into graph representation, so the internal structure between nodes can be preserved. e graph distance between global feature and local feature is used for global alignment constrain. From another point of view, local alignment is used to make the model better adapt to local private data. e metric learning method is adopted, and a triplet loss is designed to make data point of same class close to each other and negative data point far away. Finally, these loss functions are combined together for model training. By the proposed method, the consistency between global and local data can be learned, and the local personalized model is built with better adaptability and generalization. As far as we know, there are no related research studies for problems of personalized ECG classification. e main contributions of this paper are two folds: (i) In order to reduce the difference between global and local data, a novel feature alignment module is designed. With graph constraint and triplet metric, it can make the local model with better adaptability and generalization. (ii) Extensive experimental evaluations are carried out, and performance analyses are reported from multiple aspects. e rest of this paper is organized as follows. Section 2 gives the related works. Section 3 describes the methodology. Experimental evaluation and analysis are given in Section 4. Section 5 concludes this paper.

Related Works
Related works are introduced in this section, including ECG data classification method, federated learning framework, and personalization in federated learning.

ECG Classification Method.
With well-designed feature representation, ECG classification can be realized by models based on Bayes, K-means, Decision tree, and Linear Discriminate classifiers [9][10][11], along with commonly used optimization techniques [12,13]. Features like cycle and higher order of QRS wave were extracted in reference [14]. en, a fuzzy neural network was trained as classifier. Wavelet transform was used for feature extraction in reference [15]. Reference [16] adopted support vector machine for ECG classification.
A convolutional neural network method suitable for multilead ECG data was proposed in reference [17]. e stacked denosing autoencoder recognition model was used in reference [18] for ECG data. In reference [19], deep belief networks were used to construct the model for arrhythmia diagnosis.
e convolutional neural networks model was used in reference [20] for heart beat classification. In reference [21], the deep factor decomposition method was used to decrease the influence of complex noise on ECG signal. Deep autoencoder was first used for singal denosing and reconstruction, and then fully convolutional network was trained as classifier. In reference [22], nonnegative matrix factorization was used for data dimension reduction and feature was extracted with sparse representation. Feature representation with multiple scales was proposed in reference [23], and progressive decisions were fused for final classification.

Federated
Learning. In order to utilize massive distributed data storage and keep the privacy, federated learning is a useful framework to provide efficient training from data island and make model collaboration [4]. In reference [24], local models are trained at each local node, and only the updated parameters set is shared for global training. A multitask-based federated learning method was proposed in reference [25], which can solve the problem of high communication cost and fault tolerance. In reference [26], a safe client-server was first constructed. Data were allocated according to different local users. A homomorphic encryption method was designed for model parameter aggregation so as to improve server security [27]. A differential privacy method for federated learning was introduced in reference [28]. It provides protection for client data by hiding the customer's contribution during training. A ternary quantization method was proposed in reference [29], which optimizes the quantized networks and reduces lots of redundant parameters and excessive communication costs. To address problems of unlabeled and unannotated ondevice data, reference [30] used a deep temporal neural network to train an auxiliary task by optimizing a contrastive objective with multiview strategy on diverse data sets.

Personalization in Federated
Learning. Federated learning can be used to train a global model by utilizing distributed local data. However, for different local clients, the benefits they get from global model may vary greatly for various data distributions. To cope with non-IID data distributions of clients, personalized federated learning has been proposed to improve performance of each local client by training a personalized local model. Reference [31] reported that local models' performances were hard to improve during federated learning, which may even worse than training only using local data. erefore, it is important for model personalization according to specific local node. In reference [32], local nodes were first clustered, and each group models were training separately. In reference [33], part parameters of the global model were copied to the local model, and then local model was finetuned using local data. e metalearning method was adopted in reference [34], which treated model personalization as a meta testing procedure. In reference [35], there was a balance between global and local models. Global and local models were combined for final classification. In reference [36], the local model and global model were considered as two experts, and the personalized model was trained by mixed output of the personalized model and global model. Similar work was studied in reference [37]. Reference [38] introduced an attentive message passing mechanism to facilitate the collaboration effectiveness between clients.

Methodology
In this section, the proposed personalized federated learning for ECG classification based on feature alignment is described. Figure 1 gives the main framework of the proposed method. First, a global model M G ′ is constructed with a typical federated learning framework from multiple local clients. en, for each client c i , we train a personalized local model with private dataset. e local model contains three parts, including M G , M C , and M f . M G is inherited from the global model M G ′ . M C is a convolution neural networks module for local feature extraction and representation. M f is a fully connected layer or softmax layer to form the final classifier. During local model training, M G is fixed to maintain the generalization ability of the global model. Specially, two alignment modules, global alignment and local alignment, are designed to constrain the feature distribution between local model and global model, which are realized by constraints of graph representation of feature generated by M G and M C , along with metric learning for intraclass and interclass losses. Finally, global alignment loss L a , local alignment loss L t , and cross-entropy loss L c are all incorporated to optimize the objective function. Details are described in the following subsections.

Global Model Training.
In order to make the best use of distributed data and privacy guarantee, federated learning is a popular way for model training. In the first step, the global model M G ′ is trained with a most widely used federated learning framework FedAvg [39], which is synchronous update for each communication round. Figure 2 demonstrates the framework for global model training in our work. ere are n clients, and each contains a local dataset P i and a local model M i . Initially, the global server sends the model parameters to all clients. en, each client c i performs local model training based on local data and then sends parameter update w i back to the server. e server collects all local updates, and the global model parameters are optimized, and this process is repeated for multiple times. For efficiency, the global model can be updated when part of local updates is collected.
For client c i , its model parameter w i is trained with local data (x k , y k ) ∈ P i . Equation (1) gives the loss function lc i , which is the mean value of loss l () for all training data. n i is the total number of data in P i . en, equation (2) gives the minimize objective by adjusting w i with SGD and BP method.
When clients finish their training, the server updates the global model parameter w G by averaging all client model's parameters w i . As shown in equations (3) and (4), N is the number of clients and t i is the weight threshold for each model.
w G is distributed to all clients after one iteration, and each client uses s G as the base model to make further training using equations (1) and (2). After multiple iterations, the global model M G ′ can be trained with optimal performance. Moreover, other federated learning frameworks can also be adopted for global model training.

Local Model Training.
e global model M G ′ is trained with federated learning over distributed data in the last subsection. However, M G ′ cannot be directly deployed to local clients for local data inference when there is large difference between data distributions of global server and local clients. In this subsection, a personalized model adaptation method is designed based on the global trained model and private data of a specific client. e local model For better fitting the local private data, a special constraint strategy "feature alignment" is devised to guarantee the uniformity between global and local models. e alignment module is further divided into two parts, global alignment and local alignment, which are described as follows.

Global Alignment.
A global alignment module is first designed to constrain the local model training by constructing the structure between feature nodes. Different from other methods, the batch training data are used to from a graph structure, which can represent the relationship between data node. In this way, effect of single data feature shift can be reduced and relation between data nodes is retained. As shown in Figure 3, for ith training batch data samples (X i , Y i ) in a private dataset, its global feature f g (X i ) and local feature f p (X i ) are extracted through models M G and M C . en, the batch training data are treated as the basic group for global alignment operation. e features of samples are then used to construct graph representation using f g (X i ) and f p (X i ). Suppose there are n samples in a training batch, then the graph representation contains n nodes and n 2 edges. e nodes of graph are represented with the features of samples in a training batch, and the edges of graph are represented by the distances between nodes. gm(X i ) and gp(X i ) are used to denote the graph representation of ith batch data by global and local models, respectively.
In order to measure the similarity between two batch data, matrix format is used to represent the graph structure representation. In this paper, only edges between nodes are incorporated for representation. It is hypothesis that the internal skeleton between nodes of a graph is more important and should be learned from the global model. If the node feature is used, then there will be strong probability that the local model is more like the global model. e white and yellow matrices in Figure 3 are used to denote gm(X i ) and gp(X i ), which are formulated as equations (5) and (6). e represents the edge between two nodes. x i j and x i k are two training samples in X i .
Equation (7) gives the distance metric for two graph representations of a given batch data X i . d() means a distance computation method, and |X i | denotes the size of batch data. Basically, all edges of two feature graph representations are compared.

Local Alignment.
For model personalization with local private data, a local alignment module is also designed, which aims to increase the classification performance for local private data. As shown in Figure 4, for a sample training batch data (X i , Y i ) in a private dataset, its local feature f p (X i ) is extracted through models M C . en, the batch training data are used as the basic group for local alignment operation. Using metric learning method, we try to decrease distance between features from the same class; otherwise, the distance is increased. A training sample is randomly selected from f p (X i ), which is denoted as f p (x a i ). It is called anchor sample. en, another two samples are selected. One sample has the same label with anchor, and the other sample has different label. ese three samples constitute a triplet denotes the positive sample and f p (x n i ) denotes the negative sample. rough training, it is expected to decrease the distance between f p (x a i ) and f p (x p i ) and increase the distance between f p (x a i ) and f p (x n i ). e constraint is shown in the following equation: where α is the threshold for minimal distance and Γ is the triplet set.

Training Objective Function.
e final loss function of our proposed model contains three parts, global alignment loss L a , local alignment loss L t , and cross-entropy loss L c , respectively.
Global alignment loss L a is given in equation (9), which is inherited from equation (7). M means the number of batch data in all training dataset and i indicates the index of batch data.
Local alignment loss L t is given in equation (10), which is on the basis of equation (8). "+" means that function value is 0 when content in [ ] is smaller than 0; otherwise, it is the normal loss function value. Γ i is the triplet set for ith batch data.
Cross-entropy loss L c is given in equation (11). CE() is the standard cross-entropy function. M and i are the same as equation (8). Y(X i ) is the ground truth label of batch data X i , and Y ′ (X i ) is the output value of the local model.
e final loss function L is a combination of L a , L t , and L c , as shown in equation (12). w a , w t , and w c are weighted hyperparameters. L is used to update the parameter of M C and M f modules in the local personalized model:

Experimental Evaluation
In this section, dataset description and experiment setting are first given. en, the performance of the proposed method for personalized ECG classification is evaluated in terms of various environments.

Dataset and Experiment Setting.
As there are no research studies on personalized federated learning for ECG classification, we construct a specific circumstance to evaluate the proposed algorithm. We collect about 120,000 ECG data from 8 hospitals. Table 1 gives the detail description of dataset. Six symptoms such as sinus rhythm, sinus arrhythmia, sinus tachycardia, sinus bradycardia, T-wave Security and Communication Networks alternans, and normal are selected, which are the commonest types and data rich. ere are 20201, 18581, 13682, 15854, 14211, and 38524 for types of sinus rhythm, sinus arrhythmia, sinus tachycardia, sinus bradycardia, T-wave alternans, and normal, respectively. Data distributions of each medical institution are also listed in Table 2.
In our research, each medical institution is corresponded to a local node, and these local nodes provide private data for global training. Hence, the federated learning circumstance is set up.

Base Model Training.
e server (global model) employs the FedAvg to train the model globally whereas each local client updates its model locally after successive global aggregations using the SGD style algorithm. e CNN model with ResNet-34 is used as the base network structure for both global model and local model.
Each experiment is run for 100 global aggregations, with e � 4 epochs for SGD between successive global aggregations. e constant learning rate of 0.01 is used across global aggregations and clients. Table 3 gives the classification result of each local node after federated learning. Average classification rate is used as the metric in our work. ere are two node types, local and global.
e global model has a classification rate of 82.6%. For local nodes, there are two evaluations. Local nodes 1 to 8 obtain the classification rate of 89.48%, 88.76%, 90.25%, 91.54%, 88.31%, 89.57%, 90.18%, and 90.54%, respectively, on their corresponding local private data set. Meanwhile, local nodes 1 to 8 obtain the classification rate of 54.65%, 57.18%, 49.75%, 51.43%, 48.92%, 46.35%, 52.45%, and 50.20%, respectively, on global testing data set. It can be obviously seen that the performance of local models is better than that of global model on private data. Local models get low performance on global data data, with about 30% lower than the global model. ese indicate that local models are more preferable to local data, while the global model trained with traditional federated learning framework needs further improvement. erefore, it is urgent requirement to make model personalization.

Model Personalization Evaluation.
In this subsection, the proposed personalization model-based feature alignment is evaluated. e global model M G ′ obtained in the above subsection is first downloaded to each local node, and then the personalized model for each local node is trained on the basis of local data set D p and M G ′ . Complying with Section 2, w a , w t , and w c are set with 0.3, 0.3, and 0.4, respectively. Batch size is set with 16, and learning rate is 0.001. Table 4 gives the result of model personalization. Column 1 is the node index. Column 2 and 3 are average performance on local data with models of training with only local data and training with personalization. Column 4 and 5 are average performance on global data with models of training with only local data and training with personalization. It can be seen that the performance of model with personalization is decreased with about 3%. is demonstrates that the personalization model is less deviated to distribution of local data. For global testing data, the performance of the model with personalization is greatly increased with about 15-18%. It is a good validation that the personalization model is more generalized.

Comparisons with Other Methods.
In this subsection, some related model personalization methods are compared with our proposed model. We implement algorithms of [31,34,35], and the average classification rate on local and global testing data is evaluated. Table 5 demonstrates the comparison result. Methods in [33][34][35] get an average performance of 84.41%, 82.70%, and 83.55% on local node testing data and 79.80%, 78.68%, and 76.45% on global testing data. ere are about 4% and 6% compared with our proposed method, and this validates the effectiveness of the proposed personalization framework and feature alignment module.

Effect of Global Alignment and Local Alignment.
In this subsection, effect of global alignment and local alignment is evaluated. Global alignment and local alignment are two  novel operations proposed in our work, which aims to catch the generalization ability of the global model and extract the discrimination ability of local private data. Here, the effect of global alignment and local alignment by assigning them different weights is evaluated. Table 6 demonstrates the comparison result. Five parameter settings for w a , w t , and w c are adopted. It can be seen from the   Sinus rhythm  2924  1940  2308  1483  3610  3103  3645  1188  Sinus arrhythmia  2451  1720  1893  1264  3171  2728  3513  1841  Sinus tachycardia  1913  1283  1407  1308  2262  2178  2619  712  Sinus bradycardia  2046  1523  1660  1418  2898  2665  2857  787  T-wave alternans  1850  1295  1477  1227  2418  2260  2767  917  Normal  5533  3866  3920  3653  6652  6196 7473 1231

Conclusions
is work proposes a novel personalized federated learning method for ECG classification. We explore feature alignment for personalization strategies on both global and local sides.
rough experiments on our collected dataset, it shows that personalization benefits the local model with high performance and more generalization. To our knowledge, this is the first evaluation of personalization federated learning for ECG data analysis.
Our future works will focus on two aspects: (1) we will make more in-depth research on the personalization method with specific structure and (2) external dataset should be used to improve model performance, such as webly grabbed data [40].

Data Availability
Dataset used in this research is private.

Conflicts of Interest
e authors declare that there are no conflicts of interest.