Enhancing Fairness in Federated Learning: A Contribution-Based Differentiated Model Approach

Federated learning (FL) has emerged as a promising framework for collaborative machine learning, allowing the training of machine learning models on distributed devices without centralizing sensitive data. However, FL falls short in terms of fairness, as each client receives the same model regardless of their individual contributions. Tis unfairness discourages active client participation in FL. To address this challenge, we propose a contribution-based diferentiated global model mechanism. Spe-cifcally, we introduce the contribution score as a metric to assess client contributions in FL and utilize deep Q-networks (DQN) to dynamically update the contribution scores. Subsequently, we allocate clients to diferent clusters based on their contributions by using a clustering algorithm, where each cluster is associated with a distinct global model. Tis mechanism encourages clients to make greater contributions for improved global models. Experimental results confrm the efectiveness of our approach in enhancing fairness in FL.


Introduction
Federated learning (FL) [1] has gained considerable attention due to its efective approach in addressing the challenges associated with training models using distributed data.In FL, each client collaboratively trains a global model while ensuring the privacy of their local data.Specifcally, a central server distributes the global model parameters to individual clients, who utilize their respective data for training local models.Once the training process is completed, the local model parameters are transmitted back to the server, instead of the raw data.Subsequently, the server aggregates the received local models to generate an updated global model.Te aforementioned process enables FL to efectively safeguard the security of user data.
Research in the feld of FL has primarily focused on protecting user data privacy [2] and reducing communication costs while overlooking the issue of fairness.In FL, clients with varying levels of contribution ultimately obtain the same global model [3].Such inequity in the distribution of benefts can lead to dissatisfaction among high-contributing clients, subsequently impacting their motivation to actively participate in FL and ultimately impeding the sustainable development of FL.To address this problem, it is necessary to establish a reasonable criterion for evaluating the contribution of each client.Furthermore, the development of a fair allocation mechanism that satisfes all stakeholders is crucial [4].
Several methods have been proposed for fairness research in FL.In [5], the authors proposed a smart contractbased data-model provenance registry to enable accountability and a weighted fair data sampler algorithm to enhance fairness in FL.In [6], an algorithm was presented to achieve more fairness and accuracy in horizontal federated learning (FedFa).Tey proposed an appropriate weight selection algorithm that combines the information quantity of training accuracy and training frequency to measure the weights, thereby assisting clients in aggregating at the server with a more equitable weighting.Another approach, presented in [7], is the pairwise correlated agreement method, which utilizes the idea of peer prediction to evaluate user contributions in FL without requiring a test dataset.Tis approach was used to design a new incentive mechanism that ensures truthfulness in FL.Te work in [8] proposed the federated learning incentivizer (FLI) payof-sharing scheme.Te scheme dynamically divides a given budget in a contextaware manner among data owners in a federation by jointly maximizing the collective utility while minimizing the inequality among the data owners.Additionally, the authors proposed the primal-dual greedy auction mechanism in [9], which utilizes auction theory to evaluate the resource conditions of clients and provide corresponding incentives.
Previous studies have employed trust models [10] or incentive mechanisms to ensure fairness in FL.Te server provides corresponding rewards to clients based on their reputation or contributions.However, it becomes challenging for the server to assess the real-time contributions made by each client to the FL training process [11].
Reinforcement learning (RL) [12] is a prominent feld in machine learning that focuses on determining optimal actions for agents based on their interaction with the environment, aiming to maximize expected rewards.Alongside supervised and unsupervised learning, RL forms one of the fundamental branches of machine learning and has witnessed continuous advancements over the past few decades.However, traditional RL methods often encounter difculties when confronted with complex real-world problems.Deep reinforcement learning (DRL) [13] has emerged as a promising approach that combines the powerful perception and comprehension capabilities of deep learning with the decision-making abilities of RL.Tis integration allows DRL to tackle complex real-world problems efectively, making RL more practical across diverse domains.Moreover, DRL has been successfully applied in the FL domain to address various challenges, including reducing training completion time under resource constraints [14], improving model aggregation rate, and minimizing communication costs [15], as well as maximizing the social welfare of the FL services market [16].
In this article, we propose an evaluation metric called the contribution score to quantify the contributions made by clients to FL.As the FL training iterations progress, the contributions of clients dynamically change, necessitating an adaptive mechanism to account for these variations.Tis is where deep Q-network (DQN) [13] comes into play, as it efectively addresses problems in discrete action spaces by leveraging deep neural networks to learn the value function that maps states and actions to expected rewards.By employing techniques such as random sampling from a replay bufer and utilizing a fxed target network to reduce the instability of Q-value estimations, DQN provides a robust and efcient solution for dynamically updating the contribution scores.
In this work, we propose a contribution-based diferentiated global model mechanism to ensure fairness in FL.We use the contribution scores to measure the contributions of clients to FL. Ten, based on these scores, we employ clustering algorithms to divide clients into diferent clusters, with each cluster corresponding to a specifc global model.After the completion of FL iterations, clients receive global models of diferent performance levels based on their assigned clusters.Tis mechanism allows clients who make greater contributions to obtain higher-performing global models, thus incentivizing their active participation in FL.Furthermore, DQN dynamically updates the contribution score of each client, ensuring that client grouping is not static.Te dynamic updating of contribution scores enables us to adapt to changes in client behavior over time and ensure a fair assessment of client contributions.Te main contributions of this work are listed as follows: Tis article is structured as follows.In Section 2, we provide an overview of the existing literature on FL fairness.Te system model is presented in Section 3. Subsequently, we explain our proposed contribution scores updating method based on DQN in Section 4. Following that, the performance results of our proposed model are analyzed in Section 5. Finally, we present our concluding remarks in Section 6.

Related Work
Tere are recent research activities related to fairness in FL.Te signifcant and the most relevant publications are presented in three categories, contribution and reputation as metrics, contribution as an incentive in the FL, and contribution update methods.

Contribution and Reputation as Metrics.
Various techniques and methods have been proposed to address fairness issues in FL.Tese approaches utilized contribution or reputation as metrics to assess client performance.In [17], the authors designed a blockchain-based FL system where reputation validators on the blockchain employed the multi-KRUM algorithm to compute reputation and evaluate dissatisfactory updates.Validators will increase the reputation of a client upon acceptance of its update, while the reputation is diminished in case of rejection.In [18], the authors proposed a blockchain-based reliable FL reputation system to select trustworthy workers.Te reputation of workers was calculated by using a multiweight subjective logic model based on their historical performance and recommendations from other workers.However, there is a potential for workers to receive malicious ratings or unfair evaluations.In [19], the authors proposed an FL incentive 2 International Journal of Intelligent Systems mechanism based on reputation and reverse auction theory, where each client bids for the opportunity to participate in FL.Teir reputation refects their reliability and data quality.However, this method is only applicable to horizontal FL and may result in high model complexity.
Tere are other approaches that utilize contribution to ensure fairness in FL.Tese methods assess the contributions of clients by considering multiple factors.In [20], the authors proposed a blockchain-based FL framework and a protocol to transparently evaluate the contribution of each participant.Tis framework protects privacy for all parties in the model-building phase and transparently evaluates contributions based on the model updates.In [21], the authors proposed the guided truncation gradient Shapley (GTG-Shapley) approach.It reconstructs FL models from gradient updates for calculating Shapley value instead of repeatedly training with diferent combinations of FL participants.In [22], the authors proposed a lightweight multidimensional contribution method based on progressive computation.Tis approach evaluates the contribution of clients by using the performance gain as an indicator, which refects the degree to which the client model improves the global model in each round.Tis approach efectively reduces the trafc and computational overhead.However, it requires that clients make positive contributions to the fnal global model performance in each iteration.

Contribution as an Incentive in FL.
Contribution scores are widely used in FL to evaluate the contribution of clients to FL training.An efective contribution mechanism can not only optimize the utility of the FL system but also motivate high-quality participants to join the FL.Several incentive mechanisms have been proposed to integrate fairness into FL.In [23], the authors presented a system called federated learning with quality awareness (FAIR).Tis system utilizes historical learning records to estimate the learning quality of users, considering the freshness of the records.Furthermore, FAIR employs a reverse auction as an incentive mechanism to incentivize the participation of high-quality learning users.In [24], the authors proposed a decentralized fair and privacy-preserving deep learning (FPPDL) framework.Tis framework incentivizes participants by rewarding them with points based on the amount of gradient information they upload.Participants can subsequently utilize these points to acquire gradient information from other participants.Another incentive mechanism called FMore was proposed in [25], which employs a multidimensional procurement auction of K winners to select participants based on their scores.Te scores are determined by the bidding amount and the quality of the model update.Tis lightweight mechanism encourages high-quality edge nodes to participate in training and ultimately improve the performance of FL.In [26], the authors proposed an incentive mechanism called fairness-aware incentive mechanism for federated learning (FedFAIM), which ensures reward fairness by utilizing an efective Shapley value-based contribution assessment method and a novel reward allocation method based on reputation and distribution of local and global gradients.It is worth noting that these approaches primarily focus on rewarding high-contributing users without adequately considering penalties for malicious users and free-riders.Additionally, participants may provide false gradient information to obtain better incentives in certain situations, which will undermine the fairness and efectiveness of the FL system.

Contribution Update
Methods.In the literature, there are various technologies proposed to measure and update the contribution scores of clients in FL.In [27], the authors introduced the concept of the contribution index, which quantifes the contribution of data providers by considering local datasets and machine learning algorithms.In [28], the authors proposed a new measure called completed federated Shapley value, which ofers a fair evaluation of data quality.However, it is only applicable to horizontal FL.
Te dynamic updating of contribution scores is essential to accommodate the changing environment of FL.In [29], the authors proposed a DRL-based reputation mechanism for optimal selection and evaluation of reliable FL clients.Te reputation of each client is calculated by using the subjective logic model.Clients whose reputation exceeds the reputation threshold are granted the opportunity to participate in FL training.Te reputation threshold is dynamically updated through DRL.In [30], the authors proposed a DRL-based incentive mechanism that can automatically learn the optimal pricing strategy in a dynamic network environment.By actively interacting with the environment, the DRL algorithm improves its strategy based on accumulated experience, ultimately approaching an optimal solution.In [31], the authors modeled the total reward of the server and the total revenue of the edge nodes as a Stackelberg game, solved the Nash equilibrium to obtain the optimal solution, and used the DRL algorithm to dynamically adjust the incentive strategy to optimize the profts of all parties.However, this approach relies on the assumption of independent and identically distributed data among edge nodes, which is unrealistic in real FL environments.

System Model and Problem Formulation
To address the challenges in FL fairness, we design a framework based on DQN for optimal contribution evaluation and fair reward distribution in FL, as illustrated in Figure 1.Based on the contributions of clients, the server employs clustering algorithms to categorize the clients into three clusters.Each client within a cluster downloads the corresponding global model from the parameter server and utilizes its local dataset to train the model.Once the predetermined number of training rounds is reached, the trained local model parameters are uploaded to the server for aggregation.We utilize contribution scores to quantify the contributions of clients to FL and employ the DQN algorithm to automatically adjust contribution scores.Te adaptive updating of contribution scores ensures that high-International Journal of Intelligent Systems contributing clients receive greater scores, thereby enhancing the fairness of FL.Te following sections will discuss our proposed DQN-based fair FL solution.

Federated Learning Model.
In our FL model, we divide all clients into three clusters.Te clients within each cluster synchronously run the FL algorithm, resulting in three separate FL processes.We denote K as the total number of clients.Te objective of FL is to optimize the global loss function f(w) by minimizing the weighted average of each client's local loss function f(w k ), where the contribution score cs k of each client serves as the weight for the aggregation process.
where CS denotes the aggregate contribution score from K clients, which can be calculated as CS �  K k�1 cs k , and w k represents the local model parameters of the k-th client.Server divides the clients into three clusters using k-means++ Various optimization algorithms can be used to train an FL model.In this study, we adopt the stochastic gradient descent (SGD) algorithm as our FL training algorithm.Te SGD algorithm iteratively selects a batch of training examples to compute the gradients with respect to the current model parameters w and updates the parameters in the direction that minimizes the loss function f(w) [1].Hence, the objective is to discover the optimal model parameters w * that minimize f(w).w * � argmin f(w). ( At the beginning of each FL iteration t, the server selects a subset of clients to participate in the task and sends the current global model parameters w t to the selected clients.Te selection of clients can be random or based on specifc requirements set by the server, such as data size or computational resources [32].Te selected clients download the global model parameters w t and use their local data samples to calculate local model updates denoted by w k t .To achieve this, each client performs multiple iterations of SGD on its local data samples to compute the gradient g k .Te local update w k t of each client is computed as follows: where η k is the learning rate used to adjust the impact of the gradient on the local model parameters. After receiving local model updates from the selected clients, the server aggregates the updates to obtain a new global model w t+1 .Te aggregation process can be performed using various techniques such as federated averaging (FedAvg) [1] or FedProx [33].Te updated global model w t+1 is calculated as follows: In traditional FL, the clients and server repeat the above process in subsequent training iterations until the global model achieves a specifc accuracy or a predetermined number of iterations set by the server.Clients with a larger amount of data typically exhibit higher local model accuracy, which can aid in the convergence of the local model w k in (3) and the global model w in (4) towards the target value with fewer iterations.Tese high-contributing clients play a crucial role in enhancing the convergence speed and overall performance of the FL process, demonstrating the importance of their active involvement in FL.

Problem Formulation.
Federated learning is a decentralized machine learning approach in which multiple parties collaborate to train a shared machine learning model using their respective local datasets without sharing raw data.However, the decentralized nature also brings about the threat of free-riders [34].
In this work, we consider the varying contributions of individual clients to the global model training process by introducing a metric known as the contribution score.Te contribution score allows us to quantify the level of contribution provided by each client.Utilizing this metric, the clients are categorized into three clusters: highcontributing clients, ordinary clients, and free-riders.High-contributing clients typically represent companies or organizations with abundant data and computing resources, which play a crucial role in contributing to the global model.Ordinary clients possess a moderate amount of data and may also make valuable contributions.In contrast, free-riders refer to clients who consume network bandwidth and computing resources without actively contributing their own data.Te global model w t is updated as follows: where cs k i represents the contribution score of the k-th client in the i-th cluster and i can take the values of 1, 2, or 3. CS is the total contribution score from all K clients and is defned as CS �  K k�1 cs k 1 +cs k 2 +cs k 3 .From (5), we can see that the global model of federated learning is an aggregation of local models from three types of clients.Te formula assigns diferent weights to each local model based on contribution scores.Tis is where the problem arises.If we give higher weights to free-riders, then their local models will have a greater impact on the global model, but their data volume, data quality, and data diversity are all very low, which will reduce the performance and accuracy of the global model.At the same time, this will also make high-contributing clients feel unfair, because they have invested more resources and computing power, but have received less rewards.If we give too high weights to highcontributing clients, then their local models will dominate the global model, but their data features may difer greatly from other clients, which will cause the global model to overft the data of high-contributing clients and ignore the data of other clients.Tis will reduce the generalization ability of the global model, which is its performance on unknown or new data.
To address the issue of free-riders, efective incentive mechanisms should be integrated into the FL framework to promote active client participation and data contribution.Tese measures may include reward systems or reputation evaluation mechanisms.A comprehensive FL system should consider the participation of all types of clients, incentivizing high-contributing clients while also penalizing free-riders who do not contribute.

Clustering Based on Contribution Scores.
In order to address the fairness issue in FL, a diferentiated global model approach is proposed in this paper, which allows for different clients to obtain diferent models.Specifcally, we use a contribution score metric to measure the contribution made by each client to the global model training process.Based on the contribution scores, clients are divided into three clusters: high-contributing clients, ordinary clients, and free-riders.FL algorithm is run separately for each cluster, and clients are assigned diferent models based on their cluster membership.Tis diferential model approach International Journal of Intelligent Systems is designed to balance the contributions of clients and incentivize more participation, thereby improving the fairness and efectiveness of FL.
Contributed scores are stored on the server instead of being maintained by the clients.Tis prevents malicious clients from tampering with their scores.FL applies a homomorphic encryption algorithm to protect the security of the models [35].Homomorphic encryption is a special kind of encryption technique that can perform various arithmetic operations on ciphertexts without decrypting them.Moreover, the clients in FL only transmit model parameters or gradients instead of raw data.Tese measures efectively protect the privacy and security of data in FL.
To efectively classify clients into three clusters, we adopt the k-means++ algorithm [36].Te algorithm starts by selecting the frst cluster center cc1 uniformly at random from the data set.To select subsequent cluster centers cc i , the algorithm computes the squared distance D(x) 2 from each data point x to the closest cluster center that has already been chosen and chooses a new center with probability proportional to this squared distance.Tis ensures that new centers are chosen far from existing centers, which improves the quality of clustering.Te algorithm then assigns each data point to the nearest cluster center and computes the new cluster centers.Tese steps are repeated until convergence.Te calculation method for the squared distance D(x) 2 is as follows [36]: where CC represents the set of currently selected cluster centers and |x−cc| represents the Euclidean distance between sample point x and the closest cluster center cc.Te probability function of x i selected as the cluster center is computed as follows: where x i is a point in the dataset X. Te numerator in the formula is to increase the probability of selecting a point that is far away from the existing cluster centers as the next cluster center.Tis ensures that the new centers are wellspaced and helps to improve the clustering quality.Meanwhile, the denominator serves to normalize the probability so that the sum of the probability of all points being selected is equal to 1.By doing so, the algorithm can ensure that the probability of selecting any given point is proportional to its squared distance from the nearest cluster center that has already been selected.
In this work, we divided all clients into three clusters based on their contribution scores by setting k to 3 in k-means++.Te entire process of the k-means++ algorithm is summarized in Algorithm 1. Tis division is not a one-time process.After each update of the contribution scores, we need to use the k-means++ algorithm to recluster.Terefore, to ensure the fairness of FL and make the clustering of the k-means++ algorithm reasonable, an optimal contribution scores updating strategy needs to be adopted.In the proposed scheme, we use DQN to dynamically update the contribution scores of each client, ensuring the fairness of FL.

DQN-Based Contribution Scores Update Strategy
In this article, we adopt contribution scores to evaluate and acknowledge the individual contributions of clients in the context of FL.However, devising a fair and efective scheme for computing these scores poses a signifcant challenge.It necessitates a delicate balance between appropriately rewarding each client's eforts and deterring free-riders seeking personal benefts.Assigning a client with a score higher than their actual contribution might foster discontent among other clients and exacerbate the free-rider problem.Conversely, if a client's contribution is inaccurately refected in their scores, it may dampen their enthusiasm and diminish their engagement in the FL process.Hence, it is imperative to derive contribution scores in a judicious manner that ensures fairness.By doing so, we can establish an equitable FL system that motivates active participation and fosters collaborative progress.
Determining the optimal contribution scores presents a formidable challenge for the server in the FL.Te dynamic and uncertain nature of client behavior within the FL environment introduces complexities that directly impact the assignment of contribution scores.Te stochasticity of client behavior over time and the potential for clients to withdraw from the FL task due to various environmental factors further exacerbate the challenge.Tese factors may encompass unreliable network connections, device mobility, or device energy limitations.Terefore, to address this issue and achieve the optimal contribution scores selection, we propose a contribution scores update mechanism leveraging DQN in this work.

Reinforcement Learning for Contribution Scores Update.
In our proposed scheme, the server in FL acts as an agent for DQN to determine the optimal contribution scores update policy by interacting with the FL environment.Te server can receive model parameters sent to it by each client.Based on these parameters, the server can estimate the quality of the model submitted by the client.Moreover, the server also knows the previous contribution score of each client and which cluster the client belongs to.Terefore, the server can estimate the current state based on the previous contribution scores and the quality of the received model.Based on the optimal policy, the server can choose the action, i.e., the contribution scores update the policy to maximize the reward.We defne the components of our proposed model as follows: (1) Agent represents the parameter server of the FL system.(2) Environment represents the FL system that is divided into three clusters based on the contribution scores. 6 International Journal of Intelligent Systems (3) State Space s t of the server is defned using contribution scores, model accuracy, and clustering results as follows: where cs k is the contribution score of the k-th client calculated by the server, and cs k ∈ [1,10].acc k is the accuracy of the model submitted by the k-th client at each global iteration t.C is the clustering result by using k-means++ based on the contribution scores.Tese state metrics provide the basis for determining the contribution scores update strategy in our scheme.Hence, the state space of the server can be defned as follows: (4) Action Space is denoted by a t .At each global iteration t, the server selects an action a k t for client k.Te contribution scores update policy is denoted by where a k is defned as a discrete variable taking values in the set −1, 0, 1 { }.Specifcally, it represents the actions of decreasing, maintaining, and increasing the contribution score cs k for client k.
(5) Reward r t acts as a reinforcement signal that indicates the efectiveness of its chosen actions, guiding the learning and optimization process.In order to measure the contributions of diferent parties in FL, we employ a method based on deletion diagnostics [37].We assume a test dataset is available at the server to evaluate the learned model after global model aggregation.Te deletion approach entails reaggregating the model each time a local model from a cluster is omitted and measuring the change in model accuracy.Specifcally, when evaluating the impact of the k-th client in the cluster C i in FL, the infuence measure can be formulated as follows: where acc i is the global model accuracy of cluster C i and acc −k i is the accuracy of the global model when the local model of the k-th client is omitted.Ten, it is necessary to perform min-max normalization on Influence − k within each cluster separately.Reward r k t of agent k at current iteration t is denoted by Te reward r t can be defned as follows: Te server, i.e., the agent perceives a state s t based on feedback from the FL system and selects an action a t , represents contribution scores update policy.Te action a t is determined by the optimal contribution scores update policy π given the current state.Te server executes the chosen action a t and receives the reward r t from the FL environment.Te environment follows a Markov decision process (MDP) and transitions to a new state s t+1 following MDP.Te objective of the agent is to maximize the expected longterm discounted rewards by determining the optimal policy.Te optimal policy can be obtained using the Q-value, which can be updated by using the following expression: where α is the learning rate, r t denotes the immediate reward received by the agent for selecting action a t in state s t at time t, c is the discount factor that determines the signifcance of  5) end for (6) Use the selected k cluster centers as initial cluster centers, i.e., CC � cc 1 , cc 2 , . . ., cc k  .(7) while Te cluster centers no longer change do (8) For each sample x i ∈ Con, calculate its distance to each cluster center dist(x i , cc j ), where cc j ∈ CC. (9) Assign each sample x i to the cluster C j of nearest cluster center cc j .(10) for j � 1, 2, . . ., k do (11) Calculate the mean mean j of all samples in cluster C j .(12) update the cluster center cc j � mean j (13) end for (14) end while (15)  learning and is used to update the Q-value function following each action taken by the agent at each time step.Te Q-learning algorithm with TD learning is a model-free reinforcement learning algorithm that employs this update rule to learn the optimal policy for the agent over time.
Te optimal policy can be expressed as follows: In the MDP, the efectiveness of the Q-learning algorithm relies on a thorough exploration of states and actions, facilitating the convergence of Q-values towards the optimal Q-value, denoted as Q * .Te optimal Q-value Q * is then utilized to derive the optimal policy.However, in large-scale MDPs, the size of the Q-table becomes prohibitively large, making exploration challenging.Due to the enormous number of state-action combinations, fully populating and updating the Q-table requires a signifcant amount of storage space and computational resources.To avoid the enormous overhead of maintaining a large Q-table, alternative methods are needed to approximate the Q-values.One popular approach is to utilize function approximators, such as neural networks, to estimate the Q-values.

Deep Reinforcement Learning.
In the DQN algorithm, the Q-value function is approximated using a deep neural network.Tis network takes the state as input and outputs a Q-value for each possible action.During the training process, instead of directly updating a Q-table, the weights of the network are updated to minimize the discrepancy between the predicted Q-values and the target Q-values.DQN leverages state-action pairs to maximize the cumulative discounted rewards.Te neural network employs weights, denoted as θ, to establish relationships between inputs and outputs.Te objective of DQN is to minimize the loss by fnding the optimal weights along the gradients.Initially, the weight of the DQN coefcients is randomly initialized.Over time, the DQN iteratively updates its weights based on the discrepancy between the expected reward and the ground truth reward.At each iteration, the loss function is minimized using the following equation: where θ is the set of weights in the deep neural network, Q(s t , a t ; θ) is the predicted value, and Q target is the target values which can be computed as follows: where r is the reward received for taking action a t in state s t and c is the discount factor for future rewards.
Te objective of the DQN algorithm is to determine the optimal set of weights denoted as θ.By utilizing the gradients of the loss function with respect to the weights, the DQN algorithm updates the weights iteratively in a manner that minimizes the discrepancy between the estimated Q-values and the true Q-values.Tis process involves adjusting the weights in the direction that reduces the overall loss, allowing the DQN model to better approximate the optimal Q-values for diferent state-action pairs.Trough this iterative weight update process, the DQN algorithm gradually learns to estimate more accurate Q-values, enhancing its ability to make optimal decisions in complex environments such as FL.
In our DQN environment, the policy is randomly initialized and progressively refned through training.Te DQN is deployed on the server, which interacts with the environment to gather information for decision-making.Te agent selects actions based on the current state and the learned policy from the DQN.Te explorationexploitation trade-of is a critical aspect of the DQN algorithm.Exploration involves trying out new actions to gain a better understanding of the environment, while exploitation involves leveraging the acquired knowledge to make optimal decisions based on the current state.Striking the right balance between exploration and exploitation is crucial to achieving good performance.To address this challenge, a commonly used technique in DQN is a ϵ-greedy algorithm.Tis strategy entails selecting a random action with a probability of ϵ and choosing the action with the highest Q-value with a probability of 1 − ϵ.Te value of ϵ typically decreases over time to shift the emphasis towards exploitation as the agent gains more knowledge about the environment.By carefully managing the exploration-exploitation trade-of through a ϵ-greedy algorithm, the DQN can efectively explore the environment while gradually focusing on exploiting the learned knowledge to converge towards an optimal policy.
Te agent of DQN actively interacts with the environment by selecting actions and receiving corresponding rewards.Tese interaction data are recorded and stored in a memory bufer, creating a collection of past experiences.During the training process, the deep neural network is updated using mini-batches of experiences sampled from the memory bufer.Te objective is to minimize the discrepancy between the predicted Q-values and the actual rewards obtained.Experience replay is a technique used in deep reinforcement learning [38].By doing so, the agent can learn from a diverse set of experiences, preventing the learning process from being biased towards recent experiences.By leveraging experience replay, DQN leverages past experiences to improve the stability and efciency of training.Te agent can draw upon a diverse set of experiences and learn from a broader range of scenarios, resulting in enhanced performance and adaptability in complex environments such as FL.Te whole procedure of contribution scores update with DQN is summarized in Algorithm 2.  [39].Each example is a 28 × 28 gray-level image, with digits located at the center of the image.Tis dataset has been extensively used in several FL evaluations.

Performance Evaluation
We simulated the FL environment by iteratively training models on the MNIST dataset.For the local model architecture of the clients, we chose to employ a convolutional neural network (CNN).When training the local models, we set the learning rate to 0.01 and the batch size to 20.Te training samples were distributed in a highly imbalanced manner, where the labels available to the clients were unevenly and randomly distributed.In this work, we considered a more realistic scenario where 2500 randomly selected examples were assigned to high-contributing clients, 600 examples were allocated to ordinary clients, and free-riders had access to only 150 examples of data.
Te DQN agent consists of two hidden layers, with the number of units equal to the number of states and actions, respectively.Te neural network utilizes the rectifed linear unit (ReLU) as the activation function for all hidden layers, enhancing the ftting capability of this network.For learning the neural network parameters, we employ the Adam optimizer with a learning rate of 0.01.Te discount factor c is set to 0.99 to adjust the importance of future rewards in the current decision-making process.To break the temporal correlation between consecutive training samples, we use a replay memory of size 64.Te simulation parameters of the DQN are summarized in Table 1.
To visualize the fairness of FL from a holistic perspective, we quantify fairness by calculating the Pearson correlation coefcient between the contributions of clients (i.e., the amount of data contributed by each client) and their rewards (i.e., the fnal model accuracy achieved by each client).Specifcally, we construct a coordinate system with the contributions of clients as the x-axis, and the corresponding model accuracies achieved by each client as the y-axis [24].By computing the Pearson correlation coefcient, we can quantify the fairness of FL.Te fairness metric ranges from −1 to 1, where higher values indicate better fairness.Conversely, negative coefcients indicate poorer fairness.

Performance Results.
We conducted a series of experiments to evaluate the performance of our proposed method in a FL environment.Te experiments involved diferent numbers of clients, namely 4, 8, 16, and 32, denoted as N4, N8, N16, and N32, respectively.Q with weights θ − � θ (4) for t � 1 to T do (5) With probability ϵ select random actions a t (6) Otherwise select actions a t � argmax a Q(s t , a; θ) (7) Execute actions a t , update contribution score cs for each client.(8) Obtain clustering results C by using Algorithm 1 based on contribution score cs (9) Run the FL algorithms independently on each cluster (10) Observe reward r t and next state s t+1 (11) Store transition (s t , a t , r t , s t+1 ) in D (12) Sample random mini-batch of transitions from D (13) Update weights θ by minimizing the loss (14) At every certain step, update the target network weights: θ − � θ (15) end for ALGORITHM 2: DQN-based contribution scores update for FL.For the FedAvg algorithm, poor correlation coefcients are observed across all client numbers (N4, N8, N16, and N32), with some coefcients being negative, indicating a lack of fairness.Negative coefcients imply an inverse relationship between client contributions and model accuracy, where higher contributions do not necessarily result in higher accuracy.Tis signifes an unfair distribution of the model, with clients contributing more data and receiving lower model accuracy.
In contrast, our proposed method demonstrates signifcant improvements in fairness.Positive correlation coefcients are obtained, all above 0.5, indicating a positive relationship between client contributions and model accuracy.Higher data contributions are positively correlated with higher accuracy, suggesting a fair distribution of the model among clients.Tese positive coefcients refect the efectiveness of our method in addressing fairness concerns in federated learning.
Te fndings emphasize the benefts of incorporating the DQN-based contribution scores update method into the federated learning framework.By dynamically updating the contribution scores of clients based on DQN, our method achieves a more equitable distribution of the model, ensuring that clients contributing more data are rewarded with higher accuracy models.Te positive correlation coefcients afrm the improved fairness achieved by our proposed method, validating its potential for ensuring a more equitable distribution of the model among participating clients.Tis improvement in fairness is particularly noteworthy when compared to the FedAvg algorithm, which lacks mechanisms for addressing fairness concerns.
In Figures 2-5, we compare the performance of our proposed method with several baseline methods, including FedAvg, centralized framework, and standalone framework.Our proposed method consists of three global models represented by curves C1, C2, and C3 in the fgures.Tese curves correspond to the model accuracy that clients with diferent contribution scores will obtain.FedAvg is a common federated learning algorithm.Te centralized framework represents the centralized approach, assuming that the server can access user data and centralize all client data for training.Centralized training infringes upon client privacy.Te standalone framework assumes that clients do not collaborate with each other and train their models independently using their own data.Tis method maximizes client privacy but may lead to suboptimal results.
In Figure 2, we simulate the case with four participating clients.Figure 2(a) shows the varying model performance among diferent clients due to the diferences in their respective sample data.Te high-contributing clients quickly improve their accuracy, converging to approximately 0.97.Te ordinary clients achieve slightly lower accuracy, converging around 0.93.On the other hand, the free-riders achieve the poorest model performance, struggling to surpass an accuracy of 0.9.To provide a comprehensive comparison, we calculate the average accuracy of each client in the standalone framework and include it as the standalone curve in Figure 2(b).Figure 2(b) illustrates the accuracy variations of diferent methods as the models iterate over time.With each iteration, the performance of these models improves and eventually converges.Among them, the centralized framework exhibits the fastest convergence rate and achieves the highest model performance.However, in the initial stages, the performance of the FedAvg model is inferior to the models corresponding to cluster C1, due to the infuence of free-riders.From the graph, it can be seen that our proposed method efectively enhances fairness in FL.Te model performance corresponding to cluster C1 outperforms the model performance of C2, and the model performance of C2 is superior to C3. Tis correlation between the model performance and the contribution scores of clients highlights the impact of our approach.Furthermore, the model performance of cluster C3, composed of clients with low contribution scores, is even inferior to the average performance of the standalone framework.
To further validate the efectiveness of our proposed method, we expand the number of participating clients and conduct multiple experiments.Figures 3-5 display the accuracy curves of diferent methods when the number of clients is 8, 16, and 32, respectively.In Figures 3(a), 4(a), and 5(a), it is evident that the accuracy of models, trained by clients with varying sample quantities, can be distinctly categorized into three groups, efectively simulating the diferences between clients with diferent contributions.Even clients with the same sample quantity exhibit slight variations in accuracy, as each sample of clients is randomly selected from the MNIST dataset.In the initial stages of model iteration, certain clients do not receive matching contribution scores since all contribution scores of clients are initialized as 5.As shown in Figure 3(b), the accuracy curve corresponding to cluster C3 rises rapidly at the beginning.However, as the model iterates, each client receives a reasonable contribution score and is assigned to the corresponding cluster.Clients who make signifcant contributions in cluster C3 receive higher contribution scores, leading them to leave C3 and join the cluster where they belong.Tis gradually restores the accuracy curve of cluster C3 to its appropriate level.As the number of participating clients increases, the training dataset also expands, resulting in improved model accuracy for diferent methods.When      To ensure fairness in FL, we aim to provide greater benefts to clients who contribute more, i.e., better models.For visual comparison, we present bar graphs in Figure 6, depicting the model accuracies obtained by clients under diferent methods and client quantities.It is evident that the centralized framework achieves the highest model performance.However, it requires the server to have access to the privacy data of all clients, which is challenging to realize in practice.On the other hand, FedAvg considers data privacy and achieves slightly lower model performance compared to the centralized framework.Nevertheless, it does not consider fairness.In FedAvg, regardless of the contributions, clients can only obtain the same model, which discourages active participation from high-contributing clients.To address this, we utilize the k-means++ algorithm to cluster diferent clients based on their contribution scores, resulting in three clusters: C1, C2, and C3. Figure 6 demonstrates a clear hierarchy, where clusters composed of clients with higher contribution scores exhibit superior model performance compared to clusters with lower contribution scores.Tis efectively ensures fairness in FL and motivates clients to contribute more data to the FL process.

Conclusion
Tis study presents the issue of fairness in FL and proposes a method of diferentiated global models to enhance fairness.Specifcally, we use DQN to dynamically adjust the contribution scores and group clients based on them, enabling each client to obtain global models with diverse performance.Experimental results demonstrate that our method signifcantly improves the fairness of FedAvg.Our method maintains fairness above 0.5, which FedAvg sufers from low or negative fairness.
For future work, we plan to consider more metrics to evaluate the contributions of clients, such as computational power and data quality.We also aim to apply our method in real-world scenarios to test its practical efectiveness and impact.By doing so, we hope to further advance FL and promote fair and equitable collaboration among clients in various domains.
(i) We design a contribution scores updating method based on DQN.Tis method fairly evaluates the contributions of clients to FL and updates their contribution scores.(ii) We propose a contribution-based diferentiated global model mechanism to address the fairness challenge in FL.Tis mechanism ensures that highcontributing clients obtain better global models compared to low-contributing clients at the end of FL training.(iii) Te fnal experimental results confrm the efectiveness of our method in improving the fairness of FL, thereby motivating clients to participate in the FL process.

Figure 1 :
Figure 1: DQN-based contribution scores updating method for federated learning.

5. 1 .
Simulation Settings.Tis section evaluates the performance of our proposed DQN-based contribution scores update and evaluation strategy through simulations.Te simulations were performed on Nvidia GeForce GPUs version RTX 3060 running on Windows 11.Te proposed model is developed using Python 3 and Pytorch.Te experiment is conducted on the well-known MNISTdataset for handwritten digit recognition, which consists of 60,000 training examples and 10,000 test examples

Figure 2 :
Figure 2: Performance comparison with 4 participating clients.(a) Comparison between clients.(b) Comparison of diferent methods.

Figure 3 :
Figure 3: Performance comparison with 8 participating clients.(a) Comparison between clients.(b) Comparison of diferent methods.

Figure 4 :
Figure 4: Performance comparison with 16 participating clients.(a) Comparison between clients.(b) Comparison of diferent methods.
Output clustering results C � C 1 , C 2 , . . ., C k   ALGORITHM 1: Clustering algorithm based on contribution.International Journal of Intelligent Systems future rewards, s t+1 represents the new state reached after executing action a t , and max Q t (s t+1 , a t+1 ) denotes the maximum Q-value among all possible actions a t+1 in state s t+1 at time t + 1. Tis Q-value update equation is known as the Q-learning update rule with temporal diference (TD) Table 2 presents a comparison of fairness between the FedAvg algorithm and our proposed method, considering diferent numbers of clients Initialize the global model parameters for each cluster, experience replay memory D (2) Initialize action-value function Q with random weights θ (3) Initialize target action-value function

Table 1 :
Simulation parameters for DQN-based contribution scores update for FL.

Table 2 :
Fairness of FedAvg and our method over MNIST dataset, with diferent client numbers (N-k).