A Multiagent Dynamic Assessment Approach for Water Quality Based on Improved Q-Learning Algorithm

The dynamic water quality assessment is a challenging and critical issue in water resource management systems. To deal with this complex problem, a dynamic water assessment model based on multiagent technology is proposed, and an improved Q-learning algorithm is used in this paper. In the proposed Q-learning algorithm, a fuzzy membership function and a punishment mechanism are introduced to improve the learning speed of Q-learning algorithm.The dynamic water quality assessment for different regions and the prewarning of water pollution are achieved by using an interaction factor in the proposed approach.The proposed approach can deal with various situations, such as static and dynamic water quality assessment.The experimental results show that the water quality assessment based on the proposed approach is more accurate and efficient than the general methods.


Introduction
The assessment of water quality plays an essential role whether in engineering applications or in scientific research.However, due to the frequent occurrence of abrupt water pollution accident [1,2], the general static assessment of water quality cannot meet the actual requirements any more.So, it is very important to assess the water quality of different regions accurately and dynamically, which is a hot field in water environment management system.The dynamic assessment of water quality can give out alarm timely before the pollutant reaches to some sensitive water regions.It is very helpful for these regions to make preparations and control water pollution effectively.
Various methods have been proposed to deal with the problem of water quality evaluation [3][4][5].The main methods of the static water quality assessment include the comprehensive index method [6], fuzzy comprehensive evaluation method [7], BP neural network [8], and comprehensive water quality identification index method [9].Although these methods have their own advantages, there are some shortcomings of these methods.For example, the calculation of the comprehensive index assessment method is complex.The accuracy of the fuzzy comprehensive evaluation method is lower, which cannot give out the assessment for water in worse than Grade V level.The calculation model based on BP neural network is very complex, and the choice of training samples for BP neural network is difficult.The general comprehensive water quality identification index method cannot make specific analysis according to the characteristics of different water bodies because various indicators are considered to have the same effects in the water quality assessment.
The static evaluation methods can just assess water quality after the occurrence of water pollution.To solve this problem, more and more research has been focused on dynamic water quality assessment methods.For example, Yun et al. [10] evaluated the changes in river water quality during a period of time by using the probability transition matrix.Su et al. [11] studied the spatiotemporal patterns and source apportionment of pollution in Qiantang River (China) using neuralbased modeling and multivariate statistical techniques.There is much research on dynamic water quality assessment methods, but few considered the problem of quick perception for the abrupt water pollution.The methods to determine water quality of other regions according to the water quality change of a region in the same basin are few.
To control water pollution and improve water environment quality effectively, the trend of water pollution should be predicted accurately when water pollution accident occurs [12,13].Because it is a problem of complex system, the general method cannot deal with it efficiently.Recently, more and more focus has been put on the agent-based method, which is not only a feasible solution but also an efficient one [14,15].For example, Wen et al. [16] studied the problem of consensus in directed networks of multiple agents with intrinsic nonlinear dynamics and sampled-data information.Leon [17] proposed an interaction protocol for a task allocation system, which can reveal the emergent behaviors in social networks of adaptive agents.In the multiagent system, agent is defined as an entity, which has the capabilities of environment perception, problem solving, and communication with the outside world.Based on these features, the agent can be used to solve the complex problems in practice by sharing knowledge with each other [18].To solve the problem of dynamic water quality assessment, a multiagent model of water environment is set up [19,20], where different regions in the water environment are abstracted as various agents.An improved Q-learning algorithm is proposed to deal with the cooperation of multiagents and carry out the task of dynamic water quality assessment.
The paper is organized as follows.In Section 2, the dynamic assessment model for water quality based on multiagent technology is introduced.Section 3 presents the proposed Q-learning algorithm for water quality assessment.Some experiments are conducted, and the results are discussed in Section 4. At last, the conclusions are given in Section 5.

The Multiagent Dynamic Assessment Model for Water Quality
In this paper, the dynamic assessment for water quality is studied.The dynamic water quality assessment has attracted much attention due to its complexity and significance.There are two main problems that need to be solved in the task of dynamic water quality assessment.The first one is how to assess the water quality of different regions efficiently, when the indicators of water quality in all these regions are obtained.The other one is how to assess the water quality of other regions, when the indicators of water quality in only one region are obtained.
To achieve the task of dynamic water quality assessment, an assessment model for water quality based on multiagent technology is proposed in this paper, where the water environment is divided into different regional agents based on the requirement of administration.By the information exchange among these regional agents, the task of dynamic water quality assessment can be accomplished efficiently.In each agent, there is a water quality assessment model, which is defined as follows in this study: where  is the level of water quality,  1 ,  2 , . . .,   are various indicators used to assess the water quality and  1 ,  2 , . . .,   are the weights for these indicators.
To assess water quality, more accurately, the water quality level  is defined as the following form: where  1 ∈ {1, 2, . . ., 6} is the water quality level, which is determined by the Chinese national standard for water quality (see the Environmental Quality Standards for Surface Water in China (GB3838-2002)),  2 is the relative position of the water quality level between two adjacent water quality grades, and the symbol {⋅} is a separating character, which has the same function as plus.For example, when the water quality level  = 2.5, the mean is that the grade of this water quality is Grade II by the national standard, and the relative grade of the water quality is 0.5; namely, the water quality is at a relatively intermediate location between Grade II and Grade III.When the value of  1 is 6, the water quality is worse than Grade V. To reduce the computation complexity, the value of  2 is designed as a discrete value in this paper, namely,  2 ∈ {0.0, 0.1, . . ., 0.9}.
In the water quality assessment model above, the weight for the th indicator needs to be optimized based on the dynamic change of water environment.In this study, a Q-learning-based algorithm is proposed to deal with this problem, which will be introduced in detail at Section 3.

The Proposed Multiagent Q-Learning Algorithm
In the assessment model, the weights of various indicators need to be obtained.In general water quality assessment methods, these weights are always set by the experience.
Recently, some artificial intelligent methods are introduced to optimize these weights, such as genetic algorithms and neural networks.However, those approaches cannot realize the information transmission and exchange among different regions.So, the weights obtained by those approaches are intrinsically static.To deal with this problem, the multiagentbased technology is introduced into the water quality assessment, and an improved Q-learning algorithm is proposed to realize the cooperation of multiagents.In general multiagent reinforcement learning, the Markov decision process is extended to realize the exchange learning for multiagent systems.In most of algorithms of multiagent reinforcement learning, it is required that each agent should know what action will be taken by other agents before it takes action.Thus, with the increase of the number of agents or the actions of each agent, it will cause that the state space of agent grows exponentially [21].To solve these problems introduced above, an improved multiagent Q-learning algorithm is proposed in this study.The Q-learning algorithm is a kind of reinforcement learning method by the trial-and-error method.Compared with other machine learning methods, the Q-learning algorithm can initiatively find which action will produce the greatest reward, instead of being told which action should be done [22][23][24].To improve the learning efficiency of the Q-learning algorithm, the number of action-state pairs and the searching in the action-state pairs should be reduced.In a multiagent system, an agent needs to keep track of its environment as well as other agents, so the convergency and learning speed are the problems that need to be solved at first in the multiagent Q-learning algorithm [25,26].Some improvements have been done on the multiagent Q-learning to deal with the convergence problem [27,28].However, there are still some problems of those approaches in the literatures such that most of those approaches do not consider the interactions among the agents.In this study, a fuzzy membership function with distinguish weight [29] is used to reduce the size of the action-state set.And a punishment mechanism [30] is used in the proposed algorithm to reduce the searching frequency.The proposed algorithm has some better performances than the general Q-learning algorithm such as the high learning speed and good convergence rate.Furthermore, an interactive factor is introduced into the proposed Q-learning algorithm, to realize the information transmission and the interaction among the agents in the system.The flow chart of the proposed Q-learning is shown in Figure 1.The proposed approach is presented in detail as follows.

The State Reduction Based on Fuzzy Membership Function.
In the state preprocessing module of the proposed Q-learning algorithm, a fuzzy membership function with distinguish weight is used to reduce the size of sate-action sets by removing the superfluous or unrelated information from the system.The membership function is defined as follows in this paper: where  is the original state (namely, the evaluation indicator for water quality in this study),  is the distinguish weight value of the indicator within its membership domain (namely, the parameter  2 in (2)),   is the demarcation point of a grade, and  +1 is the demarcation point of the next grade.
When the grade of   is  (namely, the parameter  1 in (2)), the membership value of the evaluation indicator is V = +.
Because the size of the distinguish weight value  is 10 and the number of the grade  is 6 in this paper, the total number of states is  = 10 * 6 = 60.By this way, the state space can be reduced obviously.

The Information Transmission Based on Interaction Factor.
To transfer the information among these agents in the system, a concept of the interaction factor (denoted by ) is proposed in this paper, which can transfer the information of the key state to other agents for water quality assessment.In practice, the value of the interaction factor  should be learnt by experience.In this paper, it is calculated by where  is the distance between two agents, which can be an abstract concept or an actual physical distance and  is the attenuation coefficient, which can be calculated by the least square method: where   is the value of next state obtained by the proposed algorithm at the given .Moreover   is the actual value of the next state.The value of  obtained is the attenuation coefficient, when the function  arrives to the minimum value.Based on (4) and ( 5), the time when the water quality grade of one region will be reached to the highest value (namely, the water quality will be worst) can be obtained.
Then, we can make some preparations to prevent the water pollution for some sensitive areas.

The Action Execution Module.
In the action execution module, the regional agents select their actions by the softmax strategy [31], which is defined as where  is the action of agents, which is to increase or decrease the weights of indicators in this study,  is the simulated annealing temperature parameter, which is used to control the searching rate, and (, ) is the -value function for the action-state pair.To reduce the searching times in the action-state set and accelerate the learning rate of the proposed algorithm, a punishment mechanism is introduced into the proposed algorithm.Then, the -value function (, ) is separated into a punishment -value and a reward -value function, respectively.The update algorithm of the punishment -value is And the update algorithm for the cumulative reward -value is: where  ∈ (0, 1) is learning rate,  ∈ (0, 1) is discount factor, and  and  are the reward value and the punishment value, respectively.

The Work Flow of the Proposed Approach.
The work flow of the proposed approach for dynamic water quality assessment is summarized as follows.
(1) The initial state sets should be obtained from the water quality monitoring system, which is denoted as  1 = { is the key state after being processed.The interaction factor  can be obtained by ( 4) and ( 5).
(5) Each agent selects the optimal action under current state  according to (6) and gets the next state   after executing the action .Then, a reward  and punishment  can be obtained from the environment feedback.
If   ≥ , then select a new action from the action sets (where  is the upper limit for the punishment value and  = 50 in this paper).If   < , then set   ( +1 , ) max = max ∑ ∈   ( +1 , ).By ( 8), the value of   can be obtained.
(7) Repeat steps ( 5) and ( 6) to find out the weights of each group indicators, and calculate the average value of the weight for each indicator.Then, the water quality can be assessed by (1).

Experimental Studies
In order to test the performance of the proposed approach, some experiments are conducted.In these experiments, a water area of a lake is studied, which has six different regions (see Figure 2).The task of these experiments is to assess the water quality of the six regions.The pollution sources include the industrial pollution source, the agricultural pollution Regional agent 2 Regional agent 1 Regional agent 3 Regional agent 5 Regional agent 4 Water basin studied Regional agent 6 source, and the domestic pollution source.According to the characteristics of the water area, it is assumed that the main pollution indicators are Permanganate Index (COD Mn ), Total Nitrogen (TN), and Total Phosphorus (TP); namely, the initial state set is  1 = { 1 ,  1 ,  1 }.Then, the reduced state sets can be obtained based on the membership function (3); that is,  2 = { 2 ,  2 ,  2 }.The assessment model for water quality is  =  1   +  2   +  3   .In this paper, two experiments were conducted, where the interaction factor  is set as  = 0 and  ̸ = 0 to test the performance of the proposed approach in the static and dynamic assessment, respectively.

Static Water Quality
Assessment ( = 0).In this experiment, the interaction factor  is set as 0, which means that there is no information transmission among these regional agents in the water area.Each regional agent assesses its own water quality based on the monitoring data of various indicators.The training data set for the Q-learning algorithm is shown in Table 1, which is used to learn the weights  1 ,  2 , and  3 for COD Mn , TN, and TP, respectively.
The training data of  1 ,  1 , and  1 in Table 1 are collected from the monitoring points for each regional agent.The value of the  2 ,  2 , and  2 is the corresponding value of the three indicators reduced by the membership function. is the water quality assessed by water quality experts.From these training data, the optimal weights for the three indicators can be obtained, which are  1 = 0.5,  2 = 0.3, and  3 = 0.2.Based on these optimal weights above, the water quality of different regional agents can be assessed.To show the advantages of the proposed Q-learning approach (QL), it is compared with the approach based on the fuzzy comprehensive evaluation method (FC) and the comprehensive identification index evaluation method (CI).The test data and the water quality assessment results are shown in Table 2, where the test date are the indicator data collected in each monitoring point.
The results in Table 2 show that the assessment results of the water quality are almost the same by the three methods (see the water quality of the regional agent 1, agent 2, agent 3, and agent 4).The water quality assessment result for the regional agent 5 shows that the water quality assessment by the proposed method is more accurate than the method based on the fuzzy comprehensive evaluation method (FC).The proposed approach can not only give out the water quality grade but also evaluate the pollution degree of the water in this grade.In addition, the proposed approach can assess the water quality which is worse than Grade V (see the assessment results for agent 5 in Table 2).The assessment results for agent 6 show that the assessment based on the comprehensive water quality identification index method will become incorrect, when some indicators exceed the range in the national standard.Because the weight of each indicator is considered in the assessment model, the results based on the proposed approach are more accurate.The results of this experiment show that the proposed Q-learning approach can assess the water quality accurately and can deal with some abnormal conditions such that some indicators become abnormal.Furthermore, the proposed approach can assess the water quality of worse Grade V.

Dynamic Water Quality Assessment (𝜎 ̸
= 0).To test the performance of the proposed approach in the dynamic water quality assessment task, this experiment is conducted.In this experiment, an abrupt water pollution occurs in the regional agent 1, which is an industrial pollution, and the main contamination in the waste water is COD Mn .So, the interaction factor  is used to transfer the concentration information of COD Mn among these regional agents.In the dynamic water quality assessment model,   =  2 + ,   =  2 , and   =  2 , respectively.
In order to have an easy analysis, the assumptions in this experiment are as follows.(1) The value of  is only related to the physical distance among the regional agents.(2) The change step of  in (4) is assumed as 0.1.(3) The water speed is set as 0.02 km/h and assumed as fixed.(4) The water quality of each regional agent is known before the occurrence of the abrupt water pollution accident, which is set as the same data in the first experiment (see the water quality assessed by the proposed approach in Table 2).In this experiment, the actual concentration of COD Mn in the six regional agents before this abrupt water pollution accident and the physical distance between other agents to agent 1 are listed in Table 3.The value of  can be calculated by (5) based on the information of Table 3, where  is 0.4.With this  and , the interaction factor  of each agent can be obtained by (4).After the water pollution accident occurs, the concentration of COD Mn in the regional agent 1 increases by 1.8 mg/L.Based on the proposed approach, the change of the COD Mn concentration in other regional agents and the time when the COD Mn concentration reaches to the highest value are shown in Figure 3.The results of the dynamic water quality assessment for each regional agent are shown in Figure 4.
The results in Figure 4 show that the water quality becomes worse too, when the concentration of COD Mn in the regional agent 1 increased.This experimental results show that the proposed approach can give out the water quality assessment for different regions in the same water area, when there is just some information about the concentration of indicator in one region.Furthermore, the proposed approach can calculate the time when the concentration of the indicator will reach to the highest value.This performance is very important for the sensitive regions to prepare for the water pollution control.

Conclusions
The dynamic water quality assessment for a whole water basin has been investigated.A water assessment model based on multiagent technology is set up, and an improved multiagent Q-learning algorithm is proposed.The proposed approach can deal with various situations.It can deal with the water quality assessment at the static situations, and the assessment results are more accurately than the general methods.In addition, it can deal with the dynamic water quality assessment, which is very important for the water pollution prewarning and control.The feasibility and efficiency of the proposed approach have been discussed and illustrated through experimental studies.The results show that the proposed approach can assess the water quality efficiently, without any complex mathematical model nor any prior knowledge about the water environment.The proposed approach is applicable to other real-time cooperative tasks of multiagent systems, such as the fire disaster response for wide tracts of forest.

Figure 1 :
Figure 1: Flowchart of the proposed Q-learning algorithm for multiagent system.

Figure 2 :
Figure 2: The schematic drawing of the water area studied.

Figure 3 :Figure 4 :
Figure 3: Changes of water quality and the diffusion time of pollutant in each agent.
11,  12 , . ..,  1 }, where  1 is the actual concentration value of the th indicator.The action set of agent is  = {  + 1,   − 1}, where   is the weight of the th indicator.(2)Initialize the value of   and   to 0, select a key state from the state sets, and initialize the interaction factor  for this state to 0.
(3) Reduce the initial sate sets by(3), and a new state set can be obtained, which is denoted as  2 = { 21 ,  22 , . . .,  2 }, where  2 is the concentration value of the th indicator after being processed.(4) Each agent calculates the real-time state  key for the key state, based on the state sets after being processed.To easily compute and without losing generality, the state  key is calculated by  key =  2key + , where  2key

Table 1 :
The training data set for the Q-learning algorithm.

Table 2 :
The test data and results of water quality assessment.

Table 3 :
The actual monitoring concentration of COD Mn for each agent.