Big Data Digging of the Public ’ s Cognition about Recycled Water Reuse Based on the BP Neural Network

Reuse of recycled water is very important to both the environment and economy, while the public cognition degree towards recycled water reuse also plays a key role in this process, and it determines the acceptance degree of the public towards recycled water reuse. Under the background of the big data, the Hadoop platform was used to collect and save data about the public’s cognition towards recycled water in one city and the BP neural network algorithm was used to construct an evaluation model that could affect the public’s cognition level. The public’s risk perception, subjective norm, and perceived behavioral control regarding recycled water reuse were selected as key factors. Based on a multivariate clustering algorithm, MATLAB software was used to make real testing on massive effective data and assumption models, so as to analyze the proportion of three evaluation factors and understand the simulation parameter scope of the cognition degree of different groups of citizens. Lastly, several suggestions were proposed to improve the public’s cognition on recycled water reuse based on the big data in terms of policy mechanism.


Introduction
Recycled water reuse has brought great social and environmental benefits in terms of the water resource utilization rate and the improvement of economic efficiency in urban areas.Besides, it can also promote sustainable development of water resource.However, low utilization rate and slow development progress of recycled water reuse are two problems in China.Thus, the development practice and technology of recycled water are indispensable.Meanwhile, the cognition degree of the public will also have influence on the development of recycled water utilization.With increasingly severe competition for benefits and data, the public's recognition about recycled water becomes one of the key factors that determine whether water resource protection can be valued under the background of the big data development and it also affects the success of the progress.
In the era of the big data, the public's cognition about recycled water becomes the basic data that make the analysis of wastewater reclamation feasible, and it is also the precondition for the establishment and popularization of recycled water project.Recycled water reuse has brought huge social and environmental benefits in terms of improvement of water resource utilization and economic efficiency in cities [1].Meanwhile, it can also promote the sustainable development of water resource [2].However, the publicity of recycled water reuse in China is far behind the level of developed countries at the same period because of a late start.Although the popularization of recycled water reuse in China is not positive and Chinese citizens have very less contact with recycled water reuse, there are no big-scale events to resist recycled water reuse in China, yet it does not mean we can neglect the potential risks of this issue [3].In China, recycled water reuse is beneficial for both the environment and the society but has certain risks for users.However, recycled water reuse has much more influence on other programs, such as genetically modified food, waste incineration, and PX project, and there are counterviews about this issue all the time [4].With the development of social economy and progress of urbanization, the water environment pollution, and the worsening of water resource supply, the best way to solve this problem is recycled water reuse.Therefore, this issue will draw increasing attention from the public.With further popularization of recycled water reuse in China, direct and indirect participants in the project of recycled water reuse will increase and thus its influence will expand, so opponent opinions among residents like what they have in the field of genetically modified food will increase.The residents' acceptance towards recycled water reuse will have an increasing effect on the popularization of recycled water reuse in the future [5].Therefore, how to improve the acceptance degree of the final consumers of recycled water reuse and make the residents accept recycled water reuse from passive to automatic and positively participate in recycled water reuse is an issue that must be handled in the future popularization of recycled water reuse.
At the end of the last century, some scholars already realized that the public's low acceptance of wastewater reclamation and reuse technology is the biggest obstacle for the popularization of wastewater reclamation rather than backward technology [6].Numerous researches and engineering projects have verified that the low acceptance of the public towards wastewater recycling technology is the key factor that affects the popularization of recycled water reuse [7].For example, the water management department in San Diego plans to use recycled water in drinking water, and this action has triggered a large-scale protest movement which ultimately led to the ending of the project with a big amount of investment [8].Coincidentally, the project of using recycled water in a replenishing water storage in 2006 implemented in Toowoomba has suffered strong resistance, and those opponents claimed "refusing drinking wastewater."Although the water storage in the dam was as low as 23% of the reservoir, the project was abolished because of the negative vote from 63% residents [9].Obviously, the ignorance of residents' opinions about recycled water reuse will lead to the failure of the project to a great extent.
Because recycled water was obtained from wastewater processing, the negligence on technology and manual operation may lead to the existence of residue with chemical components and pernicious microbes [10].Therefore, the public have concerns about their health that may be caused by the recycled water reuse, and it is also an important cause of low acceptance towards recycled water [11].Based on this, Rozin et al. defined the public's aversion towards recycled water as "spiritual contagion" [12] because recycled water was processed from wastewater.The rapid development of urbanization and social economy leads to the increase in supply-demand gap of water resource, and higher requirements were proposed for the popularization of recycled water reuse technology.However, few researches have been done to clarify Chinese residents' cognition about recycled water and then figure out reasons why it is difficult to popularize recycled water reuse technology.
In this paper, in evaluating the public's cognitive degree of recycled water, the background of the big data era was considered.The BP neural network and MATLAB software were used to make a fitting test on massive data on the Hadoop big data platform, so the characteristics of the multidimensional cognitive degree model of different groups can be analyzed, and then it can be used to seek for the optimal degree evaluation solution quantitatively.

Hadoop Cloud Computation
At present, big data already become one of the most popular research technologies in the field of Internet.The basic principle of cloud computation is to integrate a server group by connecting computers that are distributed in different places through network technology and provide all kinds of services for users through the group, including computation and storage.Based on the cluster service of the server, computer resources are distributed on different computers, rather than on one single device or remote-controlled server, so as to reduce the consumption of computer and stored resources and balance the resources on each distributed computer; thus, it can provide users rapid, reliable, and high-performance computer service and weaken the implementation detail, so users do not need to have concerns about the bottom implementation.See Figure 1.
Hadoop has hidden the implementation details of the distributed framework.Two major basic components, MapReduce and HDFS, are used to provide cloud computation service, while the distributed MapReduce computer model can complete computation [13] task rapidly; HDFS has distributed the file system with high fault tolerance, so it can provide stable and reliable data service for the system.A computer computation platform coordinates the work among the computers through virtualized technology on a software level and then expands the computer capability and implements service sharing of strong computer resources and storage resources.Cloud computer system structure includes a server cluster, administration system, system deployment tool, system service catalog, resource monitoring terminal, and client terminals.These modules can provide reliable services for the cloud computation platform.The big data platform also provides a more convenient data acquisition method for empirical research.The use of big data digging can help us get large-scale samples better than traditional surveys by accessing the required data from the network cloud platform.See Figure 2 for the specific system structure.
2.1.HDFS System.The distributed file system is used to provide large-scale distributed computer service, and it is within the server in the same network.Usually, resources are operated through distributed servers at deferent locations, so it makes it inconvenient to manage resources.DFS will integrate information that exists in different servers, so users and administrators can visit resources through resource sharing catalog DFS.Based on this principle, Hadoop has set up HDFS (Hadoop Distributed File System), provides services for the cloud platform as a storage core, and makes storage computation for all files in the cluster, which makes it one of the most widely used DFS at present [14,15].
HDFS has adopted the classic Master/Slave system structure after referring to the structure of the GFS Master-Chunk Server.The entire HDFS consists of a name node, data node, 2 Complexity and client; except for the name node, the quantity of the remaining two can be multiple ones.The name node is the system administrator that is responsible for namespace maintenance and block [16].The data node is the actual data storage.Usually, one file can be stored in multiple nodes after being cut into many blocks.The client terminal is the visit entrance for users.The HDFS structure diagram can be seen in Figure 3.The Hadoop platform is a distributed storage, and the distributed computer system operates on the server cluster, including the function of storage and computation.In the HDFS (Hadoop Distributed File System), the major node is named name node that is responsible for the control of the distributed system and the management of the file cluster, from the coordinate of data node to computation and allocation of storage tasks.It implements local computation from the node storage and task of computation of name node, so as to improve the performance of the distributed system [17].The distributed MapReduce computation framework can make parallel computation on data effectively and manage and adjust the nodes within the entire cluster to complete the computation tasks.

Distributed MapReduce Computation Model. Distributed
MapReduce computation is one of the core components of the Hadoop platform, and it is the model and framework for the computation tasks of big data.MapReduce changed the organization method of large-scale computation and realized the abstraction of large-scale computation on the level of a large-scale server cluster [18].
Figure 4 shows the detailed procedures of distributed MapReduce computation.The detailed MapReduce operation mechanism can be seen in the figure.Firstly, MapReduce functions will cut the input file into n pieces of split parts, and each part is about 16 MB-64 MB; then, the split part will be analyzed into key assignment, and the implementation of the entire assignment is made based on the split parts.The distributed MapReduce computation model computes multiple assignments through the core master program.A mapper will input the assigned split from HDFS based on the break points.At the Map stage, each split will create a corresponding map task and then assign tasks to the corresponding worker program, and the worker program will use the map function in the system to compute the outcomes in the middle and then rank them in order.After that, the key assignments of the same key value will form a list, and then the 3 Complexity result set will be divided into R splits and saved in a local storage.At the Reduce stage, different map data will be integrated and ranked to form new key assignments, and then the result set will be used to get new results through reduce functions, as shown in Figure 4.

BP Neural Network Model
The BP neural network is a one-way communication and multilayer forward-type network, which includes three layers, namely, input layer, hidden layer, and output layer.The input layer receives signals from the outside, the hidden layer performs various mapping conversions of the input signals, and the output layer finally outputs the simulation results of the network.There may be several hidden layers.Under normal circumstances, a three-layer structure is adopted, as shown in Figure 5.
The BP neural network has a complete learning mechanism and theoretical system.It imitates the response process of human brain neurons to external stimuli.Through the establishment of a multilayer perceptron model and the forward transmission of signals as well as the inverse adjustment mechanism of errors, this research performs N-times iterative learning and thus forms a neural network model that   4 Complexity deals with nonlinear information.This paper takes public risk perception, subjective norms, and perceived behavior control as the input variables.Besides, it takes the public perception as the output value to construct the BP neural network model [19].
The BP neural network algorithm, which is based on genetic algorithm optimization, advocates to combining the neural network algorithm with the genetic algorithm.When training network convergence speed of the BP algorithm is not satisfactory, the thresholds and weights of the nodes at the hidden layers of the BP neural network would be taken as the input information of the genetic algorithm.And they are further coded to generate chromosomes [20].The later generations generated with selection operators, crossover operators, and mutation operators of the genetic algorithm are then used to generate new descendants as initial values of the BP algorithm.And after, the network with the BP algorithm was trained until the error precision could meet the requirement.
The BP neural network algorithm based on the genetic algorithm is a method that widely searches in the solution space of the target information of the genetic algorithm.It locates the problem when the genetic algorithm finds a more optimal network form and seeks for the optimal solution to the problem.The process is shown in Figure 6.

Multivariate Cluster Analysis
Cluster analysis, also known as group analysis, is a multivariate statistical analysis method for classifying samples or indicators according to the principle of "like attracts like".Its objects are a large number of samples.It requires the classification of these samples reasonably in accordance with their characteristics.And there are no models available for reference or available to follow.Therefore, it is usually performed without prior knowledge.Cluster analysis originated from taxonomy.In ancient taxonomy, people mainly classify things according to their experience and professional knowledge, and seldom used mathematical tools to make quantitative classification.With the development of human science and technology, there is an increasingly higher requirement for classification.Accurate classification could not be made with only experience and professional knowledge.Therefore, mathematical tools were gradually introduced into taxonomy which then became numerical taxonomy.Afterwards, the technique of multivariate analysis was introduced into numerical taxonomy to form cluster analysis.
Cluster analysis has been applied in many fields.In business, cluster analysis is used to find different customer groups and describe the characteristics of different customer groups through purchasing patterns.In biology, it is used to classify animals and plants so as to understand the intrinsic structure of different populations.Geographically, this method can help make the data observed on the earth become similar.In insurance industry, cluster analysis consumes a high average to identify groups of car insurance holders.At the same time, it is also used to identify groups of residential buildings in a city according to the type, value, and geographical location of the residential buildings.In Internet applications, cluster analysis is used to make up information through classifying documents.
Clustering is a process of classifying data into different classes or clusters, so objects in the same cluster are greatly similar to each other, while objects in different clusters differ greatly from each other.The goal of cluster analysis is to collect data according to similarities.It is related to many fields such as mathematics, computer science, statistics, biology, and economics.Clustering techniques have been developed in various fields, which are mainly applied to describing data, measuring the similarities between different data sources, and classifying data sources into different clusters.
High-dimensional cluster analysis has become an important research direction of cluster analysis.At the same time, high-dimensional data clustering is also a difficulty in clustering technology.With the advancement of technology, data collection becomes much easier, resulting in increasingly

Input of the j output neuron
The input of h neurons in the hidden layer 5 Complexity larger and more complex databases.Take various trade transaction data, web documents, and gene expression data as examples.They may have hundreds or even thousands of dimensions (properties).However, due to the influence of the "dimensional effect," many clustering methods that perform well in a low-dimensional data space are often unable to obtain good clustering results in a high-dimensional space.High-dimensional data clustering analysis is a very active field in cluster analysis, and it is also a challenging task.At present, high-dimensional data cluster analysis has a wide range of applications in market analysis, information security, finance, entertainment, and antiterrorism.First, we consider that in the Euclidean space, the quality of clusters is usually measured with the following metric: sum of the squared error (SSE).To be specific, after performing cluster analysis, the error value should be estimated to each point, the distance from the noncentroid point to the nearest centroid.And then, since each noncentroid point already belongs to a certain cluster, we need to calculate the distance from each noncentroid point to the centroid of its cluster.
And finally, sum up these distance values to evaluate the quality of a cluster as SSE.Our ultimate goal is to minimize the final SSE, where minimizing the targeted SSE is a problem.In the n-dimensional Euclidean space, SSE is defined formally and can be calculated through the following formula: The basic idea of the bisecting k-means clustering algorithm is the introduction of a local bisection test.In each experiment, the cluster with the largest SSE value is selected.After bisecting the cluster, there will be two subclusters.And choose the bisection method that will enable the total SSE of the two subclusters to be the smallest, which then will make the subclusters of each bisection better (or optimal).In other words, the bisection of the two clusters may be locally optimal.This, however, depends on the number of tests.
The specific process of the bisecting k-means clustering algorithm can be described as follows: (1) At first, take the data set D to be clustered as cluster  7. Similar to the most commonly used shortest distance method and longest distance method in sample set clustering analysis, the variable clustering method adopts the same ideas and processes as the system clustering method.In the issue of variable clustering, the longest distance method and the shortest distance method are commonly used.

The Longest Distance Method.
In the longest distance method, the distance between two types of variables is defined as In this formula, related to the similarity metric of the two variables with the least similarity in the two categories.

The Shortest Distance Method.
In the shortest distance calculation, the distance between two types of variables is defined as In this formula, related to the similarity metric of the two variables with the least similarity in the two categories.
Based on the data processed by the Hadoop big data platform, obtain the clustering diagram of ten types of indicators, with the algorithm, as shown in Figure 8.
The clustering diagram demonstrates that the variables of recycled water can be roughly divided into two categories: first is the variables that reflect the internal conditions of recycled water, such as the benefits to human health, the benefits to the environment, element content, and the treatment process, and the other is the variables that reflect the external conditions of recycled water, such as the water price mechanism; health indicators; sources; policies, regulations, and standards; treatment technologies; and safety issues.
The multilevel clustering algorithm is adopted to introduce 6 levels of perceptions on recycled water as shown in Figure 8.

Comprehension
It refers to the understanding of things, but does not require a deep understanding, but preliminary may be superficial.

Application
It refers to the use of concepts, rules, and principles learned.It requires learning to apply the abstract concept correctly to the appropriate situation without explaining the problem-solving model.

Analysis
It means decomposing the material into its constituent elements, so that the interrelationship between the various concepts and the organizational structure of the material are clearer and the basic theory and basic principles are elucidated in detail.

Synthesis
Based on the analysis, all the elements that have been decomposed are processed in a comprehensive manner, and they are again reassembled as required in order to comprehensively and creatively solve the problem.

Evaluation
This is the highest level of educational goals in the cognitive field.The requirements of this level are not judged by intuitive feelings or observed phenomena but rather by rational and profoundly convincing judgments about the value of the nature of things.

Complexity
In order to simplify the level division, integers are taken.The residents' risk perception, subjective norm, and perceived behavioral control are input layers, while the residents' feedback cognition about recycled water is regarded as the output layer.The three indicators have positive and negative effects on the public's cognition on recycled water, respectively; some indicators are beneficial for improving the public's cognition about recycled water, while some indicators have negative effects on their cognition [21].The investigated subjects are divided into different groups for discussion and weight simulation, so Table 2 is obtained.
From Table 2, it can be found that the risk perception on recycled water use has the biggest influence on the public's cognition about recycled water, followed by subjective norm and perceived behavioral control.However, the influence on cognition about recycled water can be divided into positive influence and negative influence.Therefore, further study is required to analyze which indicator has a positive influence on the public's cognition.
It is required to process the information with big data and make vertical comparison among the information, that is, to compare the negative or positive effect of each indicator on different groups when the other two indicators were kept the same.
Figure 9 represents the effect of the public's risk perception, subjective norm, and perceived behavioral control of different groups on their cognition on recycled water use.It indicates those factors having a positive effect on the public's cognition if it is above zero, and it shows a negative effect if it is below zero [22].The risk perception about recycled water use has the biggest negative effect on the public's cognition degree.From the fitting results of the assumption model and investigation data, it can be seen that risk perception, subjective norm, and perceived behavioral control on recycled water have a significant effect on the public's cognitive level about recycled water use, and the risk perception on recycled water use has a negative effect on the public's

Discussion and Conclusion
6.1.Discussion.The research conclusions could provide certain guidance for promoting recycled water reuse, and it indicates that the change of the public's attitude towards recycled water reuse should be the major concern to encourage them to use recycled water.Helping the public to remove their concern about the water quality safety is the fundamental measure to change their cognition about recycled water reuse, while legislation, supervision, and publicity can help the public to eliminate their "mental contagion." (1) The perception about the risk of recycled water reuse is an important factor that affects the public's acceptance towards recycled water.Recycled water is acquired from wastewater processing; undoubtedly, the quality of recycled water becomes an important factor that affects the public's acceptance towards recycled water reuse.From the questionnaire survey that is developed in different places worldwide, it can be seen that over 50% participants have concerns about the taste, color, smell, salt content, and the existence of harmful microbes.Among which, the possible existence of harmful microbes and chemical residuals becomes potential risks to the health of the human body and becomes important reasons why the public is against the use of recycled water.Some people think that one day harmless disposal of wastewater can be realized with the advancement of technology and science.In fact, with the increasing development of wastewater treatment technology, we would be able to remove some existing harmful substances.Meanwhile, more and more new harmful substances are found.
Therefore, how to treat potential risks of recycled water and seek for the balance between recycled water treatment and economic feasibility is an issue that we have to consider in the future.
(2) In the publicity of recycled water, social norm and social opinions shall be considered as guidance.In "Preface to A Contribution to the Critique of Political Economy", Marx pointed out that "people are social animals."The behavior of human being is affected by numerous factors such as social preferences, social identity, and social norms.Therefore, unconsciously, when they are making decisions, they will think about the behaviors of people around them and see whether they have fit into the group and they even simulate the behavior of others unconsciously; this kind of behaviors also exists in the investment consumption behaviors of residents.Investors will follow the tendency just like a flock of goats following the bellwether, and the investors will make the same decisions as the other investors did.This effect is defined as the sheep-flock effect by economists.By making full use of this group psychology of residents, it can guide the residents' behavior decisions effectively.For example, the water supply pipelines in Bogo, Colombia, cracked in 1997, which led to the shortage of urban water supply.The mayor shows how to save water by turning off the water faucet while taking shower, which has a good guidance effect on the urban residents' behaviors in saving water.
Obviously, the influences of social norms on the public's acceptance towards recycled water reuse cannot be neglected.
(3) The perception behavior control of the residents in the use of recycled water has a positive effect on the public's acceptance towards recycled water reuse.Usually, common urban residents are unfamiliar with reclaimed water reuse, so they feel very inconvenient to use reclaimed water and even have a misunderstanding on recycled water reuse.This kind of misunderstanding will make the residents' perceived behavior control degree for recycled water reuse decline, and it will further affect the residents' acceptance towards recycled water reuse.Therefore, it is important to strengthen the popularization of related knowledge about recycled water reuse and guide the public to participate in the reuse of recycled water.

Conclusion
(1) Policy and regulations should be made, and the technical innovation should be sped up.The improvement of policy and regulations is the precondition for the normal operation of the recycled water project.The legal system for recycled water management shall be made, so as to clarify the legal status for the implementation of recycled water reuse and make detailed stipulation for recycled water utilization.It is necessary to improve the requirements for recycled water resource, water quality standard, and water quality monitory method, so the safety for the use of recycled water can be ensured.Investment should be added in this project, and the technical innovation should be sped up, so as to make research and to enable the development on the technology of recycled water processing.Multiprocess has been used to improve the quality of recycled water, reduce the cost of recycled water processing, and expand the application scope of recycled water.
(2) Improve the supervision mechanism and increase the information transparency.Construct supporting facilities for recycled water and have regular maintenance and inspection, make safety warning sign, and maintain the normal performance of the tunnels.Make real-time supervision, monitor and ensure good water quality and stability, and investigate on the feedback from users about recycled water use to avoid using risks of recycled water.The government should establish a good informationized platform, keep the transparency of information, compare the data regularly to prove the reliability of recycled water, and help the public to remove their concerns about the recycled water quality.
(3) It is required to increase the intensity of publicity and improve the public's cognition on recycled water.The government needs to publicize the necessity and urgency of recycled water reuse and encourage relevant people to organize some activities like car washing and plant watering.Media resources like TV, broadcasting, and network shall be fully used in public service announcement.Regarding the publicity among young people, an environmental protection theme shall be added to commercial films, and some film stars should be invited as an environmental protection ambassador to take public service advertisements about saving water.Regarding the publicity among the old aged, it is a great choice to hold publicity activities about the hygiene and safety of recycled water in the park and display the technology and equipment that are used for wastewater processing and recycling.For colleges and universities that have high water consumption, the government should provide policy support and funds, so as to establish a demonstration site for recycled water use and encourage the students and teachers to develop products for wastewater reclamation independently.
Lastly, in the era with information explosion, information flow refers to money.In the war of cognitive resource, it is not easy to put the information about recycled water reuse on the basic level.Therefore, related knowledge about recycled water reuse shall be included in preschool education, so the residents' knowledge level about recycled water can be improved.Before the children formed their own prejudice towards recycled water use, it is very important to guide them to recognize recycled water use scientifically and properly with right education.
In future studies, we can try to find out the internal influential mechanism of what affects urban residents' behaviors on recycled water reuse from the perspectives of behavioral science, psychology, brain science, and even molecular biology.Therefore, we can come up with policy for the recycled water reuse effectively and correspondingly.Besides, it is feasible to make simulation of the working effect of these policies through computer simulation technology.Finally, conclusions of theoretical researches can be used to know the promotion process of recycled water reuse in real life.

Figure 2 :Figure 1 :
Figure 2: Architecture and technology implementation of the cloud computation platform.

Figure 3 :
Figure 3: The overall framework of HDFS system structure.

Figure 5 :
Figure 5: Structure diagram of the BP neural network.

4. 1 .
Bisecting k-Means Clustering Algorithm (from Top to Bottom).The bisecting k-means clustering algorithm is a variation of the k-means clustering algorithm.It is mainly used to improve the uncertainties of the clustering result caused by the randomness of the initial centroid randomly selected by the k-means algorithm.The bisecting k-means algorithm is less affected by the random selection of the initial centroid.

Figure 8 :
Figure 8: Cluster diagram of 10 indexes of recycled water.

Figure 9 :
Figure 9: Evaluation index training data forecast.

Table 1 :
Evaluation level of the public's recycled water cognition.

Table 2 :
Multiple indicators of public awareness of the weight.