Research on FCM and NHL Based High Order Mining Driven by Big Data

In order to get the high order evaluation and correlation degree among big data with the characteristics of multidimension and multigranularity, an FCM and NHL based high order mining algorithm driven by big data is proposed, which is a kind of machine learning based on qualitative knowledge.The algorithm is applied in scientific and technical talent forecast. Driven by the big data of scientific research track of scientific and technical talents, the index system is designed and the big data is automatically acquired and processed. Accordingly, the high order evaluations in dimension level and target level can be inferred by the correlation weights mining. And the outstanding young talents in material field in 2014 have been actively recommended to review department for decision-making.


Introduction
Big data tend to be presented in the form of multidimension and multigranularity.They have been a kind of urgent problems that are how to get correlation degree between the concepts with different granularities and how to get high order value of abstract concept with high granularity, which are the topic studied in the paper.
FCM [1], introduced by Kosko in 1986 by suggesting the use of fuzzy causal functions taking numbers in [−1, 1] in concept maps, is very suitable for model from data resources.And it is with the highly appealing properties of knowledge representation, fuzzy logic, and fuzzy inference.
With the complexity increasing of relationship between data, a "high" FCM has been proposed for modeling complex problems.At present, the FCM mainly includes aggregation fuzzy cognitive map [2,3], hierarchical fuzzy cognitive map [4,5], and quotient fuzzy cognitive map [6,7].However, they just have been studied from the perspective of construction, combination, and decomposition of the fuzzy cognitive map.Additionally, the FCM also has been studied from the relationships between concepts as follows.Literature [8] uses ordered weighted averaging (OWA)/weighted OWA (WOWA) relational operators instead of threshold function and weighted sum for the and/or relations.Literature [9] establishes a rule based on FCM by the introduction of related nodes for and/or relationship.However, the relationships between the concept nodes are with same granularity, namely, first order relationship.
On mining method, there are two main approaches to get FCM model: (1) manual methods carried out by expert(s) who have knowledge of both FCMs and the domain of application and (2) automated methods, which use learning algorithms to establish models from historical data and have been widely successfully applied to various fields [10][11][12][13] of society, engineering, medicine, environmental science, and so forth.In automated methods, Hebbian based algorithm uses an unsupervised learning mechanism, which has been applied in weight mining of an industrial process control problem [14] by single instance learning to target value.Another AHL also is an unsupervised learning algorithm, but it needs to select the activation and activated concepts and the sequence of activation by experts in advance and it only is applied in tumors grading [15] of 100 cases.
However, they all do not solve the high order evaluation and correlation mining of big data with multidimension and multigranularity relationships.Accordingly, it has been an inevitable trend in the development of FCM and data mining that the high order relationship between concepts with different granularities is researched and developed driven by big data for a representation of abstract knowledge.Therefore, we have the following researches.Section 2 describes the backgrounds of FCM.Then, Section 3 proposes NHL based high order mining of FCM.In Section 4, it is applied in scientific and technical talent forecast based on big data.Finally, we briefly conclude this paper in Section 5. (4)  is a transformation function, which includes recurring relationship on  ≥ 0 between ( + 1) and ().

FCM and Mining Algorithms Backgrounds
The transformation function may be bivalent function, trivalent function, or logistic function.The state value of   at  + 1 step can be deduced by a sigmoid function () shown as (1).The state value of a concept node at  + 1 iteration is a transformation function of states in previous time of other concepts directly associated with the node and weight matrix of FCM.Consider Weight Matrix of FCM: 2.2.FCM Mining Algorithms.FCM mining algorithms are a kind of automated machine learning to establish model from data resources.There are mainly two classes of FCM learning algorithms [14][15][16][17][18][19], Hebbian based learning and evolved based learning.The evolved based algorithms compose GS (genetic strategy), PSO (particle swarm optimization), SA (simulated annealing), and RCGA (real coded genetic algorithm).They can be used to simulate FCM model based on time sequence data resource, which does not fit for the high order FCM mining.
The former is Hebbian based algorithms, mainly including DHL (differential Hebbian learning), BDA (balanced differential algorithm), NHL (nonlinear Hebbian learning), DD-NHL (data-driven nonlinear Hebbian learning), and AHL (active Hebbian learning).The difference in these algorithms is the way of adjusting the edge weights.Hebbian based algorithms use an unsupervised learning mechanism, which adjusts the weight values by machine learning according to objective function for specific problem.
For example, in literature [14] a weight adaption method based on NHL for FCM learning is represented to solve an industrial process control problem.It proposes two termination functions  1 and  2 shown in (3). 1 is to minimize Euclidean distance of output concept DOC  and mean target value   of the output concepts. 2 is minimization of the variation of the two subsequent values of output concept.The NHL method is for single sample and multitarget.Consider The NHL method proposed in the literature does not suit multisamples and single target problem, which is needed to be solved by high order mining.Thus, a high order mining algorithm of FCM is proposed to solve the problem, which can get the high order concept value or target value and the correlation weights between different granularity concepts.

High Order NHL Mining Algorithm of FCM
In order to obtain the optimal high order output and correlation weights, NHL is extended to high order NHL of FCM for solving the problem with multidimension and multigranularity.High order concept of FCM in Figure 2 is coarse-grain node, which has a nonlinear output.Low order concepts of FCM in Figure 2 are fine-gain nodes, which are as input concepts of big data with massive samples.
The high order value of   in FCM can be seen as a nonlinear evaluation of low order values.High order NHL can be seen as a kind of accurate cluster by nonlinear weight learning.In order to better NHL clusters, the activation function  as in the following equation, the cumulative logic distribution (tanh(),  = 1, similar to the cumulative normal distribution), is selected as transformation function according to literature [20]: where  is ∑  ̸ =,∈   ()  as the independent variable of   .According to the principal of NHL, the objective function for multicases and single target is proposed as in (5).It maximizes the mathematical expectation of sum of (   ) 2 , where  indicates the th case.In addition, a constraint is necessary to stabilize the learning rule, which generates the following nonlinear Hebbian learning rule to each instance as where  is a very small positive factor, known as the learning rate parameter.Accordingly, the high order NHL is presented in Algorithm 1.

Application in Scientific and Technical
Talent Forecast Driven by Big Data The basic information of talent is from China high-level talent database; paper information is from the Chinese highlevel talent database, the Web of Knowledge, and the Journal Citation Reports; patent information is from the Chinese high-level talent database and the patent information service platform; project information is from the scientific and technical project and achievement transformation database; awards and honors information is from the Chinese highlevel talent database and network information resources.
There are 196 winners of the national science foundation for distinguished young scholars in material field from the year 1987 to the year 2009, which are selected as scientific and technical talents studied in the paper.They have 49784 highlevel published papers, 976 applied patents (mainly patent of invention), 1456 approved projects, and 685 awards in total, which can be seen as scientific research tracks of the talents.

Index System of Scientific and Technical Talent Forecast.
The basic information of talent is mainly used for the search of talents.The indexes are from the other five dimensions.The index system is determined by domain experts and shown in Table 1.

Prizes
The talent rankings The level of prize The grade of prize The field correlation * refers to the index that is not considered in the application.
The index system involves three-level index.The first level is target level.The second level is dimension level, and the third level points to data level.
Most of index data in data level need to be automatically computed based on basic big data resource of scientific and technical talent.In the index system, the 49784 high-level published papers are all indexed by SCI in the form of article.So the index of paper and the type of magazine both are ignored.The list of talent in projects and prizes is also ignored in the application because of the 1456 approved projects and the 685 awards all with first list.

Automatic Acquisition and Processing of Big Data.
The indexes, which need to be automatically computed, are involved in the number of paper cited, the talent rankings, the field correlation, the type of patent, the state of patent, the level of project, the rate of project, and the level of prize.
The number of paper cited is the sum of cited number of papers from the published time of the paper to the awardwinning time of the talent.
The talent rankings is the ranking of CnName (name of the talent) in authors of papers, inventors of patents, or members of projects, which needs to be extracted by the segmentation symbols in them.
The field correlation is the correlation of fields of papers with subject of talent; the correlation of MainIPC of patents with subject of talent; the correlation of subject of projects with subject of talent.
The type of patent originates from patent application code.In accordance with the order from left to right, first letters in patent application code of China are 2 English alphabets (CN) representing the country and then 2-4 digits indicating the year of patent application (<2004 is 2 digits and ≥2004 is 4 digits), and next digit represents the type of patent, others for the serial number.The type of patent is represented by 1 digits which is provided as follows: 1 means the application for a patent for invention patent, 2 for utility model, 3 for a patent for design, 8 for PCT invention patent into China, and 9 for PCT utility model shown into China.Thus, the computing rule of the patent type is in Figure 3.
The state of patent is applied state or public state or authorized state in the time of the award.The state value comes from the fields of Applydate, Pubdate, and Authorizedate of patents.The rate of project points to the project in applied stage, execute stage, or final stage in the time of the award according to the fields of Applydate, Startdate, and Enddate of projects.
The level of project is determined by the field of Plancode of projects.The level of prize is determined by the field of Prizetype of prizes.

High Order Mining of FCM.
The target level index and the dimension level index both belong to high order knowledge, a kind of evaluation value, which needs high order mining from its big data resources of different dimensions.In fact, it can be seen that the mining method is an integrating mining algorithm of clustering and association rules.The clustering is to get accurate value of high order knowledge depending on the weight of low level index, and the weight relationship is just a kind of association rules.
Scientific and technical talent forecast is a complex system, which needs an integrating method with expert knowledge and quantitative computing.And the high order based on NHL and FCM is a machine learning optimization driven by big data based on the initial weights given by experts, which is very fit for scientific and technical talent forecast.The procedure of the system is shown in Figure 4.
There are 196 distinguished young scholars from the year 1987 to the year 2009 with 49784 high-level published papers, 976 applied patents (mainly patent of invention), 1456 approved projects, and 685 awards, which are the basic big data resource.Based on the big data and the high order NHL algorithm of FCM, the weights of the indexes need two-level learning because of two-level granularity in the index system.Firstly, the optimal weights in data level index by high order mining of NHL driven by big data are acquired and shown in Tables 2, 3, 4, and 5.The high order evaluation in dimension level of each outstanding youth can be computed by these above weights and data resource.The comprehensive value in each dimension is the average of top five values in the dimension.
Secondly, the optimal weights in dimension level index are machine learned by high order mining of NHL based on these high order evaluations.The weights in dimension level are shown in Table 6.Accordingly, the high order evaluations of the 196 outstanding talents can be acquired by the average of top five values.And they are in an evaluation interval.
The accurate rate validated in outstanding young talents forecast from 2010 to 2013 is up to 85.9% in the evaluation interval.Accordingly, a scientific and technical talent forecast platform is implemented.And scientific and technical talents

Conclusions
Big data mining not only can implement resource sharing and reasonable allocation, but also promote fairness and justice of society.The scientific and technical talent forecast is implemented as a system, which promotes the fairness and the justice of scientific and technic talent management and evaluation to some extent.Its big data resource involves talent basic information, paper information, patent information, project information, awards and honors information, and so forth from kinds of database, platform, and network.Its method is based on NHL high order mining algorithm of FCM, which is an intelligent tool and is very suitable for getting model from big data resources.The scientific and technical talent forecast system has changed the existing passive and qualitative way into a kind of active and open talents recommendation.

Figure 1 :
Figure 1: A simple fuzzy cognitive map.

4. 1 .
Scientific and Technical Talent Big Data.Scientific and technical talent forecast is to evaluate and find scientific and technical talents directly from big data of scientific research track of talents, which is an important means to promote scientific and technical talent resources development and management and a key work to scientifically and objectively evaluate scientific and technical talents.The big data of scientific research track of talents composes multiple dimensions of scientific and technical talents including talent basic information, paper information, patent information, project information, and awards and honors information, which are from Chinese high-level talent database [21], Web of Knowledge and Journal Citation Reports [22], patent information service platform [23], scientific and technical project and achievement transformation base [24], and network information resources.

Figure 3 :Figure 4 :
Figure 3: The computing rule of patent type.
(2) 1 ,  2 , ...,   } is the set of  concepts forming the nodes of the graph.(2)={|   is the weight value of the interconnection ⟨  ,   ⟩}. is a square matrix of  ×  showing the association weight map between concepts as(2).The   belongs to the interval [−1, 1] representing the fuzzy causal degree between concepts   and   .(i)If there is positive causality between the concepts   and   ,   > 0. An increase of the value of concept   will cause an increase of the value of concept   .A decrease of the value of   will lead to a decrease of the value of   . :   →   () is a function that associates each concept   with the sequence of its activation degrees such as, for  ∈ ,   () ∈  given its activation degree at the moment .(0) ∈   indicates the initial vector and specifies initial values of all concept nodes, and () ∈   is a state vector at certain iteration .The state of FCM at  step is a state vector () = { 1 (),  2 (), . . .,   ()}.

Table 1 :
The index system of scientific and technical talent forecast.

Table 2 :
The weights of indexes in papers.

Table 3 :
The weights of indexes in patents.

Table 4 :
The weights of indexes in projects.

Table 5 :
The weights of indexes in prizes.

Table 6 :
The weights of indexes in dimension level.outstanding young winner in material field in 2014, where the top 15 talents are actively recommended to review department for decision-making of expert selection.