In order to get the high order evaluation and correlation degree among big data with the characteristics of multidimension and multigranularity, an FCM and NHL based high order mining algorithm driven by big data is proposed, which is a kind of machine learning based on qualitative knowledge. The algorithm is applied in scientific and technical talent forecast. Driven by the big data of scientific research track of scientific and technical talents, the index system is designed and the big data is automatically acquired and processed. Accordingly, the high order evaluations in dimension level and target level can be inferred by the correlation weights mining. And the outstanding young talents in material field in 2014 have been actively recommended to review department for decision-making.
Big data tend to be presented in the form of multidimension and multigranularity. They have been a kind of urgent problems that are how to get correlation degree between the concepts with different granularities and how to get high order value of abstract concept with high granularity, which are the topic studied in the paper.
FCM [
With the complexity increasing of relationship between data, a “high” FCM has been proposed for modeling complex problems. At present, the FCM mainly includes aggregation fuzzy cognitive map [
On mining method, there are two main approaches to get FCM model:
However, they all do not solve the high order evaluation and correlation mining of big data with multidimension and multigranularity relationships. Accordingly, it has been an inevitable trend in the development of FCM and data mining that the high order relationship between concepts with different granularities is researched and developed driven by big data for a representation of abstract knowledge.
Therefore, we have the following researches. Section
A fuzzy cognitive map
A simple fuzzy cognitive map.
If there is positive causality between the concepts If there is inverse causality between the concepts, If there is no relationship between two concepts,
The transformation function may be bivalent function, trivalent function, or logistic function. The state value of
FCM mining algorithms are a kind of automated machine learning to establish model from data resources. There are mainly two classes of FCM learning algorithms [
The evolved based algorithms compose GS (genetic strategy), PSO (particle swarm optimization), SA (simulated annealing), and RCGA (real coded genetic algorithm). They can be used to simulate FCM model based on time sequence data resource, which does not fit for the high order FCM mining.
The former is Hebbian based algorithms, mainly including DHL (differential Hebbian learning), BDA (balanced differential algorithm), NHL (nonlinear Hebbian learning), DD-NHL (data-driven nonlinear Hebbian learning), and AHL (active Hebbian learning). The difference in these algorithms is the way of adjusting the edge weights. Hebbian based algorithms use an unsupervised learning mechanism, which adjusts the weight values by machine learning according to objective function for specific problem.
For example, in literature [
The NHL method proposed in the literature does not suit multisamples and single target problem, which is needed to be solved by high order mining. Thus, a high order mining algorithm of FCM is proposed to solve the problem, which can get the high order concept value or target value and the correlation weights between different granularity concepts.
In order to obtain the optimal high order output and correlation weights, NHL is extended to high order NHL of FCM for solving the problem with multidimension and multigranularity. High order concept of FCM in Figure
FCM with different granularity nodes.
The high order value of
Accordingly, the high order NHL is presented in Algorithm
Scientific and technical talent forecast is to evaluate and find scientific and technical talents directly from big data of scientific research track of talents, which is an important means to promote scientific and technical talent resources development and management and a key work to scientifically and objectively evaluate scientific and technical talents.
The big data of scientific research track of talents composes multiple dimensions of scientific and technical talents including talent basic information, paper information, patent information, project information, and awards and honors information, which are from Chinese high-level talent database [
The basic information of talent is from China high-level talent database; paper information is from the Chinese high-level talent database, the Web of Knowledge, and the Journal Citation Reports; patent information is from the Chinese high-level talent database and the patent information service platform; project information is from the scientific and technical project and achievement transformation database; awards and honors information is from the Chinese high-level talent database and network information resources.
There are 196 winners of the national science foundation for distinguished young scholars in material field from the year 1987 to the year 2009, which are selected as scientific and technical talents studied in the paper. They have 49784 high-level published papers, 976 applied patents (mainly patent of invention), 1456 approved projects, and 685 awards in total, which can be seen as scientific research tracks of the talents.
The basic information of talent is mainly used for the search of talents. The indexes are from the other five dimensions. The index system is determined by domain experts and shown in Table
The index system of scientific and technical talent forecast.
Target level | Dimension level index | Data level index |
---|---|---|
Scientific and technical talent forecast | Papers | Talent rankings |
The index of paper* | ||
The number of paper cited | ||
The type of magazine* | ||
The field correlation | ||
The influence factor | ||
Patents | The talent rankings | |
The type of patent | ||
The state of patent | ||
The field correlation | ||
Projects | The talent rankings | |
The level of project | ||
The funds of project | ||
The rate of project | ||
The field correlation | ||
Prizes | The talent rankings | |
The level of prize | ||
The grade of prize | ||
The field correlation |
The index system involves three-level index. The first level is target level. The second level is dimension level, and the third level points to data level.
Most of index data in data level need to be automatically computed based on basic big data resource of scientific and technical talent. In the index system, the 49784 high-level published papers are all indexed by SCI in the form of article. So the index of paper and the type of magazine both are ignored. The list of talent in projects and prizes is also ignored in the application because of the 1456 approved projects and the 685 awards all with first list.
The indexes, which need to be automatically computed, are involved in the number of paper cited, the talent rankings, the field correlation, the type of patent, the state of patent, the level of project, the rate of project, and the level of prize.
The number of paper cited is the sum of cited number of papers from the published time of the paper to the award-winning time of the talent.
The talent rankings is the ranking of CnName (name of the talent) in authors of papers, inventors of patents, or members of projects, which needs to be extracted by the segmentation symbols in them.
The field correlation is the correlation of fields of papers with subject of talent; the correlation of MainIPC of patents with subject of talent; the correlation of subject of projects with subject of talent.
The type of patent originates from patent application code. In accordance with the order from left to right, first letters in patent application code of China are 2 English alphabets (CN) representing the country and then 2–4 digits indicating the year of patent application (<2004 is 2 digits and
The computing rule of patent type.
The state of patent is applied state or public state or authorized state in the time of the award. The state value comes from the fields of Applydate, Pubdate, and Authorizedate of patents. The rate of project points to the project in applied stage, execute stage, or final stage in the time of the award according to the fields of Applydate, Startdate, and Enddate of projects.
The level of project is determined by the field of Plancode of projects. The level of prize is determined by the field of Prizetype of prizes.
The target level index and the dimension level index both belong to high order knowledge, a kind of evaluation value, which needs high order mining from its big data resources of different dimensions. In fact, it can be seen that the mining method is an integrating mining algorithm of clustering and association rules. The clustering is to get accurate value of high order knowledge depending on the weight of low level index, and the weight relationship is just a kind of association rules.
Scientific and technical talent forecast is a complex system, which needs an integrating method with expert knowledge and quantitative computing. And the high order based on NHL and FCM is a machine learning optimization driven by big data based on the initial weights given by experts, which is very fit for scientific and technical talent forecast. The procedure of the system is shown in Figure
The procedure of scientific and technical talents high order mining system.
There are 196 distinguished young scholars from the year 1987 to the year 2009 with 49784 high-level published papers, 976 applied patents (mainly patent of invention), 1456 approved projects, and 685 awards, which are the basic big data resource. Based on the big data and the high order NHL algorithm of FCM, the weights of the indexes need two-level learning because of two-level granularity in the index system.
Firstly, the optimal weights in data level index by high order mining of NHL driven by big data are acquired and shown in Tables
The weights of indexes in papers.
Dimension level index | Data level index | Weight of data level index |
---|---|---|
Papers | Talent rankings | 0.215 |
The number of paper cited | 0.560 | |
The field correlation | 0.012 | |
The influence factor | 0.213 |
The weights of indexes in patents.
Dimension level index | Data level index | Weight of data level index |
---|---|---|
Patents | Talent rankings | 0.312 |
The type of patent | 0.040 | |
The state of patent | 0.351 | |
The field correlation | 0.297 |
The weights of indexes in projects.
Dimension level index | Data level index | Weight of data level index |
---|---|---|
Projects | The level of project | 0.529 |
The funds of project | 0.316 | |
The rate of project | 0.102 | |
The field correlation | 0.053 |
The weights of indexes in prizes.
Dimension level index | Data level index | Weight of data level index |
---|---|---|
Prizes | The level of prize | 0.380 |
The grade of prize | 0.497 | |
The field correlation | 0.123 |
Secondly, the optimal weights in dimension level index are machine learned by high order mining of NHL based on these high order evaluations. The weights in dimension level are shown in Table
The weights of indexes in dimension level.
Target level | Dimension level index | Weight of dimension level index |
---|---|---|
Scientific and technical talent | Papers | 0.549 |
Patents | 0.216 | |
Projects | 0.177 | |
Awards | 0.058 |
The accurate rate validated in outstanding young talents forecast from 2010 to 2013 is up to 85.9% in the evaluation interval. Accordingly, a scientific and technical talent forecast platform is implemented. And scientific and technical talents are forecasted for outstanding young winner in material field in 2014, where the top 15 talents are actively recommended to review department for decision-making of expert selection.
Big data mining not only can implement resource sharing and reasonable allocation, but also promote fairness and justice of society. The scientific and technical talent forecast is implemented as a system, which promotes the fairness and the justice of scientific and technic talent management and evaluation to some extent. Its big data resource involves talent basic information, paper information, patent information, project information, awards and honors information, and so forth from kinds of database, platform, and network. Its method is based on NHL high order mining algorithm of FCM, which is an intelligent tool and is very suitable for getting model from big data resources. The scientific and technical talent forecast system has changed the existing passive and qualitative way into a kind of active and open talents recommendation.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research was supported by the Postdoctoral Science Foundation of China (no. 2014M550793), the Provincial Natural Science Foundation (no. F2014508028 and no. 2012-Z-932Q), and the National Natural Science Foundation (no. 61175048).