Name Disambiguation Based on Graph Convolutional Network

Recently, massive online academic resources have provided convenience for scientific study and research. However, the author name ambiguity degrades the user experience in retrieving the literature bases. Extracting the features of papers and calculating the similarity for clustering constitute themainstreamof present name disambiguation approaches, which can be divided into two branches: clustering based on attribute features and clustering based on linkage information.)ey cannot however get high performance. In order to improve the efficiency of literature retrieval and provide technical support for the accurate construction of literature bases, a name disambiguation method based on Graph Convolutional Network (GCN) is proposed.)e disambiguation model based on GCN designed in this paper combines both attribute features and linkage information. We first build paper-to-paper graphs, coauthor graphs, and paper-to-author graphs for each reference item of a name.)e nodes in the graphs contain attribute features and the edges contain linkage features.)e graphs are then fed to a specialized GCN and output a hybrid representation. Finally, we use the hierarchical clustering algorithm to divide the papers into disjoint clusters. Finally, we cluster the papers using a hierarchical algorithm.)e experimental results show that the proposedmodel achieves average F1 value of 77.10% on three name disambiguation datasets. In order to let themodel automatically select the appropriate number of convolution layers and adapt to the structure of different local graphs, we improve upon the prior GCN model by utilizing attention mechanism. Compared with the original GCN model, it increases the average precision and F1 value by 2.05% and 0.63%, respectively.What is more, we build a bilingual dataset, BAT, which contains various forms of academic achievements and will be an alternative in future research of name disambiguation.


Introduction
e development of the Internet allows users to conveniently and quickly obtain information from digital learning platforms. Most academic research activities use the Internet as a source to search and download academic resources in various databases. Under such circumstances, how to quickly and accurately filter the required content from the massive data becomes key to improving the user experience. In the literature database system, using person names as keywords is a common search method. Researchers who have been working in a certain field for years often search for relevant studies and reviews in this field. However, the ambiguity of the name itself not only affects the query of information on the Internet, but also affects the inquiry of literature in academic research. When a name is searched in a literature database, it will return a mixed presentation of all documents shared by this author name because most of the original databases use simple string-matching method. is may cause users to waste a lot of time browsing irrelevant content or to increase the input keywords in order to search for the documents they are satisfied with.
In real world, the ambiguity of names is manifested in two aspects. On the one hand, various types of name reference items exist, such as pseudonym and aliases. Besides, the forms of name are not fixed, such as full names and partial abbreviations. On the other hand, the same name reference item can refer to multiple entities in real world. We focus on the latter problem, which is also called author name disambiguation. With the rapid growth of academic resources, the problem of authors with the same name not only affects the efficiency of academic research and brings inconvenience or even misleading to researchers, but also affects the construction and use of academic resource libraries. erefore, author name disambiguation attracts more attention of the research community in recent years.
Author name disambiguation is beneficial to the accurate retrieval in the retrieval system. When the user enters the author name to be queried, we can first give the user a series of entity interfaces that share this name reference item. Each interface corresponds to an author entity in the real world. By discriminating the attributes of each entity, the user accesses the relevant literature of the entity he/she wants to query through the interface, which reduces the user's workload and improve the user's experience.
Author name disambiguation is helpful to improve the accuracy and completeness of the author character information. e name disambiguation technology can not only classify the documents with shared names, but also integrate the scattered information of each author entity, so that the characteristic information of the author entity can be continuously improved. is is one of the important steps in the construction of the author's personal home page.
Author name disambiguation is an important part of the construction of literature database. Both universities and scientific research institutions need to count and file the collected papers. One of the important tasks is to file according to individual authors and accurately construct their own literature database in order to evaluate the scientific research achievements and level of the unit. For example, DBLP (https://dblp. uni-trier.de/) is an English literature database system in the computer field. It collects the published scientific research achievements in international journals and conferences with the authors as the core and reflects the frontier direction of foreign academic research.
For the dataset we used, there is more information about the attributes of the paper, but little description information about the author. It is impossible to construct the list of target entities for linking disambiguation. erefore, we use the disambiguation method based on clustering. e existing methods can be divided into three categories: clustering based on attribute features, clustering based on linkage structure, and hybrid method. Attribute features describe the characteristics of the object itself, such as the title, keywords, and organization. Methods based on attribute features usually focus on measuring the similarity between papers. Structural features describe the relationship between objects, such as whether they participate in writing and whether they have a coauthor relationship, while the linkage-based method pays more attention to the structural information of the graph constructed by the paper and the author. Considering the different emphasis of the two features, we think of an idea that the dataset can be abstracted into a graph, in which the attribute features can be regarded as the characteristics of the nodes in the graph, and the structural features can be quantified as the weights of the edges. In this way, all the information we obtained can be reflected in the form of graph data and can be processed by graph neural networks.
In order to effectively integrate the two levels of feature information, that is, the features of nodes and edges in the graph, we naturally think of the GCN. GCN is an excellent graph data processing model, which can continuously aggregate the node information to form a new node representation by using the edge weight. erefore, we combine the attribute features with the structure information of the constructed graph, and the embedding representation we have learned has stronger distinguishing power. Finally, we use the clustering algorithm to complete the disambiguation task.
e main contributions of this paper include the following aspects: We use hybrid features to disambiguate papers based on clustering. More specifically, the attribute features include the title, keywords, venue, name of collaborators, and organizations; the structural information includes the coauthor relationship between authors and the writing relationship between authors and papers. We construct three association graphs for each candidate set, in which nodes contain attribute features and edges contain structural information. e two levels of features are effectively integrated together using GCN, so that the embeddings of papers have strong distinguishing power.
Besides, we use the GCN based on attention mechanism (AGCN). e attention mechanism gives more weight to the areas related to the current task. AGCN can adaptively select different number of convolutional layers according to the structure of different association graphs and properly train and fit the graph data for all name reference items, so as to improve the performance of the disambiguation model.
We also build a dataset, BAT. is dataset collects different forms of achievements including papers, patents, and projects, and the diversification of data forms expands the scope of application of the disambiguation system. In addition, the achievements can be displayed in English or Chinese, and the disambiguation algorithm supports multilanguage processing, which is closer to the actual scenario. BAT dataset provides an experimental platform for exploring and studying various disambiguation models in the future.

Author Name Disambiguation.
Most disambiguation methods [1,2] use the information of title, abstract, keywords, and coauthor relationship to extract features for disambiguation. e research on the author name disambiguation is divided into two categories. One is the simple and efficient disambiguation method based on the name [3]. It only uses the author's last name and initials to mine the information contained in the name itself, which is simple and accurate to implement. However, it is more suitable for western countries with the habit of using middle names. e other is mainly solved by advanced methods, including traditional machine learning methods, probability-based methods, and graph-based methods. We now present the details. 2 Scientific Programming Due to the rich variety of machine learning models, we have so many choices to solve the problem of name disambiguation. e supervised methods use labeled sample to train the model and predict whether two papers belong to the same author. Han et al. [4] proposed a Naive Bayes model and a Support Vector Machine model, but the experimental results are quite unsatisfactory when the information is incomplete. e method proposed by Zhang [5] is to construct a Bayesian nonexhaustive classification framework. e data model is a priori. When a new fuzzy entity appears, the Dirichlet module is required to handle it. Besides, the online name disambiguation task is completed by the Gibbs sampler. In practice, manually labeling a large set of data is expensive, which limits the application of supervised methods. e unsupervised methods extract the feature vectors from the data to calculate the similarity between two records, and disambiguate by the clustering algorithm. In 1998, Bagga and Baldwin [6] proposed a method that uses vector space model to deal with cross-text coreferential disambiguation. However, this method cannot deal with the problem of text field vacancy, so the performance is limited. Mann and Yarowsky [7] construct the feature space by extracting the basic attribute of the author, which achieves a higher accuracy of disambiguation, but the sparsity of the character feature can easily lead to a lower recall rate. Chen and Martin [8] use SoftTFIDF weighted semantic and syntactic features for clustering, which significantly improves the accuracy of disambiguation. Han et al. [9] use Kways spectral clustering to solve the disambiguation problem. Compared with the supervised methods, the unsupervised methods do not need to label data, but the limitation is that the number of clusters is usually unpredictable.
When there is only a small amount of labeled data, semisupervised methods become a better choice. e distancebased semisupervised clustering method uses the similarity distance index to train the model, and the constraint-based semisupervised clustering method combines unsupervised methods with user feedback or guidance constraints to guide clustering [10]. Zhang et al. [11] combined the two previous approaches and constructed a probability model to deal with it.
is method considers a variety of constraints; the EM algorithm is used to calculate the distance between two papers and allows users to improve the disambiguation results. e experimental results show that this method is better than the one based on hierarchical clustering. e problem of author name disambiguation can also be solved by probability model. Tang et al. [12] use Markov random field to combine text attributes and linkage features for disambiguation. Song et al. [13] divided the name disambiguation task into two processes. ey first extended the Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA), two hierarchical Bayesian text models, and established the topic distribution model. en, they extracted the features of the distribution and used clustering algorithm to complete the disambiguation task.
In recent years, graph-based disambiguation methods have been applied. ey use attribute features extracted from the paper or the external knowledge base to construct the graph [14], and then cluster the nodes in the graph. ey also use the linkage information in the graph to mine close relationship between papers and collaborators [15,16]. Fan et al. [17] propose a graph-based GHOST framework. ey first construct the relationship graph between the paper and the author, confirm the effective path between the two points, and eliminate the invalid one. en, they calculate the similarity matrix and use the affine propagation clustering algorithm to get paper clusters. Finally, they incorporate the users' feedback to improve the results. Network representation learning algorithm Diting [18] consists of three components: network construction, embedding representation, and clustering. Other works use the topological structure of the graph. Franzon et al. [19] proposed a two-step traversal disambiguation method, which uses topological similarity to evaluate candidate nodes. In addition, Peng et al. [20] constructed a heterogeneous network with different types of edges by extracting author name, venue, title, abstract, and other information and proposed to use generating antagonistic network (GAN) to get the embedding representation of the network, so as to cluster the papers.

Graph Convolution Network. Original Convolution
Neural Network (CNN) can only deal with Euclidean structure data with regular spatial structure, but it struggles to deal with the graphs generated from recommendation system, social network, and molecular structure, in which each node has its own characteristic information and structure information. It is necessary to use Graph Convolution Network (GCN) to automatically learn and extract the spatial features of topological graph. e function of GCN is to aggregate the information of nodes into an edge and output a new node representation.
For the graph G � (N, E), input the feature matrix X of all nodes and the adjacency matrix A, and output the embedding representation Z of the node. e computation procedure of GCN is shown in Figure 1 and can be expressed as the following equation: where H (l) is the output of nodes in the lth layer and is the symmetrically normalized adjacency matrix of graph G, D is the degree matrix of G, and W (l) is the weight parameters of the lth layer. e state of each node depends on other nodes in the graph. e closer the nodes are, the greater the influence is. Ideally, stacking N convolution layers can at most propagate N-hop node information, and the node in the center of the graph can completely aggregate the information of the whole graph.

Disambiguation Model Based on Graph Convolution Network
In this section, we first define the problem of name disambiguation. en, we present the proposed name disambiguation model based on GCN. Specifically, the proposed model consists of two components: global representation Scientific Programming 3 and local representation. e global representation extracts features from the attribute information of the papers and authors [14], and the local representation extracts features from the linkage information in the graph [15].

Problem Formulation.
Given an ambiguous name term, let P � P i , i � 1, 2, . . . denote all the papers related to it and be called the candidate set for this item. Each paper is represented by a varied-length set of attributes including the title, keywords, venue, name of the collaborator, and organization. e real author of the paper is indicated by A(P i ). e goal of the author name disambiguation is to find a mapping function Φ that clusters the papers in P, and each cluster corresponds to only one author entity in the real world. at is to say,

Overall Model for Name Disambiguation.
To sum up, this section proposes a disambiguation model based on GCN, which mainly includes two parts: global representation learning and local linkage learning. We first extract the attributes of papers, embed them into a unified vector space, and fine-tune them by triplet loss function. en, we construct three association graphs in the candidate set, and the edges contain linkage information. In order to integrate the two parts effectively, we build two GCNs and use three kinds of loss functions for iterative training. e final output of Paper-GCN is the hybrid feature representation of papers. Finally, we use the HAC for clustering. e framework of the entire disambiguation model is shown in Figure 2.

Global Representation Learning.
In order to effectively compute the similarity between different papers, it is necessary to convert the papers represented by strings into a vector representation. Here, we use the embedding approach to obtain the vector representation of a paper. e embedding process is divided into two steps. First, all the papers in the database are represented in a unified vector representation. en, we use the labeled data to train a supervised model which updates the vector representations of the papers. e core of this method is the modeling of the representation of context and the relationship between context and target words. We use the word embedding method to identify the information of papers; the specific steps are as follows: Step 1: Extract attribute information of the papers, including the title, keywords, venue, name of collaborators, and organization; carry out word segmentation and cleaning; and then contact these attributes as a sequence, to construct the feature information P i � x j , j � 1, 2, . . . for each paper P i . P i � x j , j � 1, 2, . . . . x j is word appearing in sequence.
Step 2: Take the whole dataset as the corpus and input the feature words of all papers to the Word2Vec model [21]. It embeds the word x j into a low-dimensional vector representation x j ′ .
Step 3: Calculate the IDF value of each word x j by where |x j | indicates the frequency of occurrence of x j in the corpus and |X| indicates the size of the corpus.
Step 4: Multiply the embedding x j ′ of each word in a paper by the corresponding IDF value, and sum all the products to obtain the unified embedding P i ′ of the paper.
e distinguishing ability of the unified embedding of papers is limited, so we use the labeled data to adjust the unified representation. Considering that, in the real world, the research field and interest of the same author entity are relatively fixed, in the candidate set corresponding to the same name reference item, we call the two records belonging to the same author entity as positive sample pair. Otherwise, they are called as negative sample pair. Our goal is to train a neural network to find a new mapping relation f: p ′ ⟶ y and update the paper embeddings at the same time. Here, p ′ is the unified embedding of the paper and y is the global representation. We expect that the positive sample pairs keep closer, while the negative ones keep as far away as possible in the vector space. For any triple (p ′ , p + ′ , p − ′ ) in the dataset, the constraint condition is |y, y + | ≪ |y, y − |, where | | indicates the Euclidean distance. We define the marginbased triplet loss function: where m is a hyperparameter indicating the margin. e structure of the network is shown in Figure 3.

Construction of Association Graph.
e global representation considers the attributes of papers and authors; the close relations between papers and authors are however not taken into account. In order to make full use of these relations, we consider integrating the relation information to learn a more effective representation. In the candidate set of the same name reference item, we construct three kinds of graphs: paper-to-paper graph, coauthor graph, and paperto-author graph. e weight wof edges in the graph represents the degree of relevance between papers and collaborators.
Paper-to-paper graph: All papers in the candidate set are represented as paper nodes. If the intersection of the attribute features in two papers exceeds a threshold ε after being weighted by the IDF, an edge will be constructed between the two nodes. e reason we take this procedure is that the more attributes two papers have in common, the more likely they are to be written by the same author.
Coauthor graph: is graph shows the cooperative relationship between authors. All the collaborators involved in the candidate set are represented as author nodes. e edge indicates that there is a cooperative relationship between two authors, and the weight of the edge indicates the number of cooperation times. Paper-to-author graph: is graph shows the writing relationship within the candidate set. We take papers and authors as nodes and construct edge between the paper nodes and the author nodes.

Local Linkage Learning.
In order to effectively integrate the attribute information of papers and authors and the structure information of association graphs, we build two networks, Paper-GCN and Author-GCN. e input of the networks is the features Y of the paper and the author node. e information carried by edges of graphs continuously aggregates the information of nodes, obtaining the new embedding Z of the paper and the author node in the same vector space. By optimizing the loss function, the embedding of nodes is adjusted to ensure that closely connected objects are also adjacent to each other in the embedding space. Finally, the clustering algorithm is utilized to solve the author name disambiguation problem.
We use the linkage information of edges in the graph to integrate the attribute information of nodes. Each candidate set is a relatively independent operation space, and different candidate sets do not affect each other. Hence, we call this module local linkage learning.
In the paper-to-paper graph, the more the common features of two papers are, the more likely they are to belong to the same author entity in the real world, so they should be kept close in the embedding space. at is, for any paper  Scientific Programming node i, the positive sample node j meets the condition w ij � 1, and the negative sample node k meets the condition w ik � 0, which should ensure |Z pi , Z pj | ≪ |Z pi , Z pk |. e definition of paper-to-paper loss is where | | indicates the Euclidean distance and m is a margin between positive and negative node pairs. Similarly, in the coauthor graph, the more the times two authors cooperate, the more their research fields and interests are similar, or the more they work in similar institutions, so they should remain close in the embedding space.
at is, when e definition of author-to-authorloss is Loss aa � max 0, Z ai , Z aj − Z ai , Z ak + m .
In the paper-author graph, if there is a writing relationship between a paper node and an author node, they should remain close in the embedding space compared with those sample pairs without the relationship. For any paper node i, positive sample author node j, and negative sample author node k, when w ij � 1, w ik � 0, it should be ensured that |Z pi , Z aj | ≪ |Z pi , Z ak |. e definition of paper-to-author loss is Loss pa � max 0, Z pi , Z aj − Z pi , Z ak + m .
e paper-author graph not only provides information about writing relationships, but also serves as a bridge between the Paper-GCN and the Author-GCN. Loss pa is a function calculated by Z p and Z a , which constrains them to the same vector space and facilitates the representation and measurement of the distance between different types of nodes. In the process of optimizing Loss pa , the parameter learning processes of the two GCNs influence each other until the whole disambiguation model converges.
As mentioned above, the structure of three loss functions is the same, and the distance of the positive sample pairs in the embedding space is much smaller than that of the negative sample pairs. When generating the triples, for each anchor point i, we randomly select the positive sample node j according to the weight of the edge connected to i in the graph. e greater the weight, the higher the possibility of being selected. When choosing a negative sample node k, the smaller the weight, the higher the possibility of being selected, and triplets should satisfy w ij > w ik .
Paper-GCN and Author-GCN are two parallel networks with the same training methods. e training process is shown in Figure 4, where the blue thin lines represent the input and output of data, and the red thick lines represent the back propagation of the weight parameters. e inputs are the adjacency matrix A of the graph and the feature matrix Y of all nodes. An adjacency matrix is a symmetric matrix obtained from the undirected weighted graph. Y paper is the global representation of papers. Attribute information about the author in the dataset is rarely described, at most having affiliated organization information.
us, we use one hot coding to represent the attribute information Y author of author nodes. e outputs are the embeddings Z paper and Z author of graph nodes. For each iteration, we do the following: (1) We obtain embedding Z p and Z a from the Paper-GCN and the Author-GCN, respectively. (2) We sample triples from the paper-paper graph, minimize Loss pp , and update the weights of the Paper-GCN. (3) We sample triples from the coauthor graph, minimize Loss aa , and update the weights of the Author-GCN. (4) We sample triples from the paper-author graph, minimize Loss pa , and update the weights of both networks at the same time.
e GCN plays a key role in the whole mode. It accepts the feature matrix of nodes and the adjacency matrix of the graph as input, corresponding to attribute information and structure information, respectively. As shown in (1), GCN uses the information of edges to aggregate the information of nodes. To be more specific, under the guidance of the linkage information, the attribute information carried by nodes will be transmitted to other nodes from near and far. At the same time, it also receives the information from other nodes and constantly aggregates it with its own information. Each node in the graph is affected by the surrounding nodes and changes its own state. e closer the relationship is, the greater the influence is. e node embedding of the final output of GCN aggregates the features of the nodes and edges and the topological information of the whole graph, so it has stronger distinguishing power.
After training the Paper-GCN, the embedding of the paper nodes is a hybrid feature representation that combines attribute information and linkage information. We use the hierarchical agglomerative clustering (HAC) algorithm to cluster the papers in the candidate set.

Improved Model Based on Attention Mechanism.
Using GCN for name disambiguation, we set the fixed number of convolutional layers to 2. In practice, this setting has some limitations.
First, the optimal number of convolutional layers is usually difficult to determine. If the number of layers is too small, they cannot fully learn the spatial features of the graphs. If there are too many layers, the N-hop neighbor nodes starting from a certain node may form a loop. When aggregating node information, it makes it more difficult to distinguish between distant nodes and nearby nodes. eoretically, when the number of layers reaches a certain level, the state of the entire network presents a stable fixed point and achieves equilibrium. In practice, the optimal number of layers often needs many tests before it can be adopted. Second, a fixed number of convolutional layers is not applicable to all graph data. By analyzing the data distribution in the datasets, we find that the nodes and edges in the graph constructed under different name references are in different orders of magnitude. If the topological information in the graph is extracted by the fixed number of convolutional layers, it is likely to result in overfitting for sparse graph data and underfitting for dense graph data. All these will affect the performance of the disambiguation task.
A series of articles show that 3 convolutional layers in GCN can accomplish most of the tasks. We set up 3 layers in GCN and use the embedding of nodes output by all convolutional layers H � h 1 , h 2 , h 3 to assign an attention coefficient to the output h i of each layer: where W is the parameter that needs to be learned with the training of GCN. e output of the whole network is the weighted summation of the output by each convolutional layer, so that the network can adaptively choose the best number of layers at a certain time.
Z � a T · H.
We call the disambiguation model based on attention mechanism as the AGCN model, and the specific structure is shown in Figure 5.

Dataset.
In order to effectively evaluate the method proposed in this paper, we select two public datasets based on AMiner (https://www.aminer.cn/) system. e AMiner-18 dataset comes from data (https://github.com/ neozhangthe1/disambiguation/) published in the disambiguation paper [14] in 2018. e dataset contains a total of 600 name items, including 39781 real authors and 203078 papers. AMiner-12 dataset (https://www.aminer.cn/ disambiguation) contains 109 name items, involving 7447 papers from 1546 real author entities. e format of the AMiner-18 data record is shown in Table 1, and the AMiner-12 data record is the same, except that the fields "org," "keywords," and "abstract" are empty.
We construct a bilingual disambiguation dataset, which is provided by the China Association for Science and Technology (https://www.actkg.com/). It brings together a wide range of data sources, covering a variety of research fields, including tens of millions of scientific and technological talents. e disambiguation dataset contains 2905 naming items, with total of 47273 real author entities, and provides 7 data subsets, including talent information set, paper information set, author-paper relation information set, patent information set, author-patent relation information set, project information set, and author-project relation information set.
is dataset has the following characteristics: Each dataset contains detailed attributes. However, due to different data sources, some attributes of some papers are incomplete. e disambiguation algorithm should consider the lack of data attributes.
In addition to common papers, we also collect the relevant achievement information of patents and projects, which enriches the diversity of data and expands the scope of application of the disambiguation system.  Scientific Programming is dataset supports cross-language disambiguation algorithm.
e achievements can be written in different languages such as English and Chinese, which is more in line with the actual scenarios.
In order to unify the format with other datasets, this paper only shows the disambiguation results of the paper datasets. e Chinese dataset BAT-CN contains 94 name items, involving 4128 papers of 307 real authors, while the English dataset BAT-EN contains 15 name entries, involving 1288 papers of 35 real authors. e format of BAT paper dataset is the same as AMiner-18.

Implementation and Parameter Settings.
In the global representation learning, the title of the paper is regarded as a necessary attribute. e data missing in this field is regarded as invalid data and discarded. In addition, other attributes are not extracted if the fields are empty. Because we use IDF weighted summation of the extracted attributes to represent the feature vector of the paper, the vacancy of unnecessary fields does not affect the calculation of the feature vector. We sampled 500 name items from AMiner-18 and built a twolayer neural network for model training to fine-tune the unified representation in section global representation learning. e remaining 100 name items involve 35129 papers of 6399 real authors, together with AMiner-12 and BAT as test sets to participate in local link learning.
BAT datasets contain different forms of achievements, and the main difference of disambiguation tasks lies in the extraction of attribute features. For patents, we extract attributes such as patent title, all designers and their organizations, patent types, agencies, and keywords. For projects, we extract project names, all staff and their organizations, keywords, project sources, and project types. Due to the difference in writing formats between Chinese and English, the BAT-CN carries out word segmentation using the Jieba before extracting the attributes.
When building the paper-paper graph, considering the differences in the capacity of three datasets, we set the thresholds ε for IDF to 32 (AMiner-18), 20 (AMiner-12), and 10 (BAT), respectively. We set up two convolutional layers in the GCNs with 128 and 64 neurons, respectively. GCN is trained with Adagrad optimizer, and the learning rate is set at 0.01. In order to get credible results, we train the network multiple times to calculate the average performance metrics to avoid accidental errors.

Baseline Model.
In order to verify the effectiveness of the name disambiguation method based on GCN, we select the common basic disambiguation methods as a comparison.
A simple clustering method (HAC). It only considers the similarity between papers. Extract the title, keywords, venue, and other attribute; use IDF weighted Word2Vec to get the embedding of each paper; and directly carry on the hierarchical clustering for papers. Rule-based method. is method judges whether two papers belong to the same author entity according to artificially defined rules. For two papers involved in the same name reference item, if the number of common collaborators exceeds the threshold, an edge is constructed between the two article nodes. In the paperpaper graph, the paper nodes in the same connected component belong to the same author entity. Graph autoencoder [14]. is method constructs a local linkage paper-paper graph for each candidate set. e common attributes of two papers are weighted by IDF to get their similarity. If the similarity exceeds the threshold, construct an edge between two nodes. e unsupervised self-encoder is used to learn the local linkage. We encode the global metric matrix and the
e intermediate coding vectors are the embedding that integrates the global metric and local linkage information. Finally, hierarchical clustering is used to disambiguate the intermediate coding.

Results and Discussion.
We use pairwise precision, recall, and F1 as the evaluation metrics of the model performance, and the formula is defined as follows: We make statistics of sample pairs in the candidate clusters and calculate the value of TP, TP, and FN. e definition is as follows: True Positive (TP). e sample pairs that actually belong to the same class and are predicted to be of the same class. False Positive (TP). e sample pairs that actually belong to different classes but are predicted to be of the same class. False Negative (FN). e sample pairs that actually belong to the same class but are predicted to be of different classes. Table 2 shows the overall performance of different disambiguation methods on each dataset. As can be seen from the table, the precision of the rule-based disambiguation method approaches 100%, but the recall rate is less than 50%. It indicates that artificially defined rules cannot find the implied information between papers. e performance of simple clustering disambiguation method (HAC) is moderate, which shows that only using the attributes of the paper and comparing the similarity can also meet the basic disambiguation requirements. However, because it does not take into account the constraints in the real world and does not use other linkage information, the average F1 value is about 10% lower than that of the GCN method. e performance of the graph autoencoder is second only to our proposed GCN method. is method also uses both global and local features, but only uses the topological information of the paper-paper graph in the local features and does not consider whether there is participation in writing, whether there is a coauthor relationship, and other link information. e average F1 value is about 6% lower than that of the GCN method. e disambiguation model based on GCN we proposed has good disambiguation performance. e specific analysis is shown in Table 3. For the first three rows, although the name items are all about 100, the actual capacity is very different. Because of the huge amount of data in AMiner-18, it is not difficult to know that the network structure of the association graphs is complex. On average, there are 60 clusters for each item, and each cluster contains only 5 to 6 papers, so it is difficult to divide it accurately, and the F1 value is the lowest (66.77%). e capacity of AMiner-12 and BAT datasets is smaller, the network structure of their association graphs is relatively simple, and the number of clusters of each name item is 5-10, so their disambiguation performance is better than that of AMiner-18. It is worth mentioning that because the record in AMiner-12 is empty on "org," "keywords," and "abstract" attributes, while the fields of the BAT are complete, its F1 value is about 9% higher than the former.
is also shows that the more complete the attributes of the paper, the better the performance of disambiguation. e disambiguation performance of AGCN has improved GCN to some extent, and the overall accuracy has been improved by 2.05%. Although the recall rate has decreased, the F1 value has increased by 0.63%. It can be found that, on the three datasets of AMiner-18, AMiner-12, and BAT-EN, the accuracy of GCN disambiguation is relatively low compared with the corresponding recall rate. While AGCN makes a trade-off between accuracy and recall rate, thus improving the F1 value. erefore, the attention mechanism is helpful to improve the performance of the GCN-based author name disambiguation task.

Component Contribution Analysis
Only use global representation.
e global representation learns the attribute information of papers and authors. If we do not use the linkage information in the graph, such as cooperation relationship and writing relationship, and cluster the global representation of papers directly, the result is shown in Table 4. When using only the title, keywords, venue, name of coauthors, organizations, and other attributes, although the disambiguation performance index can reach more than 50%, compared with the hybrid features, there is still  Table 5.
It can be found from the table that only using the linkage information of the coauthor relationship between authors, the similar relationship between papers, and the writing relationship between authors and papers can also complete the disambiguation task. However, compared with our hybrid feature model, there is still a deficiency (the difference of F1 value is 4%-11%). Furthermore, when the capacity of the dataset is smaller and the network structure is simpler, the advantage of the linkage information in the disambiguation task is less obvious. For example, three evaluation metrics of the simplest dataset BAT-EN are about 11% lower than our hybrid feature model. is also shows that the paper's and the author's own attribute information play an important role in the author name disambiguation.

Conclusions
e rapid development of modern science and technology has provided great convenience for today's academic research activities, but the author name ambiguity in the literature database has become one of the urgent problems in the field of information retrieval. We carry out some research work on the authors of the same name in the literature database and propose a method to deal with the author name disambiguation by using GCN model, which uses the context information to improve the performance. is model can also benefit other disambiguation tasks, such as word sense disambiguation (WSD) [22], because context information is very indispensable for disambiguation task. We get some improvement, but there are still some related problems to be solved, which mainly involve the practical application of the method: (1) For the actual disambiguation task, the data is not labeled, so the number of real class clusters is unpredictable. If we use clustering algorithm based on partition, how to estimate the real cluster number is a problem, if we use clustering algorithm based on density, although not need to specify the number of clusters, we also need to estimate the other parameters according to the distribution of data, and these will be our next research content. (2) Academic papers on the Internet have been growing rapidly; in particular, for some hot fields in recent years, the publication of articles tends to increase exponentially. Faced with such a large dataset, we should not only ensure the accuracy of disambiguation algorithm, but also explore more efficient methods. (3) We live in a digital world with a strong academic research atmosphere. Excellent articles will constantly emerge and the literature database will be dynamically updated. How to set the update strategy, solve the task of disambiguation with the same name in real time, and keep the consistency of the database are also the problems we should consider and solve.

Data Availability
1. Previously reported AMiner-18 data were used to support this study and are available at https://github.com/ neozhangthe1/disambiguation. ese prior datasets are cited at relevant places within the text. 2. Previously reported AMiner-12 data were used to support this study and are available at https://www.aminer.cn/disambiguation. ese prior datasets are cited at relevant places within the text. 3.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.