^{1}

^{2}

^{3}

^{4}

^{1}

^{2}

^{5}

^{1}

^{2}

^{3}

^{4}

^{5}

Graph convolutional network (GCN) is an efficient network for learning graph representations. However, it costs expensive to learn the high-order interaction relationships of the node neighbor. In this paper, we propose a novel graph convolutional model to learn and fuse multihop neighbor information relationships. We adopt the weight-sharing mechanism to design different order graph convolutions for avoiding the potential concerns of overfitting. Moreover, we design a new multihop neighbor information fusion (MIF) operator which mixes different neighbor features from 1-hop to

Text classification problem is a fundamental problem in many natural language processing (NLP) applications, such as text mining, spam detection, summarization, and question-answering system [

Text could be constructed on a typical graph-structured network, and graph networks have natural advantages for processing such data. Scarselli et al. [

When the messages pass through the graph of the text network, the node’s output is affected by not only the directly connected nodes but also the

To address the above issue, we propose a multihop neighbor information fusion graph convolutional network for text classification based on the GCN. In our model, we propose a novel negative minimum value fusion operator to fuse multihop neighbor information (MIF). To reduce the computational complexity, we share the trainable weight [

The contributions of this paper are as follows. First, we propose a novel negative minimum value fusion operator, which fuses the feature information of multihop neighbors. Second, we propose high-efficiency graph convolutional network-based MIF to successfully capture

The remainder of this paper is organized as follows. In Section

In this section, we will describe the related work about the graph convolutional network text classification, and we also introduce the related work about the multihop neighbor information of graph convolutional networks.

Gori et al. [

Graph convolutional networks (GCNs) were developed from convolutional neural networks [

Recent works have been proposed for capturing

Recently, deep learning models are introduced into text classification, which achieve far better performance than traditional models.

With the development of deep learning techniques, increasingly deep learning models are applied for text classification. Kim [

With the development of graph network models, many researchers developed more GCN-based classification methods [

In this section, we review the definition of related graph notations and analyse the layer-wise propagation model of the GCN in detail. Then, we develop a novel information propagation method to capture and fuse the multihop neighbor information. We propose two novel frameworks to capture the rich information of the text network. Finally, we analyse the computational complexity and parameter quantities of our models.

We assume graph signal

The popular convolutional propagation model [

It is assumed that matrix

The multihop neighbor information fusion operator is a topological preserved operator.

If

The MIF operator is an information aggregation layer that is used to mix

Calculating the minimum value of these features from different order graph convolutions:

We give a living example to show that the MIF operator works. It is assumed that

The output of the MIF operator is defined as negative for each element of

Following DIFFPOOL [

To address the limitations of the Text GCN, we propose a propagation model of multihop neighbor information fusion graph convolution as follows:

In formula (

input:

output:

for

end

In Figure

The proposed model.

The second layer is the traditional graph convolution. We set the nonlinear activation function

In the preliminary network design, we compare how many convolutional layers and hops fit to our model. The two-convolutional-layer network shows better performance than the three and more convolutional layer network. When we implement the multihop neighbor information fusion, we observe that

When

When

The cross-entropy loss is utilized as our model loss function:

Because the actual running time is sensitive to hardware and implementations, we follow He and Sun [

Since different hop graph convolutions share the same weight in the same layer, the parameter quantities are consistent with the 1-hop graph convolution. It is assumed that ^{st} layer, and ^{nd} layer. Then, output dimension in the 1^{st} layer as the same, namely,

We will evaluate our NMGC-2 and NMGC-3 on text networks and compare our methods with the classic method and deep learning methods, such as the embedding model, CNN-based, LSTM-based, and GCN-based. We analyse the terms of computational complexity and trainable parameters in detail. We investigate the impact of network framework parameters and training epochs on classification accuracy.

We test our methods on five benchmark corpora datasets including R52 and R8 of Reuters-21578, 20-Newsgroups (20NG), Ohsumed, and Movie Review (MR). According to the preprocessing steps by Yao et al. [

Dataset statistics.

Dataset | #training | #test | #docs | #classes | #words | #nodes | #length |
---|---|---|---|---|---|---|---|

R52 | 6532 | 2568 | 9100 | 52 | 8892 | 17,992 | 69.82 |

R8 | 5485 | 2189 | 7674 | 8 | 7688 | 15,362 | 65.72 |

20NG | 11,314 | 7532 | 18,846 | 20 | 42,757 | 61,603 | 221.26 |

Ohsumed | 3357 | 4043 | 7400 | 23 | 14,157 | 21,557 | 135.82 |

MR | 7108 | 3554 | 10,662 | 2 | 18,764 | 29,426 | 20.39 |

We compare with the following baseline methods as in Yao et al. [

We tune a series of hyperparameters’ (learning rate, dropout rate, hidden units, and epochs) values to determine the best hyperparameters of our model on text networks. The hyperparameters are reported in Tables

The hyperparameters in NMGC-2.

Dataset | Learning rate | Dropout rate | Hidden units | Epochs |
---|---|---|---|---|

R52 | 0.005 | 0.4 | 128 | 800 |

R8 | 0.01 | 0.4 | 64 | 200 |

20NG | 0.005 | 0.5 | 128 | 330 |

Ohsumed | 0.005 | 0.4 | 128 | 410 |

MR | 0.01 | 0.4 | 64 | 40 |

The hyperparameters in NMGC-3.

Dataset | Learning rate | Dropout rate | Hidden units | Epochs |
---|---|---|---|---|

R52 | 0.005 | 0.5 | 128 | 1300 |

R8 | 0.015 | 0.4 | 128 | 180 |

20NG | 0.01 | 0.5 | 128 | 265 |

Ohsumed | 0.01 | 0.6 | 128 | 210 |

MR | 0.01 | 0.4 | 32 | 85 |

We compare our NMGC-2 and NMGC-3 with other baseline methods in terms of test accuracy. As shown in Table

Test accuracy on five text datasets. The benchmark results were reported by Yao et al. [

Method | R52 | R8 | 20NG | Ohsumed | MR |
---|---|---|---|---|---|

CNN-rand [ | 85.37 | 94.02 | 76.93 | 43.87 | 74.98 |

CNN-pretrain [ | 87.59 | 95.71 | 82.15 | 58.44 | |

PTE [ | 90.71 | 96.69 | 76.74 | 53.58 | 70.23 |

LSTM [ | 85.54 | 93.68 | 65.71 | 41.13 | 75.06 |

LSTM-pretrain [ | 90.48 | 96.09 | 75.43 | 51.10 | 77.33 |

fastText [ | 92.81 | 96.13 | 79.38 | 57.70 | 75.14 |

fastText-bigrams [ | 90.99 | 94.74 | 79.67 | 55.69 | 76.24 |

LEAM [ | 91.84 | 93.31 | 81.91 | 58.58 | 76.95 |

SWEM [ | 92.94 | 95.32 | 85.16 | 63.12 | 76.65 |

GCNN-S [ | 92.74 | 96.80 | — | 62.82 | 76.99 |

GCNN-F [ | 93.20 | 96.89 | — | 63.04 | 76.74 |

GCNN-C [ | 92.75 | 96.99 | 81.42 | 63.86 | 77.22 |

Text GCN [ | 93.56 ± 0.18 | 97.07 ± 0.10 | 86.34 ± 0.09 | 68.36 ± 0.56 | 76.74 ± 0.20 |

NMGC-2 (ours) | 94.35 ± 0.06 | 97.31 ± 0.09 | 86.61 ± 0.06 | 69.21 ± 0.17 | 76.21 ± 0.25 |

NMGC-3 (ours) | 93.83 ± 0.16 | 97.16 ± 0.10 | 86.68 ± 0.18 | 68.20 ± 0.35 | 76.36 ± 0.40 |

The success of NMGC-2 and NMGC-3 is mainly due to the following three aspects. (1) Our NMGC-2 and NMGC-3 models have the capability to capture the relations in terms of word-word and document-word in the datasets. (2) Our NMGC-2 and NMGC-3 make full use of the advantages of the GCN. We implement the feature information aggregation on the node and its 1-hop neighbor information (each layer) so that the node (word-word and document-word) features in the same cluster are similar, which are easy to classify. (3) Our NMGC-2 and NMGC-3 capture more and richer feature information from 1-hop to

On dataset MR, CNN-pretrain [

Compared to Text GCN [

To evaluate the relationship between hidden units and the model performance, we use different hidden units to conduct experiments. We choose a representative set of hidden units as our comparative experiments in balancing computational complexity and classification performance. The results are summarized in Table

Comparison of hidden units.

Dataset | Model | Hidden units | Epochs | Test Acc. |
---|---|---|---|---|

R52 | NMGC-2 (ours) | 128 | 800 | 94.35 ± 0.06 |

64 | 1200 | 94.23 ± 0.20 | ||

32 | 2000 | 94.16 ± 0.14 | ||

NMGC-3 (ours) | 128 | 1300 | 93.83 ± 0.16 | |

64 | 1100 | 93.29 ± 0.26 | ||

32 | 2000 | 92.85 ± 0.37 | ||

R8 | NMGC-2 (ours) | 128 | 150 | 97.25 ± 0.13 |

64 | 200 | 97.31 ± 0.09 | ||

32 | 330 | 97.30 ± 0.13 | ||

NMGC-3 (ours) | 128 | 180 | 97.16 ± 0.10 | |

64 | 220 | 96.92 ± 0.08 | ||

32 | 300 | 96.69 ± 0.06 | ||

20NG | NMGC-2 (ours) | 128 | 330 | 86.61 ± 0.06 |

64 | 480 | 86.55 ± 0.12 | ||

32 | 600 | 86.02 ± 0.12 | ||

NMGC-3 (ours) | 128 | 265 | 86.68 ± 0.18 | |

64 | 420 | 86.14 ± 0.19 | ||

32 | 540 | 85.64 ± 0.13 | ||

Ohsumed | NMGC-2 (ours) | 128 | 410 | 69.21 ± 0.17 |

64 | 495 | 69.07 ± 0.28 | ||

32 | 795 | 68.46 ± 0.24 | ||

NMGC-3 (ours) | 128 | 210 | 68.20 ± 0.35 | |

64 | 305 | 67.50 ± 0.24 | ||

32 | 415 | 66.25 ± 0.35 | ||

MR | NMGC-2 (ours) | 128 | 35 | 76.19 ± 0.43 |

64 | 40 | 76.21 ± 0.25 | ||

32 | 55 | 76.16 ± 0.26 | ||

NMGC-3 (ours) | 128 | 50 | 76.27 ± 0.32 | |

64 | 70 | 76.24 ± 0.11 | ||

32 | 85 | 76.36 ± 0.40 |

We design the weight-sharing mechanism to share the weight in the proposed convolutional layer, which reduces the number of parameters. When using weight sharing, our calculations are very efficient. This naturally reduces the computational complexity. We design the weight-sharing mechanism to avoid overfitting caused by many parameters.

As shown in Table

Comparison of computational complexity and the number of trainable weight parameters. Comp. and Params. denote the computational complexity and parameters, respectively. Constant numbers 1, 2, and 3 represent the hops of the graph convolutional network, and constant numbers 200, 64, 128, and 32 denote the number of hidden units.

Method | Comp. | Params. |
---|---|---|

Text GCN [ | O( | |

NMGC-2 (ours) | O( | O( |

NMGC-3 (ours) | O( | O( |

In this work, we propose a new multihop neighbor information fusion graph convolutional network on graph-structured data. We develop a novel MIF operator to combine the graph convolution features of multihop neighbor information from 1-hop graph convolution to

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

This work was funded by the National Natural Science Foundation of China (U1701266, 61702117, and 61672008), the Guangdong Provincial Key Laboratory of Intellectual Property and Big Data (2018B030322016), the Special Projects for Key Fields in Higher Education of Guangdong (2020ZDZX3077), and in part by Qingyuan Science and Technology Plan Project (Grant nos. 170809111721249 and 170802171710591).