^{1}

^{2}

^{3}

^{2}

^{3}

^{3}

^{1}

^{2}

^{3}

With the extensive usage of social media platforms, spam information, especially rumors, has become a serious problem of social network platforms. The rumors make it difficult for people to get credible information from Internet and cause social panic. Existing detection methods always rely on a large amount of training data. However, the number of the identified rumors is always insufficient for developing a stable detection model. To handle this problem, we proposed a deep transfer model to achieve accurate rumor detection in social media platforms. In detail, an adaptive parameter tuning method is proposed to solve the negative transferring problem in the parameter transferring process. Experiments based on real-world datasets demonstrate that the proposed model achieves more accurate rumor detection and significantly outperforms state-of-the-art rumor detection models.

With the rapid development of mobile Internet technology, online social networking (OSN), a novel information publishing and sharing platform, has become an essential part of our daily life. Some OSN platforms, such as Facebook, Twitter, Weibo, WeChat, and other social networking platforms, have triggered a media revolution with the interactivity, immediacy, and diversity, which have profoundly affected all aspects of our society and economy. The existence of false information makes it difficult for OSN users to obtain credible information on OSN platforms. Rumors are the most common false information, which are false messages that spread among a large amount of people and have mislead these people [

Most existing rumor detection methods employed learning algorithms that incorporated a wide variety of features to take rumor detection as a binary classification task [

Transfer learning (TL) is a branch of machine learning (ML) algorithms, which leverages the knowledge stored within a source domain and provides a method to transfer the knowledge of the source domain to a target domain [

The deep neural network incorporates the domain knowledge into the parameters of their nodes during the training process. We can transfer the related knowledge embedded in neural networks to the rumor detection domain by reusing the parameters of the neural networks. This paper proposes a deep transfer model based on CNN to approach an accurate rumor detection scheme. In detail, we propose a learning rate adaptive update method to solve the negative transferring problem in the transfer process.

The main contributions are listed as follows:

A novel deep transfer model based on CNN for rumor detection is proposed, which can effectively identify rumors without sufficient training data. We evaluate that the knowledge related to large-scale datasets in the field of e-commerce reviews has similar features with the knowledge about the characteristics of rumors, which is used to train a model whose parameters is transferred to the rumor detection model.

We propose a learning rate adaptive update method to solve the negative transfer problem during the parameter transfer process. In detail, based on the stochastic gradient descent algorithm, we achieve an adaptive learning rate updating method for fine-tuning the rumor detection model obtained in the transfer process.

We implement the proposed detection scheme on an open source deep learning platform, TensorFlow [

The rest of the paper is organized as follows. In Section

In this section, we focus on providing a brief review of the work most closely related to effective and efficient rumor detection. We outline related research approaches in three fields: rumor detection, deep learning, and transfer learning.

Rumor is a powerful, pervasive, and persistent force that misleads people and groups [

Early exploration started from two special studies on rumor propagation during natural disasters like earthquakes and hurricanes [

Deep learning models simulate the human brain’s thinking patterns to discover various characteristics of texts. Therefore, the accuracy of deep learning models is often higher than that of the traditional rumor detection technology. Recently, deep neural networks are emerging as the prevailing technical solution to almost all fields in natural language processing (NLP). Word embedding is the basis for deep learning to solve many natural language processing problems [

Transfer Learning can alleviate the lack of labelled training data for training a deep learning model [

Difference between transfer learning and traditional machine learning.

In view of the excellent performance of transfer learning for constructing a deep learning model without sufficient training data, this paper proposes a scheme for rumor detection based on transfer learning in the next section.

After the deep learning model completes its training process, the domain knowledge will be fixed into the model parameters. When the training data are insufficient, an effective training model cannot be obtained, as shown in Figure

The effect of the training dataset size.

In this section, we propose a deep transfer model, namely, TL-CNN. It achieves accurate rumor detection by using review evaluation knowledge in the e-commerce domain. In detail, based on the stochastic gradient descent algorithm, we propose an adaptive learning rate updating method for fine-tuning of the model obtained in the transfer process. The overall framework of the model is illustrated in Figure

The framework of TL-CNN.

Basic detection model.

The convolutional neural network (CNN) model is originally proposed in computer vision and is proven to be effective in natural language processing, semantic analysis, and other traditional NLP tasks [

Parameter configuration.

Attribute | Value |
---|---|

Convolutional units | 3, 4, 5 |

Feature maps | 100 |

Activation function | Softmax |

Pooling | 1-max pooling |

Dropout rate | 0.5 |

l2 norm constraint | 3 |

Batch_size | 50 |

The embedding layer is the first layer of the basic detection model, which is used to preprocess the raw data. In detail, the embedding layer formulates the original input data as a matrix. For a task of text classification, a sentence will be represented with a vector of the identifiers of words, which is named as a word vector. All the input data of the model consist of a

The convolutional layer is the elemental layer of a convolutional neural network. The convolutional layer consists of several convolutional units, and the parameters of each convolutional unit are optimized by a backpropagation algorithm. Each convolutional unit can cover a part of the input matrix.

The difference between our convolutional layer and the existing convolutional layer is illustrated in Figure

Embedding layer. (a) Traditional convolutional neural network. (b) Our convolutional neural network.

Ultimately, the corresponding eigenvalue of the convolution unit as well as the corresponding eigenvector is obtained. In detail, we use formula (

The feature values obtained on the convolution units of a specific size comprise the corresponding feature matrix.

To remove trivial eigenvalues from the feature map, a max pooling-based layer is used to reduce the number of features, which simplifies the computational complexity of CNN and reduces the overfitting rate. Max pooling operation is the most popular pooling operation, which will take the maximum values in the feature map after performing a dot product on the weight matrix and the feature map. The weight matrix is valuable for obtaining the most important features included in the feature map. This paper uses the maximum pooling operation to process the results of convolution layer and obtains a brief semantic representation of the inputted texts,

The fully connected layer is a regular hidden layer of a multilayer neural network which makes higher order decisions. It receives inputs from the pooling layer. To avoid suffering from overfitting, a Softmax-based fully connected layer is included in the basic detection model. Softmax operation randomly discards some inputs from the pooling layer.

The Softmax operation pooling obtains the most important features that contribute to the classification process, connecting the overall features of the inputted texts.

The output layer is responsible for obtaining the final detection result. The weight matrix of this layer is highly related to the characteristics of the detection targets and is invaluable for transfer learning. The output layer is defined as follows:

Normalize

The transfer learning algorithms reuse the existing knowledge to the target domain in order to solve the problem of training data (i.e., labelled data) shortage. The hierarchical architecture of the deep neural network model is very suitable for transfer learning [

Transfer learning process.

As illustrated in Figure

Accuracy rates of the original detection models.

Model | Dataset | Accuracy (%) |
---|---|---|

Basic | YELP-2 | 89.71 |

TL-CNN-Strawman | FBN | 67.85 |

As depicted in Table

In order to solve this problem, we need to fine-tune the hyperparameters in the model training process to avoid negative transferring, instead of reusing the parameters of the basic detection model in a straightforward way. As depicted in Figure

To achieve an effective layerwise tuning scheme, we analyze the effect of the hyperparameters in the training process. Among them, the learning rate is the most important hyperparameter that is related to the efficiency of the model training process and the accuracy of the model training results. With an applicable learning rate, we can obtain an accurate model as soon as possible. If the learning rate is too high, the model will miss the optimal point and need multiple iterations to reach convergence. On the other hand, a low learning rate always is related to a longer training process and causes the model to fall into a local optimal point. Different layers in the neural network can acquire different types of features, and thus the parameters of these layers should be tuned with different learning rates. With the limited amount of training data in rumor detection domain, we apply discriminative fine-tuning to configure each layer with different learning rates, instead of using the same learning rate for all layers of the rumor detection model.

To discover the optimal learning rate for each layer, we propose a learning rate updating scheme to update the learning rate in a reasonable way. In detail, stochastic gradient descent (SGD) [

The adaptive learning rate updating rule is as follows:

Updating learning rate

where

where

Updating the weight of this layer

The detailed updating process is introduced in Algorithm

Batch_size

decay rate

weight parameter

gradient parameter

Learning rate

Hyperparameter

initialize parameter

Obtain

Calculate the loss function

Calculate the moving average of uncentered variance over past first-order gradient of the loss function

Update the learning rate update:

Calculate the past first-order gradient of weights:

Update weights of this layer:

In order to evaluate the proposed rumor detection scheme, we implemented the proposed rumor detection scheme and baseline schemes on TensorFlow. TensorFlow is a machine learning system that operates at large scale and heterogeneous environments. It is the second generation of artificial intelligence learning system developed by Google. On the real-world datasets, we comprehensively compare the proposed scheme and the baseline schemes in terms of different accuracy metrics.

The Yelp review dataset was obtained from the 2015 Yelp dataset challenge [

The FBN dataset is a small rumor dataset that is about all five events and includes 5,802 labelled tweets [

These two datasets are used, respectively, in the basic detection model and the rumor detection model obtained in the transfer learning process, which is named as

Statistics for the datasets.

Datasets | Domain type | Class | Size |
---|---|---|---|

YELP-2 | Positive/negative review | 1,569,264 | |

FBN | Regular message/rumor | 5,802 |

We provide the relevant parameters used in the proposed model in Table

We initially train the basic detection model on YELP-2 dataset. After the transfer learning process, we fine-tune the obtained rumor detection model on FBN dataset. To extensively evaluate the performance of the rumor detection model, we implement the proposed scheme (TL-CNN) on TensorFlow as well as three state-of-the-art baseline schemes. The detailed information of these three baseline schemes is introduced as follows:

VDCNN-based model:

VDCNN is a convolutional neural network-based model proposed by Gereme and Zhu [

Char-CNN-based model:

Based on a character-level convolutional network, Char-CNN was proposed by Joo and Hwang [

RCNN-based model:

RCNN is a model proposed by Fang et al. [

We use accuracy rate, precision rate, recall rate,

Accuracy rate is the ratio of the messages correctly classified as rumors to the total number of messages.

Precision rate is calculated as the ratio of all messages correctly classified as rumors (TP) to all messages classified as rumors (TP + FP).

Recall is the ratio of all messages correctly classified as rumors (TP) to all messages that should be classified as rumors (TP + FN).

Accuracy gain

To evaluate the accuracy of the proposed rumor detection model TL-CNN, we compare the rumor detection results of TL-CNN to the results of the baseline models. As depicted in Figure

Trend of accuracy.

In Figure

Trend of accuracy gain.

A detailed result is listed in Table

Accuracy evaluation.

Model | Accuracy (%) | Precision (%) | Recall (%) | |
---|---|---|---|---|

VDCNN | 71.88 | 60.59 | 59.12 | 0.5973 |

RCNN | 79.53 | 76.92 | 82.79 | 0.7997 |

Char-CNN | 83.21 | 78.80 | 0.8195 | |

TL-CNN | 84.76 |

The proposed model was trained within 529.57 minutes on Yelp and FBN. It trains 6.6963 data per second and tests 2.4361 data per second in average. A detailed result is listed in Table

Efficiency evaluation.

Model | Training (min) | Test (data/s) |
---|---|---|

VDCNN | 1482.58 | 3.69 |

RCNN | 356.19 | 5.17 |

Char-CNN | 275.83 | 6.78 |

TL-CNN | 529.57 | 6.70 |

In this paper, we present an effective deep transfer model based on convolutional neural network, TL-CNN, to detect rumors with limited amount of training data. To achieve that, considering the phenomenon of negative transferring during the transfer learning process, we propose a learning rate adaptive tuning method to avoid negative transferring. The extensive experiments on the real-world datasets demonstrate that the proposed rumor detection model can significantly improve the accuracy of rumor detection, which can be applied to social media, e-commerce, and other fields.

Previously reported text data, Yelp and FBN, were used to support this study and are available at WOS:000450913101042 and ArXiv:1610.07363v1. These prior studies (and datasets) are cited at relevant places within this paper as references [

The authors declare that they have no conflicts of interest.

This study was supported by the National Key Research and Development Program of China (2018YFB1800403 and 2016YFE0121500), the National Natural Science Foundation of China (61902382, 61972381, 61672500, 61962045, 61502255, and 61650205), the Strategic Priority Research Program of Chinese Academy of Science (XDC0203500), and the Natural Science Foundation of Inner Mongolia Autonomous Region (2017MS(LH)0601 and 2018MS06003).