Fake Detect: A Deep Learning Ensemble Model for Fake News Detection

Pervasive usage and the development of social media networks have provided the platform for the fake news to spread fast among people. Fake news often misleads people and creates wrong society perceptions. The spread of low-quality news in social media has negatively aﬀected individuals and society. In this study, we proposed an ensemble-based deep learning model to classify news as fake or real using LIAR dataset. Due to the nature of the dataset attributes, two deep learning models were used. For the textual attribute “statement,” Bi-LSTM-GRU-dense deep learning model was used, while for the remaining attributes, dense deep learning model was used. Experimental results showed that the proposed study achieved an accuracy of 0.898, recall of 0.916, precision of 0.913, and F -score of 0.914, respectively, using only statement attribute. Moreover, the outcome of the proposed models is remarkable when compared with that of the previous studies for fake news detection using LIAR dataset.


Introduction
Progression and advancement of the hand-held devices and high-speed Internet have exponentially increased the number of digital media users. According to digital global report 2020, the number of users for digital media reached 4.75 billion, and the social media users reached 301 million in 2020 [1]. is digitalization converts the world into the global village. Due to this advancement, individuals are just one click away from the information worldwide. Despite several advantages, this transformation has raised some challenges. Fake news is one of the challenges faced by the digital community nowadays.
Fake news is pervasive propaganda that spreads misinformation online, using social media like Facebook, twitter, and Snapchat to manipulate public perceptions. Social media can have two sides for news consumption, i.e., can be utilized to update the community about the latest news and, on the other hand, can be a source of spreading false news. However, social media is a low cost, quick access, and fast distribution of news and information and to know what is happening worldwide. Moreover, due to its simplicity and lack of control on the Internet, it allows "fake news" to be widespread.
Fake news has become a focal point of discussion in the media over the past three years due to its impact on the 2016 US Presidential election [2]. Reports showed that human's capability for detecting deception without special assistance is only 54% [3]. erefore, there is a need for an automated way to classify fake and real news accurately. Some studies have been conducted but still there is a need for further attention and exploration. e proposed study attempts to eliminate the spread of rumors and fake news and helps people to identify the news source as trustworthy or not by automatically classifying the news. e organization of this paper is as follows. Section 2 includes a review of previous studies. Section 3 explains the proposed methodology, which contains the "LIAR" dataset description, preprocessing, and classification models used. Section 5 includes experimental setup results and discussion. Finally, Section 6 contains the conclusion of this paper.

Related Studies
One of the earlier studies on fake news detection and automatic fact-checking with more than a thousand samples was done by [4] using LIAR dataset. e dataset contains 12.8 K human-labeled short statements from POLIT-IFACT.COM. e statements were labeled in six different categories, such as pants fire, false, barely true, half true, mostly true, and true. e study used several classifiers such as logistic regression (LR), support vector machine (SVM), a bidirectional long short-term memory (Bi-LSTM) networks model), and a convolutional neural network (CNN) model. For LR and SVM, the study used the LIBSHORTTEXT toolkit and showed significant performance on short text classification problems. e study compared several techniques using text features only and achieved an accuracy of 0.204 and 0.208 on the validation and test sets. Due to overfitting, the Bi-LSTMs did not show good performance. However, the CNN outperformed all models, resulting in an accuracy of 0.270 on the holdout data splitting.
Similarly, another study compared three datasets such as LIAR datasets, fake or real news dataset [5], and the dataset generated by collecting fake news and real news from Internet [6]. e study made a comparison among various conventional machine learning models such as SVM, LR, decision tree (DT), AdaBoost (AB), Naive Bayes (NB), and K nearest neighbor (KNN), respectively, using lexical, sentiment, unigram, and bigram techniques with term frequency and inverse document frequency (TF-IDF). Furthermore, several CNN models such as NN, CNN, LSTM, Bi-LSTM, hierarchical attention network (HAN), convolutional HAN, and character level C-LSTM were also used with Glove embedding and character embedding to train the model. ey found that the performance of the LSTM model highly depends upon the size of the dataset. e result showed that NB, with n-gram (bigram TF-IDF), features produced the best outcome of approximately 0.94 accuracy with the combined corpus dataset.
Conversely, the study by [4] indicated that the CNN model outperformed the LIAR dataset. However, the study by [6] showed that the CNN model is the second-best for all the datasets. e NB model showed the best performance for the LIAR dataset with 0.60 accuracy and 0.59 F1-score. For the fake or real news, dataset Char-level C-LSTM showed the best performance with 0.95 accuracy and 0.95 F1-score. LSTM-based models showed the best outcome on the combined corpus dataset, where both Bi-LSTM and C-LSTM produced an accuracy of 0.95 and F1-score of 0.95. Furthermore, another study was performed by Girgis et al. [3] regarding the spread of fake news and used recurrent neural network (RNN) models (Vanilla RNN, Gated Recurrent Unit (GRU)) and long short-term memories (LSTMs) on the LIAR dataset to predict fake news. ey compared and analyzed their results with Wangs [4] findings. Although similar results were achieved, GRU (0.217) outperformed the other models. Nevertheless, in comparison with the findings of Wang, they found that CNN is better in terms of speed and outcomes. Similarly, the authors in [7] used the LSTM model on LIAR dataset. ey found that adding the speaker profile enhances the performance of the algorithm. e model achieved an accuracy of 0.415. Moreover, the study by [8] proposed a novel approach to overcome the problem of fake news detection using two metaheuristic algorithms, salp swarm optimization (SSO) and grey wolf optimization (GWO). e study performed experiments using three different datasets, which are BuzzFeed Political News, Random Political News, and LIAR Benchmark. e results showed that the GWO algorithm outperformed as compared with the SSO and other algorithms. GWO obtained the best accuracy in all datasets and produced highest precision and F1-score in two out of three datasets. Moreover, the precision of the SSO within two out of three datasets performed better than all the algorithms. e results obtained from the two algorithms were very promising because of the representation structure, and flexible fitness function handled many different objectives simultaneously and efficiently. e study recommended that using different similarity metrics in model construction and testing improves the performance of their model. In the converted document vector, binary versions of metaheuristic optimization techniques can also be used. Similarly, to improve the results of the study, adaptive and hybrid versions of the SSO and GWO algorithms were proposed.
Another study [9] used self multihead attention-based CNN (SMHACNN). e study implemented CNN and self multihead attention (SMHA) techniques and evaluated the truthfulness of news based on its content. e experiments were conducted on a public dataset that was collected from fakenews.mit.edu. e study conducted two experiments using 5-fold cross-validation, and their results showed that the model produced effective outcomes in detecting the fake news with the precision of 0.95 and the recall of 0.95. Besides, they have compared their results with previous work and have shown that their proposed technique using the self multihead attention with the CNN made a remarkable performance.
Additionally, the authors in [10] developed an exploratory analysis model using Facebook news during the 2016 US Presidential election based on the elaboration likelihood model as well as numerous cognitive and visual indicators of information, which most of them have already been shown to impact the quality of online information. e study investigated how news posts' cognitive, visual, affective, and behavioural clues, together with the addressed user communal, can be used by machine learning models to automatically detect the fake news. e study used a BuzzFeed dataset of Facebook posts. ey trained many machine learning models appropriate for binary classification. e classifiers were LR, SVM, DT, random forest (RF), and extreme gradient boosting (XGB) and were trained with the same features set. e study achieved the highest accuracy of 0.80 and an approximately 0.90 recall.
A study used a hybrid approach by combining deep learning, natural language processing (NLP), and semantics using LIAR and PolitiFact datasets [11]. e study compared the performance of some classical machine learning models like multinomial Naïve Bayes (MNB), stochastic gradient boosting (SGD), LR, DT, and SVM. e study compared the performance of some classical machine learning models like multinomial Naïve Bayes (MNB), stochastic gradient boosting (SGD), LR, DT, SVM, and DL models like CNN, Basic LSTM, Bi-LSTM GRU, and CapsNet, respectively. e study found that CapsNet outperformed the other model with an accuracy of 0.649 using LIAR dataset. e integration of semantic features such as named entity recognition (NER) sentiments in LIAR dataset enhanced the performance of the classification model. Similarly, another study also compared the performance of machine learning and DL models and found similar performance of SVM and Bi-LSTM with an accuracy of 0.61 using LIAR dataset [12]. However, the training time of Bi-LSTM was very huge. Recently, the study used ensemble-based machine learning approach for the classification of fake news using two datasets LIAR and ISOT dataset [13]. e ensemble model used DT, RF, and extra tree classifiers. e study achieved testing accuracy of 44.15%.
Despite of several studies already made in the Fake news detection, there is still a room for further improvement and investigation. e studies mentioned above highlight the significance of the CNN and deep learning models for classification of fake news. It is also found that LIAR dataset is one of the widely used benchmark dataset for the detection of fake news. In our study, we attempt to develop an ensemble-based deep learning model for fake news classification that produced better outcome when compared with the previous studies using LIAR dataset.

Material and Methods
is section presents an overview of dataset, preprocessing techniques, and description of the deep learning model used for classification. Figure 1 represents the proposed study methodology. e dataset contains two types of feature such as short textual feature, i.e., statement and other features like speaker job title, subject, and venue. erefore, the features were initially divided according to the category. For the statement attribute, several NLP techniques like tokenization, lemmatization, and stop word removal were used. However, for the other category of features, different data preprocessing techniques were applied that will be discussed further in the preprocessing section.

Dataset Description
e study used "LIAR" dataset [4] that contains 12.8 K human-labeled short statements from POLITIFACT.COM, and each statement is checked for its truthfulness by a POLITIFACT.COM editor. It has six categories for the label to rate accuracy, which are pants fire, false, mostly true, half true, mostly true, and true. e dates for the statements are primarily from 2007 to 2016. e speakers include a combination of democrats and republicans, and for each speaker, there is a rich collection of metadata that includes historical counts of false statements for each speaker. Such statements are sampled from different contexts/venues, and also the speakers are discussing a diverse set of subjects. Table 1 shows the description of the dataset. e statistical analysis of the historical counts of inaccurate statements for each speaker is also presented in the table. For the numeric variable mean (μ), standard deviation (σ) and range are used. However, categorical variable number of categories has been used. e table also contains the number of missing values per attribute. In the dataset, only three attributes have missing values, namely, speaker's job title, state info, and the context. e study used the records with the class label true and false with the total number of records 4557. e number of news records with true class label is 2053 and with false class label is 2504, respectively.

Preprocessing.
Several preprocessing techniques were applied on the dataset. Initially, the dataset consists of 14 attributes.
ree attributes have missing values, namely, state_info, speaker job title, and venue. State info was removed from the study due to low relevance of the attribute. However, the other two attributes with missing values, namely, speaker's_job title and venue were included for further analysis. In the speaker's job title and venue, attribute missing values were replaced with the unique category unknown. Party affiliation feature consists of 24 categories and is converted into four categories, namely, republican, democrat, unknown, and other, respectively. e category none is replaced with unknown while all other 19 categories are replaced with other except republican and democrat. Normalization was performed on four columns, namely, barely true counts, false counts, half true counts, mostly true counts, and pants on fire counts, respectively. e data were normalized in the range (0-1).
After performing all preprocessing steps with the data, the dataset contains 10 features and a target variable. One of the features in the dataset, namely, "statement" contains textual data.
For the statement attribute initially, wordnet tokenizer was applied. Similarly, for the lemmatization, we used WordNetLemmatizer. After the lemmatization stop words were removed, we used English stop words. e word cloud before and after preprocessing is shown in Figures 5 and 6 .

Complexity
After the basic NLP, word embedding technique was applied. Word embedding is a technique that enhances the performance of the deep learning models for NLP tasks [14]. e words are converted into the real value numbers (vectors) that can be easily executed by the neural network models. e words containing similar meaning have same representation in a vector space. e details of the word embedding are further discussed in the Bi-LSTM-GRUdense deep learning model.

Deep Learning Model
Based on the nature of the features in the dataset, two deep neural network models were designed as discussed below.   4 Complexity e first dense model was used for other features. However, the Bi-LSTM-GRU model was used for statement feature.

Deep Learning Dense Model
e first model was designed with 10 fully connected dense layers with 9 feature variables as input. e structure of the layers was 512, 256, 256, 256 (dropout_layer), 64, 64, 64 (dropout layer), and 1 (output layer) neurons, respectively. e addition of dropout layers with value 0.5 was used to force the model to learn more robust features. e rectified linear unit "ReLU" activation function was used for input and all hidden layers, while "sigmoid" was used an activation function for output layer. e "Adam" optimizer was used as an optimization algorithm [15], while "binary_crossentropy" was used for loss of the model. e "accuracy" was used to evaluate the model accuracy. e 150 epochs were used which has batch size = 128 with callbacks setting to monitor the "validation accuracy" and save only the best model.

Deep Learning Dense Model.
e main architecture of the second bidirectional LSTM gated recurrent unit (GRU) model was dense neural network with 9 layers with 200 as input as per the size of the vector for each word. e embedding layer in the deep learning model is added along with vocabulary size of 5000, size of the real-valued vector space, i.e., EMBEDDING_DIM is 300, and the maximum length of input documents is 200. e two deep neural models bi-LSTM with 50 units and return_sequence is TRUE and (b) bidirectional Gated Recurrent Unit (bi-GRU) with 50 units, return_sequence is TRUE, and return_state is also TRUE. e addition of global maximum and average pooling layers at the output of the LSTM and GRU is used to make the resulting feature map to more robust to the positional changes of features. e outcome of the both models (i.e., bi-LSTM and bi-GRU) after the global max and global average pool is concatenated to get a single value. e output layer is set to a single dense layer with 1 output. e model was trained with 10 epochs, batch size to 64, and with class weight of {0: 1.1304960541149944, 1: 0.8965131873044255}.

Experimental Setup and Results.
e model was implemented in python 3.9.0, using several libraries such as sklearn, Keras, and matplot. Based on the nature of the dataset features, the experimental setup was prepared accordingly. e dataset was divided into three sets as presented in Table 2. e dataset 1, dataset 2, and dataset 3 represent the feature combination used in the previously mentioned experimental setup.
To prepare the dataset for experiment 1 using deep neural network, the word embedding techniques were applied during the preprocessing. e embeddings consider  Complexity the context and semantics meaning of the words by producing n-dimensional vector. During the embedding process, the data are encoded to represent each word with a unique integer. Prior to embedding, the Tokenizer API from tensor flow Keras is used to perform the tokenization. Padding was added to make vectors for all words of same length (i.e., max. length set to 200). e embedding matrix was created by using the FastText "cc.en.300.vec" [16] pretrain vector. e performance of the proposed model was compared in terms of accuracy, precision, recall, and F-score. K-fold cross-validation technique was applied for data partitioning with K � 10. e results of the proposed model are presented in Table 3. e experiments proved the significance of the proposed deep learning model for fake news detection. e model produced best results with using statement as a feature. e performance of adding other features with the statement degraded the prediction performance. However, the outcome of the proposed model using all the other features excluding statement feature produced the least results. e highest accuracy of 0.898, precision of 0.913, recall of 0.916, and F-score of 0.914, respectively, were achieved using only one attribute, i.e., statement. Moreover, Figures 7(a)-8(b) present the model validation and testing loss and accuracy for the first feature set.
Additionally, Table 4 contains the comparison of the proposed model with the benchmark studies in the literature. e criteria for the selection of benchmark were the studies in the literature that used LIAR dataset. e study outperformed the previous studies with an accuracy of 0.898. Like the previous studies [3,6], the proposed study also achieved the highest results using statement feature only. However, the author in [4] used 12 feature including statement. Long et al. [7] found the speaker profile as one of    Complexity the most significant feature. However, in the current study, the proposed model performance was not enhanced after the inclusion of speaker profile attribute. Conversely, the authors in [11] also found the highest outcome using the combination of the textual and other features. However, the highest accuracy achieved in [11] using the statement feature was 0.565. us, the performance of their model was greatly enhanced with the integration of other features like speaker job title and speaker info. In [4,12], the authors focused on binary classification like the proposed study. ey converted the news into two categories as fake and real.

Conclusion
e primary goal of this paper is to reduce the drawback of social media, which is the fast spread of fake news that often misleads people, creates wrong perceptions, and has a negative influence on society. erefore, an ensemble-based deep learning model is constructed to classify the news into fake or real. Several preprocessing techniques were applied initially on the dataset. Furthermore, NLP techniques were applied on statement attribute. Two deep learning models were used, deep learning dense model for the other 9 attributes excluding statement and Bi-LSTM-GRU-dense model for statement attribute. e results achieved by the proposed study is significant with an accuracy of 0.898 using statement feature. is model performance surpassed the other studies on the same dataset, and it is very effective in detecting fake news. Finally, fake news detection using machine learning is still a new topic and challenging. Despite of the significant results achieved by the proposed study, there is still a room for the improvement. e model needs to be investigated using other fake news datasets.

Data Availability
e study used open-source dataset and is accessed from the weblink https://www.kaggle.com/mrisdal/fake-news.

Conflicts of Interest
e authors declare that they have no conflicts of interest.