An Optimized Hybrid Deep Learning Model to Detect COVID-19 Misleading Information

Fake news is challenging to detect due to mixing accurate and inaccurate information from reliable and unreliable sources. Social media is a data source that is not trustworthy all the time, especially in the COVID-19 outbreak. During the COVID-19 epidemic, fake news is widely spread. The best way to deal with this is early detection. Accordingly, in this work, we have proposed a hybrid deep learning model that uses convolutional neural network (CNN) and long short-term memory (LSTM) to detect COVID-19 fake news. The proposed model consists of some layers: an embedding layer, a convolutional layer, a pooling layer, an LSTM layer, a flatten layer, a dense layer, and an output layer. For experimental results, three COVID-19 fake news datasets are used to evaluate six machine learning models, two deep learning models, and our proposed model. The machine learning models are DT, KNN, LR, RF, SVM, and NB, while the deep learning models are CNN and LSTM. Also, four matrices are used to validate the results: accuracy, precision, recall, and F1-measure. The conducted experiments show that the proposed model outperforms the six machine learning models and the two deep learning models. Consequently, the proposed system is capable of detecting the fake news of COVID-19 significantly.


Introduction
A novel coronavirus  was discovered in Wuhan, China, at the beginning of December 2019.
e World Health Organization (WHO) has announced that the COVID-19 outbreak is a global pandemic on 11 March 2020 [1]. Due to the panic from COVID-19 disease, people started posting fake news and misinformation about the coronavirus on social media networks. e posts, tweets, and comments contain misleading statements. Recently, the researchers have had a particular interest in utilizing sentiment analysis to distinguish the fake news about COVID-19 [2]. Social media has the biggest contribution for spreading COVID-19 fake news due to the huge number of people's posts having panic expressions. erefore, the governmental authorities start to launch official websites for COVID-19 announcements to stop circulating fake stories about COVID-19 [3]. Consequently, the researchers began to pay attention to COVID-19 misleading information by analyzing social media contents and applying advanced AI technologies (i.e., machine learning and deep learning) to profiling the COVID-19 fake news [4]. As a result of the research direction in content analysis, the research organizations start raising funding to provide novel solutions to combat COVID-19 in terms of analyzing the misleading information about the COVID-19 pandemic [5][6][7][8][9][10]. Recently, machine learning and deep learning are playing a vital role in different areas such as sentiment analysis [11,12]; Alzheimer detection [13]; prediction cancer [14], and others [15,16]. e researchers have utilized the collected datasets related to COVID-19 through social media to evaluate their proposed approaches [17]. In this work, we have proposed an optimized hybrid model to detect the fake news on COVID-19 on social media. e core idea of the proposed model is the hybridization of using CNN and LSTM.
Our main contributions in this work are as follows: (i) Development of a hybrid model integrating CNN and LSTM to detect fake news about COVID-19 is done. (ii) e proposed model is optimized using a Hyperopt optimization technique to select the optimal values of parameters (iii) e proposed model, CNN, LSTM, and regular ML algorithms are applied to three COVID-19 fake news datasets (iv) e experimental results demonstrated that the proposed model had achieved the best performance compared with other models e rest of this paper is structured as follows. Section 2 presents the related work. Section 3 describes the architecture of the proposed system of COVID-19 fake news detection. Section 4 describes the experimental results. Finally, the paper is concluded in Section 5.

Related Works
Recently, researchers have been actively working to detect fake news about COVID-19. Wani et al. [18] used CNN, LSTM, and bidirectional encoder representations from transformers (BERT) to detect fake news about COVID- 19. ey used Contraint@AAAI 2021 Covid-19 Fake news detection dataset [19]. Elhadad et al. [20] proposed a voting ensemble classifier using 10 ML algorithms with seven feature extraction techniques to identify misleading information related to the COVID-19 outbreak. ey tested their proposed classifier to 3,047,255 tweets about COVID- 19. e best results are obtained from the NN, DT, and LR classifiers. Müller et al. [21] proposed transformer model COVID-Twitter-BERT (CT-BERT), on large a large corpus of Twitter messages about COVID-19. In [22], authors created an annotated dataset about COVID-19 fake news tweets. ey proposed a multilingual bidirectional encoder (mBERT) to extract the textual features from the dataset. e results show the mBERT has achieved the highest performance compared with SVM, RF, and a multilayer perceptron. In [17], two stages are developed to detect fake news. e first stage uses a novel fact-checking approach to retrieve the most relevant facts about COVID-19, while the second stage verifies the level of truth by computing the textual entailment. In addition, the authors used pretrained transformer-based language models to retrieve and classify fake news in a particular domain of COVID-19 using BERT and ALBERT. ey used a dataset that consists of more than 5000 COVID-19 false claims. In [23], authors gathered COVID-19 news articles from two data sources, and Poynter and Snopes then crawled the sources' textual contents. ey classified the articles into 11 various categories. Also, they applied ML algorithms on annotated the articles to detect misinformation about COVID-19. Al-Rakhami et al. [24] proposed an ensemble-learning-based framework to classify the tweets into credible or noncredible. ey applied the framework to a large dataset of tweets carrying news about COVID-19.
eir framework obtained high accuracy. Hossain et al. [25] introduced a benchmark dataset, COVIDLIES, which contains known COVID-19 misconceptions. ey classified each tweet in the dataset into three categories Agree, Disagree, or express No Stance. Patwa et al. [19] created and annotated the dataset that includes 10,700 posts and articles of real and fake news on COVID-19. Four ML baselines: DT, LR, Gradient Boost, and SVM, have been applied to an annotated dataset to classify posts as fake or real. SVM has obtained the best performance with the testing set.

The Proposed System of Detecting COVID-19 Fake News
In this section, the proposed system of detecting COVID-19 fake news is introduced. Figure 1 depicts the workflow of the proposed system showing in a set of steps which are (1) data collection, (2) data cleaning, (3) feature extraction, (4) hyperparameter optimization, and (5) evaluation models.

Data Collection.
ree datasets of COVID-19 fake news are used, which are described as follows.
3.1.1. Dataset 1. Dataset 1 was collected from Facebook posts, a far-right website which called Natural News (https://towar dsdatascience.com/explore-covid-19-infodemic-2d1ceaae2306). Also, another medicine website is used called orthomolecular.org. Although some data sources are removed from the Internet and social media, they are able to reach by the Internet's Archives.

Dataset 2.
Dataset 2 which is COVID-19 fake news data was collected from Internet (https://https://www. researchgate.net/publication/346036811_COVID-19_Fake_ News_Data). e dataset 2 consists of a set of COVID-19 fake news. Based on the dataset 2 attributes, the headings used later as labels have the binary attribute. In particular, it has 0, which indicates that the news is fake, while one indicates that the news is true.

Dataset 3.
Dataset 3 which is COVID-19 fake news data was collected from Internet (https://www.researchgate.net/ publication/349517903_COVID-19_Fake_News_Dataset). Dataset 3 was collected by Webhose.io and then was manually labeled. e dataset consists of three types of news which are (1) false news, which is called fake, (2) true news, and (3) partially false news. For simplification, false news and partially are considered false, labeled as 0, while the real news is labeled as 1.

Data
Cleaning. In this phase, we have applied five steps described as follows: (i) Text Parsing. In this step, the tokenization functions are used to divide the text within the datasets for further analysis. (ii) Data Cleaning. In this step, regular expressions methods are used to extract English alphabets, numbers, and their combination. is step is applied to eliminate any noisy data within the datasets texts. (iii) Part of Speech (PoS) Tagging. is step has marked each word in the text with its root, including a verb, adjective, and noun. (iv) Stop Words Removal. All the common words are removed from the text in this step, such as "a" and "the." (v) Stemming. In this step, we have applied a replacement method to replace each word with its root to eliminate the redundancy within the text. In addition, English stemmer is used, which reduces the text by 40-50% concerning the original text within the three datasets.

Feature Extraction Methods.
In this phase, two subphases have been done, which are applying Term Frequency-Inverse Document Frequency (TF-IDF) and word embedding for regular ML and DL models, respectively.
(i) For regular ML models, we have applied the TF-IDF method to assign weights to the prepossessed text of the three datasets [26]. e key idea of the TF-IDF is to determine the word frequencies within the text. (ii) For DL models, word embedding is a mechanism to represent words into vectors where the words with the same meaning have similar vectors. us, every word within the text is represented in dense vectors. Glove is one of the more popular word embedding techniques [27].

e Proposed Model Description.
In this section, the architecture of our proposed model is described. Figure 2 depicts the set of layers of the proposed model, which are sequentially working to detect fake news. In particular, the following layers are used: an embedding layer, a convolutional layer, a pooling layer, an LSTM layer, a flatten layer, a dense layer, and an output layer. In addition, we have applied Hyperopt optimization methods to optimize the proposed model further to select the best values for the proposed model's parameters. e layers are described as follows: (i) Embedding Layer. Each new has been represented into vectors mapped to each word. At the technical level, we have implemented this layer using Keras library [28]. e Keras library has three parameters Computational Intelligence and Neuroscience 3 which are input-dim, output-dim, and inputlength. e input-dim is used to configure the vocabulary size, while the output-dim is used to configure the size of the embedded words. e input-length is used to configure the length of input sequences. According to our implementation, the input-dim, output-dim, and input-length are 20000, 200, and 32, respectively. (ii) Dropout Layers. ese layers are used to prevent overfitting and reduce the complexity of the model [29]. As stated earlier, this layer receives its input from the embedding layer output. For configuration, we have set the value of the dropout as range between 0.1 and 0.9. (iii) Convolutional Layer. e convolutional layer receives input from the dropout layer. e convolutional layer has two main parts, which are filter and feature map as a kernel. In the first part, a filter is used to apply filtering on the input word matrix. e filtering process is useful for providing a map of features that indicates the pattern of the input data [30]. e ReLU activation function is used to identify the features within the news. (iv) e Pooling Layer. is layer uses the max operation for feature reduction within the feature mapping process. In particular, configuring a high value will significantly help capture the essential features, which reduces the computation for the next layer. (v) LSTM Layer. LSTM is a type of recurrent neural network which is used for prediction based on learning long-term dependencies. According to this work, we have used LSTM to build the hybrid model. (vi) e Flatten Layer. e text was converted to a 1dimensional array by the flatten layer, which was then input to the following layer. (vii) Dense Layer. It is a deeply connected neural network layer. It has some parameters which are input, kernel, bias, and activation. e input parameter represents the input data. e kernel represents the weight data. And the activation is used to represent the activation function.
(viii) e Output Layer. is layer is used to take the output of the flattened layers to generate the model's final output, real or fake news. We have used ADAM optimizer [31] and sigmoid activation function [32].

Different
Models. Different models have been compared with the proposed model: six regular ML models, CNN, and LSTM.
(i) Six regular ML models such as DT [33], LR [34], KNN [35], RF [36], SVM [13], and NB [37] were used to compare with proposed model. (ii) e long short-term memory (LSTM) model has five layers which are (1) an embedding layer, (2) hidden layer, (3) dropout layer, (4) flatten layer, and (5) an output layer. e embedding layer is the first layer that has been designed similarly to the proposed model layers. e hidden layer is used, which is LSTM [38,39]. In particular, the L2 weight regularization technique is used with reg_rate value for l2. e third layer is the dropout layer which is used to eliminate the overfitting and simplify the model [29]. It is configured by setting the dropout value as a range (0.10.9). e fourth layer is the flattened layer which aims to convert the entire text into a vector of features. Finally, the output layer takes the flatten layer output to generate the final output, which classifies text as the whole in terms of real or fake news. (iii) Convolutional neural network (CNN) consists of an embedding layer, a convolutional layer, a pooling layer, a flatten layer, a dense layer, and an output layer.

Hyperparameter Optimization.
Hyperparameter optimization aims to optimize the hyperparameters for ML and DL models automatically. Hyperparameter tuning is utilized to pass various parameters into the model to select the best values of parameters to achieve the best performance.  (i) For optimization ML models, we have used grid search. We have defined a set of initial values for each hyperparameter. e model checks these values and then selects the best value for each hyperparameter for obtaining the highest accuracy. Also, K-fold cross-validation (CV) is used where the dataset is equally divided into K-fold. K-1 folds are used for training, and the rest part is used for testing. e dividing process is repeated until the model reaches that each fold has been used for testing. Finally, the classifier is evaluated based on the average of accuracy within the 10-fold.
(ii) We have used Hyperopt, which is distributed asynchronous hyperparameter optimization built-in python library, for optimization of DL algorithms. Furthermore, Hyperopt is an open-source library for large-scale AutoML and HyperOpt-Sklearn (https:// github.com/hyperopt/hyperopt). A set of parameters for the proposed model is configured as shown in Table 1. Also for the LSTM model, a set of parameters is as shown in Table 2. Similar to LSTM, a set of parameters of the CNN model is configured as shown in Table 3.

Evaluation Metrics.
Four standard machine learning metrics are used, which are accuracy, precision, recall, and F1-score. Equations (1)-(4) describe the formulas for calculating these metrics. Accordingly, TP stands for true positive, TN for true negative, FP for false positive, and FN for false negative.
(i) Accuracy is the popular metric used to perform ML and DL models. It measures the percentage of correctly predicted observations. e accuracy calculation formula is as follows: (ii) e second metric is a precision which indicates the ratio of true positives to all true events predicted. e precision calculation formula is as follows: (iii) Recall is the third metric used to indicate the total number of positive classifications out of true class. e recall calculation formula is as follows: (iv) e fourth metric is F1-score which shows the trade-off between precision and recall. It shows the weighted average of precision and recall. e F1score calculation formula is as follows:

Experiment Setup.
e experiments were conducted using a Google Colab RAM 25 GB, Python 3, and GPU. e three comparable models, the proposed model, LSTM, and CNN models, are implemented using the Keras library. e regular ML models have been implemented using the sci-kitlearn package. For optimization, grid search is used for ML, and the Hyperopt library is used for DL. For the embedding layer, a 200-dimensional word vector is used for the Glove set pretrained. For the datasets, three datasets of COVID-19 fake news are used. Each dataset is divided into 80% for training and 20% for testing. Each experiment has been repeated ten times. e result of cross-validation (CV) and testing performance has been registered.     Table 6. e optimal values for CNN's parameters are shown in Table 7.

Results
e optimal settings of the LSTM parameters are shown in Table 8.            Table 16. e optimal values for CNN's parameters are shown in Table 17. e optimal settings of the LSTM parameters are shown in Table 18.

Discussion.
e section will be present the best models for each dataset. Figure 3 illustrates a comparison between the best models for CV and test performance for dataset 1 Overall, the proposed model has achieved the best performance for CV and testing. Figure 4 illustrates a comparison between the best models for CV and test performance for dataset 2. For CV performance, the proposed model has the best performance ( Overall, the proposed model has achieved the best performance for CV and testing. Figure 5 illustrates a comparison between the best models for CV and test performance for dataset 3. For CV performance, the proposed   Overall, the proposed model has achieved the best performance for CV and testing.

Conclusion
In this paper, the hybrid model based on CNN and LSTM has been proposed to detect COVID-19 fake news. Some layers for the proposed model were developed (i.e., an embedding layer, a convolutional layer, a pooling layer, an LSTM layer, a flatten layer, a dense layer, and an output layer. ree datasets about COVID-19 fake news were used to evaluate the proposed model. e experimental results have proved the superiority of our proposed model to detect the fake news of COVID-19 among six machine learning models (i.e., DT, KNN, LR, RF SVM, and NB) and two deep learning models (i.e., CNN and LSTM).

Conflicts of Interest
e authors declare that they have no conflicts of interest.