In recent years, convolutional neural network (CNN) has attracted considerable attention since its impressive performance in various applications, such as Arabic sentence classification. However, building a powerful CNN for Arabic sentiment classification can be highly complicated and time consuming. In this paper, we address this problem by combining differential evolution (DE) algorithm and CNN, where DE algorithm is used to automatically search the optimal configuration including CNN architecture and network parameters. In order to achieve the goal, five CNN parameters are searched by the DE algorithm which include convolution filter sizes that control the CNN architecture, number of filters per convolution filter size (NFCS), number of neurons in fully connected (FC) layer, initialization mode, and dropout rate. In addition, the effect of the mutation and crossover operators in DE algorithm were investigated. The performance of the proposed framework DE-CNN is evaluated on five Arabic sentiment datasets. Experiments’ results show that DE-CNN has higher accuracy and is less time consuming than the state-of-the-art algorithms.
People and organizations are posting their information and opinions on various social media platforms such as Twitter and Facebook. Understanding public sentiments, emotions, and concerns expressed on these platforms is a crucial issue, which is the interest of sentiment analysis (SA). SA is a natural language processing (NLP) application that focuses on automatically determining and classifying the sentiment of large amounts of text or speech [
To choose the best architecture and hyperparameters for a DL model and apply it to Arabic sentiment classification, the model is usually evaluated on different architectures and hyperparameter combinations manually or using previous successful models directly [
DE-CNN starts by generating a population, where each individual represents a configuration selected randomly from each parameter possible values. Then, DE-CNN evaluates each individual through computing fitness function value using the current configuration. After that, all individuals in the population are updated using DE algorithm operators. These steps are repeated until the terminal criteria are satisfied. To evaluate the performance of the proposed framework, various Arabic sentiment classification datasets covering Twitter data are used. The evaluations on these datasets show that the proposed framework outperformed existing methods.
The main contributions of this paper can be summarized as follows: Modeling the problem of evolving CNNs as a metaheuristic optimization task to build an Arabic sentiment classification system Using two different fitness evaluation techniques to assess the generalization of the CNN Integrating two different mutation strategies to improve the exploration and exploitation ability of DE algorithm Building and training different CNN architectures with variable number of parallel convolution layers
The rest of this paper is organized as follows: Section
In this section, we will review the most recent works related to Arabic sentiment classification and NE. Recently, many works have been conducted on SA targeting English, and other European languages. However, a small number of researches focus on the Arabic language [
NE is considered as a subfield within artificial intelligence (AI). It aims to automatically evolve neural networks architectures and hyperparameters based on the use of evolutionary algorithms. For example, Young et al. [
The differential evolution (DE) is one of the most popular evolutionary algorithms introduced by Storn and Price in [
Then, the fitness function
The previous steps are repeated until the stop condition is met. If it is satisfied, the DE stops and returns the best individual. Otherwise, it will continue by starting again from mutation phase. DE algorithm can use different strategies to perform mutation, where some of them are used to improve the exploration and exploitation ability of the search space [
Deep learning approaches known by their ability to automatically learn features have shown remarkable performance in various fields. For example, computer vision (CV) [
The parallel CNN architecture.
In this section, the proposed DE-CNN framework based on DE algorithm for evolving CNN will be presented in detail. The aim of the proposed DE-CNN is to determine the optimal architecture and parameters for a CNN, and enhance the performance of Arabic sentiment classification. To achieve this goal, the DE algorithm is used to search for the best configuration from a set of parameters used to build and train a CNN. Unlike the most existing CNN architectures for text classification that employ one dimension (1D) convolution and pooling operations, we apply 2D convolution operations in DE-CNN. To gain better performance, words from the dataset are inputted to the CNN as a word-embedding matrix with two dimensions, where each word is represented by a vector extracted from a pretrained word embedding. Therefore, the 2D convolution operations may help to extract more meaningful sentiment features and prevent destroying the structure of the word embeddings [
The proposed DE-CNN framework consists of three stages: initialization, evaluation, and update. Firstly, the initialization stage, DE-CNN starts by generating a random population
In this stage, the list of values corresponding to each parameter is generated and DE algorithm parameters such as crossover and mutation are set. Moreover, the size of solutions
Example of a random configuration
Parameters | Index range | Example value | ||
---|---|---|---|---|
Minimum | Maximum | Index |
|
|
Filters sizes list ( |
1 | 20 | 2 | [3, 4] |
Number of neurons ( |
1 | 7 | 4 | 300 |
NFCS ( |
1 | 8 | 3 | 150 |
Initialization mode ( |
1 | 4 | 1 | Uniform |
Dropout rate ( |
1 | 7 | 5 | 0.7 |
As shown in Table
CNN architecture and parameters for the example listed in Table
This stage starts by constructing the CNN model based on the parameters of the current solution
Meanwhile, in 5-fold CV, the dataset is divided into different five groups, where four of them are used to represent the training sets and one of them represents the testing set. The evaluation is repeated five times, and the average of the classification accuracy over the five runs is used as the fitness function value
In this stage, the best solution
The general framework of the proposed model DE-CNN is shown in Figure
The proposed DE-CNN framework.
Several experiments are conducted using different datasets for Arabic sentiment classification with their balanced and imbalanced shapes, where each dataset is described in Section
In this section, various Arabic sentiment datasets used to evaluate the proposed framework are introduced. Nabil et al. [
Each dataset in our experiments has been preprocessed after applying several actions to clean and normalize the Arabic text. Stopwrods has been removed after mapping each word from the dataset vocabulary to a stopwords list that contains 750 most frequent Arabic words (
In this section, we will learn the proper CNN architecture and parameters automatically using differential evolution (DE) algorithm. Parameterizing the CNN model using DE requires an individual structure (configuration). In our experiments, the individual structure consists of parameters from two layers which are the convolution layer and FC layer. In addition, only three convolution layers are set to be trained in parallel at the same time at max. In total, five different parameters are coded into each individual. We fix the optimizer and merge operation for each individual, and we change the parameters values to maximize the accuracy of the classified sentences over a test set. Moreover, CBOW58 Arabic word embeddings from [
Individual structure (configuration).
Parameter | Values |
---|---|
|
|
Filter sizes | 2 to 9 |
NFCS | 50, 100, 150, 200, 250, 300, 400, 500 |
|
|
Number of neurons | 50, 100, 200, 300, 350, 400, 500 |
Initialization mode | Uniform, LeCun uniform, normal, He uniform |
Dropout rate | 0.2 to 0.9 |
The number of filters used to perform convolution operation varies from 50 to 500 filters per filter size. The number of convolution layers trained in parallel is related to the dimension of filter sizes list. A random function is implemented to generated random filter sizes list, where each filter size can have a value that ranges from 2 to 9. The maximum number of generated filter sizes in a list is limited to three filter sizes. For example, if the generated filter sizes list is [2, 5, 7], that means three convolution layers are running in parallel, where each one of the layers uses a single filter size from the generated list. The number of neurons used to construct the fully connected layer is 50, 100, 200, 300, 350, 400, or 500. ReLU function is used as an activation function for both convolution and FC layer. Different initialization modes for the FC layer were investigated such as uniform, LeCun uniform, normal, and He uniform. To prevent the overfitting of the CNN, a regularization technique named Dropout is adopted with different rates that range from 0.2 to 0.9 and used in three positions: after embedding layer, pooling layer, and FC layer. Adam is used as an optimizer to train the CNN. Moreover, to handle sentences with variable lengths, all sentences are padded (zero-padding is adopted) so they all become of a length equal to the maximum sentence length in each dataset.
A set of measures such as precision, recall, accuracy, and F-score were used to evaluate the performance. These measures are defined as follows:
It is worth mentioning that 5-fold CV and
In this experimental series, we analyze the influence of different parameters of DE algorithm, which include population size, mutation parameter
The results based on 5-fold CV accuracy as fitness function value used with DE/best/1, different population sizes, and mutation parameter.
DE parameters | DE/best/1- |
DE/best/1- |
||||||
---|---|---|---|---|---|---|---|---|
Population size | 5 | 10 | 5 | 10 | ||||
Dataset | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) |
ArTwitter | 91.79 | 7585.98 | 91.58 | 22699.37 | 90.34 | 13523.39 | 91.68 | 22544.00 |
STD | 86.75 | 4047.13 | 86.97 | 8578.22 | 86.63 | 10524.26 | 86.97 | 17972.58 |
AAQ | 86.03 | 26251.50 | 86.10 | 56413.17 | 86.12 | 38387.81 | 86.73 | 66221.46 |
ASTD-B | 80.60 | 6200.27 | 81.11 | 5690.58 | 80.98 | 4678.05 | 81.11 | 9112.36 |
AJGT | 85.75 | 5061.12 | 88.56 | 7862.54 | 90.83 | 8486.48 | 91.72 | 18209.64 |
|
86.18 | 9829.2 | 86.87 | 20248.78 | 86.98 | 15119.99 |
|
|
After that, the value of
The effect of DE strategy is tested through using DE/best/2 as given in Table
The results based on 5-fold CV accuracy as fitness function value used with DE/best/2, different population sizes, and mutation parameter.
DE parameters | DE/best/2- |
DE/best/2- |
||||||
---|---|---|---|---|---|---|---|---|
Population size | 5 | 10 | 5 | 10 | ||||
Dataset | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) |
ArTwitter | 91.48 | 16570.82 | 91.27 | 21842.62 | 91.37 | 12498.83 | 91.73 | 20115.25 |
STD | 86.63 | 4160.50 | 87.03 | 8306.28 | 86.75 | 5956.65 | 86.75 | 9726.51 |
AAQ | 85.54 | 14978.24 | 86.76 | 68793.32 | 83.86 | 22016.83 | 85.91 | 76816.73 |
ASTD-B | 80.85 | 4408.58 | 81.48 | 8175.77 | 81.04 | 4455.77 | 82.05 | 9056.05 |
AJGT | 91.06 | 6314.36 | 92.56 | 8543.75 | 92.06 | 3259.03 | 91.67 | 21229.50 |
|
87.11 | 9286.5 |
|
|
87.01 | 9637.42 | 87.62 | 27388.81 |
From all results listed in Tables
The optimal configuration for each dataset based on the DE-CNN-5CV model.
Dataset | Filter sizes list | Number of neurons | NFCS | Initialization mode | Dropout rate |
---|---|---|---|---|---|
ArTwitter | [2, 3, 5] | 350 | 50 | LeCun uniform | 0.6 |
STD | [2, 4] | 100 | 200 | Normal | 0.6 |
AAQ | [4, 5, 6] | 200 | 300 | Normal | 0.2 |
ASTD-B | [3, 4] | 100 | 100 | He uniform | 0.3 |
AJGT | [2, 3, 6] | 200 | 300 | Uniform | 0.2 |
From the previous experimental series, we noticed that DE-CNN takes a long time to select the optimal configuration. That is due to the usage of 5-fold CV accuracy as a fitness evaluation technique in DE evaluation stage for each individual. In this experimental series, we analyze the effect of using
The results of DE/best/1 strategy, different population sizes, and mutation parameter using
DE parameters | DE/best/1- |
DE/best/1- |
||||||
---|---|---|---|---|---|---|---|---|
Population size | 5 | 10 | 5 | 10 | ||||
Dataset | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) |
ArTwitter | 92.72 | 2341.01 | 93.44 | 3921.61 | 92.66 | 3270.71 | 92.61 | 6540.61 |
STD | 87.97 | 1198.57 | 88.30 | 2643.58 | 88.42 | 3497.13 | 86.91 | 5547.04 |
AAQ | 86.97 | 4949.27 | 86.38 | 11575.05 | 87.43 | 7880.75 | 87.39 | 17957.25 |
ASTD-B | 82.04 | 1057.19 | 81.17 | 1940.99 | 82.29 | 1170.57 | 81.61 | 2114.62 |
AJGT | 91.67 | 2858.29 | 92.56 | 5039.01 | 92.50 | 3382.44 | 92.44 | 5338.89 |
|
88.27 | 2480.87 | 88.37 | 5024.05 | 88.66 | 3840.32 | 88.29 | 7499.68 |
The results of DE/best/2, different population sizes, and mutation parameter using
DE parameters | DE/best/2- |
DE/best/2- |
||||||
---|---|---|---|---|---|---|---|---|
Population size | 5 | 10 | 5 | 10 | ||||
Dataset | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) |
ArTwitter | 92.72 | 1780.57 | 92.97 | 2716.75 | 92.36 | 3462.59 | 92.56 | 4735.10 |
STD | 88.19 | 1302.01 | 87.69 | 2023.61 | 88.03 | 3117.11 | 88.14 | 3102.14 |
AAQ | 87.29 | 9051.87 | 87.11 | 20436.34 | 86.99 | 7233.76 | 86.43 | 20061.08 |
ASTD-B | 81.29 | 1140.01 | 82.23 | 1961.08 | 81.67 | 1152.20 | 82.48 | 1847.88 |
AJGT | 92.44 | 3064.20 | 92.17 | 6451.25 | 92.50 | 2723.33 | 92.83 | 6598.21 |
|
|
|
88.43 | 6717.81 | 88.31 | 3537.80 | 88.49 | 7268.88 |
According to the results in Tables
Table
The optimal configuration for each dataset based on the DE-CNN-TSF model.
Dataset | Filter sizes list | Number of neurons | NFCS | Initialization mode | Dropout rate |
---|---|---|---|---|---|
ArTwitter | [2, 3, 5] | 200 | 150 | LeCun uniform | 0.2 |
STD | [2, 4, 5] | 300 | 500 | LeCun uniform | 0.3 |
AAQ | [8, 9] | 350 | 250 | LeCun uniform | 0.3 |
ASTD-B | [5, 6, 8] | 300 | 150 | He uniform | 0.2 |
AJGT | [2, 7, 8] | 400 | 400 | Normal | 0.5 |
In this section, the influence of crossover probability used as a parameter in DE algorithm is analyzed. Two new crossover probability values were chosen, which are 0.5 and 0.8. Conducted experiments in this section were implemented using same setups as in experimental series 2, where
The results of DE/best/1, different population sizes, and crossover probability using
DE parameters | DE/best/1- |
DE/best/1- |
||||||
---|---|---|---|---|---|---|---|---|
Population size | 5 | 10 | 5 | 10 | ||||
Dataset | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) |
ArTwitter | 92.25 | 1861.73 | 93.03 | 3154.44 | 92.46 | 700.90 | 92.56 | 928.86 |
STD | 88.31 | 1253.39 | 87.86 | 3548.95 | 88.36 | 383.54 | 87.69 | 730.12 |
AAQ | 87.36 | 8404.20 | 87.88 | 12614.47 | 87.50 | 4009.01 | 87.55 | 7515.25 |
ASTD-B | 81.54 | 1313.47 | 81.79 | 2041.57 | 82.17 | 1271.83 | 81.85 | 2756.65 |
AJGT | 93.00 | 4424.83 | 92.50 | 6963.73 | 92.83 | 277.75 | 92.11 | 780.7900 |
|
88.49 | 3451.52 | 88.61 | 5664.63 |
|
|
88.35 | 2542.33 |
The results of DE/best/2, different population sizes, and crossover probability using
DE parameters | DE/best/2- |
DE/best/2- |
||||||
---|---|---|---|---|---|---|---|---|
Population size | 5 | 10 | 5 | 10 | ||||
Dataset | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) |
ArTwitter | 92.56 | 1855.46 | 93.03 | 4105.46 | 93.03 | 747.39 | 92.51 | 1590.83 |
STD | 87.47 | 2927.36 | 87.47 | 6936.65 | 88.03 | 444.72 | 87.64 | 3682.39 |
AAQ | 87.39 | 8774.22 | 87.48 | 11471.57 | 87.39 | 6166.45 | 87.57 | 7590.81 |
ASTD-B | 81.98 | 1015.00 | 81.48 | 2619.95 | 81.85 | 2330.10 | 82.10 | 3463.05 |
AJGT | 92.39 | 3783.20 | 92.56 | 7449.43 | 92.67 | 636.44 | 92.44 | 1340.59 |
|
88.36 | 3671.05 | 88.40 | 6516.61 | 88.59 | 2065.02 | 88.45 | 3533.53 |
From Table
From this experimental series, we can conclude that the best DE-CNN model is constructed at
The optimal configuration for each dataset based on the DE-CNN-TSC model.
Dataset | Filter sizes list | Number of neurons | NFCS | Initialization mode | Dropout rate |
---|---|---|---|---|---|
ArTwitter | [2, 5, 9] | 50 | 200 | LeCun uniform | 0.2 |
STD | [3, 4, 5] | 300 | 300 | LeCun uniform | 0.3 |
AAQ | [2, 5, 8] | 300 | 500 | Normal | 0.5 |
ASTD-B | [2, 3, 8] | 350 | 150 | LeCun uniform | 0.3 |
AJGT | [2, 3, 8] | 350 | 50 | Uniform | 0.2 |
In this experimental series, a general model only having one optimal configuration is selected for all datasets. The general model is constructed by selecting the most frequent parameters extracted from all optimal configurations during all previous experimental series. We conducted previously three experimental series where we selected the best DE-CNN model that produces the optimal configuration for each dataset. From experimental series 1, 2, and 3, we selected DE-CNN-5CV (
The frequency for each value with the best parameters over all datasets after combining DE-CNN-5CV and DE-CNN-TSC parameters. (a) Dropout rate, (b) NFCS, (c) initialization mode, (d) number of neurons, and (e) filter sizes list.
The frequency for each value with the best parameters over all datasets after combining DE-CNN-TSF and DE-CNN-TSC parameters. (a) Dropout rate, (b) NFCS, (c) initialization mode, (d) number of neurons, and (e) filter sizes list.
From Figures
General models configurations.
Combined DE-CNN models | Generated model name | Filter sizes list | Number of neurons | NFCS | Initialization mode | Dropout rate |
---|---|---|---|---|---|---|
DE-CNN-TSF and DE-CNN-TSC | DE-CNN-G1 | [2, 3, 8] | 300 | 150 | LeCun uniform | 0.2 |
DE-CNN-G2 | [2, 3, 8] | 300 | 150 | LeCun uniform | 0.3 | |
DE-CNN-5CV and DE-CNN-TSC | DE-CNN-G3 | [2, 3, 8] | 350 | 300 | Normal | 0.2 |
DE-CNN-G4 | [2, 3, 8] | 350 | 300 | LeCun uniform | 0.2 |
In order to determine the optimal general model from these four models listed in Table
Comparison between general models using 10-fold CV.
Dataset | DE-CNN-G1 | DE-CNN-G2 | DE-CNN-G3 | DE-CNN-G4 | ||||
---|---|---|---|---|---|---|---|---|
Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) | Accuracy | Time (s) | |
ArTwitter | 93.28 | 552.24 | 93.23 | 723.64 | 92.77 | 1061.17 | 93.13 | 1254.12 |
STD | 88.14 | 458.10 | 87.75 | 486.06 | 87.97 | 728.43 | 88.14 | 1034.33 |
AAQ | 87.50 | 1601.06 | 87.97 | 1610.54 | 87.69 | 2452.71 | 87.50 | 2434.58 |
ASTD-B | 82.48 | 368.28 | 82.29 | 185.35 | 82.60 | 873.89 | 82.41 | 785.69 |
AJGT | 93.06 | 690.82 | 92.61 | 699.20 | 92.94 | 1318.60 | 92.83 | 1285.41 |
|
|
|
88.77 | 740.96 | 88.79 | 1286.96 | 88.80 | 1358.83 |
In this section, the performance of DE algorithm is compared against two metaheuristic methods which are particle swarm optimization (PSO) [
Comparison between DE, PSO, and GA.
Dataset | DE | GA | PSO | ||||||
---|---|---|---|---|---|---|---|---|---|
Time (s) | 5-Fold CV | 10-Fold CV | Time (s) | 5-Fold CV | 10-Fold CV | Time (s) | 5-Fold CV | 10-Fold CV | |
ArTwitter | 700.90 | 92.46 | 92.61 | 767.73 | 92.41 | 92.92 | 1326.14 | 92.82 | 92.61 |
STD | 383.54 | 88.36 | 88.25 | 1122.77 | 87.03 | 87.80 | 2079.96 | 87.25 | 88.14 |
AAQ | 4009.01 | 87.50 | 88.02 | 6969.59 | 87.41 | 87.78 | 3698.91 | 87.13 | 86.89 |
ASTD-B | 1271.83 | 82.17 | 82.29 | 1693.62 | 81.85 | 81.73 | 683.22 | 82.17 | 82.29 |
AJGT | 277.75 | 92.83 | 93.17 | 2320.46 | 92.33 | 92.72 | 1122.90 | 92.44 | 92.33 |
|
|
|
|
2574.83 | 88.21 | 88.59 | 1782.23 | 88.36 | 88.45 |
From Table
In this section, the results of the proposed DE-CNN models are compared with state-of-the-art methods on different Arabic sentiment classification datasets. For a fair comparison with the state-of-the-art methods, the optimal selected configuration from each experimental series is used to build the CNN, where each CNN is evaluated using 10-fold CV, and the results are listed in Table CNN-base: a CNN similar to the model described in Section Combined LSTM: a model proposed by Al-Azani and El-Alfy [ Stacking ensemble (eclf14): a model based on stacking ensemble presented in [ NuSVC: a model is employed in [ SVM(bigrams): a suport vector machine classifier trained on TF-IDF as weighting scheme through bigrams was evaluated in [
Comparisons with other models.
Dataset | Measures | Our models | State-of-the-art models | ||||
---|---|---|---|---|---|---|---|
CNN-base | DE-CNN-5CV | DE-CNN-TSF | DE-CNN-TSC | DE-CNN-G1 | Combined LSTM [ | ||
ArTwitter | Acc | 90.95 | 93.28 | 92.25 | 92.61 | 93.28 | 87.27 |
Prc | 89.76 | 93.30 | 90.85 | 91.24 | 92.14 | 87.36 | |
Rec | 93.03 | 93.74 | 94.45 | 94.75 | 95.05 | 87.27 | |
F1 | 91.32 | 93.44 | 92.57 | 92.91 | 93.55 | 87.28 | |
|
|||||||
STD | CNN-base | DE-CNN-5CV | DE-CNN-TSF | DE-CNN-TSC | DE-CNN-TS-G1 | Stacking (eclf14) [ | |
Acc | 87.24 | 88.31 | 88.36 | 88.25 | 88.14 | 85.28 | |
Prc | 80.26 | 79.66 | 80.41 | 79.07 | 79.09 | 61.04 | |
Rec | 64.82 | 71.58 | 70.69 | 72.25 | 71.36 | 67.14 | |
F1 | 71.39 | 75.33 | 75.14 | 75.36 | 74.93 | 63.95 | |
|
|||||||
AAQ | CNN-base | DE-CNN-5CV | DE-CNN-TSF | DE-CNN-TSC | DE-CNN-G1 | NuCSV [ | |
Acc | 84.69 | 87.15 | 87.43 | 88.01 | 87.50 | 80.21 | |
Prc | 83.62 | 87.38 | 86.75 | 88.08 | 86.70 | 83.00 | |
Rec | 86.48 | 87.05 | 88.49 | 88.07 | 88.77 | 76.50 | |
F1 | 85.00 | 87.16 | 87.58 | 88.03 | 87.70 | 79.62 | |
|
|||||||
ASTD-B | CNN-base | DE-CNN-5CV | DE-CNN-TSF | DE-CNN-TSC | DE-CNN-G1 | Combined-LSTM [ | |
Acc | 80.47 | 81.60 | 80.72 | 82.28 | 82.48 | 81.63 | |
Prc | 81.08 | 81.89 | 82.18 | 82.66 | 81.86 | 82.32 | |
Rec | 79.72 | 81.35 | 78.84 | 81.85 | 83.48 | 81.63 | |
F1 | 80.27 | 81.54 | 80.32 | 82.17 | 82.57 | 81.64 | |
|
|||||||
AJGT | CNN-base | DE-CNN-5CV | DE-CNN-TSF | DE-CNN-TSC | DE-CNN-G1 | SVM (bigrams) [ | |
Acc | 90.16 | 92.72 | 92.56 | 93.17 | 93.06 | 88.72 | |
Prc | 89.62 | 91.80 | 91.86 | 92.79 | 92.36 | 92.08 | |
Rec | 91.00 | 93.89 | 93.44 | 93.67 | 93.89 | 84.89 | |
F1 | 90.24 | 92.81 | 92.63 | 93.19 | 93.10 | 88.27 |
From Table
Moreover, the average performance results over all datasets for DE-CNN models and CNN-base model are depicted in Figure
Average of results for the methods on various datasets.
From all the previous results, we can conclude that the proposed DE-CNN-G1 is the best model over all the other models in this study, where the DE algorithm finds the optimal configuration to build the proper CNN model used to improve Arabic sentiment classification. Moreover, after analyzing DE parameters influence, it has been found that crossover at 0.8, mutation parameters at 0.3, and DE/best/2 improved the ability of DE to find the optimal CNN configuration. With all this accuracy improvement and computational time saving, the proposed DE-CNN in general and DE-CNN-G1 in specific still need to be improved since finding the optimal parameters for the metaheuristic algorithm such as mutation, population size, and the DE strategy require more exploration and exploitation. One of the major conclusions that can be drawn from the results obtained is that the measuring technique of the fitness function value is crucial to the exploration of the architecture and parameters search space of the CNN. Moreover, training a deep neural network usually relies on randomness to perform better. Various forms of randomness can be applied in different stages when training the network such as random initialization of the network weights, setting a regularization using dropout, and during the optimization phase. This phenomenon may affect the stability and repeatability of the obtained results on different evaluation techniques such as 5-fold CV and 10-fold CV.
This paper proposed a framework that adopts a differential evolution (DE) algorithm for evolving the convolutional neural network (CNN) and generating an Arabic sentiment classification system. The DE mutation strategies help the system by slightly increasing the performance in terms of accuracy and time cost. To further assess the stability of the proposed framework, we built and evaluated two general DE-CNN models using the most frequent parameters extracted from the all optimal configurations. Simulation results show that the two DE-CNN models DE-CNN-TSC and DE-CNN-G1 are robust and stable, and they can outperform state-of-the-art methods.
According to the promising results of the proposed DE-CNN model for enhancing the Arabic sentiment classification, in the future work, the proposed model can be extended and applied to several applications such as image classification, object detection, and big data classification.
The resources used to support the findings of this study have been deposited in the Github repository (
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was in part supported by the National Key Research and Development Program of China (grant no. 2017YFB1402203), the Defense Industrial Technology Development Program (grant no. JCKY2018110C165), Hubei Provincial Natural Science Foundation of China (grant no. 2017CFA012), and the Key Technical Innovation Project of Hubei Province of China (grant no. 2017AAA122).