Developing an Intelligent System with Deep Learning Algorithms for Sentiment Analysis of E-Commerce Product Reviews

Most consumers rely on online reviews when deciding to purchase e-commerce services or products. Unfortunately, the main problem of these reviews, which is not completely tackled, is the existence of deceptive reviews. The novelty of the proposed system is the application of opinion mining on consumers' reviews to help businesses and organizations continually improve their market strategies and obtain an in-depth analysis of the consumers' opinions regarding their products and brands. In this paper, the long short-term memory (LSTM) and deep learning convolutional neural network integrated with LSTM (CNN-LSTM) models were used for sentiment analysis of reviews in the e-commerce domain. The system was tested and evaluated by using real-time data that included reviews of cameras, laptops, mobile phones, tablets, televisions, and video surveillance products from the Amazon website. Data preprocessing steps, such as lowercase processing, stopword removal, punctuation removal, and tokenization, were used for data cleaning. The clean data were processed with the LSTM and CNN-LSTM models for the detection and classification of the consumers' sentiment into positive or negative. The LSTM and CNN-LSTM algorithms achieved an accuracy of 94% and 91%, respectively. We conclude that the deep learning techniques applied here provide optimal results for the classification of the customers' sentiment toward the products.


Introduction
Web 3.0 has the main features of the semantic web, artificial intelligence, connectivity, etc., allowing people to use social media to communicate and express their opinions about real-world events. In this context, the analysis of users' reviews is essential for companies to grow worldwide. is makes opinion mining a key player in the analysis of reviews and discussions. Nowadays, companies analyze this type of information to improve the quality and performance of the products and, consequently, survive in a competitive market.
Opinion mining can be expressed as the reason behind any action or movement that people use to follow the reason [1].
Within the huge amount of data generated on the Internet, important information is hidden. Data mining techniques are used to extract information and solve various problems. Online product reviews have two important aspects under which data are stored on the Internet. Commercial websites are platforms where users express their sentiment or opinion on several topics. Sentiment analysis refers to a broad area of natural language processing (NLP), computational linguistics, and text mining [2]. e use of these techniques leads to the extraction and analysis of the opinion on a given product. Opinion mining defines an opinion as positive or negative, and sentiment analysis defines the polarity value of a user's opinion on a particular product or service. e current approaches of sentiment analysis are mainly [3] machine learning algorithms [4], lexicon-based methods, [5] and hybrid models [6,7].
Negation is a prevalent morphological development that impacts polarity and, therefore, must be reflected in the assessment of sentiment. Automatic detection of negation in news articles is required for numerous text processing applications, including sentiment analysis. Here, we explored the role and importance of users' reviews concerning particular products on the decision using sentiment analysis. We present experimental results that demonstrate that sentiment analysis is appropriate to this end. e goal was to determine the polarity of the natural language of texts written in product reviews. e existing straightforward approaches are statistical, based on frequencies of positive and negative words. Recently, researchers discovered new ways to account for other aspects of content, such as structural or semantic features. e present work focuses on the identification of documentlevel negation by using multiple computational methods. In recent years, with the exponential growth of smartphone use, many people are connected to social networking platforms, like Facebook, Twitter, and Instagram. Social networks have become a field to express beliefs or opinions, emotions, thoughts, personal issues, places, or personalities.
ere are numerous studies applying sentiment analysis, some of which used real-time data from Twitter for extracting patterns by employing the Twitter-streaming application programming interface (API) [8,9]. e sentiment analyzers are divided into two types: SentiWordNet [10] and WordNet [11]. Sentiment analysis uses positive and negative scores to classify opinions. By developing a model to analyze word sequence disambiguation [12], the Twitterstreaming API was used to gather data concerning the Indonesian presidential elections [13]. Needless tweets were removed, and the remaining data were investigated for sentimental aspects by dividing each tweet into numerous sub-tweets and calculating the sentiment polarity of the subtweets for predicting the consequence of the elections. e mean absolute error metric was used to evaluate the results, it noted that the prediction error was 0.6 better than the previous study [14]. To predict the Swedish election outcome by using Twitter data, a system was developed [15]. To predict the outcome of the European elections, a new method was designed that studied the similarity of the structure with the outcome of the vote. Another method was created to test Brazilian municipal elections in six cities [16]. In this methodology, sentiment analysis was applied along with a stratified sample [17] of users to compare the characteristics of the findings with the actual voters.
Many researchers have used machine learning and artificial intelligence to analyze the sentiment of tweets [18,19]. In [20], the Naive Bayes, support vector machine (SVM) [21], and information entropy-based [22] models were applied to classify product reviews. A hybrid machine learning algorithm based on Twitter opinion mining was proposed in [23]. Heydari et al. [24] proposed time series model for fraudulent sentiment reviewer analysis. Hajek et al. [25] developed a deep feedforward neural network and convolution model to detect fake positive and negative review in an Amazon dataset. Long et al. [26] applied LSTM with multi-head attention network for predicting sentimentbased text using China social media dataset. Dong et al. [27] proposed supervised machine linear regression for predicting sentiment of customers presented in online shopping data using sentiment analysis learning approaches.
Researchers have been focusing on developing powerful models to deal with the ever-increasing complexity of big data [28,29], as well as expanding sentiment analysis to a wide range of applications [30,31], from financial forecasting to marketing strategies [32] among other areas [33,34]. However, only a few of them analyzed different deep learning approaches to give real evidence of their performance [35]. Deep learning techniques are becoming increasingly popular. When assessing the performance of a single approach on a single dataset in a specific area, the results suggest that CNN and RNN have relatively good accuracy. Based on AdaBoost combination, Gao et al. [36] proposed CNN model for sentiment analysis in user-generated text. In this vein, Hassan and Mahmood [37] demonstrated that the CNN and RNN models overcame the problem of short texts in deep learning models.
Some traditional approaches, which are assisted by machine learning techniques, are based on aspects of the used language. Using the domain of movie opinions, Pang et al. [18] studied the performance of various machine learning algorithms, including Naive Bayes, maximum entropy, and SVM. By using SVM with unigrams, they achieved an accuracy of 82.9%. NLP is typically used to extract features used by a sentiment classifier. In this aspect, the majority of NLP strategies are centered on the usage of n-grams but the use of a bag-of-words strategy is also common [38,39]. Numerous studies have demonstrated significant results when employing the bag-of-words as a text representation for item categorization [40][41][42][43][44].
Researchers have taken advantage of NLP themes to develop deep learning models based on neural networks with more than three layers, according to the journal Nature. Most of these studies found that deep learning models accurately detect sentiment in various situations. e CNN [45,46], RNN [47], deep neural network [48], recursive neural deep model [49], and the attention-based bidirectional CNN-RNN [50] models are some representative examples. Some researchers combine models, which are then referred to as hybrid neural networks. e hierarchical bidirectional RNN is an example of a hybrid neural network [51]. e main issue with sentiment analysis of product reviews in the e-commerce domain is the existence of fake reviews that lead customers to select undesired products [52]. e main contributions of the proposed research are the following: 2 Computational Intelligence and Neuroscience (1) e generation of a sentiment score using a lexiconbased approach for each product review of the dataset. (2) Labeling the review texts as negative if the generated sentiment score is <0 or positive if the score is >1. (3) e combination of all product reviews into a single data frame to obtain more sentiment-related words. (4) Improving the accuracy by developing a hybrid deep learning model combining the CNN and LSTM models for the product-related sentiment classification. (5) Comparing the classification performance of the CNN-LSTM and LSTM models.

Materials and Methods
e proposed methodology for predicting the review-related sentiments is based on the deep learning algorithms presented here. e phases of the proposed system are the following: dataset collection, data preprocessing, generating the sentiment score, polarity calculation, applying the CNN-LSTM model, evaluation metrics, and analysis of the results. Figure 1 shows the framework of the proposed methodology used in the present study.

Datasets.
To evaluate the proposed system, the dataset [53] was collected from reviews on the Amazon website in JSON file format. Each JSON file comprises a number of reviews (Table 1). e dataset includes reviews of laptops, mobile phones, tablets, televisions, and video surveillance products.
e data preprocessing includes various steps, such as lowercase processing with meta-features like the reviewer's ID, the product ID, and the review text.

Data Preprocessing.
We implemented different preprocessing steps aiming at cleaning the review texts so that they are easy to process. e following preprocessing methods were performed on the dataset as a whole.

Lowercase.
It entails converting whole words of the review text into lowercase words.

Stopword Removal.
Stopwords are widely used words in a language, such as "the," "a," "an," "is," and "are". As these words do not carry any information significant for the model, they were removed from the content of the review.

Punctuation Removal.
All punctuation marks in the review texts were removed.

One-Word Review Elimination.
Reviews that included only one word were eliminated.

Contraction Removal.
is process replaces a word originally written in the short form with the respective full form; for instance, "when've" becomes "when have." 2.2.6. Tokenization. Each sentence of the review texts was divided into small pieces of words or tokens.

Part-of-Speech Tagging.
is step is used to tag each word present in the sentence with a POS tag, for example, "VB" for a verb, "AJJ" for an adjective, and "NN" for a noun.

Score Generation.
e review text was evaluated for sentiment, and a score was generated. For calculating the sentiment score, the dataset was matched with opinion lexicon [53] that consists of 5,000 positive words and 4,500 negative words with their respective scores. e sentiment score was calculated for each review text based on the scores of the lexicon. e review text was labeled as positive if the score was >0; otherwise, it was labeled as negative.

Word Embeddings.
We calculated numerical vectors with every preprocessed sentence in the product review dataset using the "Word embeddings" method. To create word indices, we first turned all of the review text terms into sequences. e Keras text tokenizer [54] is being used to obtain those indices. We made sure that no term or word gets a zero index in the tokenizer, and that the vocabulary size is adjusted properly. en, for each single word in the training and testing sets, a distinctive index is generated,  which is employed to create numeric vectors of all review texts of the dataset. Figure 2 presents the structure of the CNN-LSTM model used for sentiment classification of customers' reviews using an Amazon dataset.

Embedding Layer.
is is the initial layer of the CNN-LSTM model that is used to transform each word in the training dataset into an actual-valued vector, meaning that a set of sentiment-related words are constructed and transformed into a numerical form. is process is known as word embedding. e embedding layer consisted of three components: the vocabulary size (maximum features; 15,000 words), the embedding dimensions (50), and the input sequence length (400 words).

Dropout Layer.
e main task of this layer is to avoid the overfitting of the model [52]. Here, we assigned the value 0.4 to the dropout rate parameter, where this value has a range between 0 and 1. e main function of the dropout layer is to arbitrarily deactivate a set of neurons in the embedding layer, where every neuron denotes the dense exemplification of a sentiment word in a review text.
CNN is a deep learning technique used in different areas such as natural language preprocessing tasks, computer vision, and medical image processing.

Convolution Layer.
e third layer of the CNN-LSTM model is used for the extraction of features from the input matrix. It uses n convolution filters that operate over the elements of the input sequence matrix to find the convolutions for each sequence. We set the number of filters to 64 and the size of the filter kernel to 3 × 3.

Max Pooling Layer.
is layer performs downsampling beside the spatial dimensionality of the given input sequences. It considers the maximum value of all input features in the pool of each filter kernel. It has assigned to 5 × 5 kernel.

LSTM Layer.
LSTM is a type of RNN capable of learning long-term dependence [52]. We used an LSTM layer and assigned it to 50 hidden units toward the next layer. One of the most notable advantages of employing a convolutional neural network as feature extraction technique beyond a traditional LSTM is the reduction in the aggregating amount of features.
roughout the feature extraction process, a sentiment classification model uses these features (words) for prediction of the product review text as positive or negative sentiment. LSTM executes precalculations for the input sequences before providing an output to the last layer of the network. In every cell, four discrete computations are conducted based on four gates: input (i t ), forget (f t ), candidate (c t ), and output (o t ). e structure of the LSTM model is presented in Figure 3. e equations for these gates are as follows: where sig and tanh are the sigmoid and tangent activation functions, respectively, X is the input data, W and b represent the weight and bias factor, respectively, C t is the cell state, c ∼ t is the candidate gate, and h t refers to the output of the LSTM cell.

Dense Layer (Fully Connected Layer).
is is a hidden layer in the CNN-LSTM model. It consists of 512 artificial connected neurons that connect all neurons of the network. e function applied to this layer is the rectified linear unit described by the following equation:

Sigmoid Activation Function.
It is the first layer that detects and classifies the output classes (positive or negative sentiment). e sigmoid function formula is given as follows (Algorithm 1): (3)

Evaluation Metrics.
To evaluate the proposed models (CNN-LSTM and LSTM), the accuracy, precision, recall, F1score, and specificity metrics were used. e performance measurements are presented below: where true positive (TP) represents the total number of samples that are successfully classified as positive sentiment, false positive (FP) is the total number of samples that are incorrectly classified as negative sentiments, true negative (TN) denotes the total number of samples that are successfully classified as negative sentiment, and false negative (FN) represents the total number of samples that are incorrectly classified as positive sentiments.

Experimental Results
In this section, we present the experimental results of the application of the CNN-LSTM and LSTM models for the analysis and prediction of sentiment in the e-commerce domain. We used hardware with 4 GB RAM and an i7 2800 CPU and ran the experiments on the Jupyter environment. e evaluation metrics (accuracy, precision, F1-score, recall, and specificity) were employed to examine the proposed system. e word cloud (sentiment words and product names) of the dataset is presented in Figure 4, which shows graphical representations of words (large font words) that give greater importance to that seem more repeatedly in the used product review dataset.

Data Splitting.
In this phase, we divided the dataset that consisted of 13,057 product reviews into 70% training, 10% validation, and 20% testing datasets. en, the CNN-LSTM and LSTM models were applied to detect and classify the review texts into positive or negative. Table 2 shows the splitting of the dataset. Output: How much c (t) should be exposed?
e empirical results of our system were compared with the results of [28] and are shown in Table 4.
e CNN-LSTM model achieved an accuracy of 94%. Computational Intelligence and Neuroscience

Conclusion
Recently, sentiment analysis has become a valuable tool for the generation and evaluation of different types of data, helping the decision-making processes that lead to the improvement of businesses and companies. Social networking creates a large amount of data that require processing and analysis to obtain relevant insights. In the present study, the experimental dataset was collected from the Amazon website and included reviews of laptops, mobile phones, tablets, televisions, and video surveillance products. e lexicon-based approach was used for the calculation of the sentiment score for each review text. e output of the preprocessed data was classified with the LSTM and CNN-LSTM models.
e experimental results showed that our model was satisfactory in all the measurement metrics.

Conflicts of Interest
e authors declare that they have no conflicts of interest.