LSTM-DGWO-Based Sentiment Analysis Framework for Analyzing Online Customer Reviews

Sentiment analysis furnishes consumer concerns regarding products, enabling product enhancement development. Existing sentiment analysis using machine learning techniques is computationally intensive and less reliable. Deep learning in sentiment analysis approaches such as long short term memory has adequately evolved, and the selection of optimal hyperparameters is a significant issue. This study combines the LSTM with differential grey wolf optimization (LSTM-DGWO) deep learning model. The app review dataset is processed using the bidirectional encoder representations from transformers (BERT) framework for efficient word embeddings. Then, review features are extracted by the genetic algorithm (GA), and the optimal review feature set is extracted using the firefly algorithm (FA). Finally, the LSTM-DGWO model categorizes app reviews, and the DGWO algorithm optimizes the hyperparameters of the LSTM model. The proposed model outperformed conventional methods with a greater accuracy of 98.89%. The findings demonstrate that sentiment analysis can be practically applied to understand the customer's perception of enhancing products from a business perspective.


Introduction
With the evolution of technology, business owners are releasing applications with diferent functionalities. Mobile apps are software-based applications installed on smartphones to ofer a user-friendly experience. Diferent mobile applications, including entertainment, education, and business, have been released. Due to the proliferation of mobile applications, people are now shopping online through apps downloaded to their mobile devices instead of traditional web browsers. Approximately 66% of smartphone users utilize mobile apps [1]. Businesses make an efort to make their applications efective, convenient to use, and error free. Tey keep improving user experiences and application services by adding new features and functions [2]. App developers need a way to successfully compile user feature requests, feedback, and general thoughts to satisfy user needs [3]. Apps seek user ratings as an input to provide new services and enhance existing ones. Users can rate items using stars and provide text reviews for products. However, reviews include enormous amounts of unstructured information that cannot be manually evaluated.
Sentiment analysis (SA) is a feld that analyses how customers react to products and services. Additionally, it measures how these sentiments are expressed in their attitudes and assessments. Te connections between SA and product design still need to be explored. Te SA's primary objective is to determine the polarity of the product on web commerce. As a result, it identifes the emotional state and uncovers the subjective data concealed in user experiences, and analyzing these sentiments in light of user feedback is crucial. Sentiment analysis and text mining need to be more consistent in driving suitable decisions, improving market competitiveness, and building customer trust [4]. Te advantages of app review sentiment analysis are shown in Figure 1. Online reviews infuenced 90% of consumer choices [5]. Most consumers choose the option with a high star rating because they believe it to be supported by favorable evaluations. However, the ratings provided by people on Internet platforms do not necessarily correspond to written reviews. Te intelligent sentiment analysis (SA) system enables app developers to customize the products as per the requirements and interests of the customers.
Te overall accuracy of existing research on sentiment analysis in polarity identifcation signs of progress can be improved by theoretical and technological difculties [6]. Te naive Bayes (NB), support vector machine (SVM), maximum entropy (MaxEnt), random forest (RF), and conditional random feld frameworks are instances of conventional machine learning (ML) techniques that are often employed in sentiment analysis. Word2Vec, Glove, and FastText are some approaches that can automatically extract feature vectors from the text. However, the typical ML method still requires human communication to extricate the emotional aspects of the data from the input text [7]. Due to computational complexity and the selection of incomplete feature vectors in sentiment analysis, these ML techniques showed lower accuracy [8].
Sentiment analysis has made extensive use of deep learning (DL). Deep learning requires more processing power and storage than traditional ML algorithms since it uses more hidden layers, but accuracy has considerably improved. DL designs, including recurrent neural networks (RNNs), convolutional neural networks (CNNs), and gated recurrent unit (GRU) architectures, have been successfully used in text mining applications [9]. RNNs are helpful for many text-processing applications, but they encounter disappearing and expanding gradients when the input data contain long-term dependencies [10]. LSTM, however, improved sentiment analysis over RNN.
When a consumer accesses an online platform for purchase, they frst review the feedback left by other customers. Based on that, the consumer makes a purchase decision. To improve the quality of customer service, many organizations are turning to the same problem-solving techniques [11]. However, the efectiveness of DL techniques for sentiment relies on how textual statements are characterized. Te large dimensionality and sparsity of feature vectors resulting from traditional text representation techniques like bag-of-words and term frequency approaches reduce the accuracy of sentiment analysis. Terefore, a compelling feature extraction and selection approach is needed to raise the accuracy of sentiment analysis [12]. Deep learning optimization approaches have been used in a limited number of studies on user sentiment analysis for mobile apps. Te signifcant contribution of the work is as follows: (1) Te BERT model has been employed to obtain effcient word embeddings and automatic labels for online reviews in the preprocessing stage due to the quality of labeling being critical for efcient learning.
(2) A genetic algorithm-based feature extraction method extracts suitable features from word embeddings, and the frefy algorithm-based feature selection method is used to obtain the optimum app review feature subset, which enhances the sentiment analysis process.
(3) A deep neural network framework has been proposed to process online application reviews in a scalable way without afecting performance. An optimized deep learning technique, i.e., the LSTM-DGSO, is applied to classify the reviews into multiclasses like positive, negative, and neutral. (4) A comparative analysis has been presented and illustrates how the proposed model can be efectively used for sentiment analysis.
Te remaining paper is formulated as follows: Section 2 discusses the related works. Te proposed work, dataset, and techniques are illustrated in Section 3. Te results and discussions are shown in Section 4. Finally, the paper is concluded in Section 5 with future work directions.

Literature Review
Tis section deals with recent studies conducted on online customer reviews using ML and DL approaches.

Machine Learning-Based Sentiment Analysis of Customer
Reviews. Xia et al. [13] presented a conditional random feld technique to extricate the emotional cues and SVM to identify the sentiment polarity of the reviews for classifying online reviews. Te asymmetrical weighting of features used in this approach resulted in inconsistent accuracy. Tang et al. [14] presented the maximum entropy-based joint aspectdependent sentiment topic approach (MaxEnt-JABST) to increase accuracy and performance in extracting aspects and opinions of online reviews. It has concerns with sentiment analysis across various domains to increase accuracy and performance in extracting aspects and opinions of online reviews. Shah et al. [15] proposed a novel strategy that included nine abstract-level dynamic analyses of user reviews using POS tagging and n-gram classifcation, followed by classifcation using NB and MaxEnt classifers. However, the performance of sentiment analysis needs improvement for sentiment classifcation. Saad et al. [16] used multinomial logistic regression (MNB), support vector regression (SVR), decision trees (DTs), and RF algorithms. Jiang et al. [17] integrated SVM with IPSO (improved particle swarm optimization) to categorize attitudes.

Deep Learning-Based Sentiment Analysis of Customer
Reviews. Chen et al. [18] proposed the deep belief network and sentiment analysis (DBNSA) to analyze user reviews and enhance user rating categorization. Te categorization of user ratings by DBNSA involves complicated computing operations. A neural network model that includes the extraction of user behavior data from tweets was suggested by the authors of [19]. Asghar et al. [20] designed Senti-eSystem, a tool for assessing customer satisfaction using a hybrid fuzzy and deep neural network. Performance sufers due to the imbalance in the dataset gathered for this study. Te multichannel convolution and bidirectional GRU multihead attention capsule (AT-MC-BiGRU-capsule) is a model for text sentiment analysis that replaces scalar 2 Computational Intelligence and Neuroscience neurons with vector neurons and employs capsules to defne text emotions, as presented in [21]. Te model lacks stability issues. Zulqarnain et al. [22] presented a two-state GRU (TS-GRU) depending on the feature attention process, focusing on word-feature capturing and sequential modeling to discover and classify the sentiment polarity. Te TS-GRU approach could be more computationally challenging. Alam et al. [23] suggested a domain-specifc distributed word representation with a dilated CNN for social media SA to create smart city apps but considered one domain. RNN was utilized in [24] to anticipate client opinions based on web reviews. Glove feature extraction produced unsatisfactory results when combined with the RNN algorithm. A hybrid deep CNN and LSTM models were suggested by the authors of [25] in the e-commerce industry. However, this strategy uses more computing resources. Te best aspects from the online review were extracted by the authors of [26] using ML methods, and the selected features were subsequently put into the CNN for sentiment analysis. Te suggested approach fexibility and computational efciency have remained the same.

Sentiment Analysis of App Reviews. Using a latent
Dirichlet allocation (LDA) model to fnd sentiments and a logistic regression model to determine the variables infuencing E-rider satisfaction, Aman et al. [27] presented the app store comments from two prominent micromobility businesses. A biased allocation was produced by the LDA model that overemphasized topics relating to user experience and app performance. Rahman et al. [28] utilized ML classifers, including K-nearest neighbor (KNN), RF, SVM, DT, and NB, together with NLP-based approaches like N-gram, bag-of-words, and TF-IDF. Tey discovered and built a well-ftted model to recognize user opinions on mobile applications. Aslam et al. [29] proposed the CNN-based DL methodology to categorize app reviews. However, this technique has yet to look at location-based and temporal traits. Jha et al. [30] suggested an improved dictionary-based multilabel classifcation method to categorize nonfunctional needs in user comments taken from samples of Android and iOS applications. However, the accuracy of this method could have been higher. Venkatakrishnan et al. [31] used an improved dataset to use NB, XGBoost, and multilayer perceptron (MLP) to examine the numerous app-centric variables and predict user ratings. Te functionality of the model needs to be enhanced. Rustam et al. [32] used logistic regression, RF, and AdaBoost classifer approaches to categorize the reviews of Shopify applications. Due to traditional feature selection approaches, lower accuracy is obtained. RF, SVM, and NB were built for sentiment analysis of English textual comments obtained from three digital payment apps [33]. Tese techniques need to be more accurate in classifying emotions and could have been more cost efective. Tchakounte et al. [34] presented a model using NB to get information valuable for enhancing the security features of mobile applications. With diferences in time, the fndings could be more consistent.
Ireland et al. [35] employed logistics to demonstrate that sentiment classifcation of user-generated big data might be utilized to compare airline service quality to existing surveybased methods by analyzing real-time customer views. In   Computational Intelligence and Neuroscience particular, the article will look at how user-generated big data sentiment analysis might be utilized to study airline service quality. Oyebode et al. [36] presented a method to evaluate health record data using machine learning approaches. Lin et al. [37] ofered a sentiment analysis model of app reviews using deep learning. Bose et al. [38] proposed a model using the NRC emotion lexicon and used six product reviews. Tey presented how sentiment analysis assists in determining the consumers' behaviors. Zaki et al. [39] proposed a methodology for determining the signifcant labels denoting customer sentiments. Te titles of the comments, which typically include the terms that most efectively characterize the customer experience, were combed through to locate relevant labels. Te fndings indicate that the labels developed from the titles are valid for analyzing the feelings expressed in the comments. Iqbal et al. [40] suggested that predicting attitudes demonstrates superior, or at the very least comparable, outcomes with much reduced computational complexity. Te fndings of this study highlight the critical signifcance of performing sentiment analysis on the content of consumer reviews and social media platforms to acquire valuable insights. Akram et al. [41] suggested a technique for short clustering text using a deep neural network. Tis approach learns clustering aims by transforming the high-dimensional feature space into a lower-dimensional feature space. Abbasi et al. [42] proposed a new method for authorship detection that combines ensemble learning, DistilBERT, and more traditional machine learning strategies. A count vectorizer and bigram term frequency-inverse document frequency (TF-IDF) are used in the suggested method to extract essential qualities. Witte et al. [43] ofered an international survey for online consultations in mental health care using statistical analysis. Assaker et al. [44] proposed a model for the traveler using online travel reviews through an extended unifed theory of acceptance and use of technology. Tis analysis improves the interpretation of the explicatory variables for online reviews. Assaker [45] presented the efects of trustworthiness and expertise on usage intention toward usergenerated content and online reviews among female, male, younger, and older travelers.
Machine learning techniques like a binary support vector machine cannot be the most efective categorization when analyzing online customer reviews. Clustering, classifcation, regression, and rule extraction are some of the machine learning issues. Consequently, deep learning algorithms are used to analyze customer evaluation sentiments efectively, and the LSTM framework is effective for sentiment analysis. However, building an LSTM model with optimum hyperparameters is a complex problem. Te application of the LSTM framework with optimum hyperparameters in sentiment analysis of online app reviews has yet to be explored. Tis motivates us to research the LSTM-DGSO methodology in sentiment analysis of customer reviews. Te fndings illustrate the detailed analysis performed, explore hidden factors of customer sentiment analysis, and build a model. A summary of current sentiment analysis models for online customer reviews is illustrated in Table 1.

Proposed Work
Discovering the sentiment class of reviews is a multiclass classifcation issue. Efcient and automatic classifcation of the reviews posted by app users into three classes, such as positive, negative, and neutral, is the main objective of this study. Figure 2 depicts the overall framework for sentiment analysis of app reviews. Initially, we collected online reviews for Shopify apps from the Kaggle website. Ten, the BERT model processed the reviews to obtain efcient word embeddings and extract labels for reviews whether they belong to positive, negative, or neutral classes. GA extracted more relevant app review features, and the optimum feature subset was obtained using FA. We employed the LSTM model to categorize the reviews. To improve the behavior of LSTM in sentiment analysis of reviews, hyperparameters of LSTM, like the learning rate and batch size, are optimized by the DGSO algorithm. Figure 2 illustrates the proposed framework for sentiment analysis.

Data Collection.
Te Shopify app store dataset is utilized in this study and collected from Kaggle [46], and 50140 reviews are selected randomly. Te dataset has eight felds; their description is presented in Table 2.

Data Preprocessing Using the BERT Model.
In the preprocessing stage, the BERT model is used to represent app reviews efciently and extract their labels. BERT is a compelling architecture that utilizes transformer-based topologies and is built on an encoder-decoder network [47], and the task-specifc layers of the BERT model are crucial [48]. Te steps involved in review preprocessing using the BERT model are demonstrated in Figure 3.
Te document containing app reviews is provided as an input to the BERT model using the following equation: where [A] denotes the app review set and p denotes the number of app reviews. Te preprocessing steps include removing the special characters and numerical data from app reviews and substituting uppercase characters with lowercase characters. Te BERT model uses a WordPiece tokenizer to split the review sentences into a list of tokens or words. Te token set for the app reviews obtained after tokenization is defned by equation (2). Ten, stop words like prepositions, articles, and conjunctions are removed from the token set.
where a p m denotes the mth token of the pth review and m denotes the number of tokens for each review.
Te part of speech to which each token belongs is tagged to each token by POS (parts-of-speech) tagger. Te POStagged app review vector is defned by the following equation: where t a p m denotes the POS tag assigned to the token and a p m denotes the tag symbol. Following POS tagging, the lemmatization process takes place to obtain token root words (lemma). Ten, the BERT model creates word embeddings for app reviews. Tokens with higher semantic similarity are similarly represented in the word embedding. It contains feature words for each app review. Te sentiment score for each app review is calculated using the following equation: where sentiment score A p denotes the sentiment score for the pth app review (A p ), S a p i denotes the sentiment value for  Te BERT model extracts the label for each app review depending on the sentiment_score. If the sentiment_score for the review (A p ) is greater than 0, the review is labeled as positive. If the sentiment_score for the review (A p ) is less than 0, the review is labeled as negative. If sentiment_score for the review (A p ) equals 0, the review is labeled as neutral. As a result, the BERT model obtains word embeddings with respective labels for the app review dataset.

Feature Extraction Using the Genetic Algorithm (GA).
Dimensionality reduction is needed to ease the sentiment classifcation process. Te dimensionality reduction problem can be formulated as an optimization issue. We employed the GA to reduce review feature dimensions. Te advantage of using a genetic algorithm is that it can solve complex problems using traditional methods. Te GA seeks a transformed review feature set in a y-dimensional space that satisfes the optimization criteria given a set of x-dimensional input app review data. Te classifcation accuracy is

Generation of word embeddings
Labeling of reviews  utilized to assess modifed patterns. Word embeddings of app reviews are provided as an input to the feature extraction step and defned as where [W] is the word embedding and k p x denotes the x th feature of the p th review.
Chromosomes make up the population in the GA. A solution vector is referred to as a chromosome or an individual in the GA. Genes are the separate building blocks that make up chromosomes. Te procedure involved in GAbased feature extraction, as depicted in Figure 4 and Algorithm 1, is explained as follows. Te population is randomly initialized with a set of chromosomes. Here, the GA randomly selects a set of features from the app review dataset and stores them in each chromosome for later usage. All features in chromosomes are encoded with real-number representations. If the x th bit of the i th vector equals 1, then the k p x th feature is permitted to take part in classifcation; if the bit is 0, then the corresponding feature does not take part in classifcation. Each resultant feature subset is rated based on categorization efciency. Te ftness value of each chromosome is evaluated by training the LSTM model over the feature subset and observing the classifcation accuracy. Te ftness function for the GA is the number of properly identifed app reviews given by the LSTM trained on the particular feature subset (chromosome). Te ftness value of each chromosome in the population is determined by where ftness denotes the ftness value of the i th chromosome, N correct denotes the number of correctly classifed app reviews, and p denotes the number of app reviews to classify. Figure 4 shows GA-based feature extraction from review data. Te ftness value of each chromosome in the population is compared to the threshold ftness value. Te chromosomes whose ftness value exceeds the threshold value are selected for the next generation or new solution generation. Crossover and mutations are the two operators that the GA uses to create new feature solutions from preexisting ones. In crossover, two chromosomes, referred to as parents, are typically combined to create new chromosomes, referred to as ofspring. For the ofspring to inherit excellent genes that make parents ftter, parents are chosen from the population's existing chromosomes with a preference for ftness. Te ofspring population containing new ofspring is generated from chromosomes of the initial population, whose ftness value is greater than the threshold value. Ten, each solution in the ofspring population is mutated by a mutation operator with a specifc mutation rate. Te mutation operator modifes chromosomal properties at random. Usually, the mutation occurs at the gene (feature) level. Following mutation, the ftness value of each mutated ofspring (modifed feature solutions) in the ofspring population is evaluated according to equation (6) using the LSTM model. Ten, "N" mutated solutions that satisfy the condition that the ftness value of the solution must be greater than the threshold value are selected from the ofspring population. Finally, the termination condition of the GA is checked. If the current iteration is less than the maximum iteration, the "N" mutated solutions obtained from the above step allow a new crossover and mutation operations. Up to the maximum repetition, the procedure above is repeated. If the current iteration is equal to the maximum iteration, the solutions containing more relevant features for sentiment analysis are obtained as a result of the GA. Te x-dimensional input features of app reviews are transformed into a ydimensional feature set.

Feature Selection
Using the Firefy Algorithm. Te optimum app review feature set that enhances the sentiment analysis performance must be selected from the y-dimensional app review feature set. Te FA is utilized for choosing the optimum app review feature set in this paper. Te pseudocode for FA-based feature selection is presented in Algorithm 2. A search space is initialized, having y dimensions corresponding to the features in the app review dataset. Te search space is initialized with "m" number of frefies. Te position (x F a ) and intensity (J F a ) of each frefy in the search space are initialized. Each frefy at a specifc position is represented as a binary vector with the "y" number of features for "p" app reviews and is denoted by where a � 1,. . ., m, F a represents the a th frefy representing the app review feature solution, m denotes the number of frefies, and k p y denotes the y th feature of the p th review. Each element in F a is limited to 0 or 1, indicating whether that app review feature is selected. If the k p y feature is selected, then it is encoded as 1, and if it is not selected, it is encoded as 0. Te change in brightness and attractiveness are two crucial aspects of the FA. Hence, the intensity of a frefy (F a ) at a distance (d) from another frefy (F b ) is defned by where J F a denotes the intensity of a frefy (F a ) at a distance (d) from another frefy (F b ), J 0 denotes the initial brightness, d(F a .F b ) refers to the distance between two frefies, F a and F b , and δ is the light absorption coefcient infuencing intensity. Depending on the intensity and distance, two frefies, F a and F b , are more or less attractive to one other. Te attractiveness (A) of a frefy is determined by equation (9), proportional to the intensity noticed by another frefy.
where A F a denotes the attractiveness of a frefy (F a ) at a distance (d) from another frefy (F b ) and A 0 denotes the attractiveness constant.

Computational Intelligence and Neuroscience
According to the classifer's efciency using the chosen feature subset, each frefy travels in a specifc direction in the search space to locate the ideal feature subset. Here, the LSTM model was used as an evaluating classifer. Te correctness of the classifer using the chosen feature is regarded as the intensity of the frefy or objective function. Te light intensity J of a frefy representing the app review feature subset is proportional to the ftness function according to the following equation: Te ftness function for the FA is defned by Using equation (12), the frefy with a lower intensity (accuracy) will travel toward the frefy with a greater intensity (accuracy). When a frefy moves, its position and feature vector will change.
where x t F a and x t−1 F a denote the position of the frefy F a at time t and t − 1, respectively, x t−1 F b denotes the position of the frefy F b at time t − 1, A F a indicates the attractiveness of the frefy F a , c denotes the randomization parameter, and r denotes a random number between 0 and 1. Te distance between two frefies, F a and F b , is defned as where (x F a , y F a ) and (x F b , y F b ) are the position vector of frefies F a and F b, respectively, and d(F a .F b ) denotes the distance between two frefies, F a and F b . Tis process is repeated for other frefies. A better feature solution at the end of the iteration will represent each frefy. Tis process is continued until the maximum iteration is reached. If the termination condition is achieved, all frefies having the local best solutions are ranked depending on a ftness function. Te highest intensity (accuracy) frefy is returned as the best global solution. As a result of the FA, the optimal app review feature set [W optimal ] is obtained from the global best frefy.

Classifcation of App Reviews by the LSTM-DGSO
Technique. Te sentiment analysis includes classifying reviews into three classes: positive, negative, and neutral, using the LSTM-DGSO technique. In the LSTM-DGSO module, hyperparameters of the LSTM model, like the learning rate and batch size, are optimized by the DGSO technique. Te optimized LSTM model is trained over the optimal app review feature subset. Te procedure is shown in   Computational Intelligence and Neuroscience 9 Algorithm 3. Te frst stage of the LSTM-DGSO module includes the optimization of hyperparameters of LSTM by DGSO. According to DGSO, grey swarm is generated with the "n" number of grey wolves in the search space. Grey swarm can be divided into four levels, namely, α, β, c, and δ. Te α grey agent is the head of the grey swarm, which controls the hunting, habitat, and moving behavior of grey swarm. Te β grey agent is at the second level c, and the grey agent obeys the commands α and β agents. Te δ grey agent is the lowest agent in grey swarm. Te number of maximum iterations is assumed to be i max . Te hyperparameters of LSTM are initialized as R for the learning rate and B.S for the batch size. Te position of each grey wolf represents the hyperparameter solution of LSTM. In each iteration, the grey wolf can search for prey in the search space. Here, the prey denotes the threshold app review classifcation accuracy. Based on the accuracy of the LSTM classifer with the given hyperparameters, each grey agent travels in a certain direction. An objective function is the classifer model's accuracy using the provided hyperparameters. In the search process, the position of each grey agent is changed continuously to achieve higher accuracy. Te three wolves closest to the prey are automatically converted into the local best solutions of α, β, and c grey agents. Te position of each grey search agent according to three local best solutions at time t is updated using the following equation: where v 1 → , v 2 → , and v 3 → are defned by the following equations: where E → denotes the coefcient vector, defned by where b → is given by where k 1 denotes the random vector and t and t max denote the current and maximum iteration numbers, respectively. T → α , T → β , and T → δ are determined by the following equations: where D → is the coefcient vector and is determined by Until the maximum loop is reached, the procedure above is repeated. Te ideal hyperparameter solution for improving the sentiment analysis process by LSTM is derived via DGSO after the maximum number of iterations has been obtained. Te optimal hyperparameters obtained from DGSO are set for the LSTM model. Our proposed LSTM model includes an embedding layer, a one-dimensional convolutional (Conv1D) layer, a one-dimensional max pooling (MaxPooling1D) layer, a bidirectional LSTM (Bi LSTM) layer, a dropout layer, and a dense layer. Te framework of our proposed LSTM model is presented in Figure 5, and the architecture of the proposed LSTM model is shown in Table 3. Te embedding layer efciently represents the optimal app review feature set. Te Conv1D layer generates a feature map for the selected app review features. Tere are 32 flters in the convolution layer with a kernel size of 3. Te MaxPooling1D layer extracts the maximum information from the feature map. Ten, the efcient feature map is sent as an input to the BiLSTM layer.
Te feature sequences generated by the MaxPooling layer do not provide sequence information. With focus on sequential modeling, BiLSTM can further decode the feature sequences acquired by the previous layer to provide contextual information respective to app reviews. Both forward and backward LSTM units make up BiLSTM. Combining a forward hidden layer with a backward hidden layer allows BiLSTM to retrieve both the prior and subsequent contextual elements of app reviews. A group of memory blocks, often recurrently linked blocks, make up each LSTM layer. Each memory block contains memory cells and three gates: input, forget, and output. Te process occurring in each LSTM unit is explained as follows. Figure 5 depicts the overall framework of the proposed LSTM model.
Te feature map is added to BiLSTM neurons through activation function collaboration with the input gate. Te forget gate's output has already been acquired at this gate. Equation (24) is used to calculate the output from the input gate.
where x t denotes the output of the input gate at time t, σ is the logistic sigmoid function, S wx , S rx , and S dx are the weight matrices for the input gate, q x is the variable bias of the input gate, W t is the information regarding app reviews at the time "t," and r t − 1 and d t − 1 are the hidden and cell state at the time step t − 1, respectively. Te output of the preceding LSTM neuron can be used to modulate the forget gate of the present LSTM neuron. Te output of the forget gate of the LSTM neuron is processed using the following equation: where g t indicates the output of the forget gate, Z wg , Z rg , and Z dg are the weight matrices for the forget gate, and a g is the variable bias of the forget gate.

Computational Intelligence and Neuroscience
Te output gate of an LSTM neuron regulates how much current information of app review features is analyzed using equation (26). Te contextual information of review features resulting from the output gate is defned as follows: where y t denotes the fltered information obtained from the output gate, S wy , S ry , and S dy are the weight matrices for the output gate, and a y is the variable bias of the output gate. Te state of the updated neuron or memory cell of LSTM is defned by where d t denotes the normalized situation of the updated neuron, S wd, and S rd are the weight matrices for the updated neuron, and a d is the variable bias of the updated neuron. Te hidden state of the LSTM unit is defned by where r t is the hidden state of the LSTM unit at time t. Te BiLSTM unit combines the contextual information read by forward LSTM units and contextual information read by backward LSTM units. Te output of the BiLSTM layer is defned by Following the BiLSTM layer, a dropout layer is introduced to reduce overftting issues. Te dense layer combines the outputs from the dropout layer. Te output from the dense layer is presented to the sigmoid layer to predict the sentiment category of reviews using the following equation: V denotes the prediction result for app reviews and b denotes the bias.

Results and Discussion
Tis section evaluates the performance of the proposed LSTM-DGSO model. Te proposed method analyzes review sentiment categories by hyperparameter-optimized LSTM. Te performance of classifying reviews into positive, negative, and neutral reviews by the LSTM-DGSO model is examined. Te sentiment analysis performance of LSTM-  DGSO is compared to existing sentiment analysis methods such as CNN, stochastic gradient descent (SGD), BiLSTM + attention mechanism, and SVM. Accuracy is the proportion of accurately classifed app reviews to the overall dataset count determined using the following equation: TP indicates the count of negative app reviews identifed exactly as negative. TN refers to the number of positive/ neutral app reviews identifed accurately as positive/neutral. FN indicates the number of negative app reviews misclassifed as positive or neutral. FP denotes the number of positive/neutral app reviews misclassifed as negative.
Te accuracy value indicates the number of correct predictions obtained. Loss values indicate the diference from the desired target sentiment categories. Figure 6 portrays the overview of model accuracy versus loss. Once the model parameters are established, the model's accuracy is often measured as a percentage.   Precision is determined using (32) as the proportion of app reviews correctly identifed as negative out of reviews identifed as negative. Figure 8 depicts the precision-based performance analysis of diferent sentiment analysis models. Te precision of the LSTM-DGSO technique is higher than that of existing models. Higher precision implies that the number of positive/neutral app reviews misclassifed as negative app reviews are low compared to that of existing models.
Recall is the proportion of app reviews correctly identifed as negative out of the total negative reviews in the dataset and calculated by Recall � TP TP + FN .
(33) Figure 9 shows the recall-based performance analysis. Te recall of the LSTM-DGSO technique is higher than that of existing sentiment analysis models, namely, CNN, SGD, BiLSTM + attention, and SVM. Higher recall for the proposed approach means that the number of negative app reviews misclassifed as positive/neutral app reviews is low compared to that of existing models. A lower misclassifcation error achieved by the proposed model means that it can accurately identify the sentiment category of app reviews with low errors. Figures 10 and 11 represent the overview of model precision and recall versus loss, respectively. Te LSTM-DGSO model revealed greater precision and recall and lower loss for app review analysis.
Te F1 score is the weighted precision and recall ratio determined by F1 score � 2 * precision * recall precision + recall . (34) Figure 12 depicts the comparative analysis of diferent sentiment analysis models based on the F1 score. Te F1 score of the LSTM-DGSO technique is higher than that of the existing sentiment method considered in this study. A higher F1 score for the proposed approach indicates that the number of negative app reviews correctly classifed as negative and the number of positive/neutral app reviews correctly classifed as positive/neutral app reviews is signifcantly higher than those of existing models.
Te area under the curve (AUC) is used exclusively for probability-based classifcation issues to conduct in-depth prediction analysis. Figure 13 illustrates the comparative analysis of diferent sentiment analysis models based on the AUC score. Figure 14 depicts the ROC curve for various sentiment analysis methods. Te ROC curve illustrates the trade-of between sensitivity and specifcity. Te AUC score of the LSTM-DGSO technique is higher than that of existing sentiment analysis models. Te LSTM-DGSO model's improved accuracy and lower error rate demonstrate the proposed model's robustness and convergence in sentiment analysis of reviews.
Te comparative performance analysis with existing studies is illustrated in Table 4, and with diferent datasets [49][50][51][52] in Table 5. Training accuracy exhibits the classifcation performance of the LSTM-DGSO model for the training review dataset, and validation accuracy indicates the classifcation performance of the LSTM-DGSO model for the validation review dataset. Figure 15 portrays the comparative investigation of training and validation accuracies for the proposed LSTM- CNN [29] SGD [38] SVM [28] BiLSTM+Attention [39] LSTM-DGSO [Proposed]  CNN [29] SGD [38] SVM [28] BiLSTM+Attention [39] LSTM-DGSO [Proposed] CNN [29] SGD [38] SVM [28] BiLSTM+Attention [39] LSTM-DGSO [Proposed]    DGSO model. Te training set is the most signifcant subset formed from the original dataset and utilized to ft models. Tis subset is used to train models. Te models are then evaluated based on their performance based on the validation set to complete the model selection process. From the analysis, it is observed that validation accuracy is slightly lower than training accuracy. Te proposed model efciently classifes the reviews into positive, negative, and neutral in the training and validation phases. Figure 16 exhibits the comparative investigation of training and validation losses for the proposed LSTM-DGSO model. From the fgure, it is observed that validation loss was slightly lower than training loss. It demonstrated that the proposed model fts the training and validation review datasets well. Table 6 depicts the accuracy and loss indicated by LSTM-DGSO over training and validation review data and demonstrates that the LSTM-DGSO model efciently mitigates overftting issues and generalization errors. Te proposed approach overcomes existing approaches, such as CNN, SGD, SVM, and Bi-LSTM. Te CNN model did not accurately encode objectlocation, orientation, and a large amount of training data.o .Te SVM approach is inappropriate for handling massive data sets, whereas SGD models can be reasonably computationally complex. Figure 6 illustrates the accuracy-based performance analysis of diferent sentiment analysis models. As the epoch increases, the accuracy for the classifcation of reviews slightly increases for the proposed LSTM-DGSO model. Te accuracy of the LSTM-DGSO technique is higher than that of existing sentiment analysis models, namely, CNN [29], SGD [36], BiLSTM + attention [37], and SVM [28]. Table 4 illustrates the performance analysis of the proposed and conventional sentiment analysis models. In addition, the performance analysis of the proposed and conventional sentiment analysis models compared to other datasets like book reviews, IMDb movie reviews, Sentiment 140, and CNN [29] SGD [38] SVM [28] BiLSTM+Attention [39] LSTM-DGSO [Proposed]  CNN [29] SGD [38] SVM [28] BiLSTM+Attention [39] LSTM-DGSO [Proposed]  SGD [38] CNN [29] BiLSTM+Attention [39] LSTM-DGSO [Proposed]    SemEval-2017 datasets is illustrated in Table 5 due to GAbased feature extraction and FA-based feature selection. Te optimal review features selected by the FA efciently improved the sentiment analysis.
Sentiment analysis can be afected by the efectiveness of the labeled datasets utilized, and construct validity is at threat. A decrease in the accuracy of sentiment analysis can be due to inconsistent annotation. Te issue has been addressed using the BERT model to label the dataset and enhance the learning efectiveness of the model. Internal elements like how the proposed LSTM model hyperparameters are set up pose a threat to internal validity. Te optimized LSTM model is used, obtained by theproposed DGSO method, to overcome the issue.

Conclusion
Te sentiment analysis posted by mobile app users is signifcant and delivers accurate insights into the app. Tis research employed an optimized DL model named LSTM-DGSO for sentiment analysis of online reviews. Efective feature extraction using the GA and feature selection employing the FA are used. Te proposed LSTM-DGSO models demonstrated an accuracy of 98.89% and a loss of 0.0484 compared with existing conventional sentiment analysis methods.
Te standard GWO cannot seamlessly transition from prospective exploration to exploitation by adding more iterations. Te GWO's primary shortcoming is that its single search technique hinders its ability to manage optimization issues with varying characteristics competently. Traditional LSTM parameters are prone to falling into local optimum when traditional LSTM parameters are adjusted backward. Te high complexity of the algorithm is a disadvantage that reduces prediction accuracy. Future research is needed for classifying reviews using DL optimization techniques based on multiple aspects, like satisfed/unsatisfed, like/dislike, and recommended/not recommended with diferent sets of datasets.

Conflicts of Interest
Te authors declare that they have no conficts of interest.