An Efficient Sentiment Classification Method with the Help of Neighbors and a Hybrid of RNN Models

,


Introduction
Te present generation of widely available and afordable web technologies generates signifcant social big data with perspectives that aid decision-making.Sentiment analysis (SA) is a computational study that analyses individuals' perspectives, opinions, and emotions toward a particular entity that may track individuals' moods and viewpoints by analyzing unstructured, multimodal, informal, noisy, and high-dimensional social data [1].SA is a subset of natural language processing (NLP) that can be used in various realworld applications, including fnancial and stock price forecasting [2], politics [3], and medicine [4,5].Many researchers have dedicated signifcant eforts to investigating textual SA [6][7][8][9][10] through various methodologies, resulting in notable advancements on social media platforms.One signifcant constraint of current sentiment classifcation systems is their predominant dependence on a post or tweet's textual content.
However, on social platforms such as Twitter, Flickr, and Instagram, there's a wealth of metadata available alongside the text in posts.People attach metadata to their shared content, including user-generated labels, as a means of engaging with others.Tis metadata, which includes information about the attributes of social media posts and their authors, serves as a valuable source of context for dissecting the expressed sentiment within the text.Moreover, these metadata elements can function as an additional asset alongside the text-based features when it comes to successfully executing sentiment classifcation.For instance, the sentence "Nowruz Mubarak!" displayed in the green bounding box of Figure 1 carries an ambiguous meaning, and it is challenging to discern the true sentiment behind this sentence.Now, we consider the sentences in the red bounding box: "Happy Iranian New Year (Nowruz), Happy Nowruz, . .."Tese sentences can help determine the true sentiment expressed by the input sentence, which can be positive.As a result, SA can greatly beneft from the metadata or neighbors of input sentences.Based on this intuition, we enhance SA by augmenting each input text with the neighborhood of related texts.
Previous studies have utilized metadata extracted from tweets, including metrics such as URL counts, hashtags, and mentions, as features for sentiment analysis.Tis approach assumes that texts sharing similar contents or features can convey similar sentiments [11,12].However, these studies relied on parametric models for text analysis, assuming consistent vocabularies between training and testing datasets.In the real world, metadata vocabularies can change as new tags emerge.Terefore, it is crucial to investigate the incorporation of metadata, or neighboring data, to adapt to these evolving vocabularies and enhance sentiment classifcation.In conclusion, a compelling rationale exists for utilizing both tags and metadata, facilitating the seamless integration of additional data sources to enhance the representation of the original input data.
Te primary technical contribution of our study involves the nonparametric approach for generating text neighborhoods, as depicted in Figure 1.Subsequently, we employ a novel parametric model to learn the degree of informative representation derived from an initial input and its corresponding neighbors.Tis strategy enables our model to perform complex or impractical tasks using existing techniques and demonstrates a cutting-edge level of performance in text classifcation and SA.We specifcally demonstrate the following capabilities of our model.

Adapt to
Changing Vocabularies.Our model can handle various vocabularies during training and testing thanks to our nonparametric approach to fnding near neighbors.Even when the training and testing vocabulary is entirely unrelated, our model still performs well.In other words, during training, the model learns the specifc words in the training data and their relationships and similarities.However, it can still efectively classify sentiments in text even when it encounters entirely diferent words during the testing phase.Tis adaptability is attributed to the model's nonparametric approach, allowing it to generalize to new categories of text metadata and adjust to changes in that metadata over time.So, despite variations in the text vocabulary, the model's performance remains robust due to its unique design and training process.

Handle Diferent Text Types with and without Metadata.
Te proposed model can calculate neighbors from either the input data or the associated metadata.Tis feature ensures that our approach does not constrain the exclusive use of metadata or input data, showcasing the model's remarkable adaptability across various text data types.It is essential to highlight the signifcance of both the presence and integration of metadata, as these aspects can signifcantly infuence the model's performance in sentiment analysis tasks.Tis reinforces the pivotal role of metadata, underscoring its potential to enhance the model's accuracy, and stresses the need to consider it a crucial factor in data analysis.

Efcient Deep
Model.An efcient model is designed to jointly learn representations from samples and their neighbors to generate intraclass-oriented meaningful representation.
Te main objective of this study is to identify substantial correlations between the input text and the neighborhood of related texts within various sentiment datasets.It seeks to explore whether this metadata or neighboring texts can indicate potential biases in sentiment labeling.Lastly, we demonstrate how leveraging this metadata as features in a classifer might aid in enhancing sentiment classifers and sentiment quantifcation.
Te paper's remaining sections are structured as follows: Section 2 presents the literature review.Section 3 describes the proposed model in depth.Section 4 illustrates the fndings of the experiments.Section 5 concludes the study.

Literature Review
Te signifcance of sentiment analysis (SA) increases in the context of natural language processing (NLP) when dealing with a substantial volume of user-generated textual content.Many supervised traditional machine learning (ML) classifers are used to classify sentiment based on various features [13].Ahuja et al. [14] examined the efects of TF-IDF word level and N-gram on the SS-Tweet sentiment analysis dataset.Te TF-IDF word level approach in sentiment analysis using six machine learning classifcation algorithms outperforms N-gram features by 3-4%.Mee et al. [15] examined the relationship between textual qualities and Twitter user characteristics using regression and sentiment analysis, specifcally the TF-IDF approach.Kunal et al. [16] recommended combining Tweepy and TextBlob, a Python framework, to analyze and classify tweets using the Naive Bayes (NB) classifer.Htet and Myint [17] developed a system for analyzing social media data from Twitter to assess individuals' health, education, and business status.Tis system employs the maximum entropy (ME) classifer to identify specifc requirements.Obiedat et al. [7] presented a hybrid approach that combines support vector machine (SVM), particle swarm optimization, and various oversampling techniques to address the issue of imbalanced data.Te SVM has enhanced sentiment prediction using the restaurant reviews' dataset.

Complexity
Recently, text analysis has experienced advantageous outcomes by utilizing diverse deep learning (DL) models [18,19], which have been extensively implemented in numerous research investigations.Te categorization of short text was performed by Wang et al. [20] through the use of a convolutional neural network (CNN).Basiri et al. [21] proposed a CNN-recurrent neural network (RNN) model that utilizes attention mechanisms to capture past and future contexts in text sentiment analysis.Te incorporation of bidirectional temporal information fow enhances the accuracy of classifcation in this approach.Te present trend in the feld of sentiment analysis involves the development of innovative text classifcation methods utilizing deep learning techniques such as CNN [22,23] and long short-term memory (LSTM) [24].While CNNs can collect and analyze local data, they may be less efective in capturing longrange dependency.Te limitation of sequential modeling of texts across sentences can be overcome by utilizing the LSTM technique.In any case, its performance in collecting local information is suboptimal.Te integration of CNN and LSTM becomes essential for enhancing the efcacy of text sentiment classifcation [25][26][27].Li et al. [28] presented a new padding methodology that enhanced consistency in the dimensions of input data instances, thereby augmenting the amount of sentiment-oriented information incorporated in every review.Integrating a sentiment analysis model denoted as "lexicon," which utilizes deep learning techniques, involved incorporating two-channel CNN-LSTM/ Bi-LSTM family models through parallelization techniques.Abid et al. [29] presented a unifed procedure for SA on the Twitter platform.Tis approach incorporates an RNN architecture to capture long-term dependencies efciently and utilizes a CNN and Global Vectors (GloVe) for Word Representation as a word embedding technique.Te experimental outcomes performed superior to the baseline model upon evaluating the Twitter corpora.Dang et al. [30] proposed the integration of LSTM networks, CNNs, and SVM in hybrid deep SA learning models.Te proposed model was evaluated utilizing eight textual datasets comprising tweets and reviews from various domains.Te fndings indicate that the hybrid models performed superior to the single models in sentiment analysis across all datasets.
Salur and Aydin [31] presented a new hybrid deep learning model that combines diferent types of word embeddings, namely, Word2Vec, FastText, and character-level embeddings, in combination with various deep learning models such as LSTM, GRU, Bi-LSTM, and CNN.Te model under consideration amalgamates features from various deep learning word embedding methods and classifes textual information based on its emotional content.Zulqarnain et al. [32] proposed a novel methodology for SA by utilizing an encoder approach with a two-state GRU named E-TGRU.Tis framework was designed to enhance the efectiveness of SA.Te study's results indicate that, with adequate training data, the GRU model can profciently acquire the vocabulary utilized in user opinions.Te fndings indicate that E-TGRU exhibited superior performance compared to GRU, LSTM, and Bi-LSTM.Li et al. [33] proposed a sentiment classifcation model for analyzing online restaurant reviews, integrating Word2Vec, bidirectional GRU, and the attention technique.Te results show that the model's performance surpassed established sentiment analysis models.Kamyab et al. [34] presented a novel approach to sentiment analysis utilizing attention mechanisms in conjunction with CNNs and two distinct bidirectional recurrent neural networks.Te proposed method aims to enhance the understanding of sentiment in textual data.Initially, a preprocessor was utilized to improve the quality of the data.Ten, max-pooling was used in conjunction with a CNN layer to reduce the dimensionality of features and retrieve contextual information.In addition, the study employed two autonomous bidirectional recurrent neural networks, namely, LSTM and GRU, to efectively capture long-term dependencies.Ultimately, the attention mechanism was implemented to highlight the degree of attention attributed to each word.
Mishra et al. [11] developed a sentiment analysis tool that is both cost-free and open-source, featuring a graphical user Happy Nowruz, the old and ancient festival of Iranians.
Nowruz Mubarak in Persian means happy new day.

Happy Nowruz
Figure 1: Even for humans, it might be challenging to understand some sentences without additional context.On social media, however, similar texts are frequently shared.Based on this intuition, we retrieve a neighborhood of sentences with similar words to help defne the sentiment of the input text given an ambiguous sentence, such as "Nowruz Mubarak!" Complexity 3 interface.Tis tool enables users to perform two key functions: frstly, to retrain the weights of a given model by relabeling predictions and/or adding labeled instances and secondly, to tailor lexical resources to address errors in sentiment lexicons, such as false positives and false negatives.Te proposed approach has the potential to ofer advantages in iteratively improving or augmenting models in a readily available manner while disregarding the expenses associated with training a new model from the beginning and reducing predictive precision over time.Mishra and Diesner [12] analyzed the metadata characteristics at the user and tweet levels, identifying associations and relationships between these characteristics and the log odds for sentiment categories.Te reliability of this analysis is strengthened by replicating the experiments on current tweets obtained from the user population present in our datasets.Te results suggest that most patterns identifed in this analysis exhibit high consistency.Te metadata characteristics that have been identifed are ultimately employed as features for a sentiment classifcation algorithm, resulting in an improved outcome for sentiment classifcation.
Deep learning-based sentiment analysis and the BERT technique have recently piqued the interest of researchers.Chiorrini et al. [35] suggested two BERT-based text classifcation approaches: BERT-base and cased BERT-base.Teir investigation used two independent datasets for sentiment analysis and emotion recognition.Tey noted that BERT ofers positive text classifcation results.Huang et al. [36] proposed an innovative DCNN-Bi-GRU (deep convolutional neural network bidirection gated recurrent) text categorization model.Te word semantic representation language model is trained using BERT.Te DCNN-Bi-GRU hybrid model receives the dynamically generated semantic vector from the word context.Tis model is validated by the CCERT Chinese e-mail sample set and movie comment data set experiments.Bello et al. [37] proposed utilizing bidirectional encoder representations from transformers (BERT) for text classifcation in NLP and various variants.Te experimental results indicate that the integration of BERT with CNN, BERT with RNN, and BERT with Bi-LSTM yields favorable outcomes in terms of accuracy rate, precision rate, recall rate, and F1-score when compared to the outcomes achieved by employing these deep learning models with Word2Vec or without any variation.
Te earlier research has exhibited satisfactory results using diferent ML and DL models.However, achieving high accuracy in sentiment categorization remains a formidable challenge, particularly when dealing with data from social media platforms.One of the limitations of current sentiment classifcation systems is their heavy reliance on traditional techniques, such as bag-of-words (BOW) and N-gram approaches, which use term frequency as a feature.While these methods are straightforward and efective, they generate feature vectors that are often sparse and high-dimensional.Tis can lead to scalability issues and potential overftting, even with regularization techniques.Furthermore, these models operate under the implicit assumption that sentiment expression in a post is solely conveyed through the text data, without considering contextual factors.However, social media platforms like Twitter ofer an opportunity to access rich metadata alongside the textual content of posts.Tis metadata includes information about the characteristics of social network posts and their authors, which can provide valuable contextual cues for analyzing sentiment in a tweet.Additionally, leveraging this metadata can complement the textual features in the sentiment classifcation task.
Although few studies have relied on tweet metadata assuming consistent language patterns in training and testing data, the reality is that metadata vocabularies can change as new categories and trends emerge.Terefore, fnding methods to incorporate this changing metadata or neighboring data is crucial to improve sentiment analysis accuracy.Motivated by these observations, we introduce an innovative deep text sentiment classifcation (DTSC) model.Our model features a nonparametric approach to generating text neighborhoods, making it adaptable to a wide range of signals and capable of generalizing to new categories of text vocabularies.Ten, the input texts and their neighbors are converted into embedding vectors of lower dimensions, allowing neural networks to capture semantic word relationships.Importantly, this dense vector representation maintains a fxed size (embedding dimension), reducing model parameters and improving computational efciency.
Additionally, we employ a novel parametric model to gauge the level of informative representation obtained from the input text and its associated neighbors.Tis unique approach equips our model with the ability to tackle complex tasks that may have been challenging with conventional techniques.As a result, our model demonstrates state-of-the-art performance in text sentiment classifcation.

The Proposed Model
A novel deep text sentiment classifcation (DTSC) model is proposed, as shown in Figure 2. Te model considers the neighborhoods in which input text features are embedded.Our model uses a nonparametric approach to construct neighborhoods of related texts based on Jaccard similarities to develop a perfect system that can handle a wide range of signals, generalize to new categories of text metadata, and adjust to changes in that metadata over time.Te input texts and their neighbors are transformed into embedding vectors to efciently capture the semantic relationships between words and texts, making them more appropriate for indepth analysis.Ten, a new deep recurrent neural network architecture is proposed.Specifcally, two distinct modules, Bi-LSTM and GRU, extract valuable representations from an input text and its neighbors.Te outputs of each module (extracted features of a text and its neighbors) are fed through the maximum operation, which selects the most pertinent data.Finally, the extracted features are concatenated and deeply fused using multiple fully connected layers with a classifer layer to perform sentiment classifcation.Te weights and biases are shared among the input text and its k neighbors.In other words, the input text and its neighbors are passed through a common architecture.4 Complexity

Candidate Neighborhoods.
In the nonparametric approach, we assume that integrating neighboring data alongside the primary input during network training will yield extracted features that demonstrate similarity among samples belonging to the same class.In other words, it is assumed that if two texts are related or share similar content, their feature representations should also be similar after training.Te key challenge lies in the selection of the nearest neighbors.Our approach leverages the Jaccard measure between words to calculate text similarity, allowing for the nonparametric creation of candidate neighborhoods.In particular, Jaccard similarity is employed to assess how similar or dissimilar individual words are between diferent texts.Tis similarity measurement involves the following steps.

Jaccard Similarity Calculation.
In this context, it measures how similar the words in one text are to those in another.Te Jaccard similarity is a nonparametric measure that quantifes the similarity between two sets by comparing the intersection (common words) of the word sets in both texts to their union (all unique words in both texts), and it ranges between 0 (no similarity) and 1 (perfect similarity).It does not assume any specifc probability distribution for the data but calculates the proportion of common elements.Given two sets A and B, the Jaccard similarity (Jsimi) is defned as where (1) |A| is the cardinality (number of elements) of set A.
(2) |B| is the cardinality (number of elements) of set B.
(3) |A∩B| is the number of elements in the intersection of sets A and B (i.e., the number of elements common to both sets).( 4) |A∪B| is the number of elements in the union of sets A and B (i.e., the total number of distinct elements in both sets).
Concretely, for x, x ′ ϵ X, we compute where w x and w x ′ represent the set of words for x-th sample and its nearest x ′ -th neighbor.We set Jsimi(x, x) � 0 for all x ϵ X, to prevent a text from appearing in its neighborhoods.
If the Jaccard similarity is high, the texts share many common words and are more similar.Conversely, if the Jaccard similarity is low, it suggests that the texts have fewer words in common and are more dissimilar.

Creating Candidate Neighborhoods.
In each batch of data, candidate neighbors for each sample are computed using the Jaccard similarity.Te text's calculated similarities are then grouped into "neighborhoods" or clusters.
Texts with a high Jaccard similarity between their words are placed in the same neighborhood because they are considered more similar.Tese neighborhoods are groups of related texts that share common features or themes.Te only limitation that can be considered is that the number of neighbors should be smaller than the batch size.Tis is because the neighbors are selected from among the closest samples within a batch.

Complexity
In summary, this method employs Jaccard similarity calculations to evaluate word overlap between texts, facilitating their organization into potential neighborhoods or clusters.However, a critical transformation is applied to enhance the suitability of these textual representations for further analysis.Te input texts and their neighbors (with the number of neighbors defned by the user) undergo a crucial conversion process, and they are transformed into embedding vectors.
Tis conversion is pivotal because it translates textual information into numerical vectors, enabling the model to efectively understand and work with the text.Tese embedding vectors capture the semantic relationships between words and texts, making them more appropriate for indepth analysis.Te input vectors and their corresponding candidate neighbors are introduced into a network featuring two distinct modules: Bi-LSTM and GRU.Tese modules leverage the embedded representations to extract valuable patterns and insights between the input samples and their neighbors, enabling efcient comprehension and classifcation of the sentiments expressed in the texts.
Te rationale for implementing a nonparametric approach over parametric models lies in several key advantages: (1) Adaptability to changing vocabularies: nonparametric approaches can adapt seamlessly to varying vocabularies.Tis adaptability is crucial when working with text data from diferent sources or domains, where the vocabulary can be entirely unrelated between training and testing datasets.In contrast, parametric models often assume a fxed vocabulary, limiting their ability to handle such dynamic language usage.(2) Complex and diverse data handling: nonparametric approaches excel at managing complex and diverse data, which is common in real-world text applications.Tey can accommodate diferent word choices, expressions, and language styles, making them suitable for texts from various sources.Parametric models may struggle when confronted with data heterogeneity and nonlinear relationships.(3) Enhanced performance in text classifcation and sentiment analysis (SA): the combination of nonparametric neighborhood generation and a novel parametric model brings the best of both worlds.Te nonparametric approach captures contextual information from neighbors, while the parametric model efectively learns informative representations.Tis synergy enables the model to perform complex tasks and achieve a cutting-edge level of performance in text classifcation and SA, surpassing the capabilities of traditional parametric models.(4) Robustness in exploring correlations and bias detection: beyond classifcation, the model's nonparametric foundation allows it to investigate correlations between the input text and related neighborhood texts.Furthermore, it explores the potential for metadata to indicate biases in sentiment labeling.Tis capability is essential for understanding and addressing bias in sentiment analysis applications, ensuring more accurate and fair results.
In summary, the decision to employ a nonparametric model is justifed by its adaptability to varying vocabularies, ability to handle complex and diverse data, improved performance, and robustness in exploring correlations and detecting biases.Tis hybrid approach, combining nonparametric and parametric techniques, is well suited for addressing the challenges posed by dynamic and diverse text data in the context of sentiment analysis and classifcation.

Bi-LSTM.
Te RNN [38] model has gained signifcant attention in NLP due to its complex architecture that facilitates efective feature extraction.Te model demonstrates profciency in processing short data sequences due to its singular memory, which renders it incapable of processing long-term dependency issues.As a result, the LSTM architecture is used as an extension of the RNN model to address the problem of long-term dependency in text SA.Te LSTM model leverages the present word embedding and the preceding hidden state within the context of text sentiment data for every component or term to anticipate the upcoming hidden state.Te hidden state h t ϵ R T (T is feature vector dimension) at time t is updated as follows: where ∘ is the element-wise product symbol, σ(•) is the sigmoid activation function, x t represents the lower layer input at time step t, and tanh is the tangent activation function.i t , f t , o t , and c t are input, forget, output, and memory gates, respectively.Te parameters of the LSTM are W, U, and b.
In sequence modeling tasks, it is benefcial to understand past and future contexts.Adding a second hidden layer in the unidirectional LSTM model expands the architecture, giving rise to the Bi-LSTM [39], which incorporates hidden connections that propagate in the reverse temporal sequence.Te Bi-LSTM model consists of two sequences, which are as follows: Te output is shown in equation ( 4): where ⊕ is the element-wise sum operation.

GRU. GRU is a distinctive variant within the family of
RNNs [40].Te internal unit of the GRU is analogous to the LSTM internal unit [36], except that the GRU combines the forgetting and incoming ports into a single update port.
Although it draws inspiration from the LSTM unit, this model maintains the LSTM's capacity to overcome the vanishing gradient issue.Te simplifed internal architecture of GRUs facilitates their training process by reducing the computational complexity involved in enhancing the internal states.Te hidden state h t ϵ R T (T is feature vector dimension) at time t is updated as follows: where σ is the sigmoid activation function and ∅ h is the hyperbolic tangent function.Te operation ∘ denotes the element-wise vector product, x t denotes an input vector, h t indicates the output vector,  h t denotes the candidate activation vector, z t is the update gate vector, r t is the reset gate vector, and the learned parameters are W, U, and b.
In the proposed architecture, the i-th input (x i ) and its i-th neighbor (z i ) go through RNN-based layers and new representations for the input sample (v x i ) and its i-th neighbor (v z i ) are generated (equations ( 7) and ( 8)).
For Bi-LSTM-based representation, (1) v BiLSTM represents the new representation of the input text x i , generated by applying the Bi-LSTM model to x i .Tis captures both the backward and forward dependencies of a word in the i-th input text (x i ).
(2) Similarly, v BiLSTM z i represents the new representation of the neighbor text z i , generated using Bi-LSTM to capture the backward and forward dependencies of a word in the i-th neighbor (z i ).For GRU-based representation, represents the new representation of the input text x i , generated by applying the GRU model to x i .GRU also extracts contextual and high-level textual information with long-term dependencies of the i-th input text (x i ).
(5) Similarly, v GRU z i represents the new representation of the neighbor text z i , generated using GRU to extract contextual and high-level textual information with long-term dependencies of the i-th neighbor (z i ).
Te representations that were generated from the sample (v BiLSTM x i , v GRU x i ) and its neighbors (v BiLSTM z i , v GRU z i ) are merged after applying the maximum operation that selects the most pertinent data, as shown in the following equation: where "+" represents the concatenation operation and f θ (x i ) is the output feature generated by combining the features from the text x i and its neighbors.Finally, the output feature f θ (x i ) is deeply fused using multiple fully connected layers, as shown in equation (10), with a classifer layer for sentiment classifcation.Te weights and biases are shared among the input text and its k neighbors.In other words, the input text and its neighbors are passed through a common architecture.
where y i ′ is the label predictions of the i-th input text, σ is the activation function, b x and W x represent the biases and weights, and f θ (x i ) is the output feature of the RNN's blocks.
Te proposed method uses the cross-entropy objective function to train the network.Te objective function calculates the loss, which is estimated between predicted labels (equation ( 10)) and the true label (equation ( 11)).
where N is the number of classes and y i indicates the accurate label of i-th text.

Experiment
Te efectiveness of the proposed methodology is assessed and compared with several established baseline techniques, including Bi-LSTM, bidirectional-GRU (Bi-GRU), LSTM, GRU, CNN, and LSTM-CNN.Te parameters and their respective values for the proposed method and other baseline approaches are displayed in Table 1.
Te architectures of Bi-LSTM, LSTM, Bi-GRU, and GRU employ three layers of 128, 64, and 64 units.Furthermore, the classifer employs three fully connected layers with dimensions of 128, 64, 32, and 1 (or more based on class count).Additionally, a 128-dimensional word embedding is utilized.
In the 1D CNN model, three layers are employed, each consisting of 32 flters and a kernel size 5.A 1D max pooling layer is employed with a pooling size of three.A 1D global max pooling layer is employed in the convolutional-based model's last layer.Te classifer employs two layers, with the frst consisting of 16 units and the second consisting of 1 unit (or more based on class count).In addition, a 128dimensional word embedding is employed.
Te LSTM-CNN model incorporates two layers with 32 flters and a kernel size 5.A 1D max pooling layer is employed with a pool size of 3. In the fnal layer of the Complexity LSTM-CNN model, two LSTM layers with 32 units are employed.Te classifer employed two layers, with 16 and 1 unit (or more based on class count).In addition, a 128dimensional word embedding is employed.

Evaluation Criteria.
To assess the efectiveness of the proposed model and conduct a comparative analysis with prior research, we employ the following evaluation metrics: precision, recall, F1-score, and accuracy.Tese metrics are defned in equations ( 12)- (15).In these equations, TP stands for true positive, FP represents false positive, TN stands for true negative, and FN represents false negative [41,42].[43] are used as keywords to collect 20,127 sentiment sentences with two classes (positive, negative), named the "Binary_Getty" (BG) dataset, which includes textual explanations and labels.Te initial labeling is accomplished using the sentiment scores associated with ANP keywords.We further employ the Valence-Aware Dictionary and sentiment Reasoner (VADER) [44], a lexicon and rule-based SA tool (https:// github.com/chute/vaderSentiment(accessed Mar.16, 2023)), to label the preprocessed textual description.Ten, we select only the text samples for which ANP and VADER sentiment scores are the same.Finally, three volunteers were chosen to assess the quality of our datasets.Each sample is graded 1 (suitable) or 0 (unsuitable).Te results show that 95% of the samples are suitable and 5% are unsuitable; we only consider the samples with grade 1 (suitable) and ignore the others.

iStock
Images.iStock images (https://www.istockphoto.com/(accessed Mar.16, 2023)) is an online platform that ofers a wide range of international royalty-free microstock photos, including images and their accompanying textual descriptions, graphics, clipart, videos, and audio tracks.Te same procedure from Getty images is implemented; 3244 ANPs are used as keywords to retrieve 19,279 sentiment sentences with two classes (positive, negative), named the "Binary_iStock" (BIS) dataset.Te dataset includes textual explanations and labels.Te same labeling procedure demonstrated for the BG dataset is used to establish the fnal labeling of the BIS dataset.

Twitter Dataset.
Additionally, we gathered a new dataset from Twitter.English tweets are specifcally gathered using the Twitter streaming application programming interface (API) (https://developer.twitter.com/en(accessed Mar.16, 2023)), with user-generated hashtags as keywords.We carefully fltered out all texts that were too short (less than fve words) or too long (more than 100 words).We use VADER, a lexicon and rule-based SA tool, to speed up the labeling process to predict text sentiment polarity.Based on the projected sentiment polarity, the tweets are manually categorized into neutral, negative, and positive sentiment polarities.Finally, 17,073 high-quality tweets were obtained.

Multiview Sentiment Analysis (MVSA) Dataset.
Te MVSA-Single dataset [45] comprises 5129 image-text pairs extracted from Twitter.After displaying each pair to a single annotator, the annotator assigned one of three polarities (neutral, negative, or positive) to the image-text pair.Like [46], we frst delete tweets with contradicting textual and visual labels.In cases where one modality is labeled as neutral while the other is labeled as positive or (negative), the ultimate polarity assigned to multimodal data is positive or (negative).Tus, we obtain a new MVSA-Single dataset with 4511 text-image pairs.Here, we used only the 8 Complexity textual data from this dataset and considered a benchmark dataset collected mainly from the Twitter website, to demonstrate the outstanding performance of our proposed model using diferent social datasets.

IMDB Dataset.
Te IMDB (https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-moviereviews) is an online database of information related to flms, television series, podcasts, home videos, video games, and critical reviews.It includes 50,000 IMDB movie reviews and the binary sentiment of each movie review: positive or negative.
To prepare text data for sentiment analysis (SA), it undergoes the following preprocessing steps: (1) lowercase, which involves changing all text to lowercase.(2) Remove irrelevant information, including punctuation, special characters, hashtags, multiple spaces, URL references, stop words, and numbers.(3) Emoticon translation involves translating all emoticons into their respective terms.

Experiment 1: Te Efect of the Neighborhood Technique.
Te proposed method's performance is compared with and without the neighborhood technique in this experiment to ascertain the impact of the neighboring technique on our approach.In this experiment, the frst part includes only the original input in the network training phase without the neighboring technique and neighbors are ignored.In the second part, the number of neighbors is considered to be 2. Te outcomes are shown in Table 3.
Te proposed method with the neighboring technique outperforms the method without the neighboring mode.Te use of the data neighbors enables the learning of better representations.Te accuracy, precision, recall, and F1-score of the proposed model with using neighbors improve by an average of 1.20%, 1.52%, 1.42%, and 1.40% compared to without using neighbor modes.For the MVSA-Single dataset, using a neighbor has a more positive efect than other datasets.So, for the MVSA-Single dataset, the accuracy, precision, recall, and F1-Score of the proposed method improved by an average of 1.72%, 2.73%, 1.70%, and 2.63%, respectively.Using neighbors dramatically improves performance.

Experiment 2:
Te Efect of Neighborhood Size.In this experiment, to determine the efect of the neighborhood size in our approach, the performance of the proposed method is evaluated with diferent neighborhood sizes and batches.Our method is executed ten times for each dataset, and the average result is reported.Te results are demonstrated in Table 4.
As demonstrated in Table 4, the efcacy of the proposed method increases with the increase in the neighborhood size.So, for sizes 2, 4, 8, 16, and 32, the average of all criteria has increased by 0.72%, 1.37%, 1.48%, 1.53%, and 1.54%, respectively, compared to size 0. Interestingly, all the criteria improve with the increase in neighbors.However, as the size of the neighborhood increases, the improvement percentage decreases.For example, there is only a 0.03% diference between sizes 16 and 32.
Meanwhile, there is a 0.61% diference between sizes 2 and 4. Te reason is quite apparent; with the increase in the size of the neighborhood, distant neighbors may also be selected.Although these distant neighbors are in the same class as the input sample, the text of the two samples may difer.

Experiment 3: Te Proposed Method vs. Baseline Models.
In this section, an evaluation of the proposed method and other established baseline methods is conducted across all datasets.Te outcomes of the proposed method and other approaches are presented in Table 5.
Table 5 demonstrated that the LSTM model outperformed CNN and GRU, exhibiting comparatively inferior performance.Te hybrid LSTM-CNN model demonstrated superior performance to the LSTM-only model, followed by bidirectional-based methods with relatively good performance, such as Bi-LSTM and Bi-GRU.Te proposed method's average accuracy, precision, recall, and F1-score metrics are 92.77%,93.73%, 91.82%, and 92.73%, respectively.On the second-best method (Bi-GRU), the average values for accuracy, precision, recall, and F1-score metrics are 92.37%,93.32%, 91.42%, and 92.33%, respectively.According to CNN 1D's average recall of 90.54%, more samples from the specifc class are typically misclassifed.According to the suggested method's average recall of 91.82%, fewer samples are regularly misclassifed compared to CNN 1D.As can be seen, the proposed model performs better than the current leading approach in all evaluation metrics.Specifcally, it achieves the highest accuracy of 99.60% when evaluated on the BG dataset.Tis result provides strong evidence for the efectiveness of the DTSC model in enhancing the classifer's performance.4.6.Experiment 4: Generalization.Our model has the beneft of handling scenarios where various types of metadata are accessible during training and testing with ease.Additionally, our model handles circumstances in which the words used over time may change.In other words, there may be some diferences between the words in the training and test sets.
In the real world, the vocabulary or tags may change as new words become popular and older words fall out of favor.Any method that relies on user metadata should be able to handle these conditions.Ideally, to test our model's resilience to changes in user words over time, we should train it with texts from one time and test it with texts from another.
Instead of randomly dividing a dataset into training and test sets in this experiment, we use the BG dataset to train the model, generate neighborhoods in the training phase, and use the BIS dataset to test and generate neighborhoods in the testing phase.Te results are reported in Table 6.
Table 6 presents a comparative analysis between the proposed method and baseline algorithms, namely, LSTM and GRU.Te evaluation uses BG for training and BIS for testing in two distinct scenarios.Te fndings indicate that Complexity the proposed methodology has yielded enhancements in performance, with average accuracy improvements of 1.14% and 1.09% observed for group one and respective improvements of 1.89% and 2.23% achieved for group two.
Tis suggests that leveraging additional metadata, specifcally employing diferent vocabularies during training and testing, generates enhanced representations.Utilizing neighbors plays a crucial role in achieving these improvements.During the network training phase, this approach generates superior features characterized by a high level of generalizability.Even during the testing phase, when the input sample contains words not present in the training dictionary, the model produces superior representations with the assistance of its neighbors.
Te results obtained from Tables 3 to 6 demonstrate the great benefts of the proposed model, which can be summarized as follows: (1) Improved accuracy: the model is designed to enhance accuracy in sentiment categorization, a task known to be challenging, especially in the context of social media data.By leveraging both textual content and extensive metadata, it aims to provide more accurate sentiment analysis.
(2) Contextual understanding: unlike models solely relying on text data, this model considers the contextual aspects of social media posts.Tis contextual understanding is crucial for interpreting sentiment accurately, given the nuances and informal language often used in social media.
(3) Utilizing metadata: the model harnesses the wealth of metadata, or neighboring data, available on platforms like Twitter, which includes information about posts and their authors.Tis additional   In summary, the model's benefts include improved accuracy, enhanced contextual understanding, efective utilization of metadata, fexibility in handling diverse tasks, and cutting-edge performance, all of which contribute to more precise sentiment categorization, especially in the challenging domain of social media analysis.

. Conclusion
Some texts that are difcult to recognize on their own may become more understandable in a neighborhood of related texts with similar contexts.Motivated by this theory, a novel deep text sentiment classifcation (DTSC) model was proposed to improve the classifer's performance by integrating the neighborhood of related texts.Our model uses the nonparametric approach to construct neighborhoods of related texts based on Jaccard similarities.
Moreover, two distinct deep learning-based recurrent neural networks (Bi-LSTM and GRU) were integrated to extract sophisticated features, capture temporal relationships, and generate SA insights.Te result of each module was further processed through the maximum operation, which selects the most pertinent data.Finally, the extracted features were concatenated and subjected to classifcation to achieve accurate sentiment prediction.In contrast to the previous studies, our approach utilizes a nonparametric approach, enabling it to perform strongly even when the text vocabulary varies between training and testing.Te efectiveness of the proposed model was evaluated on fve realworld sentiment datasets of short English text along with a dataset of lengthy movie reviews.Te DTSC model performs more accurately and efciently in identifying and understanding the semantics of both short and lengthy texts when compared to baseline approaches.Te proposed model demonstrated a high level of accuracy across the datasets.Specifcally, it achieved a 99.60% accuracy on the Binary_Getty (BG) dataset, a 98.32% on the Binary_iStock (BIS) dataset, a 96.13% accuracy on Twitter, an 82.19% accuracy on the multiview sentiment analysis (MVSA) dataset, and an 87.60% accuracy on the IMDB dataset.Tese fndings indicate that the proposed model performs better than the existing state-of-the-art techniques regarding model evaluation criteria for text sentiment classifcation.Future works primarily comprise (1) broadening the model's scope to encompass additional languages, such as Persian, and (2) leveraging transformer-based language models to produce more resilient embedding representations.

Table 2 :
Te complete statistics of each dataset.

Table 4 :
Te efect of neighborhood size.

Table 3 :
Te efect of the neighborhood technique.
(4)Flexibility: the model's nonparametric generation of text neighborhoods and subsequent use of a novel parametric model make it adaptable to various tasks.It can handle complex sentiment analysis tasks that may be impractical for other techniques.(5)Cutting-edge performance: the model demonstrates a state-of-the-art level of performance in text sentiment classifcation.It outperforms traditional

Table 5 :
Performance comparison of diferent models.