Augmentation of Contextualized Concatenated Word Representation and Dilated Convolution Neural Network for Sentiment Analysis

Department of Information Systems, School of Business and Economics, University of Management and Technology, Lahore, Pakistan School of Information Science and Technology, Northwest University, Xi’an, China Department of Information Technology, The University of Haripur, Pakistan Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11633, Saudi Arabia School of Electrical Engineering and Computer Science, National University of Sciences and Technology (NUST), Islamabad, Pakistan Department of Electrical Engineering, FAST National University of Computer and Emerging Sciences, CFD Campus, Faisalabad, Pakistan


Introduction
In topical years, progress towards intelligent applications showed excellent technological developments through social media data analytics [1]. These advancements are regulated mainly and decently using social networks like Twitter, Facebook, and Instagram [2]. These social networks now transformed into a potential origin for mining social information to prevail over people's sentiments. The enormous opinions over social media comprise simple word sentences and hold helpful information in several aspects. Consequently, social media data can be engaged to determine valued insights. Development in the pattern of social media data mining algorithms must focus on textual data. Sentiment analysis based on social media data is a rapidly evolving field to understand people's opinions, attitudes, and behaviors. An intelligent application can benefit social media sentiment analysis as these attitudes, feelings, and reactions can be correlated to the disasters, epidemic situations, government policies, and people perception, which is a substantial source of assessing the polarity: positive, negative, and neutral.
These applications concentrate on improvements in multiple aspects such as technological and legislative measures with social media data mining. It could also raise people's perspectives and cognition and endue them to acquire a viable environment [3]. The social media data on social networks consider a principal source of reviews against the events, disasters, and current epidemic situations though the challenge is a sizable social data that concerns efficient and scalable techniques to work on noise. The requirement for cleaning the noisy data requires automatic techniques for the classification of worthy information. It also goes through various issues, such as sentences written in short length, notations, and typos mistakes. In this regard, semantic exploration can adapt excessive social media data for syntactic regularities towards advancements through incorporation and scalability [4]. Thus, social media data mining algorithms consider gathering and handling social data effectively on Instagram, Facebook, and Twitter. We decide on twitter, which incorporates various posts and comprises 280 characters [5] for sentiment analysis.
Nowadays, machine learning methods are leveraged to augment services by mining social media data [6]. Among numerous methods of machine learning towards the classification of sentiments, Naıve Bayes (NB) is exploited for topic detection [7], sentiment analysis [8], recommendation systems [9], and spam detection [10]. Further, a support vector machine that is a preferred technique in social media data has been applied [11]. However, due to the varying size of sequence composition, the stated methods are problematic towards extracting the features, which is significant. A subfield of machine learning stated that deep learning incorporates neural architectures to extract expeditious high-level features while considering social media data classification. Also, techniques based on neural networks are increasingly utilized to solve the problems associated with supervised as well as unsupervised learning [12][13][14].
Deep learning methodologies assure researchers that the use of neural networks empowers extracting features devoid of involving the complex engineering of features [15,16]. Feature extraction and classification are carried out through a sequence of words by multiplying with related weight as a one-hot vector or matrix [17]. The succession of a respective word is interpreted by way of continuous vector space initializing to neural architecture using several layers for prediction. This impacts the learning set to increase the classification evaluation metrics such as accuracy defined in [18]. Among neural architectures, convolutional neural networks have attained adequate results to classify sentences obtained from social media [19,20]. Multiple distributed word representations such as Word2Vec [21], GloVe [22], and FastText [23] can learn through mapping the words upon lower dimensions. A technique for extracting features using handcrafted features to classify sentences featuring convolutional neural networks introduced in [24] cannot hold long-term dependencies. Relatively, the employment of the CNN variant as dilated convolution removes the consequences containing information loss owing to traditional approaches of down sampling in conventional pooling operations and the stride convolution. Additionally, it scales receptive fields significantly, devoid of more parameters that make dilated convolution feasible to hold long-term dependencies and semantics.
Though many research works inadequately coupled relations regarding social media data ought to be intensified, this study modifies a deep neural network technique that automatically specifies social media data engaging a dilated convolutional neural network architecture in the parallel mechanism. From our best evaluation, a parallel mechanism in a dilated convolutional neural network can efficiently predict appropriate information by learning features from a contextualized concatenated word representational model using different embeddings. Beyond the developments, various hyperparameters employing dilated convolutional neural networks to analyze social media data sentiments are reasoned. This paper sets up a new approach for sentiment analysis and is utilizable to improve many services. The following are the contributions of this work: (i) Contextualized concatenated word representational (CCWRs) model is utilized to get classifier's improved exhibition features compared with many state-of-the-art techniques (ii) A parallel mechanism in three dilated convolution pooling layers featured different dilation rates, and two fully connected layers in a novel approach are considered (iii) Lastly, the work undertakes a deep learning approach using multiple parameters and hyperparameters to offer intelligent applications using Twitter data for sentiment analysis to enhance people's behavior The rest of the paper is organized as follows. Section 2 continued with related work. The proposed framework for sentiment analysis is accessible in Section 3. Section 4 covers experimental setups and results, while the discussion is in Section 5. Finally, the paper is concluded in Section 6.

Related Work
The continuous social media data maturation has incited an advanced degree in scientific and sustainable smart urban explorations. Plenty of tasks have been performed over sustainability toward smart applications by social media networks [25,26]. Still, the collective signification is not yet 2 Wireless Communications and Mobile Computing entirely considered and admired [27]. Social media users now appreciate the accessibility and necessity of smart services that imply smart applications' requirements by concentrating on social media contexts [28,29]. These smart applications activate the combination of social media networks and a smart environment for the social user's opinions, and prospects possess a sound impact, as explained in [30,31]. By concentrating on social media networks with associated information, like hashtags, time, location, and name, the present work intends to explore how these networks fundamentally contribute to elevating the importance of smart applications. Therefore, social media network users regarded as smart application sensors together with associated metainformation can be utilized in many research works as described by [32][33][34]. Moreover, the data composed by social users tend to be syntactically unique and ubiquitous using smartphones, which can be appropriately collected and analyzed the information in a short textual framework [35]. Numerous multiple perspectives, from event detection to disease tracking and monitoring employing small textual content on social media, have been proposed in [35][36][37]. This short textual content of social media networks such as Twitter is semantically essential and extensively utilized in heterogeneous applications related to text classification [38,39].
A methodology using features (metalevel) considering emotions from Twitter data for the polarity classification has been offered [40]. Many methodologies of manual tagging data gathered from Twitter with metainformation similar to location and social user for training based on conditional random fields as presented in [41,42]. Similarly, multiple techniques categorize different topics like joy, fear, anger, love, and surprise by tagging on Twitter proposed in [43,44], to accomplish emotional analysis. To originate smart applications, Twitter can identify numerous aspects such as people inferring and trends. However, these aspects are bound toward noise from nonassociative contents, which are crucial, as clarified in [45]. Therefore, a filter should be considered to attain adequate associative information on mentions, URLs, slang words, and numbers. A set of features as the evaluation metrics rely on features class.
Machine and deep learning-centered techniques employing social media data must be essential to mine valuable insights as presented in [46,47]. The mining process, such as social media data, intends to assess people's opinions in many prospects, such as gathering data linked to the user and observing interactions [47]. Overall representation starts from unstructured to structured data; deep learning performs better than machine learning methods, which are time-consuming and require complex manual feature engineering processes. These positive aspects in deep learning methods to mine opinions from enormous social networks in streaming, multimedia, and textual framework provoked researchers in numerous works [48][49][50][51][52]. Therefore, extracting opinions from social networks such as Twitter in text form using unsupervised learning is significantly regarded for appropriate representation in this work.
Among many neural networks, traditional convolutional and recurrent neural networks have prominently accom-plished higher outcomes from social networks for capturing long-term dependencies and extracting the opinions, such as sentiment analysis. An optimized version of recurrent neural network, long-short-term memory utilized to improve the semantics, is proposed in [53], but the training of mentioned version was computationally difficult. To extract syntactic features and perform faster training for text classification based on social media, convolutional neural networks testified to be more suitable [54,55]. Further, a convolutional neural network as a joint trained task can substantially extract features as well as classification [56]. Many researchers are progressively employing convolutional in recent works with pooling layers for morphological modeling [57,58].
Similarly, to cope with contextual data in character and sentence-level, two convolutional layers of deep architecture for the classification of short texts are offered in [59]. However, conventional convolutional architecture requires multiple layers stacking with the length of text. An improved form referred to as dilated convolution neural network; comparatively, a more sensible choice capable of the increased size of receptive field size adequately overcomes the issues and utilized in many works [60][61][62].
Distributed word representation in deep learning transforms words into a continuous vector; likewise, pretrained learning representation makes an essential impact while classifying the social data for sentiment analysis. It is also observed that social data classification augments the learning toward syntactical, phonological, and sentimental information; some of the works attempt to combine pretrained vectors. But the syntactical and phonological issues are demanding relationships between the words for the sufficient and actual classification as explained in [63][64][65], although the concept of combining varied pretrained representations is of significant results concerning different channels towards the classification of multiple sentences as presented in [66]. However, the dimensions of combined representations should be the same, restricting the scope and usage of pretrained representations due to multiple dimensions.
Furthermore, CNN has been employed to identify the actual word events from sentence-level social data by considering position along with entity explained in [67]. A more in-depth work focused on social media data pursuance of sentence-level classification confirms the ubiquity of multiple opinions or events in [68]. In this work, a dynamic multipooling layer is introduced to extract opinions about events for improved information. Although CNNs have been in continuous considerations among researchers, long-term dependencies regarding semantic features toward input sequence remain challenging. Also, it has been observed that CNNs tend to depend on stacking various layers together with the convolution-pooling to adapt long-term semantic dependencies. From the best of our knowledge, a dilated convolution network that adjusts dimension and encompasses addition in the receptive field's size devoid of loss of detailed information and the problem of long contextual and semantic dependencies is addressed effectively through varied dilation rates.

The Proposed Architecture for Sentiment Analysis Based on CCWRs and 3D-CNN
This section contributes a theoretical model regarding a potential approach for social media data considering Twitter for sentiment analysis based on COVID's perspective. Initially, different word representational models are concatenated, referred to as contextualized concatenated word representation (CCWRs) Second, the 3D-CNN architecture in this work is made by utilizing three dilated convolution kernels and two fully connected layers to seizure long-term contextual dependencies concerning semantic features. We utilized multiple dimensional convolution processes to manage additional complexities toward enhanced performance by initializing the words to the 3D-CNN as a matrix to extract sufficient features through corresponding weights. Our proposed architecture has subsequent portions: concatenated word representations, three dilated convolution-pooling layers, two fully connected layers, and Softmax (as shown in Figure 1).

Contextualized Concatenated Word Representations (CCWRs).
To represent the words as dense vectors, word representation models are regarded as essentials for feature extraction. These word representations are effective and result in advances toward the execution of social media data sentiment analysis as described in [69][70][71]. Many recent works have considered improved word and feature representations by way of different word embedding models [65,72,73]. Though these models diversify in architecture and pretraining, they still encode the input according to the surroundings. Words are represented utilizing only a single pretrained language model. Further, these representations are unfeasible as a result of slow training and evaluation. Using pretrained models trained on multiple datasets exploits the biasness in different datasets, leading to numerous representations associated with the same word. On the other hand, the concatenation of multiple word representational model can produce better representations devoid of computational complexities compared to a single model nearer to contextualized embeddings.
In this work, different pretrained word representational models such as Word2vec [21], fastText [23], and GloVe [22] are concatenated to deal with sentiments regarding contextualized and semantic information through the weighted mechanism. We leverage multiple word representations to produce a single table for every pretrained model in which the token of related input is embedded into a single vector space. Then, the subsequent vectors tend to be concatenated toward a single vector. Such weighted concatenation sufficiently upgrades the semantics and can handle the most recent problems identified with the misspelled and out of vocabulary words. The process of concatenation using associated weights assists in exploring better representations and functionally helps for sentence encoding in pursuance of feature selection. We utilize GloVe trained on Twitter, having 2 billion tweets, 27 billion tokens, and 1.2 million vocabulary, Word2vec 30 million tweets, and google news. In contrast, fastText on 1-million-word vectors, 16 billion tokens with subword information on Wikipedia, UMBC web-based corpus, and http://statmt.org news dataset considering dimensions range from 100 to 200.
We discard words that lie less than ten times and convert the characters to lowercase, and the most acceptable size corresponding to context window size selection is 5. In the training of CCWRs, we proportionately dropped the learning rate with the improvement in training. It has been observed that early regarded text does have an overall impact on the precession of the model. In this work, we train the concatenated representations on multiple datasets associated with different anomalies in the world. The first dataset is congregated by the use of TAGS and streaming API as described in [74]. The variety of keywords seemed to be evolving incessantly over social media. However, to stream the tweets of contextual perspective, the rational filtering keywords of our work exhibited in the table accumulated

Three Dilated Convolutional Neural Network (3D-CNN).
In sentiment analysis based on the textual framework, the conventional CNN, due to the pooling layer, some significant text features are missed during calculation, ending in the adverse effect in overall network precision. Similarly, to acquire significant features, CNN architecture is deepened by stacking more layers, which concerns more parameters and additional computational resources. Also, the backpropagation of gradient may lead to vanishing gradient while increasing layers in network, causing performance to reduce significantly. Further, the limited size of convolutional kernels causes classical CNN only to hold short-term dependencies of text. To handle stated issues, a process that places zeros into the primary convolution kernel formed the dilated convolution kernel as introduced in [60]. This placement by way of intensifying receptive field size enables capturing more information and uplifting network's entire performance, for example, Figure 2 by which conventional convolution kernel of 3 × 3 size, a point upon 0 weight placed on each point in a matrix in Figure 2(a) and then develop in Figure 2(b); equally, Figure 2(c) exhibits as receptive field size 3 × 3, 7 × 7, 15 × 15 concerning convolution kernel. The receptive field size tends to increase as the placed holes increase. However, it is observed that the parameters are the same as shown in Figures 2(a)-2(c). The dilated convolution kernel seems to process the text enabling the convolution kernel to obtain additional information without additional computational resources. Therefore, the increase in receptive field size is essential for many tasks such as prediction and classification. A three dilated convolutional layers model is presented in Figure 3 showing the significant rise in dilation rate at each layer. The model with a specific feature map as ðF 1 , F 2, F 3 , F 4 Þof dilation convolution ð1, 2, 3, 4Þ together with the receptive field size of each element as ðð3 × 3Þ, ð2 3 − 1Þ × ð2 3 − 1Þ, ð2 4 − 1Þ × ð2 4 − 1ÞÞ where receptive field size of each element in F i+1 by way of ð2 i+2 − 1Þ × ð2 i+2 − 1Þ can be seen.
To maximize the performance of the dilated CNN over the traditional CNN model, a novel architecture 3D-CNN following CCWRs containing three dilated convolution and pooling layers via two fully connected layers is proposed. The increase in the receptive field size extracts sufficient linguistics and contextual information without affecting and extending dimensions and parameters. This implementation efficiently increases the convolution kernel considering multiple scales with the aid of dilated operation while applying distinctive dilation rates as described in [77] and shown in Figure 4.
The choice of these dilation rates is significant when designing the structure of 3D-CNN as mentioned in [21] depends upon: Here, i = 1, 2, ⋯, n ; d i as dilation rate toward the i th layer, whereas D i being foremost d i in the prescribed layer. Figure 5(a) has three dilated convolution kernels utilizing 3 * 3 as size with d i = 2, similarly, Figure 5(b) using d i = 1, 2, and 3.
The considered dilated convolution kernels ð1, 2, 4Þ over several dilation rates ð1, 2, 4Þ ensure the extraction of each semantics and not obstinate contextual information while extracting the feature maps. Our model significantly extracts the semantics and contextual information for sentiment (a) (b) (c) Figure 2: The process of stacking and its effect of dilated convolution kernel [60]. The three dilated convolution-pooling layer calculation for the extraction of long semantics and contextual information following CCWRs is presented in the following articulations: where the dilated convolution is F * d i in the particular layer.

Experiments
This section initially incorporates the datasets, experimentation, and analysis of the results via different methodologies compared with multiple datasets. The main goal is to evaluate the proposed novel technique; this work presents an appropriate classifier including three dilated convolutional layers accompanied through CCWRs.

Datasets.
To precisely evaluate, we test the proposed model on two datasets towards suitability, adaptability, and reliability. The first dataset encompasses various 27 events in real-world life, such as disasters, emergencies, and incidents that are publicly accessible [78]. The second dataset is congregated by streaming and TAGS [79]. These API are considered as interface Twitter search interface, which utilizes the keywords and terms specified by the user. The user places a query, and TAGS retains the results through a free google sheet and offers setup tags to update the sheet whenever needed. The keywords selected for tweets gathered from September 2020 to March 2021 are mentioned in Table 1 as search terms accumulate to 18920. Social media data like Twitter carry a lot of noise, such as numbers, URL, and user mentions, to normalize the data and handle redundancy; some preprocessing techniques, tokenization, and lemmatization are considered described in [80]. Further, tweets fea-turing only five words as well as the stop words decided to eliminate. A shuffling in both datasets is determined for reliable outcomes, applicability, and appropriate analysis since all datasets are equal for better performance [81]. Four sets from the shuffling of selected datasets are acquired in which each fuses a comparable amount of tweets alluded to ESD1, ESD2, ESD3, and ESD4 as equally shuffled data.

Experimental
Setup. The experimentations accumulated using diverse datasets coupled with multiple word representations language models to provide contextual concatenated word representation by considering suitable parameters. The activation functions, optimization algorithm, training, minibatch size, filter size, number of hidden layers, the receptive field size, and the number of epochs incorporated are presented in Table 1. To deal with the issue of vanishing gradient in training, the rectified linear unit (Relu) and hyperbolic tangent (Tanh) are taken into consideration, which generally sets the output and serves input to neurons in the subsequent layer as explained in [82]. For the regular distribution and to reduce overfitting, a variant of Relu, a randomized (RRelu), is also considered through which the parameters with regard to negative impacts are sampled randomly [83]. For training, optimization algorithms such as stochastic gradient descent SGD, a 0.01 learning rate, and a stochastic optimization ADAM learning rate of 0.001 are utilized. Further, to improve training performance, the root mean squared propagation (Rmprop) optimizer that calculates the gradients upon a fixed window is regarded. There is no conceptualization for a specific choice of neurons in hidden layers; similarly, the wrong choice of neurons results in underfitting and overfitting due to few or more neurons that ultimately influence model's training [84]. Keeping in mind the nature of this work, the different choices of neurons 150, 300, and 400 in hidden layers are adequate to evaluate.
Moreover, training in neural architectures is reasonable using minibatches to split the large datasets into smaller sets. The minibatch gradient descent is taken into account with batch sizes of 64, 128, and 256, respectively. On the other hand, considering 10, 15, and 20 widths of the model since model's width is determined by the choices of hidden layers that impact the entire complexity of the neural network architecture. Similarly, for generalization, epochs refer to the number of times a dataset tends to pass through the network and cause the model to under or overfit due to selected epochs number. The selected number of epochs continues to be 10 to 100 for best critical analysis by considering mentioned hyperparameters. Lastly, varied filter size incorporates ð2, 2, 3Þ, ð2, 2, 4Þ, ð2, 2, 5Þ, as well as the dilation width is set to ð1, 2, 4Þ.

Results and Analysis.
We completed experimentations on numerous assessment metrics on equally shuffled datasets by way of baseline and proposed methods. Precession, recall, classical metric accuracy, and f 1-score to inspect the symmetry in recall and precision are considered to deal with the imbalanced data. A 16 core processor of 3000 MHZ and 32 GB RAM is used to accomplish all the experimentation. Additionally, ML library "TensorFlow," an open-sourced   Wireless Communications and Mobile Computing [85], is involved in the training and comparing the proposed framework [85]. We settled the evaluation by including baseline models, which contained each model and pretrained word vectors compared with the proposed framework, with each pretrained vector on similar architecture involving hyperparameters for the comparative analysis carried out. The evaluation metrics for baseline models are accuracy together with F1-score. The most elevated accuracy accomplished in baseline models is 74.04% and F1-score 70.42% by utilizing FastText and 73.65% and F1-score 70.64% through GloVe ESD-1, which are presented in Tables 2  and 3 and displayed in Figures 4 and 5.
The accuracy achieved on the proposed model is mentioned in Tables 4, 5, 6, and 7, along with hyperparameter settings selected and shown in Table 8. The employment of hidden layers seems to have a significant impact on the improvement of network tuning. Astonishingly, the optimization algorithm Rmprop attains dependable accuracy of 77.92% on ESD-4 with a batch size of 256, 20 hidden layers, and 300 neurons using randomized rectified linear units (RRelu), whereas the selection of other parameters accomplished slightly closer accuracy 77.80% on ESD-3 with a batch size of 256, 15 number of hidden layers, and 150 neurons using rectified linear unit (Relu) through SGD and 76.84% on ESD-4 with a batch size of 128, 15 number of hidden layers, and 150 neurons using tanh through ADAM, respectively. Further, the batch sizes from 64, 128, and 256, in our architecture achieved the best accuracy on the equally shuffled dataset denying many works claiming the best performance with 2 to 32 batch size. The other evaluation metrics for handling imbalanced data are precision, F1-score, and recall (as shown in Tables 9, 10, 11, and 12).

Discussion
Deep learning-based methodologies promoted the significant availability of word representations models such as Word2Vev, GloVe, and Fasttext. This work investigates the quality of the different word representation models to perform social media sentiment analysis for intelligent applications. Our work referred to the collection, selection, and evaluation of multiple standard metrics and appropriate hyperparameters mentioned in Table 8. The foremost challenge during this work is related to dimensions of multiple word representation models by way of weighted (a) (b) Figure 5: The stacking effect of dilated convolution kernels [21].            concatenation to produce novel contextual concatenated word representation (CCWRs). The maturation of dilated convolution neural network architecture referred to as 3D-CNN is employed to increase the scale of receptive fields with different dilation rates to attain long-term contextual regularities. The 3D-CNN architecture incorporates three dilated convolutional layers and a pair of fully connected layers. Following CCWRs, the processing of successive textual data and the computational time is spatially regulated by a succession of text [62]. Throughout this work as evident from Figures 10 and 11, it is observed that merely the stacking process of dilated convolution kernels effectively reduces training time and raises training accuracy to a definite level; however, not satisfactory enough to enhance the testing accuracy. This happens due to discontinuation between the dilated convolutions kernels, which captures minor information causes to neglect the constancy of information. Also, fixed-rate size during the extraction of feature maps, the big and little size information, cannot be considered simultaneously. These issues influence the training as well as testing accuracy of the fixed dilated convolutional model. In our work, novel CCWRs with 3D-CNN dilated varying dilation rates in the multiple layers utilize convolution operations series to capture complete information devoid of holes or missing. This successfully avoids information loss and the problem of testing accuracy using different dilated convolution kernels by increasing receptive field size.
By correlating the multiple distributed word representation model and contextual concatenated word representation model, we acknowledge that the development of CCWRs is significant despite including the small size of the corpus. Our experimentation provides the implementation of 3D-CNN in terms of important revelations such as (i) multiple word representation models by way of weighted concatenation for the generation of contextual representation along with two fully connected layers to classify social media data utilizing the linguistics regarding social media for intelligent applications and (ii) comparing and analyzing the optimization, preference, selection, tuning, and configuration of multiple parameters indicates the significant effect on the entire structure.
Nowadays, data available on social platforms, such as Twitter, is frequently used and has exceptional impacts on making intelligent and informed decisions marking which can be analyzed concerning people's opinions toward realworld events. Though many methodologies have been examined, it is still unable to mine out of vocabulary, misspelled, 73        This work is also essential to social media textual data mining algorithm to consider real-word situations like disasters and current COVID-19 that entails well-timed effective techniques by observing people's impulses to assist the government in policy and strategic decisions.
Further, this paper is a significant source of accessibility of authentic, powerful, and evolving techniques concerning authorities necessary to consider the varying situation of the world with multiple variants of COVID-19. More indepth, the idea can also extend to empower smart cities by contributing new methods through professionals by developing intelligent applications in epidemic situations towards the robustness of techniques and interpretations. To institutionalize intelligent applications regarded as an essential means, there is a need for propositions to use social media textual data mining algorithms in an intelligent environment involving a rapidly increasing social media textual data size. We can say that the development of the proactive, responsive, and cost-effective intelligent application will remain inadequate while performing without inheriting the significance of deep learning approaches and, more importantly, mining of insights of social media data.

Conclusion
The significance of social media data established an essential mean to realize people's attitudes to improve service. This paper uniquely formulates several hyperparameter tuning, selection, and configurations towards maximum model optimization on different valuation metrics. Proposing contextual concatenated word representations (CCWRs) trained on streamed social media data effectively surpasses various word representation models and overcomes out of vocabulary (OOV) words problem to some extent. Also, a novel proposition of three dilated convolution layers (3D-CNN) upon different dilated convolution kernels at each layer instead of stacking convolutional layers is utilized via a series of experimentations and verifications on multiple datasets. The proposed architecture as the augmentation of CCWRs and (3D-CNN) in the manner above accurately performs with many views such as avoiding loss of detailed, informative messages and capturing the long contextual information. However, it has been concluded that specific extensive training social media data can be helpful to extend evaluation metrics. Further, in our method, the imbalanced training data and subject-based collection of social media data from Twitter through relevant keywords is still a challenge that can be dilated in future work.

Data Availability
The data used to support the findings of this study are included in the article.

Conflicts of Interest
The authors declare that they have no conflicts of interest. 76 Figure 11: Evaluation metrics on proposed method on ESD-4.

10
Wireless Communications and Mobile Computing