Targeted Advertising in Social Media Platforms Using Hybrid Convolutional Learning Method besides Efficient Feature Weights

the


Introduction
The majority of advertising budgets are allocated to social media advertising. By 2021, global advertising expenditures are projected to reach 757.44 billion USD. The three principal players in online advertising are as follows: (i) the company desiring to serve online advertisements, known as the advertiser; (ii) the company delivering the advertisements via its website, known as the publisher; and (iii) the individual browsing the publisher's website and being exposed to advertisements, known as the customer. Social media marketing has become very appealing to advertisers due to the growing number of active users and the ability to establish one-to-one relationships with customers [1]. Consequently, advertising companies are used to enhance campaign targeting and define better advertisement placement strategies [2].
Companies that invest significant funds in advertising expect a successful promotion and increased sales. Effective advertising is a challenge for advertisers and companies alike. Businesses consider each advertisement that influences the audience and becomes popular with them. Influencers and well-known locations can be utilized in advertisements to attract the audience's attention.
Recent research has employed popularity metrics to predict and analyze Instagram popularity. These metrics include the popularity prediction [3,4], the fake users [5], the number of likes [6][7][8][9], the engagement rate [10], the intrinsic popularity [11,12], and virality [13]. Numerous factors, including hashtags [14], metadata, visuals [7,9], and sentiments [15], were considered during the analysis. The engagement rate, which is the number of likes per follower, is the most popular metric for measuring popularity. Another report suggested alternative metrics, including the growth, reach, and impression of followers [16]. The engagement rate represents the reaction of users based on quantifiable metrics such as likes and comments. Increasing engagement rate on social networks, particularly Instagram, is a factor of competition for pages that advertise products. Prior research highlighted the evaluation of post quality and engagement rate for user-generated text content. However, they failed to consider image attributes and additional contexts, including emotional attributes, captions, and hashtags of published posts.
Because of massive gains in processing capability and data storage, deep learning (DL) and machine learning (ML) have attracted a lot of interest in recent years. The approaches may be applied to a variety of real-world systems [17][18][19]. Clustering is a technique for analyzing market data. Data clustering is used to identify a classification in which data share hidden characteristics [20]. These techniques are highly effective during the early stages of data analysis when the dataset is unknown. One of the most significant clustering methods is fuzzy c-means clustering, which faces difficulties with high-dimensional datasets and numerous archetypes [21]. Moreover, the performance of the fuzzy c-means algorithm (FCM) is strongly influenced by the selection of the primary centroid clusters [22]. Recently, deep learning algorithms, particularly convolutional neural (CNN) and long short-term memory (LSTM) networks, have been applied to marketing for more accurate analysis. Deep learning is utilized in various marketing fields, including marketing intention detection, advertising, and data mining [23]. For instance, a great number of articles have studied the application of semantic analysis with deep learning techniques on the texts communicated in social networks. Through the Seq2Seq dialect normalization and the transformers analysis, Chennafi et al. [24] analyzed the Arabic texts. By decreasing the number of OOV (out of vocabulary) words in the current technique, a Seq2Seq model for dialect normalization may operate as a preprocessing stage for the classification task of ABSA (aspectbased sentiment analysis). As a result, the accuracy of the model is improved. Recognizing the types of advertising texts, that is, offensive language, racism, hatred, and entire types of verbal violence, is of utmost importance. The classification system developed by Aldjanabi et al. [25] aimed at determining hate and offensive speech via multitask learning (MTL) model, generated on the basis of a pretrained language model. The developed multitask learning model indicated a noticeable performance, which was much better compared to that of the other models on three out of four datasets for the purpose of detecting hate and offensive speeches. A similar study was also carried out by Fan et al. [26] on the tweets published on the Twitter social network platform in order to categorize and identify the hashtags written about UK Brexit. According to the results, their suggested model could analyze and classify toxic tweets efficiently. In order to comprehend the ad complicity criteria on social media platforms, Ghanbarpour et al. [27] employed statistical techniques. According to their results, consumers featuring higher perceived ad complicity experienced higher perceived ad intrusiveness.
This study demonstrates the feasibility and utility of a hybrid convolutional model based on FCM and Extreme Gradient Boosting (XGBoost) for evaluating and estimating the accuracy of user engagement with offered advertisements, preventing spam advertisements, and lowering the cost for advertisers. We cluster the selected data based on the weight and significance of their attributes using the FCM algorithm and the XGBoost method, and then, we use CNNand LSTM-based methods to train and predict the user engagement rate with a published advertisement. Innovations in our method are as follows: (i) Utilizing and extracting a dataset of the promoted post's primary attributes and context (ii) Determining the weight of effective attributes on user engagement rate and providing a clustering technique based on XGBoost and FCM methods (iii) Employing a hybrid convolutional algorithm based on the feature-weight (FW) and CNN-LSTM algorithms to learn and predict the user engagement rate in promotions This research is structured as follows: in the first section, we examine the existing literature on evaluating the efficacy of advertising in social networks and machine learning algorithms. Section 2 evaluates the user engagement rate with advertising using a deep learning methodology. The final section discusses the results, outcomes, and future works.

Engagement Rate.
In practice, advertisers may not be able to participate in every ad auction due to their limited budget and a large number of online users. They must first identify potential user interests and then bid on audiences whose interests align with the advertiser's advertising campaigns [28]. Numerous studies use the engagement rate as an essential criterion for measuring the success rate of targeted advertising. Accurate prediction of engagement rate reveals user interest and response to published advertisements.
In several studies, the engagement rate with online advertising as measured by the click-through rate (CTR) has been analyzed [29]. Tree structures and parameters used to extract the CTR of a new impression are constructed using historical data. Commonly used generative models include hierarchical Bayesian frameworks [30] and CTR hierarchy trees [31]. The natural advantage of a generative model is its interpretability, which enables businesses to determine which factor(s) improves CTR values the most. However, due to model limitations, these methods can frequently only estimate a limited number of parameters (e.g., by using several chosen factors to break down the tree hierarchy). In addition, they cannot accurately estimate CTR by incorporating a significant amount of useful information from websites, publishers, and users. On the other hand, the increasing use of machine learning (deep learning in particular) has given rise to a group of predictive modeling techniques that view user clicks as binary events and train a classifier through supervised learning to estimate the probability of an advertisement impression being clicked [32]. Several CTR estimation methods based on deep neural networks are included in these techniques [33]. Generally, these techniques can manage tens of thousands of features and are typically more robust than generative models. However, when it comes to interpretability, they are not entirely transparent. Hong et al. [34] used a combined convolutional method based on a recurrent neural network (RNN) to classify and extract the users' interests in social networks. The results showed that combining text and image information could improve classification precision. Yap [35] evaluated Facebook engagement rates to determine the efficacy of library advertisements. To this end, they analyzed users' average Facebook page engagement rates kept in university libraries. Luke and Suharjito [36] investigated the use of Twitter to post product advertisements. Their proposed method measured the effectiveness of advertisements by analyzing user engagement with advertising tweets. Consequently, they utilized the Naïve Bayes algorithm to categorize and estimate the engagement rate of Twitter followers based on the products or services they promoted. In their study, Bonilla-Quijada et al. [37] measured the effectiveness of urban tourism advertising using the engagement rate. They analyzed the number of likes and comments posted on Instagram to determine user engagement rate and then measured the effectiveness of published ads. According to Zheng et al. [38], user engagement directly affects brand loyalty. They evaluated user engagement by creating online communities that encouraged other users to engage. In a separate study, Kim et al. [39] examined the effects of various Facebook content types on the engagement rate of online users. It was observed that educational posts had the greatest impact on online users. In addition, engagement was identified as a significant factor in disseminating knowledge through online tools. Stefko et al. [40] observed that the success of a promotional post is contingent on maximizing users' engagement with that post via likes, comments, and sharing, among other activities. Gasparoni [41] examined the correlation between user age and level of engagement with various social media platforms. Their findings indicated that different age groups utilize social media differently. Accordingly, they observed that people over 45 prefer Facebook, whereas younger people (those aged 18 to 34 years) are more active on Instagram.
The impact of various factors (including user characteristics, posts, emotions, relationships, images, and backgrounds, among others) on the engagement rate must be investigated because assessing these influential factors in different networks can increase the engagement of users with advertising posts and thus increase the success rate of targeted advertising. The following section describes relevant methods.

Feature-Weight Learning Algorithm: Extreme Gradient
Boosting. XGBoost [42] is an improved algorithm derived from the gradient boosting decision tree. It can efficiently construct boosted trees and execute parallel operations. These boosted trees are categorized into classification and regression trees. The central idea behind the algorithm is to optimize the objective function. While feature vectors compute the similarity of the history and forecasting days, gradient boosting builds boosted trees that can extract the feature scores intelligently, demonstrating to the training model the importance of each feature. Increased usage of a feature for making important decisions with boosted trees results in a higher score for that feature. The algorithm counts importance via "cover," "frequency," and "gain" [43]. The cover is defined as the relative value of observing a feature. Frequency, a simplified version of gain, is defined as the occurrence of a given feature in all constructed trees. Moreover, the gain is the most significant reference factor for determining the importance of a particular feature in the tree branches. "Gain" is used to set the feature importance in this study. For a single decision tree T, Breiman et al. [44] proposed the following as a score of importance for each predictor feature X l : There are J − 1 internal nodes in the decision tree, which divide the region into two subregions over each node t using the prediction feature X l . The selected feature is the one that provides the most significant estimated improvement τ 2 t in the squared error risk over a constant fit across the entire region. The squared importance of feature X l is equal to the sum of similar squared improvements at all the J − 1 nodes, for which this feature was selected as the splitting feature. The importance is calculated over the additive M trees as follows: 2.3. Fuzzy Clustering Algorithm. Lotfi Zadeh pioneered the fuzzy set theory to eliminate linguistic ambiguities and event uncertainties [45,46]. In the traditional set theory, set boundaries are defined so that a given element is either definitely included or definitely excluded from the set. A similar clear distinction exists in classical logic, which states that a given proposition is either entirely accurate or incorrect. However, many propositions and complexes may not conform to such distinct and clear boundaries.
A fuzzy set results when a feature is not fully clear, that is, when the membership of all or some members belonging to a set is not completely clear. In hard clustering or classic strategies, such as K-means clustering, a sample or object either definitely belongs to a set or not, with a respective membership degree of either 1 or 0. In the fuzzy or soft membership theory, an object may belong to a cluster with a membership degree between 0 and 1 [47]. Fuzzy clustering may refer to precise or fuzzy data analysis using fuzzy methods. However, this article used fuzzy techniques to analyze specific data. In practice, FCM-based fuzzy clustering algorithms are the most prevalent [48,49]. These algorithms, such as K-means clustering, belong to a class of clustering algorithms with objective functions; they aim at minimizing the cost function shown as follows: Here, V � v 1 , v 2 , . . . , v c represents cluster center set, U � (μ ij ) denotes a fuzzy partition matrix, m ∈ [1, ∞] is a weighting factor that indicates cluster fuzziness, and μ ij ∈ [0, 1] represents the degree of membership of data point j within the cluster. The cost function J(U, V) is minimized by calculating the cluster centers as follows: Using the computed distance norms d ik � ||x i − v j ||, we update the fuzzy partitioning matrix, μ, as follows: where C represents the number of clusters in the equation above.

Convolutional Neural Network (CNN).
The basis of a convolutional neural network (CNN) is a series of convolution and subsampling processes conducted across multiple layers [50]. Subsequently, one or several fully connected layers follow the CNN. All CNN operations pass through three successive layers, as depicted below.

Convolution Layer.
The name CNN originates from the performed convolution operation. This process primarily contributes to the extraction of features from the input data. For instance, if the input is an image, the convolution operation extracts image features while maintaining the interpixel spatial relationship by learning the image features via small squares of input data (2-D filters). Convolution contributes to extracting the feature matrix when applied to text classification by maintaining a high-level word or phrase representation.

Pooling Layer.
A good practice is to decrease the number of trainable parameters when the input size is too large. The feature dimension must be decreased with no loss of significant information. Pooling layers periodically appear between subsequent convolution layers. Pooling (also known as down-sampling or subsampling) decreases the spatial size of the feature maps but preserves the most significant information. There are different types of spatial pooling, including max, average, and sum. In max-pooling, a spatial neighborhood (e.g., a 22-window) is defined, and the largest element of the rectified feature map within that window is extracted. Instead of the greatest element, the sum or average of all elements within that window can be taken for sum and average pooling, respectively. This article considers the max-pooling strategy.

Fully Connected Layer.
This layer is a classical multilayer perceptron with an output layer that employs a softmax activation function. The term "fully connected" indicates that every neuron in the preceding layer is connected to every neuron in the next layer. The convolution and pooling layers outputs high-level features from the input data. The fully connected layer uses these features to classify the input into different classes according to the training dataset. The majority of the features of convolution and pooling layers appear to be appropriate for classification.

LSTM-Based RNN for Prediction. Hochreiter et al.
proposed LSTM in 1997 [51] as an efficient recurrent neural network (RNN) architecture, extensively utilized in numerous fields. Additionally, LSTM is a well-known model for time series prediction that can handle long-term dependency data effectively. RNNs were designed to work in nonlinear time-varying problems [52]. The internal connections of an RNN allow signals to travel back and forth, making RNNs suitable for time series prediction. RNNs can mine rules from time sequences for predicting data that have not yet occurred. The features result from the feedback connections enabling weight updates based on the residual at each forward step ( Figure 1): (i) x t : is taken as the input to the network at time step t (ii) h t : represents a hidden state at time t and acts as "memory" of the network (iii) W: hidden-to-hidden recurrent connections parameterized by a weight matrix W RNN has demonstrated suitability for the specified problem [53]. However, RNNs are likely heavily impacted by  gradient vanishing, which can increase indefinitely and lead to network failure. Consequently, simple RNNs may not be optimal for predicting problems with long-term dependencies. LSTM was initially designed to deal with the problem of vanishing gradients in standard RNN when handling longterm dependencies. This section concludes with the longterm, short-term memory (LSTM) neural network. The LSTM model augments the RNN neurons with input, output, and forget gates. This structure can effectively circumvent the problem of vanishing gradients [54].
This capability makes LSTM a suitable architecture for problems that contain long-term dependencies. The main novelty of LSTM lies in the memory cell, which essentially acts as a state information collector. LSTM uses the forget gate to determine which information to discard from the cell state, as shown in Figure 2. The forget gate f t 's activation is calculated using a sigmoid function: The subsequent step involves determining which new information to store in the cell state. Initially, a sigmoid layer called the "input gate layer" decides what information must be updated. Subsequently, a tanh layer produces a vector c t containing new candidate values to be updated next: Afterward, the previous cell state c t−1 is updated to the new cell state c t . c t−1 is multiplied by f t to discard the information from the old cell. Next, i t * c t is added. The resulting candidates are scaled according to the amount of information that must be updated for each state value.
Finally, the output must be decided. There are two parts to this: first, a sigmoid layer is run as the output gate for filtering the cell state; next, the cell state is placed through tanh (.) and is multiplied by the output o t for computing the desired information: In Equations (10)

Proposed Method
We propose a hybrid convolutional network model for learning and predicting user engagement rates based on the FW-FCM and CNN-LSTM algorithms. We extract the effective attributes using FCM XGBoost methods. Selected data are clustered based on the attributes' weight, and our model learns similar features by employing a convolutional neural network. The output of this algorithm is the prediction of the level of the user engagement rate in a promotion. The overall process of our approach is shown in Figure 4.

Data Collection and Preprocessing.
This study aims at providing an analytical model independent of the dataset. Consequently, we evaluate our method using three datasets from different domains. The following is a detailed description of each dataset's selected features.

Taobao Dataset.
Taobao is an advertising dataset provided by Alibaba, containing eight days of ad clickthrough data (26 million records) randomly sampled from 1,140,000 users ( top-rated movie reviews for training and an equal number for testing. The issue is determining whether a particular movie review is positive or negative. The data were collected by Stanford researchers and published in 2011 in a paper in which the ratio of training data to testing data was 70 : 30 [57] (Table 3).

Implementation
The proposed FW-CNN-LSTM model's entire procedure is outlined below and illustrated in Figure 5:      Title features M1: Text title M2: A number to identify rows uniquely for a specific title ID M3: The more well-known title/the title on the promotional materials at the time of release M4: Title text sentiment analysis M5: Title language M6: Enumerated attribute set for the alternative title. One or more of the following: "DVD," "alternative," "TV," "festival," "working," "video," "imdbDisplay," and "original." New values can be added prospectively without warning. M7: Length of the title text M8: Number of keywords in the title Movie features M9: Video format/type (e.g., short, movie, short, TV series, tv episode, and video) M10: This feature represents the title release year. For a TV series, it is the start year of the series M11: Primary runtime (in minutes) M12: Genres related to the title include the number of hashtags, the hashtag's length, and the hashtag's text. In the IMDB database, movie title and actors who achieve the highest scores, and in the Taobao dataset, timestamp and brand ID achieve the highest scores. In addition, the supplement features are significant for prediction. This outcome is consistent with the data analysis results. The significant values of every feature have now been extracted. These will be employed as a priori knowledge for the next clustering algorithm.

FCM Clustering Based on Feature
Weight. This section improves FCM clustering by calculating the original cluster centers and employing a new distance computation method.    Journal The core idea underlying the selection of SDs is to determine the attribute weights using the XGBoost algorithm and compute the distance between the selected engagement rate, which depends on measuring various attributes with varying weights. This process comprises the following steps: (1) The predicting engagement rate is chosen as the initial center c 0 , which aims at minimizing the cost function below. (2) The subsequent center c j is selected, where c i denotes the furthest point from the cluster centers chosen previously, that is, c 0 , c 1 , . . . , c j−1 . Steps 1 and 2 are repeated until all the K centers are found. (3) The feature-weights are calculated via the XGBoost method. Next, a weight is attributed to every feature, giving them different importance levels. Assuming w p to be the weight related to the feature p, we present the norm as follows: (a) We assign each data point to the closest cluster.
(b) We update the clusters by calculating the cluster centroid again. The algorithm repeatedly executes steps (1) and (2) until convergence occurs.

Construction of the Feature Matrix.
Each feature in the text list is embedded in a 128-element vector trained through the backpropagation process. Figure 7 depicts a sample feature matrix created from the representation embedding model. The index values for each feature are displayed as an example in Figure 6. These values can vary with datasets. The values in the matrix are weights assigned randomly to the embedding layers adjusted via the backpropagation procedure.

The Convolution Layer.
In the convolutional layer, the input feature matrix is traversed by eight filters with a size of 7 × 1 and a stride of 1 to obtain the required features. We utilize multiple filters to extract different feature types. For instance, if a matrix with a size of 26 × 130 is traversed using a filter with dimensions of 7 × 1, the convolution process will deliver 8@26 × 130 features matrix. The matrix includes every local hidden feature, as shown in Figure 8. The input of the last layer is the output from the last layer in LSTM.

The Max-Pooling Layer.
After a feature matrix with a size of m × n is obtained from the convolutional layer, maxpooling is applied using a filter with dimensions of 2 × 1. In the max-pooling process, the highest feature value is selected at each filter position during the traverse. The resulting feature matrix has dimensions of m/2 x n. The process of max-pooling is performed independently for each convolution filter. The maxpooling layer's structure is schematically depicted in Figure 8.

The Sigmoid
Layer. The output feature vectors from the LSTM layer are transferred to a fully connected sigmoid layer to determine each category's probability distribution. This is mathematically expressed as follows: Here, D Psig (V j ) represents the probability distribution of category j and z j denotes the output related to category j. We use the sigmoid activation function to normalize the confidence score of the classifier between 0 and 1. After the probability distribution is obtained from the sigmoid layer, we apply binary cross-entropy as a loss function to calculate the disparity between the actual and predicted engagement labels [58]:

Journal of Electrical and Computer Engineering
Here, k represents the number of categories. Moreover, R(V i ) denotes the actual sentiment related to the text. This parameter can adopt discrete values belonging to the set T � {A, B, C, D, E}, with T being the label for the engagement rate. R(V i ) resembles the likelihood function, which seeks to minimize the difference between the probability distribution within the training set and that predicted by the model for the testing dataset.

Setting the Long Short-Term Memory (LSTM) Network.
The LSTM layer receives the output of the max-pooling layer for sequential left-to-right analysis of the created feature vectors. Because the significant local features were extracted at the max-pooling layer output, the LSTM network can examine long-term dependencies for global features. The LSTM layer output is flattened in order to reduce the   features. Then, it is passed through the fully connected CNN layer so that the actual engagement label is predicted.

Quantifying the Number of Hidden Layers in the LSTM.
Attention-based LSTM has varying output sequences and learning effects depending on the number of hidden layers. This section seeks to determine the optimal number of hidden layers for classifier performance. This section includes three experiments. The experiment variable equals the number of hidden layers. We compute the time consumption and precision of the model by keeping the other variables unchanged and setting the hidden layer number to 1, 2, and 3. The number of hidden layer neurons in all three experiments is 128. The results are displayed in Table 4. As shown in the table, there is a strong correlation between the number of hidden layers and time consumption. The more hidden layers, the more time required to train identical data. The reason is that the number of neurons that must be trained grows exponentially as the number of hidden layers increases. An increase in parameters will result in increased time and resource consumption. However, there is no strong correlation between precision and the number of layers; increasing the number of layers will not improve the model's precision. Consequently, more hidden layers are not necessarily better. As shown in the table, there is a moderate time consumption for a hidden layer number of 2. Most importantly, the two-layer neural network has the highest precision. Therefore, the LSTM models listed below employ two hidden layers.

Quantifying the Number of Neurons in the LSTM.
The previous section's experiment determined the number of model layers by adjusting only the number of hidden layers. Nevertheless, for convenience, the experiment assumes the number of neurons to be 128 in each layer. In this section, after determining the number of hidden layers in the model, we identify the number of neurons in each layer through experiments to improve the attention-based LSTM formulation. Since there are two hidden layers in LSTM, the number of neurons must be tested in the network with two layers. We designed three sets of tests with neuron numbers of (128, 128), (256, 128), and (128, 64). The time consumption and accuracy of each set of tests were computed, and the results are presented in Table 5.
As shown in the table, the training time is only slightly affected by the number of neurons. Consequently, the current experiment utilizes the group of parameters with the highest precision, corresponding to 128 hidden layers.

The Performance Evaluation Parameters
We randomly split the dataset into training (70%), validation (20%), and test (10%) sets. The validation set was used for tuning hyper-parameters, and the final performance comparison was conducted on the test set. Figures 9 demonstrates the efficiency of the proposed model in terms of accuracy and the number of epochs. The learning of the proposed network was overfitted on the Taobao dataset (with more than 100 epochs). Also, the presented method has a good fit in 150 epochs on the IMDB dataset and 100 epochs on the Instagram dataset. In this study, we have employed the dropout (0.3) method after the convolutional layers to avoid overfitting. Then, to determine the proper optimization algorithm for our proposed model, the standard methods of Adam, RMSProp, and SGD were compared based on their cost function, and the Adam algorithm was selected with a learning rate of 0.0001, which obtained better results.
All the experiments in this article used the Python programming language, and PyCharm was utilized as the development environment. The FW-CNN-LSTM model was compared to the other machine learning models, such as CNN, logistic regression, Naïve Bayes, and SVM, for validation purposes. The proposed algorithm was evaluated regarding the F-measure, recall, precision, and accuracy obtained from the confusion matrix. In the following, the criteria set in line with the research objective, identifying and suggesting suitable users for targeted advertisements, are introduced.
The precision criterion indicates how accurately the algorithm predicts user engagement rates. The following equation shows how this criterion is calculated: The precision criterion has several weaknesses. For instance, it does not differentiate the prediction error of the engagement rate of included users from that of excluded users. The precision and recall criteria were employed to solve this problem. The main focus of the precision criterion is on positive classifier outputs and the extent to which the engagement rate is correctly predicted in positive responses. The following equation shows how this criterion is calculated: The recall criterion is used to verify the algorithm's coverage in introducing users to be selected for targeted advertisements. In this criterion, the main focus is on the prediction error of included users' engagement rates. The recall criterion is more important in issues related to sending  targeted advertisements. The following shows how this criterion is calculated: The values of recall and precision can be significantly different, in which case the best algorithm is selected using the following equation from the average recall and precision values under the criterion F-measure:

Analysis and Discussion
To evaluate the scalability of the proposed method for large datasets, the presented method was assessed based on various ratios of training data (0.1, 0.2, 0.3, . . ., 1) and their corresponding training time. Figure 10 illustrates this process. As can be seen, increasing the ratio of training data from 0.1 to 1 increased the training time from 97 seconds to 675 seconds. However, the values for actual training time in various data ratios are relatively linear, which confirms the scalability of the proposed model for large datasets.
We demonstrate the efficacy of the proposed FW-CNN-LSTM model by comparing it to classical machine learning techniques widely used today, such as CNN, Naïve Bayes, logistic regression, and SVM. The comparison indicators are the F-measure, precision, recall, and accuracy. Table 6 presents a comparative analysis of the Taobao dataset. As shown in the recall criterion, the FW-CNN-LSTM algorithm could recognize over 91% of the users suitable for targeted advertisements, meaning that less than 18,400 of the customers were not covered by the proposed method. Regarding the precision criterion, the proposed method (FW-CNN-LST) successfully selected 94% of the users suitable for targeted advertisements, which is acceptable compared to other methods. The Naive Bayes algorithm also performed acceptably in the precision criterion. The F-measure indicates that the FW-CNN-LSTM algorithm outperformed the other examined methods. The FW-CNN-LSTM algorithm successfully predicted the customer engagement rate (91%) in the accuracy criterion, displaying its superiority.
In the IDMP dataset, the Naive Bayes algorithm displayed the best performance (97%) in the recall criterion. Thus, this algorithm could optimally include the users suitable for targeted advertisements. However, the algorithm displayed 84% precision by wrongly sending advertisements to 870 users. The FW-CNN-LSTM algorithm outperformed the other algorithms by displaying 96% precision. The algorithm has assigned advertisements to users more accurately. Nevertheless, the algorithm covers a smaller range of users (about 87%). Given the significant difference in the performance regarding the precision and recall criteria, the proposed methods can be compared more accurately using the F-measure criterion. The F-measure indicates that the FW-CNN-LSTM algorithm outperformed the other examined methods. In the accuracy criteria, the FW-CNN-LSTM method successfully predicted the user engagement rates by 91%, outperforming the other algorithms (Table 7).
In the Instagram dataset, the FW-CNN-LSTM method could optimally (93%) cover the selected users for targeted advertisements in the recall criteria, failing to recognize only 79 included users. The CNN algorithm displayed the best performance (92%) in the precision criterion. The FW-    CNN-LSTM method displayed 91% precision by sending 108 advertisements to excluded users. It can be stated that regarding the accuracy criterion, the proposed method outperformed other methods by successfully predicting the user engagement rate by 90% (Table 8).
The overall performance of the proposed algorithm was assessed and ranked using Friedman's test. Friedman's test computes the adjusted chi-square (χ 2 ) between the proposed algorithm and the other algorithms [59]. The larger the adjusted χ 2 , the more different the ranks. This study considered a significance level of 0.05. If the p-value is smaller than 0.05, the null hypothesis (the mean ranks of the variables are not significantly different) is rejected. Consequently, the variables have different ranks. Tables 9-11 present the Friedman test's ranking results on all datasets.
According to Friedman's test, the FW-CNN-LSTM algorithm with the highest rank on datasets outperforms in the evaluation criteria compared to the other classification approaches. Logistic regression in the IMDB dataset, the CNN algorithm in the Instagram dataset, and the SVM algorithm in the Taobao dataset are of the ranks presented in Tables 9-11. Figure 11 displays the performance of the presented algorithm on the top-k users with a high engagement rate. The figures indicate the superior performance of the proposed algorithm compared with the other algorithms. The result showed that an increase in the number of predicting level of engagement will decrease the accuracy criterion. Accordingly, it can be concluded that sending advertisements for the first k primary users can be significantly    effective, and the users will be more likely to interact with the advertisements.

Conclusion and Future Works
This article proposes a hybrid convolutional method, termed FW-CNN-LSTM, to predict the user engagement rate. The algorithm selects SDs by first defining the attribute weights with XGBoost and then estimating the distance between the selected engagement rate and the one that calculates various attributes with varying weights. The FCM algorithm then combines the feature-weight and XGBoost distance measures into a single cluster for further projection. Subsequently, the selected comparable featurebased data are utilized as an input into a hybrid convolutional CNN-LSTM network to predict the engagement rate. In addition, we develop a model capable of producing accurate results regardless of the dataset or domain. As a result, our algorithm is evaluated on three distinct databases. The IMDB, Taobao, and Instagram datasets are associated with the film, advertising, and food industries, respectively. Accuracy, recall, F-measure, and precision are used to analyze the results of basic methods, such as SVM, Logistic regression, Naïve Bayes, and CNN. The presented algorithm achieved a 0/9056 accuracy level on the Instagram dataset, a 0/9107 accuracy level on the IMDB dataset, and a 0/9166 accuracy level on the Taobao dataset. Consequently, the results demonstrated that our algorithm outperforms others. According to the findings, hashtag, brand ID, movie title, and actors who achieve the highest scores, and the values for actual training time in various data ratios are relatively linear, which confirms the scalability of the proposed model for large datasets. Our method analyzes the user engagement rate with advertising content, resulting in more targeted advertisements and less spam on social media platforms. It is suggested that relational features and post links be utilized to enhance our context-based strategy and predict and analyze the user engagement rate with advertisement content.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no conflicts of interest.