Ensemble Machine Learning Model for Classification of Spam Product Reviews

,


Introduction
Product reviews refer to individual's feelings or views about certain products/services delivered to particular firms companies.In the modern technological world, online product reviews occupy a central place in the product evaluation process for a company and its customers.ese reviews serve as feedback for a company to improve product quality, plan, and monitor its business ventures that result in boosting its productivity and profit.
ese also help customers in the right selection of products in less time and effort.Sometimes companies make spam reviews on products to enhance their sales.Detection of these reviews is a challenging task in NLP.An automatic approach is therefore needed to detect spam reviews on products and will allow users to quickly see spam and nonspam product reviews.
Opinions like online product reviews provide the key source of information for consumers to guide them in purchasing products of interests.Customers post reviews about such products that serve feedback by mentioning their good and bad experiences [1].ese experiences have also great impact for businesses in the near future.As expected, these also provide the ways and means to modify decisionmaking of customers by posting misleading and false reviews. is unethical practice is termed opinion spamming, where spammers put down false reviews/opinions to either attract more customers or harm the reputation of a business or merchandise [1].
Bogus reviews/opinions can be classified into three groups: first, false reviews whose objective is to place bogus and untrusted details about a product to either harm or enrich its reputation; second, reviews that focused on products without mentioning any experience with other products; and third, nonreviews and advertisements comprised of text only and indirectly linked to the product [1].
e first group is very technical and problematic in identification, while the other two require less effort.e writer of these reviews/opinions is either a single spammer that works for a business or a cluster of spammers that exert effort in collaboration to get the end-results.
Google also pointed out concerns of fake reviews in an official report and clearly directed the innovators and users to not purchase and receive payments from firms that make available false reviews [2].In certain countries, authorities have initiated actions against firms using false reviews to exaggerate their products.For example, a Canadian telecommunication company was penalised $1.25 million for posting fake encouraging reviews about its products.In addition, reputation of the CNN application was badly damaged by numerous adverse fake reviews that declined its rating and status in the Apple App Store [2].In past, numerous efforts have been attempted to detect opinion spams.e researchers in [3] applied a machine learning methodology by using logistic regression to spot opinion spams.
e authors in [4][5][6] adopted a supervised machine learning approach to notice spams.Other approaches [7,8] proposed a mix approach by combining supervised and semisupervised learning to identify spam opinions.Despite these efforts, numerous shortcomings were noticed in the machine learning approaches.For instance, the use of numerous features is computationally expensive and provides poor flexibility and less accurate results.
is study attempts to use ensemble machine learning approach that combines the prediction from three classifiers, namely, Random Forest (RF), multilayer perception (MLP), and K-Nearest Neighbour (KNN), to improve the classification accuracy of spam product reviews.e selection of the three classifiers is made based on empirical analysis.e proposed ensemble machine learning model functions in the following way: At first, we extract 25 features from mobile application reviews of Yelp Dataset and represent the product reviews as feature vector.Too many features affect the performance of the model and not all features have the same contribution in predictive model, so nonvaluable features need to be filtered out.erefore, we employ three different feature selection methods: Chi-square, Univariate, and Information Gain, to select the ten best features.Finally, for the task of spam review classification, the proposed ensemble model is used to classify the reviews as spam or real (nonspam).
e effectiveness of the ensemble model is compared with the boosting approaches (XGBoost, GBM AdaBoost, and GBM Gaussian) that are benchmark techniques for this study.e contributions of this study are as follows: (1) To propose an ensemble learning model for classification of spam product reviews (2) To evaluate the ensemble classification model with all features extracted from spam product reviews in terms of classification accuracy (3) To evaluate the effectiveness of the ensemble classification model with best features obtained using three feature selection techniques e paper is planned as follows: Detailed literature is given in Section 2. Suggested ensemble mode is highlighted in Section 3. Numerous experimental results and a full discussion are given in Section 4. At the end, conclusion and future directions are presented in Section 5.

Related Work
Product reviews about user feedback and users satisfaction about particular products are mostly considered by customers to assess a particular product or service.ese reviews help consumers in decision-making process of purchasing products.
ey also serve as guidelines for businesses to assess their future customers.Customer reviews related to mobile applications (apps) in the Google Play store use star ratings to indicate the quality of apps to other consumers.In addition, they serve as a tool for producers or manufacturers of such applications to upgrade their products/services to attract more customers [9].In fact, product reviews have advantages for both the consumer and the producer; their role has still been reduced by spam reviews.
e existence of such unethical reviews is a stumbling block in the way of online businesses, either illegally boosting their business profits or destroying their product reputation.e identification and classification of spam reviews is the need of the day to safeguard the trust of customers.Various studies [9,10] have presented a few techniques to identify spam reviews.
Technological advances have made it possible for online companies to sell mobile applications through playstores.Mobile users not only purchase products/applications from these playstores but also have the facility to post reviews that express their views on these products.Some of these product reviews/opinions are spams posted by fake users for their vested interests.e authors in [11] concentrated on fictitious opinions on crowdsourced fake review dataset by using n-gram-based classifiers to identify spam reviews.e study 2 Complexity in [12] also focused on detecting fictitious opinions and compared the behavioral features of spam reviews with factual-life Yelp Dataset through Support Vector Machine (SVM).Compared to simple n-gram-based approach, the results of approach relying on behavioral features improved the accuracy rate of spam classification.Both approaches are tested with generated datasets that is, fictional opinions that are powerless to depict spams finding in real-life product reviews.e authors in [13] used ranking-based approach to depict real-life product reviews and discovered the reviews burstiness in spotting opinion spammers.In order to single out spammers and legitimate users, they used Markov Random Field (MRF) and Loopy Belief Propagation (LBP) techniques to study the data.e authors in [14] presented a good spam review detection method by using rating deviation of the review, reviewer's liveliness, and content-based information by adopting time series.e main shortcomings of this approach were high computational processing time and less competence in translating the semantic senses and information that are incorporated in the text of reviews.e study in [15] presented a model called the Fraud Eagle, which identified the network and graph associations of fake product reviews using [15] the iterative propagation-based classification approach.It is noted that the Fraud Eagle framework magnificently detected fake product reviews on online review site.e research work in [16] introduced the SPEAGLE framework that works on information collected from the review metadata like timestamps, ratings, texts, and network review information to identify spam product reviews.
In order to detect spam reviews in Chinese language review site, DianPing, the study in [17] attempted two approaches: the KNN and generic graph-based approaches.
e results of these approaches have shown that the behavior of reviewers also helps to detect spams reviews.To detect fake and unknown reviews in the same research area, the study introduced another methodology entitled Positive-Unlabeled (PU) to identify false opinions by using supervised learning approach.Unknown reviews may be fake or genuine but fake reviews are always pretended.To extend these studies, researchers in [7] proposed another approach called mixing population and individual property method by using a novel semisupervised model that is functional to the PU learning (MPIPUL) method.It is noted that the PU learning operated well in balanced datasets but was not confirmed on imbalance datasets.e main problem that arises in these approaches is that the reviews are languagespecific, that is, in Chinese language.
e current methodologies to detect spam product reviews/opinions undergo imbalance datasets; that is why the results may not be fully trusted.erefore, the authors in [14] proposed another spam review detection method by using rating deviation of the review, reviewer's liveliness, and content-based information via time series.is was a good approach [14] for spam detection but it suffers from high computation processing time and is poor in interpreting the semantic values and information included in the text of reviews.erefore, the study in [6] proposed a neural network approach that combines the recurrent neural network and convolutional neural network to know the unbroken document level depiction of the reviews.Compared to the discrete models, results of this work revealed that it has superior generalization ability.
e process of spam detection includes spotting the users' accounts from where spamming activities are performed for malicious objectives.e various detection approaches like ngram, and linguistic, and pattern-based are unable to detect well-equipped spammers who write their reviews in a manner that seems real.erefore, the study in [18] introduced an approach based on heterogeneous graphs to catch and associate the connection existing among the reviewers and the reviews. is methodology is exempted from the use of any textual content from the reviews and can increase the chances to identify opinion spammers in a better way.e research work in [19] concentrated on network footprints and presented a two-step approach to detect spammer's products and groups.is approach comprises two main modules, that is, Network Footprint Score (NFS) and GroupStrainer.e results showed that this methodology surpassed those approaches that had studied iTunes and Amazon datasets in spam detection with a high accuracy rate.
e authors in [20] used Naïve Bayes, max entropy, support vector model (SVM), and RF techniques for the iPhone mobile review dataset collected from Kaggle.Part of speech (POS) tagging and vectoring features are exploited to detect spam reviews.e best accuracy was given by RF. e authors in [9] used sentiment analysis as a feature for movies reviews dataset to detect spam reviews.Naïve Bayes provided improved performance than other machine learning classifiers.e authors in [21] used Naïve Bayes (NB), SVM, KNN, and Decision Tree (DT) for classification of movies products reviews via sentiments analysis stop words or without stop words used as features vector space or feature vector.e authors in [22] used Count Vectorizer and TF-IDF features using the SVM classifier on MTurk and Yelp Amazon Dataset of different product reviews.e study in [23] used logistic regression, Naïve Bayes, RF, SVM, and deep neural network classifier on the dataset of Amazon product reviews using TF-IDF features and found that deep neural network performs better than other machine learning classifiers.e authors in [24] developed an automatic system to identify rumours in online business reviews by classifying them as rumours and nonrumors using several machine learning classifiers.
In recent research studies, researchers also exploit the capabilities of supervised boosting approaches based on statistical features to achieve good and result-oriented accuracy in detecting fake product reviews.To make better the uncovering of opinion spams in cellphone application playstores, the authors in [25] proposed a methodology based on statistical features that are modeled via supervised boosting techniques like the GBM and XGBoost and to appraise two polyglot datasets of English language and Malay language.e appraisal of this study highlighted that XGBoost is utmost appropriate for spotting spam opinion in the English language dataset; on the other hand, the GBM Gaussian is appropriate for the Malay dataset, and, compared to other approaches, statistical-based features had attained better correctness rate for both datasets.We Complexity propose an ensemble supervising machine learning approach for classification of spam products reviews.e detailed framework/structure of this methodology is given below.

Proposed Methodology
is section presents methodology of the suggested ensemble model for spam product reviews classification, as shown in Figure 1.
e methodology consists of three phases, preprocessing, feature extraction, feature selection, as well as classification model for spam product reviews.
3.1.Preprocessing.In computational linguistics, data preprocessing is a vital step in cleaning the unwanted data, so that the cleaned data can be efficiently used before any further processing or providing it as input to the system. is phase comprises sentence segmentation, tokenization, stop word removal, and words stemming, which are discussed below.

Sentence Segmentation.
It is used to detect text boundary and split the text into sentences.Commonly, exclamation (!), interrogation (?), and full stop (.) signs are used as indicators to segment the text.
For instance, we have the following text: "I purchased this product.It is the finest product available in this marketplace."We will get the two following sentences after segmentation: Input product review text: "I purchased this product.It is the finest product available in this marketplace."Output: Segment 1: "I purchased this product."Segment 2: "It is the finest product available in this marketplace."3.1.2.Tokenization.At this step, the sentences are divided into distinct words by dividing them at whitespaces like tabs, blanks, and punctuation signs, that is, dot (.), comma (,), semicolon (;), colon (:), and so forth.ese are the main indications for dividing the text into tokens.

Stop Words Removal.
Words that repeatedly occurred in a sentence are called stop words.ese consist of prepositions (in, on, at, etc.), conjunctions (and, also, thus, etc.), articles (a, an, and the), and so forth.ese words have little meaning in the text documents, and removing them from the text will help to improve the system performance.

Word Stemming.
Word stemming plays a vital role in preprocessing.In order to capture the related concept, this step changes the derived words to their base or stem words.
e renowned stemming algorithm, Porter's stemming (Porter, 1980), is adopted to remove suffixes like -ing, -es, and -ers from the text words.For example, the words "rising" and "rises" will be changed to the base form "rise" after stemming.

Features Extraction.
Features play a significant role in text classification problems.e purpose of this step is to mine features from review datasets for product reviews classification problem.In this study, we extracted 25 features from mobile application reviews of Yelp Dataset.Almost all of these features are statistical and can be calculated directly from text.
e proposed ensemble and other benchmark boosting approaches are tested with all 25 features for the task of spam reviews classification, as shown in Table 1.e description of all these features is presented in Table 2.

Features Selection.
It is generally not a good idea to use all 25 features (in our case) to classify product reviews as spam or nonspam, since all features do not have the same relevance in constructing a reliable and accurate predictive model.Some features are valuable and contribute more to model prediction, while others are less valuable and have a serious impact on the effectiveness of the model.Moreover, the relevant and valuable features avoid overfitting, enhance accuracy, and lessen the training time of the predictive model.In order to address this issue, we exploited three feature filtering techniques, that is, Chi-square, Information Gain, and Univariate, to diminish the features space size in order to obtain optimal features as discussed in Section 3.5.Using these feature selection techniques, ten important and relevant statistical features are selected out of twenty-five from mobile application reviews of Yelp Dataset.
To get rid of nonvaluable and extra features, three features selection techniques (Chi-square, Univariate, and Information Gain) are adopted.Table 3 demonstrates that Chi-square technique selected the ten most salient features for the Yelp Dataset.Univariate selection technique chose the ten best features from the same dataset, as illustrated in Table 4. Finally, Table 5 depicts the ten best features selected using Information Gain.e next section highlights the fact that all classifiers with selected optimal features have performed well in comparison to all features.More specifically, given the reduced optimal features, the proposed ensemble model outperformed all the classifiers, as discussed in Section 4.2.

Classification Model for Spam Product Reviews.
e focus of this phase is to categorize the product reviews as spam or real (nonspam), using ensemble learning model.Ensemble learning assists in enhancing the results (outcomes) of machine learning by integrating numerous models.is approach allows the creation of an improved predictive model compared to a single model.
In this study, Simple Majority Voting Ensemble or Voting Classifier has been employed to combine the predictions from multiple machine learning algorithms (MLP, Random Forest, and KNN) in order to get an improved combined result.Once the Voting Classifier has been trained, it can be used to predict the label of new instance 4 Complexity based on majority vote of contributing models.In order to evaluate the effectiveness of the individual models and ensemble model, initially, we train and test the individual models on the product review dataset using 10-fold cross validations.en, we trained our proposed ensemble classifier on the same review dataset using the 10-fold cross validation.MLP, RF, and KNN are state-of-the-art algorithms and have been proven to be very effective in addressing text classification problems.RF is normally used as baseline in text classification problems by researchers.It is an ensemble learning approach for the classification job and operates by creating a number of decision trees at training time and predicts the most frequent class decided by the contributing decision trees.e KNN algorithm works by calculating the distance (given in equations ( 1)-( 3)) between a query and all examples in the data, picking the specified number of examples (K) that are nearest to the query [9].
In classification problems, different K values in KNN algorithm will yield different classification results; however, a good value of K is determined by conducting the experiment several times with different values of K and then choosing the one that gives good classification results.
RF works by developing a number of decision trees at training time and predicting the most frequent class decided by the contributing decision trees.RF employs Gini Index and Entropy for classification purposes, which are given in the two following equations: MLP, colloquially, is often referred to as neural networks known as "vanilla," particularly when having only one hidden layer [10].As mentioned earlier in this section, this

Datasets for Evaluation.
e proposed ensemble model is assessed on Yelp Dataset [14], which is a publicly available dataset that contains English reviews/opinions from several hotels and restaurants.is dataset is widely used in spam reviews detection problem.e dataset contains a total of 2526 opinions/reviews taken from Yelp's hotel reviews.It includes 389 spam and 2136 normal opinions.
For the task of spam reviews classification, we also evaluated the proposed ensemble classifier with the benchmark models [11] in terms of performance metric, that is, classification accuracy.
e benchmarks model used boosting techniques such as XGBoost, GBM AdaBoost, and GBM Gaussian classifiers and the proposed ensemble model combined the predictions from machine learning classifiers such as KNN, RF, and MLP.

Evaluation Results and Discussion
. First of all, the preprocessing methods are applied over the given dataset to split the review text into sentences, tokenize the sentences into words, and to remove the stop words.Word stemming is then performed on the rest of words.Initially, we extracted all 25 statistical features from Yelp Dataset for the task of spam review classification.We evaluated the effectiveness of the proposed ensemble approach, the individual classifiers (MLP, RF, and KNN), and other benchmark boosting approaches for the spam review classification task.We know that all features do not have the same significance in constructing a reliable and accurate predictive model.Some features are valuable and contribute more to model prediction, while others are less valuable and adversely affect the model performance.In order to get rid of nonvaluable and extra features, three features selection techniques (Chi-square, Univariate, and Information Gain) are adopted to extract the top 10 features from review dataset.In order to investigate the impact of reduced optimal features set on classification accuracy, the proposed ensemble learning model, the individual models, and boosting approaches are tested again using the top 10 best features obtained using mentioned feature selection techniques.
To achieve the spam reviews classification job, the RF, KNN, and MLP classifiers are adopted to classify the reviews as spam or nonspam.e training and testing of all the classifiers including RF, KNN, and MLP, the proposed ensemble model, and the boosting approaches (GBM Gaussian, XGBoost, and GBM AdaBoost) are performed using stratified 10-fold cross validation (SCV).In stratified 10-fold cross validation (SCV), the folds are picked in such a way that each fold contains roughly the same number of class labels.
Classification results of all classifiers adopted in this work, with all features and features chosen by means of 3 selection techniques, are presented in this section.e individual models such as RF, KNN, and MLP, ensemble model (RF, KNN, and MLP), GBM Gaussian, XGBoost, and GBM AdaBoost are used in this study.At first phase, the performances of individual models such as RF, KNN, and MLP, ensemble model, and boosting models for all 25 features of Yelp Dataset are compared.e results in Table 1

Complexity
show that the proposed ensemble model has the highest accuracy of 88.13% as compared to other classifiers.At the second phase, feature space is reduced by using the Chi-square feature selection method and the top 10 best features are selected for Yelp Dataset as given in Table 3.Given the 10 best features, all the classifiers are applied on Yelp Dataset for the spam review classification task.
e classification outcomes, given in Table 6, demonstrate that the proposed ensemble model achieved accuracy of 89.26% and performed better than individual models and other boosting approaches on ten best features selected using Chi-square features selection technique.
Referring to the results given in Table 6, the proposed ensemble model achieved the highest accuracy of 89.26 percent, RF obtained the second highest accuracy of 85.72 percent, the accuracy of GBM AdaBoost was 85.59 percent, XGBoost accuracy was 85.03 percent, and GBM Gaussian got the lowest accuracy of 84.74 percent.Among the individual models, RF achieved the highest accuracy, which is even better than boosting approaches.However, MLP obtained the lowest accuracy of 84.50 percent.
At the third phase, Univariate features selection method is utilized to filter the feature space in order to choose the top 10 best features for Yelp Dataset as shown in Table 4.For the spam review classification task, and given the 10 best features, all the classifiers are applied to Yelp Dataset.e classification outcomes, given in Table 7, reveal that the proposed ensemble model attained accuracy of 88.70% and performed better than individual models and other boosting approaches on ten best features selected using Univariate selection technique.
Similarly, at the fourth phase, Information Gain is used to select the 10 optimal features from all features of Yelp Dataset, as shown in Table 5.
For the spam review classification task, and given the 10 best features, all classifiers are applied to Yelp Dataset.e classification outcomes, for the task spam review classification using 10 best features, are given in Table 7, revealing that the proposed ensemble model achieved accuracy of 88.13% and performed better than individual models and other boosting approaches.
Figure 2 depicts the classification accuracy of the proposed ensemble model and other benchmark classifiers on all features and top 10 features obtained using Chi-square, Univariate, and Information Gain selection techniques.
From the results shown in Figure 2, we observed that accuracy of classifiers either improved, remained unaffected, or slightly downgraded with the best selected features compared to the accuracy of classifiers obtained using all features.It is also noteworthy that the proposed ensemble model (RF, KNN, and MLP) surpasses all the benchmark classifiers on top 10 best features obtained using aforementioned selection techniques.From the empirical results given in Tables 1 and 6

Conclusion and Future Work
tSpam product review classification is a difficult task in the area of opinion mining.Numerous research efforts have been attempted to address this issue.However, in this study, we present an ensemble model that combines the predictions from MLP, KNN, and RF to classify product reviews as spam or nonspam.For the task of spam review classification, we studied the impact of all 25 statistical features on the proposed ensemble model, the individual models, and other boosting approaches.We found from the empirical results that the proposed ensemble model outperformed all classifiers in terms of classification accuracy.In next step, we employed feature selection techniques (Chi-square, Univariate, and Information Gain) to extract the top 10 features from the reviews dataset.e performances of the proposed ensemble model and other classifiers are evaluated using 10 best features obtained using the three selection techniques; and we found from the experimental outcomes that the ensemble model surpassed all the classifiers in terms of accuracy for the task of spam review classification achieved on Yelp Dataset.Hence, it is verified from results that the proposed ensemble approach is superior to other algorithms such as boosting approaches like Extreme Gradient Boost (XGBoost), the Generalized Boosted Regression Model (GBM), and AdaBoost Regression Model.In the future, we want to explore the deep learning approach and longest short-term memory with weighted TF-IDF embedding for the task of spam review classification.

Figure 1 :
Figure 1: Proposed ensemble model for spam product reviews classification.

Table 1 :
Classification results using all features on Yelp Dataset.

Table 2 :
All features in Yelp Dataset.

Table 3 :
Ten best features chosen by Chi-square for Yelp Dataset.

Table 4 :
Ten best features selected by Univariate for Yelp Dataset.

Table 5 :
Ten best features selected by Information Gain for Yelp Dataset.

Table 6 :
Classification results using top 10 features selected by Chisquare.

Table 7 :
Classification results using top 10 features selected by Univariate.
Maximum accuracy achieved is highlighted in bold.
-8, we concluded the following: (1) e accuracy of the proposed ensemble model improved with best features obtained using Chi-square and Univariate selection techniques, while it remained constant with IG (2) e accuracy of the GBM Gaussian remained constant on all feature selection techniques (3) e accuracy of XGBoost and GBM AdaBoost either remained constant or slightly downgraded on best features (4) e accuracy of RF and KNN classifiers either improved or slightly downgraded on best features, Classifiers accuracy on top 10 features set obtained using three feature selection techniques (Chi-square, Univariate, and Information Gain) for Yelp Dataset.Overall, the classification accuracy of the proposed ensemble model is superior to those of all individual models as well as other boosting approaches