A Comparative Analysis of Fraudulent Recruitment Advertisement Detection Methods in the IoT Environment

. The growth of the Internet of Things has changed the way of job hunting. Online recruitment has gradually replaced the traditional o ﬄ ine recruitment mode. Some unscrupulous people use online recruitment platforms to publish fraudulent recruitment advertisements, which not only bring ﬁ nancial and reputational losses to job seekers but also harm the sustainable development of society. However, previous studies have not used uni ﬁ ed evaluation metrics and datasets, and detecting fraudulent recruitment advertisements lacks systematic research. To resolve this problem, this paper selects four representative traditional learning methods (i.e., random forest, support vector machine (SVM), logistic regression, and Naïve Bayes) and three deep learning methods (i.e., TextCNN, gate recurrent unit (GRU), and bidirectional long-short-term memory (Bi-LSTM)), which perform good in natural language processing (NLP) and use the same evaluation metrics and datasets conducting comparative experiments on balanced and unbalanced datasets, respectively. The experimental results show that the TextCNN method achieves the best detection performance with relatively low energy consumption on the balanced dataset. All the metrics values are more signi ﬁ cant than 0.93. On unbalanced datasets, the TextCNN method still performs best with increasing imbalanced proportion.


Introduction
With the development of the Internet of Things (IoT), people can quickly get information from electronic devices. At the same time, the primary recruitment method in the labor market has rapidly shifted from offline to online, and getting recruitment information from the Internet has become a mainstream way. IoT has changed the inefficient way of finding a job. Online recruitment has the advantages of effectiveness, easiness, and efficiency [1]. However, some unscrupulous people use the network platforms' weakness to post fraudulent recruitment advertisements on the Internet to deceive money and exploit labor in the name of recruitment. Some fake recruitment has evolved from fraud to violent robbery, threats, restriction of personal freedom, and other serious violations [2]. Fraudulent recruitment advertisements have become a nationwide social nuisance. According to the 2017 China Internet Users' Consumer Protection Rights Report (https:// wenku.baidu.com/view/f50682067ed5360cba1aa8114431b90 d6d85894e.html), among all the fraud cases reported by protection rights, fraudulent part-time work was the most frequently reported type of fraud (accounting for 22.1%), and most of the fraudulent recruitment occurred in well-known recruitment platforms. Data from China's Justice Big Data Service Platform (http://data.court.gov.cn/pages/research .html) shows that fraudulent recruitment cases increased yearly. Data from Ai Media Consulting through the 2019 China Internet Recruitment Industry Market Research (https://www.iimedia.cn/c400/63879.html) shows that among the various bad experiences on the online platforms, the most situation that job seekers minded was that the enterprise information was not true (accounting for 34.8%). The second was personal information leakage (accounting for 31.8%). It is thus clear that the detection of fraudulent recruitment advertisements is a critical problem to be solved urgently. Detecting fraudulent recruitment advertisements based on data generated by the IoT system not only helps safeguard the rights and interests of job seekers but also promotes economic growth and creates a green IoT environment. Figure 1 shows the detection process for fraudulent recruitment advertisements in an IoT environment. However, this area is still a relatively untapped field, not receiving much attention from the academic community. In addition, detecting fraudulent recruitment advertisements among legitimate ones is a technically challenging problem [3]. Most research on fraudulent recruitment advertisements is carried out from the theoretical aspect. For example, Rubin [4] analyzed the causes and countermeasures of commercial fraud through advertising from the perspective of information economics, stating that deception was manipulating information to gain advantages. From a legal perspective, Jiang [5] puts forward the slogan "Taking the law as a guarantee, strengthening advertising supervision functions, increasing rectification efforts, and cracking down on false and fraudulent advertisements." The technology for detecting fraudulent recruitment advertisements is limited and immature. According to the methods adopted, the limited existing studies on the detection of fraudulent recruitment advertisements generally can be divided into three parts: traditional learning-based detection methods, traditional learning + feature extraction-based detection methods, and deep learning-based detection methods. Traditional learning-based detection methods mainly use traditional learning algorithms to detect fake job advertisements. Traditional learning + feature extraction-based detection methods mainly use feature extraction methods to improve the performance of traditional learning algorithms. Deep learning-based detection methods use various deep learning algorithms to detect employment frauds without feature engineering. The similarity of these detection methods is to identify the implicit fraud patterns in data. In particular, existing research involved different detection methods, different feature extraction methods, different evaluation metrics, and different datasets. Therefore, it is necessary to study the detection of fake recruitment advertisements systematically. To conduct a systematic study on detecting fake recruitment advertisements, this paper selects seven algorithms with good performance in NLP. It conducts systematic and comprehensive experiments using the same evaluation metrics and datasets. The main contributions are summarized as follows.
(1) The existing detecting methods of fraudulent recruitment advertisements are described and analyzed in detail (2) A comparative analysis of the existing work is carried out experimentally using the same dataset and evaluation metrics. Seven algorithms are used to conduct comparative experiments on the public employment fraud detection dataset, and the experimental results are analyzed in detail (3) The experimental results show that the TextCNN of the deep learning methods outperforms all other compared methods in the accuracy, precision, recall, and F1-score. In terms of time performance, though the training time of TextCNN is much higher than the traditional learning methods and the traditional learning + feature extraction methods, its testing time is similar to the ones of SVM and SVM+TF-IDF, which is acceptable for IoT devices. Therefore, comprehensively considering the accuracy, precision, recall, F1-score, and testing time, the TextCNN method performs best among all these compared methods (4) This paper has a specific reference value for further research of higher performance and lower energy consumption detection methods to achieve the goal of green IoT The organization of this paper is as follows. Section 2 reviews the relevant research on fraudulent recruitment detection. Section 3 introduces the representative methods. Section 4 presents the setting of experiments and the analysis of experimental results. Section 5 is the summary of the work of this paper and the prospect for the future.

Related Work
This section reviews the related work from three categories (i.e., the traditional learning-based detection methods, the traditional learning + feature extraction-based detection methods, and the deep learning-based detection methods) that we classify in Section 1 in detail.

Traditional
Learning-Based Detection Methods. Some researchers used traditional learning-based methods to learn some rules. For example, Vidros et al. [6] analyzed employment fraud for the first time. They explained the work process of online recruitment and the role of the applicant tracking system in this process in detail. A set of rules of thumb was summarized from analyzing feasible data in the real world. Meanwhile, spam [7], insurance fraud, and phishing are highly similar to recruitment fraud, and they also discussed these areas in detail.
Besides, a more comprehensive and extensive study [8] was conducted based on the previous work [6]. They contributed and evaluated a real dataset of 17880 recruitment advertisements-Employment Scam Aegean Dataset (EMS-CAD) (https://www.kaggle.com/datasets/shivamb/real-orfake-fake-jobposting-prediction). Based on a subset of this dataset, they conducted word bag modeling and trained six WEKA (http://www.cs.waikato.ac.nz/ml/weka) classifiers (logistic regression, J48 decision trees, Naïve Bayes, random forest, etc.). As a result, an extension of the empirical ruleset was derived.

Journal of Sensors
Based on supervised learning methods, Dutta and Bandyopadhyay [9] proposed an automatic classification tool. They used single and ensemble classifiers to detect fraudulent recruitment advertisements, respectively. The single classifiers applied Naïve Bayes, multi-layer perceptron (MLP), K-nearest neighbor (KNN), and decision tree. And in the ensemble classifiers, random forest, and AdaBoost were applied. In addition, they compared the performance of these classifiers on the original highly unbalanced dataset.
Recruitment advertisements should be drawn from various sources to collect data in an all-around way. To solve this problem, Nindyati and Nugraha [10] extracted the Indonesian Employment Scam Dataset (IESD) from Indonesian recruitment data and manually labeled it based on empirical analysis and public reports of employment scam complaints. They considered the platforms where fraudulent recruitment advertisements were posted. In addition, they added behavioral features to previous studies [8] and used behavioral activities as contextual features for fraud detection. Naïve Bayes, logistic regression, KNN, decision tree, and SVM were applied as classifiers. The result indicated that adding behavioral characteristics can improve the detection effect of fake recruitment advertisements.

Traditional Learning + Feature Extraction-Based
Detection Methods. According to the classification method proposed by Vidros et al. [8], the features were divided into three categories in the feature extraction stage, i.e., languagebased, context-based, and metadata-based features [3]. They selected J48, logistic regression, and random forest methods as three baselines. Then, they combined voting techniques, including average vote, majority vote, and maximum vote, with these three baselines when constructing detection models. Moreover, they evaluated the performance of the models on unbalanced datasets, which made the experiment more comprehensive.
The lack of sufficient background information on recruitment websites makes the detection of fraudulent recruitment advertisements even more challenging. To address this problem, Mahbub and Pardede [11] focused on a novel feature space design that further extracted information about recruitment companies. They extracted contextual features manually and considered not only textual and structural features but also contextual features. The experiments result on the Naïve Bayes, J48, and JRip classifiers suggested that adding contextual features improved the detection performance and further enriched the ruleset.
Alghamdi and Alharby [12] used ensemble method based on random forest classifier to detect fraudulent recruitment advertisements. The SVM algorithm extracted the main features, including company profile, company logo, and required experience. Moreover, the detection performance was improved based on the reliable model obtained in the preprocessing and feature selection stages.
Mehboob and Malik [13] focused on the influential features of the EMSCAD to detect fake recruitment advertisements. They used information gain to select significant features. Their experimental results showed that the company profile, salary range, organization type, and required education were the most influential. Therefore, they considered combining well-performing features and adding valuable features to help improve the model's performance. In addition, they established a robust detection model using gradient boosting techniques.

Deep Learning-Based Methods.
Fraudsters may know the ruleset in advance, which makes detecting fraudulent recruitment advertisements increasingly tricky. Kim et al. [14] believed there was an internal correlation between frauds, so they proposed a deep neural network algorithm based on hierarchical clustering to detect implicit fraud in data. They took the anomaly characteristics as the initial weights of the deep neural network and trained them with an autoencoder. This method discarded the feature information, took the global and local data structure as the entry point, and used clustering and deep neural networks to reveal the internal relationships between frauds.

Summary.
Existing studies did an excellent job of detecting fraudulent recruitment advertisements. However, using the ruleset to detect fake job advertisements has poor expansion and is challenging to apply to new datasets. At the same time, existing studies used different detection methods, different evaluation metrics, and different datasets. They lacked comparative analysis, which makes the detection of fake job advertisements lack systematic research.

Journal of Sensors
To solve this problem, this paper selects four traditional algorithms, random forest, logistic regression, SVM, and Naïve Bayes, which are frequently used in the above literature and are proven to be good on various datasets under multifarious evaluation metrics. Moreover, three popular deep learning algorithms, including GRU, Bi-LSTM, and TextCNN, are adopted to detect fake recruitment advertisements, all performing well in the field of NLP. For machine learning methods, the Bag of Words (BoW) algorithm and the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm [15] are adopted to realize feature extraction. At the same time, in deep learning methods, we also try to use pretraining models, including Word to Vector (Word2-Vec) [16] and Global Vectors (GloVe) [17] for word embedding. Moreover, we use the same evaluation metrics and datasets to carry out systematic and comprehensive comparative experiments.

Comparison and Analysis of Fraudulent Recruitment Advertisement Detection Methods
Since this paper is aimed at comparing and analyzing the detection methods used in the existing work experimentally, this section will analyze the four typical traditional learning methods frequently adopted in recruitment advertising detection literature [3, 8-10, 12, 13] and the three currently popular deep learning methods. In detail, the four traditional learning methods include random forest, logistic regression, SVM, and Naïve Bayes. The three deep learning methods include GRU, Bi-LSTM, and TextCNN.
3.1. Random Forest Method. As a classical ensemble learning method, random forest was first proposed by Breiman [18], who combined the Bagging ensemble learning theory [19] with the random subspace method [20]. Random forest is a classifier based on the decision tree, which can solve the performance bottleneck of the decision tree by outputting the result by voting. In addition, random forest has better tolerance to noise and outliers, has stronger scalability for high-dimensional data classification, and has a stronger generalization ability of the model. Moreover, random forest is a data-driven nonparametric classification method that only needs to learn classification rules from a given sample without prior knowledge.

SVM Method.
Based on statistical learning theory [21], SVM is a data mining method that can successfully handle regression and pattern recognition problems. The target of SVM is to search for the optimal hyperplane in space to satisfy the classification requirements. SVM is one of the most commonly used and most effective classifiers, and it has good generalization performance based on the principle of structural risk minimization. Moreover, SVM has a solid theoretical basis and specific mathematical model and has been widely concerned by researchers since it was proposed.

Naïve Bayes Method.
Based on the probability model, Naïve Bayes was proposed by Maron and Kuhns [22]. The "naive" of Naïve Bayes refers to the two primary hypotheses: conditional independence and positional independence. In detail, the conditional independence hypothesis assumes that the property values are independent of each other, namely, there is no dependency between terms. The positional independence hypothesis means that the term's location in the document does not affect the probability calculation.

Logistic Regression
Method. The mechanism of logistic regression [23] uses a group of data to fit a logistic regression model and form multiple regression relationships to predict the occurrence probability of an event in any area. The advantage of logistic regression lies in that the independent variables in statistical analysis can be continuous or discrete, which does not need to meet the normal distribution. Logistic regression is good at solving binary classification problems, and detecting fraudulent recruitment advertisements is a common textual binary classification problem. The logistic regression classifier is simple and easy to understand, and the model is highly interpretable. It does not need to assume data distribution in advance and directly models the possibility of classification, avoiding the problem of inaccurate hypothesis distribution. In addition, only the eigenvalues of each dimension are stored, and the memory resource consumption is small. [24] is a feedforward neural network that recognizes two-dimensional images with amplification, shrinkage, and displacement invariance. In recent years, CNN has been mostly used for image processing or classification recognition. Kim [25] first adopted CNN for text classification and proposed the TextCNN model. Figure 2 is the structure of TextCNN. TextCNN and CNN are very similar in design. The difference is that CNN uses convolution kernels of the same width and height when processing images. Still, TextCNN's convolution kernel width is consistent with the word vector dimension. When CNN processes images, it carries out a two-dimensional convolution operation at the convolution layer, while TextCNN carries out a onedimensional convolution operation when it processes text.

TextCNN Method. Convolutional neural network (CNN)
3.6. LSTM Method. Long-short-term memory (LSTM) [26] is a special kind of recurrent neural network (RNN) [27]. Its network structure is the same as general RNN. The difference is that the memory module replaces the summation unit in the hidden layer. Figure 3 is the structure of the LSTM model. The information of cell state can be enhanced or weakened by the design of the "gate" of LSTM, so that long-term dependent information can be learned, effectively overcoming the defect of traditional RNN.
GRU [28] and Bi-LSTM [29] are the two most classical variants of LSTM. GRU was proposed to solve long-term memory problems and gradient in backpropagation. And Bi-LSTM overcomes the shortcoming that the LSTM model can only get one-way information from front to back but cannot from back to front. The forward and backward LSTM networks obtain the context information, and the model's performance is effectively improved. 4 Journal of Sensors

Experiments and Results
In this paper, we have designed two sets of experiments. The first set of experiments is to verify the performance of each algorithm on the balanced dataset. The second set of experiments is conducted to verify the influence of the dataset imbalance ratio on the experimental performances. The following subsections illustrate the data and experimental setting details, the evaluation metrics, and the experimental results.

Data and Experimental Settings.
In the experimental data section, the EMSCAD is used in this paper. The dataset contains 17880 real-life recruitment advertisements from 2012 to 2014, classified as legitimate and fraudulent, with 17014 legitimate and 866 fraudulent. The dataset description is shown in Table 1. We construct five datasets based on the original dataset using the downsampling method, including one balanced dataset and four unbalanced datasets. Table 2 is the detailed information on the datasets, and the last column is the datasets for each set of experiments we used. As many values are missed, we select company profile, description, requirements, and benefits, primarily selected in existing literature as text features. Before starting the experiment, we "clean" the experimental data, including removing punctuations, stopping words, and processing missing values. In experiments, seven algorithms are selected for comparison, that is, the random forest method (abbr. RF), the logistic regression method (abbr. LR), the SVM method, and the Naïve Bayes method (abbr. NB), which are the four typical traditional learning algorithms adopted in recruitment advertisement detection literature [3,[8][9][10][11][12]

Evaluation Metrics. Accuracy, precision, recall, and F1
-score are adopted as evaluation metrics. A confusion matrix is introduced first to introduce the above four evaluation metrics. Table 3 is a binary confusion matrix. In this table, TP represents the quantity of truly positive samples and classified as positive, FP represents the quantity of actually negative samples but classified as positive, FN represents the quantity of actually positive samples but classified as negative, and TN represents the quantity of actually negative samples and classified as negative.
The details of accuracy, precision, recall, and F1-score are shown as follows.  Table 4 lists the results of experiments on the balanced dataset. Table 4 shows that all the traditional learning methods achieve good results (A, P, R, and F are all greater than 0.88), and random forest performs best on the balanced dataset. After using the TF-IDF algorithm for feature extraction, the results of all the traditional learning + feature extraction methods are improved (A, P, R, and F are all greater than 0.9), except for the Naive Bayes. In particular, the SVM+TF-IDF methods achieve the best performance. TF-IDF algorithm measures the importance of a word in terms of frequency. These results show that using TF-IDF for feature extraction plays a significant part in enhancing the effectiveness of methods. In the deep learning methods, TextCNN performs best (A, P, R, and F are all greater than 0.93). The reason is that TextCNN uses three different sizes of convolution kernels for word embedding. Compared with random forest and SVM +TF-IDF, the performance of TextCNN improves (A: 3.1%, P: 3.2%, R: 3%, F: 3%) and (A: 0.2%, P: 0.1%, R: 0.3%, F: 0.2%), respectively. After using Word2Vec and GloVe for word embedding, the performance of GRU, Bi-LSTM, and TextCNN are worse than before. In my opinion, these two pretraining models are trained on a specific dataset, which is different from the writing style of the EMSCAD dataset, so the experimental results are not ideal. Table 5 lists the results of experiments on the four unbalanced datasets. From Table 5, when the datasets are slightly unbalanced, for example, on the unbalanced dataset-1 and unbalanced dataset-2, the results are roughly the same as those on the balanced dataset. In detail, on the unbalanced dataset-1 and unbalanced dataset-2, we can see that random forest performs best in traditional learning methods, while SVM+TF-IDF performs best in traditional learning + feature extraction methods. Similarly, TextCNN in the deep learning category   Journal of Sensors performs best compared to the above methods. On the unbalanced dataset-1, compared with random forest and SVM+TF-IDF, the performance of TextCNN improves (A: 3.1%, P: 2.3%, R: 3.8%, F: 3.3%) and (A: 3%, P: 2.7%, R: 3.3%, F: 3.2%), respectively. On the unbalanced dataset-2, compared with random forest and SVM+TF-IDF, the performance of TextCNN improves (A: 3.2%, P: 1.2%, R: 6.8%, F: 4.8%) and (A: 2.4%, P: 0.8%, R: 5%, F: 3.5%), respectively. In the deep learning + pretraining methods, after using the pretraining model Word2Vec and GloVe for word embedding, the performance of GRU and Bi-LSTM decreased significantly. And the result of the TextCNN methods decreases slightly. We guess these results also may lead by the fact that these two pretraining models are trained on a specific dataset, which is different from the writing style of the EMSCAD dataset.

Results and Analysis of the Second Set of Experiments.
On the unbalanced dataset-3 and unbalanced dataset-4, though the unbalanced ratio of these two datasets increases, the TextCNN method maintains its advantage. Still, it has the best detection effect compared with other methods. In particular, on the unbalanced dataset-3 and unbalanced dataset-4, TextCNN performs best compared to the other methods. As the datasets become more unbalanced, the influence of TF-IDF on accuracy, precision, recall, and F1 -score gradually decreases, and the recall and F1-score of the traditional learning methods and traditional learning + feature extraction methods significantly reduce, which shows that the TextCNN method has good robustness even when the dataset is very unbalanced. We think the TextCNN method uses multiple convolution kernels of different sizes to embed documents that enrich the semantic representation. In conclusion, TextCNN has advantages in dealing with balanced and unbalanced datasets; therefore, it is more suitable for dealing with the data in real life.

Analysis of Energy Consumption.
Since the goal of the green IoT is to achieve better results in an environmentally friendly manner (i.e., less computing consumption), for the computing consumption, two aspects should be considered, i.e., training time and testing time. Testing time is even more critical for IoT devices than training time for getting a good model. This is because IoT devices with less testing time are more sensitive and can contribute to better humancomputer interaction. Therefore, to further compare the performance of the compared methods, this section analyzes the experimental results from the perspective of training time and testing time and takes the testing time as the primary consideration. Figures 4 and 5 are the results of training time and testing time on the balanced dataset. Two figures are drawn because the training time of traditional and deep learning methods is of different orders of magnitude.
As can be seen from Figure 4(a), among the four traditional learning methods (see left columnar plexus of Figure 4(a)), Naïve Bayes has the shortest training time and SVM has the longest one. After feature extraction (see right columnar plexus of Figure 4(a)), expecting for logistic regression, the training time of all the other methods, such as RF, SVM, and NB, increases. Compared with traditional learning methods, the training time of the detection methods based on deep learning is significantly increased to a larger order of magnitude, shown in Figure 5(a). That is because deep learning methods need much longer to train the deep neural network. In addition to the GRU method, the training time of Bi-LSTM and TextCNN increases after using pretraining models for word embedding (see right columnar plexus of Figure 5(a)), and the training time of the TextCNN method increases most significantly.  Figure 4(b), we can see that almost all the methods need less than 0.05 s in response to each test except the SVM and SVM+TF-IDF methods. I think SVM and SVM+TF-IDF spend more time searching for the optimal hyperplane. In particular, the logistic regression method has the shortest testing time, which means it is the most sensitive method for IoT devices. After using TF-IDF for feature extraction (see right columnar plexus of Figure 4(b)), the testing time of all the other two methods, such as NB and LR, is decreased except for the RF and SVM methods. From Figure 5(b), we can see that in the deep learning methods, the testing time of TextCNN is shorter than GRU and Bi-LSTM. With the adoption of Word2Vec and GloVe, the testing time of GRU, Bi-LSTM, and TextCNN all increased. It is worth noting that the response time of TextCNN is about 0.6 s per test, which is slightly higher than that of traditional learning methods. Compared with the results, we get by using the traditional learning methods and the traditional learning + feature extraction methods in Table 4 and Table 5, we can see that TextCNN achieves the best detection performance on accuracy, precision, recall, and F1-score. Moreover, comparing the results
The experimental results indicate that the deep learning methods are generally better than the traditional learning methods, the traditional learning + feature extraction methods, and even the deep learning + pretraining-based methods, regardless of the balanced or the unbalanced datasets. In particular, TextCNN outperforms other deep learning methods. In terms of time performance, though TextCNN needs considerably longer offline training time, the testing time (i.e., response time) is slightly higher than the traditional learning-based methods. Those results indicate that the TextCNN method can detect real-life fraudulent recruitment advertisements in the IoT environment.
In summary, using unified evaluation metrics and datasets and considering the impact of the imbalance rates make the comparison and analysis of fraudulent recruitment advertisement detection methods more systematic and comprehensive.    9 Journal of Sensors Therefore, this paper can help researchers systematically understand the detecting methods of fraudulent recruitment advertisements and provide directions for selecting and exploring suitable methods. The experimental results have a specific reference value for further research of higher performance recruitment advertising detection methods.
Based on this paper, in the future, we aim to collect our employment fraud detecting dataset, and we will study a higher performance and lower energy consumption fraudulent recruitment advertising detection method to help achieve the goal of green IoT. It is an exciting direction to ensemble high-performance methods such as TextCNN, LSTM, or other popular deep learning methods and technologies (e.g., attention mechanism, mask mechanism). Moreover, deeper pretraining models for word embeddings are also an interesting direction.

Data Availability
The data we used is available and can be obtained from the author (202082060057@sdust.edu.cn).

Conflicts of Interest
The authors declare that they have no conflicts of interest.